Discussion:
netperf
Анна Будкина
2014-01-28 10:02:40 UTC
Permalink
Hello!

I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i
obtained extremely low result - 5x10^6 bits/s while the maximum throughput
is 1000 mbit/s .
I wonder what could be the reason for that.

Anna
Sebastian Sumpf
2014-01-28 10:19:18 UTC
Permalink
Hi Anna,
Post by Анна Будкина
Hello!
I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i
obtained extremely low result - 5x10^6 bits/s while the maximum
throughput is 1000 mbit/s .
I wonder what could be the reason for that.
That is only 5 mbit/s, which is indeed pretty low. Right now the LXIP
stack should at least do about 500 mbit/s in either direction. Please
let me know what hardware platform you are using, so I can try to
reproduce the behavior.

Thanks,

Sebastian
--
Sebastian Sumpf
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Анна Будкина
2014-01-28 10:58:35 UTC
Permalink
I'm measuring throughput between two hosts. I use genode running on
fiasco.OC on one machine and monolithic Linux on another machine. There's
82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi
driver doesn't work on my machine i'm using pci_drv driver. Another problem
is that level-triggered interrupts are not recieved and i'm polling
interrupts in the internal cycle in /os/src/lib/dde_kit/interrupt.cc:
while (1) {
_lock.lock();
if (_handle_irq) _handler(_priv);
_lock.unlock();
}

//while (1) {
//_irq.wait_for_irq();
/* only call registered handler function, if IRQ is not
disabled */
//_lock.lock();
//if (_handle_irq) _handler(_priv);
//_lock.unlock();
//}
I've done the same thing while testing l4re network stack and it didn't
affect performance then.
Post by Sebastian Sumpf
Hi Anna,
Post by Анна Будкина
Hello!
I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i
obtained extremely low result - 5x10^6 bits/s while the maximum
throughput is 1000 mbit/s .
I wonder what could be the reason for that.
That is only 5 mbit/s, which is indeed pretty low. Right now the LXIP
stack should at least do about 500 mbit/s in either direction. Please
let me know what hardware platform you are using, so I can try to
reproduce the behavior.
Thanks,
Sebastian
--
Sebastian Sumpf
Genode Labs
http://www.genode-labs.com · http://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
GeschÀftsfÌhrer: Dr.-Ing. Norman Feske, Christian Helmuth
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Genode-main mailing list
https://lists.sourceforge.net/lists/listinfo/genode-main
Sebastian Sumpf
2014-01-28 11:50:29 UTC
Permalink
Hi again Anna,

for the future, please sent your replies at the bottom not the top, like
I am doing now .-)
Post by Анна Будкина
I'm measuring throughput between two hosts. I use genode running on
fiasco.OC on one machine and monolithic Linux on another machine.
There's 82579LM NIC on each host. I'm running netperf_lxip.run script.
As acpi driver doesn't work on my machine i'm using pci_drv driver.
Another problem is that level-triggered interrupts are not recieved and
i'm polling interrupts in the internal cycle in
while (1) {
_lock.lock();
if (_handle_irq) _handler(_priv);
_lock.unlock();
}
//while (1) {
//_irq.wait_for_irq();
/* only call registered handler function, if IRQ is not
disabled */
//_lock.lock();
//if (_handle_irq) _handler(_priv);
//_lock.unlock();
//}
I've done the same thing while testing l4re network stack and it didn't
affect performance then.
It sounds like as if you are using an x86 machine - right? Polling is a
no go and I have tried to fix the interrupt-mode issue (edge vs. level
vs. low and high) on Fiasco OC on several occasions. Unfortunately,
there seems to be no universal solution to that. Maybe you should ask
about that one on the L4Hackers mailing list:
l4-***@os.inf.tu-dresden.de

If you can afford it and if you're running on x86, please try the Nova
version of Genode and let me know about the outcome, performance-wise of
course. I appreciate any hints why the ACPI-driver is not working (other
then it is not an x86 computer of course).

For the book: I cannot recommend the changes above!

Sebastian
Sartakov A. Vasily
2014-01-28 11:58:42 UTC
Permalink
Hello.
Post by Sebastian Sumpf
If you can afford it and if you're running on x86, please try the Nova
version of Genode and let me know about the outcome, performance-wise of
course. I appreciate any hints why the ACPI-driver is not working (other
then it is not an x86 computer of course).
Just for curiosity: Have you compared LXI performance with different kernels - NOVA vs FOC? on hardware, ofc
--
Sartakov A. Vasily
***@ksyslabs.org
Sebastian Sumpf
2014-01-28 21:06:17 UTC
Permalink
Hi Vasily,
Post by Sartakov A. Vasily
Hello.
Post by Sebastian Sumpf
If you can afford it and if you're running on x86, please try the Nova
version of Genode and let me know about the outcome, performance-wise of
course. I appreciate any hints why the ACPI-driver is not working (other
then it is not an x86 computer of course).
Just for curiosity: Have you compared LXI performance with different kernels - NOVA vs FOC? on hardware, ofc
No not really, this thing has been optimized for the Exynos 5250
platform on Fiasco OC, using a Gigabit-USB-network adapter. Right now I
am looking into the rump stuff (http://wiki.netbsd.org/rumpkernel),
while hoping to get a decent performance out of it! Just talk to my
colleagues at FOSDEM about it!

Sorry I cannot be there,

Sebastian
Christian Helmuth
2014-01-28 11:59:50 UTC
Permalink
Hello Anna,

welcome to the list ;-)
Post by Анна Будкина
I'm measuring throughput between two hosts. I use genode running on
fiasco.OC on one machine and monolithic Linux on another machine. There's
82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi
driver doesn't work on my machine i'm using pci_drv driver. Another problem
is that level-triggered interrupts are not recieved and i'm polling
[...]

I was slightly astonished by the bad benchmark results. So, I tried
today's Genode master with the following scenario:

* Genode on Lenovo T61 (82566mm, PCIe 8086:1049)
* Linux on T410 (82577LM, PCIe 8086:10ea)

With your patch I got

! PERF: TCP_STREAM 2.02 MBit/s
! PERF: TCP_MAERTS 8.00 MBit/s

This substantiates my assumption that your implemented "polling"
degrades the performance significantly. The original code without
polling produces

! PERF: TCP_STREAM 65.59 MBit/s
! PERF: TCP_MAERTS 543.35 MBit/s

which is no top-notch result, but looks more promising. We did not
investigate the performance drop on TCP_STREAM up to now, but suspect
the NIC driver or its integration to be the cause.

Best regards
--
Christian Helmuth
Genode Labs

http://www.genode-labs.com/ · http://genode.org/
https://twitter.com/GenodeLabs · /ˈdʒiː.nəʊd/

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Sebastian Sumpf
2014-01-28 12:07:19 UTC
Permalink
Post by Christian Helmuth
Hello Anna,
welcome to the list ;-)
Post by Анна Будкина
I'm measuring throughput between two hosts. I use genode running on
fiasco.OC on one machine and monolithic Linux on another machine. There's
82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi
driver doesn't work on my machine i'm using pci_drv driver. Another problem
is that level-triggered interrupts are not recieved and i'm polling
[...]
I was slightly astonished by the bad benchmark results. So, I tried
* Genode on Lenovo T61 (82566mm, PCIe 8086:1049)
* Linux on T410 (82577LM, PCIe 8086:10ea)
With your patch I got
! PERF: TCP_STREAM 2.02 MBit/s
! PERF: TCP_MAERTS 8.00 MBit/s
This substantiates my assumption that your implemented "polling"
degrades the performance significantly. The original code without
polling produces
! PERF: TCP_STREAM 65.59 MBit/s
! PERF: TCP_MAERTS 543.35 MBit/s
which is no top-notch result, but looks more promising. We did not
investigate the performance drop on TCP_STREAM up to now, but suspect
the NIC driver or its integration to be the cause.
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

Sebastian
--
Sebastian Sumpf
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Christian Helmuth
2014-01-28 12:20:04 UTC
Permalink
Sebastian,
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?
Complete netperf output follows

---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249

Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12 9226

Maximum
Segment
Size (bytes)
1448

calculation: overall bytes / size per packet / time = packets per second
82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s

! PERF: TCP_STREAM 65.59 MBit/s ok

---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Recv Send Recv Send
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151

Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954 16384.00 41474

Maximum
Segment
Size (bytes)
1448

calculation: overall bytes / size per packet / time = packets per second
679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s

! PERF: TCP_MAERTS 543.35 MBit/s ok


The manual states

TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just
like” a TCP_STREAM test except the data flows from the
netserver to the netperf.

So, the scenario is much slower if the Genode side is _receiving_.

Regards
--
Christian Helmuth
Genode Labs

http://www.genode-labs.com/ · http://genode.org/
https://twitter.com/GenodeLabs · /ˈdʒiː.nəʊd/

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Sebastian Sumpf
2014-01-28 12:32:25 UTC
Permalink
Post by Christian Helmuth
Sebastian,
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?
Complete netperf output follows
---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Recv Send Recv Send
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954 16384.00 41474
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just
like” a TCP_STREAM test except the data flows from the
netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.
Without cursing: This is no good! I will look into that!

Sebastian
--
Sebastian Sumpf
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Анна Будкина
2014-01-28 12:49:49 UTC
Permalink
Post by Christian Helmuth
Post by Christian Helmuth
Sebastian,
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?
Complete netperf output follows
---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.65 () port 0 AF_INET
Post by Christian Helmuth
Recv Send Send Utilization Service
Demand
Post by Christian Helmuth
Socket Socket Message Elapsed Send Recv Send
Recv
Post by Christian Helmuth
Size Size Size Time Throughput local remote local
remote
Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U us/KB
us/KB
Post by Christian Helmuth
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598
-1.249
Post by Christian Helmuth
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.65 () port 0 AF_INET
Post by Christian Helmuth
Recv Send Send Utilization Service
Demand
Post by Christian Helmuth
Socket Socket Message Elapsed Recv Send Recv
Send
Post by Christian Helmuth
Size Size Size Time Throughput local remote local
remote
Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U us/KB
us/KB
Post by Christian Helmuth
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555
-0.151
Post by Christian Helmuth
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954 16384.00
41474
Post by Christian Helmuth
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is "just
like" a TCP_STREAM test except the data flows from the
netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.
Without cursing: This is no good! I will look into that!
Sebastian
--
Sebastian Sumpf
Genode Labs
http://www.genode-labs.com · http://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Genode-main mailing list
https://lists.sourceforge.net/lists/listinfo/genode-main
Thank you very much for the reply! I will try to use Genode on Nova to
perform these tests.
Christian Helmuth
2014-01-28 13:39:43 UTC
Permalink
Hello,

just for completeness...
Post by Анна Будкина
Thank you very much for the reply! I will try to use Genode on Nova to
perform these tests.
My results on NOVA

! PERF: TCP_STREAM 230.55 MBit/s
! PERF: TCP_MAERTS 664.66 MBit/s

So, on NOVA it performs slightly better on TCP_MAERTS and also yields
much improved performance on TCP_STREAM - unfortunatly still about 1/3
of TCP_MAERTS.

Regards
--
Christian Helmuth
Genode Labs

http://www.genode-labs.com/ · http://genode.org/
https://twitter.com/GenodeLabs · /ˈdʒiː.nəʊd/

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Sebastian Sumpf
2014-01-28 21:08:14 UTC
Permalink
Post by Christian Helmuth
Post by Christian Helmuth
Sebastian,
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?
Complete netperf output follows
---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C --
-m 1024
Post by Christian Helmuth
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.65 () port 0 AF_INET
Post by Christian Helmuth
Recv Send Send Utilization
Service Demand
Post by Christian Helmuth
Socket Socket Message Elapsed Send Recv
Send Recv
Post by Christian Helmuth
Size Size Size Time Throughput local remote
local remote
Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U
us/KB us/KB
Post by Christian Helmuth
87380 16384 1024 10.02 65.59 34.95 -1.00
174.598 -1.249
Post by Christian Helmuth
Alignment Offset Bytes Bytes Sends Bytes
Recvs
Post by Christian Helmuth
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12
9226
Post by Christian Helmuth
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per
second
Post by Christian Helmuth
82158592 Bytes / 1024 Bytes / 10.02 s = 8007
packets/s
Post by Christian Helmuth
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C --
-m 1024
Post by Christian Helmuth
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.65 () port 0 AF_INET
Post by Christian Helmuth
Recv Send Send Utilization
Service Demand
Post by Christian Helmuth
Socket Socket Message Elapsed Recv Send
Recv Send
Post by Christian Helmuth
Size Size Size Time Throughput local remote
local remote
Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U
us/KB us/KB
Post by Christian Helmuth
87380 16384 1024 10.00 543.35 39.06 -1.00
23.555 -0.151
Post by Christian Helmuth
Alignment Offset Bytes Bytes Recvs Bytes
Sends
Post by Christian Helmuth
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954
16384.00 41474
Post by Christian Helmuth
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per
second
Post by Christian Helmuth
679280808 Bytes / 1024 Bytes / 10.00 s = 66336
packets/s
Post by Christian Helmuth
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just
like” a TCP_STREAM test except the data flows from the
netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.
Without cursing: This is no good! I will look into that!
Thank you very much for the reply! I will try to use Genode on Nova to
perform these tests.
So, it is x86 i guess.

Sebastian
Julian Stecklina
2014-01-28 21:22:47 UTC
Permalink
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a packet
trace and use tcptrace/xplot on it.

Julian
Sebastian Sumpf
2014-01-28 21:48:25 UTC
Permalink
Post by Julian Stecklina
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a packet
trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)

Sebastian
Johannes Schlatow
2016-05-24 20:11:27 UTC
Permalink
On Tue, 28 Jan 2014 22:48:25 +0100
Post by Sebastian Sumpf
Post by Julian Stecklina
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What
is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a
packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)
Hi Sebastian,

I was wondering whether you actually looked intro that as we are
experiencing some strange effects with netperf as well.

Let me briefly summarise our findings:
We are running netperf_lwip on base-linux in order to evaluate how our
changes in the software affect the networking performance. For
TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results
in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the
results that have been discussed here.
However, what actually puzzles me most is the fact that
netperf_lwip_bridge draws a quite different picture. More precisely,
TCP_STREAM falls down to round about 170Mbit/s which I guess is
perfectly explainable by the additional context switch and copying of
the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the
additional nic_bridge. All results are reproducible. I could also
observe a similar behaviour on hw_rpi.

AFAIK the netserver code for TCP_STREAM only uses recv() whereas the
code for TCP_MAERTS only uses send(). Hence, it's totally
comprehensible to me that we experience asymmetric throughput results
depending on which path (RX or TX) performs better. However, I just
don't get why the nic_bridge, which not only adds a context switch but
also additional copying, increases the performance for TCP_MAERTS.

I guess this might be caused by bulk processing of multiple packets
enabled by the asynchronous packet-stream interface. I think I could
test this by assigning a high scheduling priority to the nic_bridge
so that it always processes a single packet.

Up to this point I have basically two questions:
1. Has anyone made any further investigations of Genode's networking
performance?
2. Any other (possible) explanations for my observations?

Cheers
Johannes
Sebastian Sumpf
2016-06-07 09:15:44 UTC
Permalink
Hi Johannes,

sorry for the late answer, I first had to fix RPI's USB and networking
after the last release, it seemed to be kind of or totally broken since
the 15.05. release. Reinier pointed me kindly to this. My answer can be
found below.
Post by Johannes Schlatow
On Tue, 28 Jan 2014 22:48:25 +0100
Post by Sebastian Sumpf
Post by Julian Stecklina
Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What
is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a
packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)
Hi Sebastian,
I was wondering whether you actually looked intro that as we are
experiencing some strange effects with netperf as well.
We are running netperf_lwip on base-linux in order to evaluate how our
changes in the software affect the networking performance. For
TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results
in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the
results that have been discussed here.
However, what actually puzzles me most is the fact that
netperf_lwip_bridge draws a quite different picture. More precisely,
TCP_STREAM falls down to round about 170Mbit/s which I guess is
perfectly explainable by the additional context switch and copying of
the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the
additional nic_bridge. All results are reproducible. I could also
observe a similar behaviour on hw_rpi.
AFAIK the netserver code for TCP_STREAM only uses recv() whereas the
code for TCP_MAERTS only uses send(). Hence, it's totally
comprehensible to me that we experience asymmetric throughput results
depending on which path (RX or TX) performs better. However, I just
don't get why the nic_bridge, which not only adds a context switch but
also additional copying, increases the performance for TCP_MAERTS.
I guess this might be caused by bulk processing of multiple packets
enabled by the asynchronous packet-stream interface. I think I could
test this by assigning a high scheduling priority to the nic_bridge
so that it always processes a single packet.
1. Has anyone made any further investigations of Genode's networking
performance?
2. Any other (possible) explanations for my observations?
1. Not really.

2. TCP uses receive and send window sizes. This means that an ACK has to
be sent for each window or segment, how they call it, not for each TCP
packet. Usually, the higher the throughput is, the larger are the window
sizes. We have seen window sizes as large as 20 KB, but only when Linux
is sending. The window size dynamically adapts to the rate of ACKs and
heavily depends on the timing of both communication partners. Also when
sending (MAERTS) we cannot batch packets as we do when receiving them
directly from the hardware (there can be multiple packets available in
one DMA transaction - on most cards). This means each packet is send to
the card in a separate request (especially on Linux). Therefore, I would
see the sending as a base line when sending or receiving one packet at a
time. Because of the nic_bridge, the timing changed so that the ACK rate
somehow caused a slightly larger TCP window (you can check that with
wireshark). Because of batching, the receive numbers would be in turn
the current (and not so great ;) upper limit. That would be my three cents.

Programming a TCP/IP stack that actually works and performs in the wild
is complicated stuff and I guess we could keep our whole company busy,
just doing that. I hope this helps to explain some parts of your
observation,

Sebastian
--
Sebastian Sumpf
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Loading...