netperf

Discussion:

netperf

Анна Будкина

2014-01-28 10:02:40 UTC

Hello!

I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i
obtained extremely low result - 5x10^6 bits/s while the maximum throughput
is 1000 mbit/s .
I wonder what could be the reason for that.

Anna

Sebastian Sumpf

2014-01-28 10:19:18 UTC

Permalink

Hi Anna,

Post by ÐÐ½Ð½Ð° ÐÑÐ´ÐºÐ¸Ð½Ð°
Hello!
I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i
obtained extremely low result - 5x10^6 bits/s while the maximum
throughput is 1000 mbit/s .
I wonder what could be the reason for that.

--
Sebastian Sumpf
Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Анна Будкина

2014-01-28 10:58:35 UTC

Permalink

I'm measuring throughput between two hosts. I use genode running on
fiasco.OC on one machine and monolithic Linux on another machine. There's
82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi
driver doesn't work on my machine i'm using pci_drv driver. Another problem
is that level-triggered interrupts are not recieved and i'm polling
interrupts in the internal cycle in /os/src/lib/dde_kit/interrupt.cc:
while (1) {
_lock.lock();
if (_handle_irq) _handler(_priv);
_lock.unlock();
}

//while (1) {
//_irq.wait_for_irq();
/* only call registered handler function, if IRQ is not
disabled */
//_lock.lock();
//if (_handle_irq) _handler(_priv);
//_lock.unlock();
//}
I've done the same thing while testing l4re network stack and it didn't
affect performance then.

Post by Sebastian Sumpf
Hi Anna,

That is only 5 mbit/s, which is indeed pretty low. Right now the LXIP
stack should at least do about 500 mbit/s in either direction. Please
let me know what hardware platform you are using, so I can try to
reproduce the behavior.
Thanks,
Sebastian
--
Sebastian Sumpf
Genode Labs
http://www.genode-labs.com Â· http://genode.org
Genode Labs GmbH Â· Amtsgericht Dresden Â· HRB 28424 Â· Sitz Dresden
GeschÃ€ftsfÃŒhrer: Dr.-Ing. Norman Feske, Christian Helmuth
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Genode-main mailing list
https://lists.sourceforge.net/lists/listinfo/genode-main

Sebastian Sumpf

2014-01-28 11:50:29 UTC

Permalink

Hi again Anna,

for the future, please sent your replies at the bottom not the top, like
I am doing now .-)

Post by ÐÐ½Ð½Ð° ÐÑÐ´ÐºÐ¸Ð½Ð°
I'm measuring throughput between two hosts. I use genode running on
fiasco.OC on one machine and monolithic Linux on another machine.
There's 82579LM NIC on each host. I'm running netperf_lxip.run script.
As acpi driver doesn't work on my machine i'm using pci_drv driver.
Another problem is that level-triggered interrupts are not recieved and
i'm polling interrupts in the internal cycle in
while (1) {
_lock.lock();
if (_handle_irq) _handler(_priv);
_lock.unlock();
}
//while (1) {
//_irq.wait_for_irq();
/* only call registered handler function, if IRQ is not
disabled */
//_lock.lock();
//if (_handle_irq) _handler(_priv);
//_lock.unlock();
//}
I've done the same thing while testing l4re network stack and it didn't
affect performance then.

It sounds like as if you are using an x86 machine - right? Polling is a
no go and I have tried to fix the interrupt-mode issue (edge vs. level
vs. low and high) on Fiasco OC on several occasions. Unfortunately,
there seems to be no universal solution to that. Maybe you should ask
about that one on the L4Hackers mailing list:
l4-***@os.inf.tu-dresden.de

If you can afford it and if you're running on x86, please try the Nova
version of Genode and let me know about the outcome, performance-wise of
course. I appreciate any hints why the ACPI-driver is not working (other
then it is not an x86 computer of course).

For the book: I cannot recommend the changes above!

Sebastian

Sartakov A. Vasily

2014-01-28 11:58:42 UTC

Permalink

Hello.

Post by Sebastian Sumpf
If you can afford it and if you're running on x86, please try the Nova
version of Genode and let me know about the outcome, performance-wise of
course. I appreciate any hints why the ACPI-driver is not working (other
then it is not an x86 computer of course).

Just for curiosity: Have you compared LXI performance with different kernels - NOVA vs FOC? on hardware, ofc

--
Sartakov A. Vasily
***@ksyslabs.org

Sebastian Sumpf

2014-01-28 21:06:17 UTC

Permalink

Hi Vasily,

Post by Sartakov A. Vasily
Hello.

Just for curiosity: Have you compared LXI performance with different kernels - NOVA vs FOC? on hardware, ofc

No not really, this thing has been optimized for the Exynos 5250
platform on Fiasco OC, using a Gigabit-USB-network adapter. Right now I
am looking into the rump stuff (http://wiki.netbsd.org/rumpkernel),
while hoping to get a decent performance out of it! Just talk to my
colleagues at FOSDEM about it!

Sorry I cannot be there,

Sebastian

Christian Helmuth

2014-01-28 11:59:50 UTC

Permalink

Hello Anna,

welcome to the list ;-)

[...]

I was slightly astonished by the bad benchmark results. So, I tried
today's Genode master with the following scenario:

* Genode on Lenovo T61 (82566mm, PCIe 8086:1049)
* Linux on T410 (82577LM, PCIe 8086:10ea)

With your patch I got

! PERF: TCP_STREAM 2.02 MBit/s
! PERF: TCP_MAERTS 8.00 MBit/s

This substantiates my assumption that your implemented "polling"
degrades the performance significantly. The original code without
polling produces

! PERF: TCP_STREAM 65.59 MBit/s
! PERF: TCP_MAERTS 543.35 MBit/s

which is no top-notch result, but looks more promising. We did not
investigate the performance drop on TCP_STREAM up to now, but suspect
the NIC driver or its integration to be the cause.

Best regards

--
Christian Helmuth
Genode Labs

http://www.genode-labs.com/ · http://genode.org/
https://twitter.com/GenodeLabs · /ˈdʒiː.nəʊd/

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Sebastian Sumpf

2014-01-28 12:07:19 UTC

Permalink

Post by Christian Helmuth
Hello Anna,
welcome to the list ;-)

[...]
I was slightly astonished by the bad benchmark results. So, I tried
* Genode on Lenovo T61 (82566mm, PCIe 8086:1049)
* Linux on T410 (82577LM, PCIe 8086:10ea)
With your patch I got
! PERF: TCP_STREAM 2.02 MBit/s
! PERF: TCP_MAERTS 8.00 MBit/s
This substantiates my assumption that your implemented "polling"
degrades the performance significantly. The original code without
polling produces
! PERF: TCP_STREAM 65.59 MBit/s
! PERF: TCP_MAERTS 543.35 MBit/s
which is no top-notch result, but looks more promising. We did not
investigate the performance drop on TCP_STREAM up to now, but suspect
the NIC driver or its integration to be the cause.

Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

Sebastian

Christian Helmuth

2014-01-28 12:20:04 UTC

Permalink

Sebastian,

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

Complete netperf output follows

---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249

Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12 9226

Maximum
Segment
Size (bytes)
1448

calculation: overall bytes / size per packet / time = packets per second
82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s

! PERF: TCP_STREAM 65.59 MBit/s ok

---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Recv Send Recv Send
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151

Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954 16384.00 41474

Maximum
Segment
Size (bytes)
1448

calculation: overall bytes / size per packet / time = packets per second
679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s

! PERF: TCP_MAERTS 543.35 MBit/s ok

The manual states

TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just
like” a TCP_STREAM test except the data flows from the
netserver to the netperf.

So, the scenario is much slower if the Genode side is _receiving_.

Regards

Sebastian Sumpf

2014-01-28 12:32:25 UTC

Permalink

Post by Christian Helmuth
Sebastian,

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

Complete netperf output follows
---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Recv Send Recv Send
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954 16384.00 41474
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just
like” a TCP_STREAM test except the data flows from the
netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.

Without cursing: This is no good! I will look into that!

Sebastian

Анна Будкина

2014-01-28 12:49:49 UTC

Permalink

Post by Christian Helmuth

Post by Christian Helmuth
Sebastian,

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

10.0.0.65 () port 0 AF_INET

Post by Christian Helmuth
Recv Send Send Utilization Service

Demand

Post by Christian Helmuth
Socket Socket Message Elapsed Send Recv Send

Recv

Post by Christian Helmuth
Size Size Size Time Throughput local remote local

remote

Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U us/KB

us/KB

Post by Christian Helmuth
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598

-1.249

Post by Christian Helmuth
Alignment Offset Bytes Bytes Sends Bytes Recvs
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to

10.0.0.65 () port 0 AF_INET

Post by Christian Helmuth
Recv Send Send Utilization Service

Demand

Post by Christian Helmuth
Socket Socket Message Elapsed Recv Send Recv

Send

Post by Christian Helmuth
Size Size Size Time Throughput local remote local

remote

Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U us/KB

us/KB

Post by Christian Helmuth
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555

-0.151

Post by Christian Helmuth
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954 16384.00

41474

Post by Christian Helmuth
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per second
679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is "just
like" a TCP_STREAM test except the data flows from the
netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.

Without cursing: This is no good! I will look into that!
Sebastian
--
Sebastian Sumpf
Genode Labs
http://www.genode-labs.com · http://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Genode-main mailing list
https://lists.sourceforge.net/lists/listinfo/genode-main

Thank you very much for the reply! I will try to use Genode on Nova to
perform these tests.

Christian Helmuth

2014-01-28 13:39:43 UTC

Permalink

Hello,

just for completeness...

Post by ÐÐ½Ð½Ð° ÐÑÐ´ÐºÐ¸Ð½Ð°
Thank you very much for the reply! I will try to use Genode on Nova to
perform these tests.

My results on NOVA

! PERF: TCP_STREAM 230.55 MBit/s
! PERF: TCP_MAERTS 664.66 MBit/s

So, on NOVA it performs slightly better on TCP_MAERTS and also yields
much improved performance on TCP_STREAM - unfortunatly still about 1/3
of TCP_MAERTS.

Regards

Sebastian Sumpf

2014-01-28 21:08:14 UTC

Permalink

Post by Christian Helmuth

Post by Christian Helmuth
Sebastian,

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

Complete netperf output follows
---------------------------- TCP_STREAM -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C --

-m 1024

Post by Christian Helmuth
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to

10.0.0.65 () port 0 AF_INET

Post by Christian Helmuth
Recv Send Send Utilization

Service Demand

Post by Christian Helmuth
Socket Socket Message Elapsed Send Recv

Send Recv

Post by Christian Helmuth
Size Size Size Time Throughput local remote

local remote

Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U

us/KB us/KB

Post by Christian Helmuth
87380 16384 1024 10.02 65.59 34.95 -1.00

174.598 -1.249

Post by Christian Helmuth
Alignment Offset Bytes Bytes Sends Bytes

Recvs

Post by Christian Helmuth
Local Remote Local Remote Xfered Per Per
Send Recv Send Recv Send (avg) Recv (avg)
8 8 0 0 82158592 1024.00 80233 8905.12

9226

Post by Christian Helmuth
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per

second

Post by Christian Helmuth
82158592 Bytes / 1024 Bytes / 10.02 s = 8007

packets/s

Post by Christian Helmuth
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS -----------------------
spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C --

-m 1024

Post by Christian Helmuth
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to

10.0.0.65 () port 0 AF_INET

Post by Christian Helmuth
Recv Send Send Utilization

Service Demand

Post by Christian Helmuth
Socket Socket Message Elapsed Recv Send

Recv Send

Post by Christian Helmuth
Size Size Size Time Throughput local remote

local remote

Post by Christian Helmuth
bytes bytes bytes secs. 10^6bits/s % S % U

us/KB us/KB

Post by Christian Helmuth
87380 16384 1024 10.00 543.35 39.06 -1.00

23.555 -0.151

Post by Christian Helmuth
Alignment Offset Bytes Bytes Recvs Bytes

Sends

Post by Christian Helmuth
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 679280808 13875.90 48954

16384.00 41474

Post by Christian Helmuth
Maximum
Segment
Size (bytes)
1448
calculation: overall bytes / size per packet / time = packets per

second

Post by Christian Helmuth
679280808 Bytes / 1024 Bytes / 10.00 s = 66336

packets/s

Post by Christian Helmuth
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data
from the system running netperf to the system running
netserver.
TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just
like” a TCP_STREAM test except the data flows from the
netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.

Without cursing: This is no good! I will look into that!
Thank you very much for the reply! I will try to use Genode on Nova to
perform these tests.

So, it is x86 i guess.

Sebastian

Julian Stecklina

2014-01-28 21:22:47 UTC

Permalink

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

For the extremely bad case, it might be interesting to capture a packet
trace and use tcptrace/xplot on it.

Julian

Sebastian Sumpf

2014-01-28 21:48:25 UTC

Permalink

Post by Julian Stecklina

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What is
going on? Is this RX or TX?

For the extremely bad case, it might be interesting to capture a packet
trace and use tcptrace/xplot on it.

Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)

Sebastian

Johannes Schlatow

2016-05-24 20:11:27 UTC

Permalink

On Tue, 28 Jan 2014 22:48:25 +0100

Post by Sebastian Sumpf

Post by Julian Stecklina

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What
is going on? Is this RX or TX?

For the extremely bad case, it might be interesting to capture a
packet trace and use tcptrace/xplot on it.

Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)

Hi Sebastian,

I was wondering whether you actually looked intro that as we are
experiencing some strange effects with netperf as well.

Let me briefly summarise our findings:
We are running netperf_lwip on base-linux in order to evaluate how our
changes in the software affect the networking performance. For
TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results
in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the
results that have been discussed here.
However, what actually puzzles me most is the fact that
netperf_lwip_bridge draws a quite different picture. More precisely,
TCP_STREAM falls down to round about 170Mbit/s which I guess is
perfectly explainable by the additional context switch and copying of
the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the
additional nic_bridge. All results are reproducible. I could also
observe a similar behaviour on hw_rpi.

AFAIK the netserver code for TCP_STREAM only uses recv() whereas the
code for TCP_MAERTS only uses send(). Hence, it's totally
comprehensible to me that we experience asymmetric throughput results
depending on which path (RX or TX) performs better. However, I just
don't get why the nic_bridge, which not only adds a context switch but
also additional copying, increases the performance for TCP_MAERTS.

I guess this might be caused by bulk processing of multiple packets
enabled by the asynchronous packet-stream interface. I think I could
test this by assigning a high scheduling priority to the nic_bridge
so that it always processes a single packet.

Up to this point I have basically two questions:
1. Has anyone made any further investigations of Genode's networking
performance?
2. Any other (possible) explanations for my observations?

Cheers
Johannes

Sebastian Sumpf

2016-06-07 09:15:44 UTC

Permalink

Hi Johannes,

sorry for the late answer, I first had to fix RPI's USB and networking
after the last release, it seemed to be kind of or totally broken since
the 15.05. release. Reinier pointed me kindly to this. My answer can be
found below.

Post by Johannes Schlatow
On Tue, 28 Jan 2014 22:48:25 +0100

Post by Sebastian Sumpf

Post by Julian Stecklina

Post by Sebastian Sumpf
Thanks for your tests! But I don't like the 65 MBit/s thing! What
is going on? Is this RX or TX?

For the extremely bad case, it might be interesting to capture a
packet trace and use tcptrace/xplot on it.

Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)

Hi Sebastian,
I was wondering whether you actually looked intro that as we are
experiencing some strange effects with netperf as well.
We are running netperf_lwip on base-linux in order to evaluate how our
changes in the software affect the networking performance. For
TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results
in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the
results that have been discussed here.
However, what actually puzzles me most is the fact that
netperf_lwip_bridge draws a quite different picture. More precisely,
TCP_STREAM falls down to round about 170Mbit/s which I guess is
perfectly explainable by the additional context switch and copying of
the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the
additional nic_bridge. All results are reproducible. I could also
observe a similar behaviour on hw_rpi.
AFAIK the netserver code for TCP_STREAM only uses recv() whereas the
code for TCP_MAERTS only uses send(). Hence, it's totally
comprehensible to me that we experience asymmetric throughput results
depending on which path (RX or TX) performs better. However, I just
don't get why the nic_bridge, which not only adds a context switch but
also additional copying, increases the performance for TCP_MAERTS.
I guess this might be caused by bulk processing of multiple packets
enabled by the asynchronous packet-stream interface. I think I could
test this by assigning a high scheduling priority to the nic_bridge
so that it always processes a single packet.
1. Has anyone made any further investigations of Genode's networking
performance?
2. Any other (possible) explanations for my observations?

1. Not really.

2. TCP uses receive and send window sizes. This means that an ACK has to
be sent for each window or segment, how they call it, not for each TCP
packet. Usually, the higher the throughput is, the larger are the window
sizes. We have seen window sizes as large as 20 KB, but only when Linux
is sending. The window size dynamically adapts to the rate of ACKs and
heavily depends on the timing of both communication partners. Also when
sending (MAERTS) we cannot batch packets as we do when receiving them
directly from the hardware (there can be multiple packets available in
one DMA transaction - on most cards). This means each packet is send to
the card in a separate request (especially on Linux). Therefore, I would
see the sending as a base line when sending or receiving one packet at a
time. Because of the nic_bridge, the timing changed so that the ACK rate
somehow caused a slightly larger TCP window (you can check that with
wireshark). Because of batching, the receive numbers would be in turn
the current (and not so great ;) upper limit. That would be my three cents.

Programming a TCP/IP stack that actually works and performs in the wild
is complicated stuff and I guess we could keep our whole company busy,
just doing that. I hope this helps to explain some parts of your
observation,

Sebastian