Hi Johannes,
sorry for the late answer, I first had to fix RPI's USB and networking
after the last release, it seemed to be kind of or totally broken since
the 15.05. release. Reinier pointed me kindly to this. My answer can be
found below.
Post by Johannes SchlatowOn Tue, 28 Jan 2014 22:48:25 +0100
Post by Sebastian SumpfPost by Julian StecklinaPost by Sebastian SumpfThanks for your tests! But I don't like the 65 MBit/s thing! What
is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a
packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be
our plot guy .-)
Hi Sebastian,
I was wondering whether you actually looked intro that as we are
experiencing some strange effects with netperf as well.
We are running netperf_lwip on base-linux in order to evaluate how our
changes in the software affect the networking performance. For
TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results
in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the
results that have been discussed here.
However, what actually puzzles me most is the fact that
netperf_lwip_bridge draws a quite different picture. More precisely,
TCP_STREAM falls down to round about 170Mbit/s which I guess is
perfectly explainable by the additional context switch and copying of
the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the
additional nic_bridge. All results are reproducible. I could also
observe a similar behaviour on hw_rpi.
AFAIK the netserver code for TCP_STREAM only uses recv() whereas the
code for TCP_MAERTS only uses send(). Hence, it's totally
comprehensible to me that we experience asymmetric throughput results
depending on which path (RX or TX) performs better. However, I just
don't get why the nic_bridge, which not only adds a context switch but
also additional copying, increases the performance for TCP_MAERTS.
I guess this might be caused by bulk processing of multiple packets
enabled by the asynchronous packet-stream interface. I think I could
test this by assigning a high scheduling priority to the nic_bridge
so that it always processes a single packet.
1. Has anyone made any further investigations of Genode's networking
performance?
2. Any other (possible) explanations for my observations?
1. Not really.
2. TCP uses receive and send window sizes. This means that an ACK has to
be sent for each window or segment, how they call it, not for each TCP
packet. Usually, the higher the throughput is, the larger are the window
sizes. We have seen window sizes as large as 20 KB, but only when Linux
is sending. The window size dynamically adapts to the rate of ACKs and
heavily depends on the timing of both communication partners. Also when
sending (MAERTS) we cannot batch packets as we do when receiving them
directly from the hardware (there can be multiple packets available in
one DMA transaction - on most cards). This means each packet is send to
the card in a separate request (especially on Linux). Therefore, I would
see the sending as a base line when sending or receiving one packet at a
time. Because of the nic_bridge, the timing changed so that the ACK rate
somehow caused a slightly larger TCP window (you can check that with
wireshark). Because of batching, the receive numbers would be in turn
the current (and not so great ;) upper limit. That would be my three cents.
Programming a TCP/IP stack that actually works and performs in the wild
is complicated stuff and I guess we could keep our whole company busy,
just doing that. I hope this helps to explain some parts of your
observation,
Sebastian
--
Sebastian Sumpf
Genode Labs
http://www.genode-labs.com · http://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden
Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth