I've been tracking down a problem with ibrowser being extremely slow for me (it took 64 sec to load http://www.reactos.org). It turns out to be a problem associated with the loopback interface. The attached test program (gcc -o loop.exe loop.c -lws2_32) is the minimum test program to demonstrate the problem. It occurs when you send a small amount of data over the loopback interface when there is no pending recv (so you send, then start a recv for the data, then another send, then another recv, no recv waiting while you send).
Sequence of events: - Start first send - TCP/IP stack queues the data at the receiving end - First send returns - Start first recv - Queued data is retrieved, but it is determined that there's plenty of space in the TCP window left, so there is no ACK sent back - First recv returns - Start second send - tcp_output determines that the connection is not idle (there is some un-ACKed data) and queues the data at the sending end - Second send returns - Start second recv - Since there is no data waiting at the receiving end, recv just sits there - An internal timer with a period of 2.5 sec expires - The timer proc notices that there is an ACK pending - ACK is sent back to the sending end - Sending end determines connection is idle now and sends the queued data to the receiving end - Timer proc terminates and reschedules itself - Data has now arrived at the receiving end and can be retrieved by the waiting recv. Again no ACK is sent back - Second recv returns - Third send starts, queues its data at sending end and returns - Third recv starts, has to wait 2.5 sec for the internal timer to timeout and then returns - etc.
Although I understand what's wrong, I have a bit of difficulty trying to figure out how to fix it. Attached is a proposed fix, which basically attacks the problem at the sending end by removing the check if the connection is idle or not. With that fix the loop.c program works as I expect and http://www.reactos.org loads in a much more reasonable 2-3 sec in ibrowser. The problem that I have with the fix is that it's a change in code we borrowed from the BSD stack. I can hardly imagine that a piece of software so heavily used as the BSD stack would have such a fundamental problem.
Any thoughts?
GvG
A correctly implemented TCP stack will keep sending packets as long as there is free space in the window. If the existing code refuses to send more packets because there is SOME unacknowledged data, but enough room in the window to send more, then it is broken.
The entire point of having a window is so that the sender can send multiple packets before the first is ACK'd. Are you sure that the existing code refuses to send even when the window should allow it? Or is the window just too small? If so, are you sure it came directly from the bsd stack? I can not believe that both are true.
Ge van Geldorp wrote:
Although I understand what's wrong, I have a bit of difficulty trying to figure out how to fix it. Attached is a proposed fix, which basically attacks the problem at the sending end by removing the check if the connection is idle or not. With that fix the loop.c program works as I expect and http://www.reactos.org loads in a much more reasonable 2-3 sec in ibrowser. The problem that I have with the fix is that it's a change in code we borrowed from the BSD stack. I can hardly imagine that a piece of software so heavily used as the BSD stack would have such a fundamental problem.
Any thoughts?
GvG
On Tue, 20 Dec 2005 23:45:41 -0500 Phillip Susi psusi@cfl.rr.com wrote:
A correctly implemented TCP stack will keep sending packets as long as there is free space in the window. If the existing code refuses to send more packets because there is SOME unacknowledged data, but enough room in the window to send more, then it is broken.
The entire point of having a window is so that the sender can send multiple packets before the first is ACK'd. Are you sure that the existing code refuses to send even when the window should allow it? Or is the window just too small? If so, are you sure it came directly from the bsd stack? I can not believe that both are true.
Ge van Geldorp wrote:
Although I understand what's wrong, I have a bit of difficulty trying to figure out how to fix it. Attached is a proposed fix, which basically attacks the problem at the sending end by removing the check if the connection is idle or not. With that fix the loop.c program works as I expect and http://www.reactos.org loads in a much more reasonable 2-3 sec in ibrowser. The problem that I have with the fix is that it's a change in code we borrowed from the BSD stack. I can hardly imagine that a piece of software so heavily used as the BSD stack would have such a fundamental problem.
Any thoughts?
GvG
The problem is likely in our loopback pseudo adapter rather than the BSD code, I'd guess.
It actually doesn't come directly from BSD. It was first in oskit, in which some work was done to isolate a few parts of FreeBSD and simplify and generalize the buffer management code. If there's a mistake in the TCP implementation itself, I might have introduced it. Since the IP and TCP code had stuff that overlaps code we already had in tcpip.sys, I narrowed the scope of the BSD code to make the import more compact (and preserve as much original code as possible in ReactOS).
From: Phillip Susi
The entire point of having a window is so that the sender can send multiple packets before the first is ACK'd. Are you sure that the existing code refuses to send even when the window should allow it?
I've been wrong before, but that's my interpretation of what's happening. We've only sent one byte and are trying to send another single byte.
Or is the window just too small?
Since a window size of 1 doesn't make sense, the window size must be big enough. I believe it is actually 16384, plenty of room for the 2 bytes we've sent/are trying to send.
From: art yerkes
The problem is likely in our loopback pseudo adapter rather than the BSD code, I'd guess.
Please don't take this the wrong way (I'm still impressed by the work you and Casper did on integrating the network stack), but that was my initial thought too... However, as far as I can see the second send just doesn't make it to the loopback adapter, it's tcp_output() which decides it shouldn't be sent.
It actually doesn't come directly from BSD. It was first in oskit, in which some work was done to isolate a few parts of FreeBSD and simplify and generalize the buffer management code. If there's a mistake in the TCP implementation itself, I might have introduced it.
I checked the FreeBSD code and there seems to be a fix in there (which is not present in our tree) related to this (http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c#rev1.53) :
"Add a flag TF_LASTIDLE, that forces a previously idle connection to send all its data, especially when the data is less than one MSS. This fixes an issue where the stack was delaying the sending of data, eventhough there was enough window to send all the data and the sending of data was emptying the socket buffer."
That same TF_LASTIDLE flag is also present in OpenBSD (not surprising, since they say they imported it from FreeBSD). I'll see if I can apply the same change to our tree.
GvG
Ge van Geldorp wrote:
I've been wrong before, but that's my interpretation of what's happening. We've only sent one byte and are trying to send another single byte.
Wait a second... why is it sending only one byte at a time? The caller is only send()ing one byte at a time? The caller really should not be doing that. You may be seeing the nagle algorithm kicking in then, which specifically tries to wait for an ack before sending more frames, _unless_ there is enough queued data to send another complete frame. This prevents tons of single byte frames from flooding the network, instead coalescing them into larger frames.
I'm curious now about the application. Why is it sending one byte at a time to itself via the loopback? That seems to be broken.
From: Phillip Susi
Ge van Geldorp wrote:
I've been wrong before, but that's my interpretation of what's happening. We've only sent one byte and are trying to send another single byte.
Wait a second... why is it sending only one byte at a time? The caller is only send()ing one byte at a time?
Yup.
The caller really should not be doing that.
Hmm, I just read a Larry Wall quote: "we're not terribly interested in telling people what they can't do" :-)
You may be seeing the nagle algorithm kicking in then, which specifically tries to wait for an ack before sending more frames, _unless_ there is enough queued data to send another complete frame. This prevents tons of single byte frames from flooding the network, instead coalescing them into larger frames.
I'm curious now about the application. Why is it sending one byte at a time to itself via the loopback? That seems to be broken.
The application is the Mozilla ActiveX control. My guess (but I haven't studied its source code to confirm this) is that it is doing this as a form of inter-thread communication. It might be broken, but it works ok on Windows, so we have to make it work also. In my first post I've included a simple test app which demonstrates the behaviour. I can send it again if you like.
GvG
This is the loopback adapter we are talking about, we won't be flooding any "network" with that traffic.
On 12/21/05, Phillip Susi psusi@cfl.rr.com wrote:
Ge van Geldorp wrote:
I've been wrong before, but that's my interpretation of what's happening. We've only sent one byte and are trying to send another single byte.
Wait a second... why is it sending only one byte at a time? The caller is only send()ing one byte at a time? The caller really should not be doing that. You may be seeing the nagle algorithm kicking in then, which specifically tries to wait for an ack before sending more frames, _unless_ there is enough queued data to send another complete frame. This prevents tons of single byte frames from flooding the network, instead coalescing them into larger frames.
I'm curious now about the application. Why is it sending one byte at a time to itself via the loopback? That seems to be broken.
Ros-dev mailing list Ros-dev@reactos.org http://www.reactos.org/mailman/listinfo/ros-dev
-- <Russell> argh <Russell> iterator shenanigans :/