I've been tracking down a problem with ibrowser being extremely slow for me
(it took 64 sec to load
http://www.reactos.org). It turns out to be a
problem associated with the loopback interface. The attached test program
(gcc -o loop.exe loop.c -lws2_32) is the minimum test program to demonstrate
the problem. It occurs when you send a small amount of data over the
loopback interface when there is no pending recv (so you send, then start a
recv for the data, then another send, then another recv, no recv waiting
while you send).
Sequence of events:
- Start first send
- TCP/IP stack queues the data at the receiving end
- First send returns
- Start first recv
- Queued data is retrieved, but it is determined that there's plenty of
space in the TCP window left, so there is no ACK sent back
- First recv returns
- Start second send
- tcp_output determines that the connection is not idle (there is some
un-ACKed data) and queues the data at the sending end
- Second send returns
- Start second recv
- Since there is no data waiting at the receiving end, recv just sits there
- An internal timer with a period of 2.5 sec expires
- The timer proc notices that there is an ACK pending
- ACK is sent back to the sending end
- Sending end determines connection is idle now and sends the queued data to
the receiving end
- Timer proc terminates and reschedules itself
- Data has now arrived at the receiving end and can be retrieved by the
waiting recv. Again no ACK is sent back
- Second recv returns
- Third send starts, queues its data at sending end and returns
- Third recv starts, has to wait 2.5 sec for the internal timer to timeout
and then returns
- etc.
Although I understand what's wrong, I have a bit of difficulty trying to
figure out how to fix it. Attached is a proposed fix, which basically
attacks the problem at the sending end by removing the check if the
connection is idle or not. With that fix the loop.c program works as I
expect and
http://www.reactos.org loads in a much more reasonable 2-3 sec in
ibrowser. The problem that I have with the fix is that it's a change in code
we borrowed from the BSD stack. I can hardly imagine that a piece of
software so heavily used as the BSD stack would have such a fundamental
problem.
Any thoughts?
GvG