Sunday, January 19, 2014

Further Adventures in TCP

TCP can be slow. This has lead to new protocols being invented. Oracle Coherence uses TCMP for data transmission (unlike Hazelcast where "communication among cluster members is always TCP/IP with Java NIO beauty" [1]). There are several implementations of UDT, a UDP based protocol, such as the ones found here (see Netty's use of the native libraries of Barchart-UDT in io.netty.testsuite.transport.udt.UDTClientServerConnectionTest).

ACK Knowledge

Why is TCP so slow? Well, first there is the overhead. Using tcpdump on the command line, I watched the communication between two boxes connected on the same LAN during a performance test. Here is the slightly edited output:

> sudo tcpdump -nn host 192.168.1.94 and 192.168.1.91 -i p7p1
.
.
21:59:46.105835 IP [CLIENT] > [SERVER] Flags [S], seq 3624779548, win 8192, options [mss 1460,nop,wscale 0,nop,nop,TS val 415343776 ecr 0,sackOK,eol], length 0
21:59:46.105842 IP [SERVER] > [CLIENT]: Flags [S.], seq 4258144914, ack 3624779549, win 2896, options [mss 1460,sackOK,TS val 8505455 ecr 415343776,nop,wscale 0], length 0 
21:59:46.113288 IP [CLIENT] > [SERVER] Flags [.], ack 1, win 8688, options [nop,nop,TS val 415343783 ecr 8505455], length 0
21:59:46.113554 IP [CLIENT] > [SERVER] Flags [P.], seq 1:1025, ack 1, win 8688, options [nop,nop,TS val 415343783 ecr 8505455], length 1024
21:59:46.113559 IP [SERVER] > [CLIENT]: Flags [.], ack 1025, win 2896, options [nop,nop,TS val 8505463 ecr 415343783], length 0 
21:59:46.113625 IP [CLIENT] > [SERVER] Flags [.], seq 1025:2473, ack 1, win 8688, options [nop,nop,TS val 415343783 ecr 8505455], length 1448
21:59:46.113843 IP [SERVER] > [CLIENT]: Flags [.], ack 2473, win 2896, options [nop,nop,TS val 8505463 ecr 415343783], length 0 
21:59:46.120443 IP [CLIENT] > [SERVER] Flags [.], seq 2473:3921, ack 1, win 8688, options [nop,nop,TS val 415343790 ecr 8505463], length 1448
21:59:46.120695 IP [CLIENT] > [SERVER] Flags [.], seq 3921:5369, ack 1, win 8688, options [nop,nop,TS val 415343791 ecr 8505463], length 1448
21:59:46.120710 IP [SERVER] > [CLIENT]: Flags [.], ack 5369, win 2896, options [nop,nop,TS val 8505470 ecr 415343790], length 0 
21:59:46.127322 IP [CLIENT] > [SERVER] Flags [.], seq 5369:6817, ack 1, win 8688, options [nop,nop,TS val 415343797 ecr 8505470], length 1448
21:59:46.127370 IP [CLIENT] > [SERVER] Flags [.], seq 6817:8265, ack 1, win 8688, options [nop,nop,TS val 415343797 ecr 8505470], length 1448
21:59:46.127588 IP [SERVER] > [CLIENT]: Flags [.], ack 8265, win 2896, options [nop,nop,TS val 8505477 ecr 415343797], length 0 
21:59:46.132814 IP [CLIENT] > [SERVER] Flags [.], seq 8265:9713, ack 1, win 8688, options [nop,nop,TS val 415343804 ecr 8505477], length 1448
21:59:46.132817 IP [CLIENT] > [SERVER] Flags [P.], seq 9713:10011, ack 1, win 8688, options [nop,nop,TS val 415343804 ecr 8505477], length 298
21:59:46.133006 IP [SERVER] > [CLIENT]: Flags [.], ack 10011, win 2896, options [nop,nop,TS val 8505482 ecr 415343804], length 0 
21:59:46.133170 IP [SERVER] > [CLIENT]: Flags [P.], seq 1:3, ack 10011, win 2896, options [nop,nop,TS val 8505483 ecr 415343804], length 2 
21:59:46.133177 IP [SERVER] > [CLIENT]: Flags [R.], seq 3, ack 10011, win 2896, options [nop,nop,TS val 8505483 ecr 415343804], length 0 
21:59:46.139801 IP [CLIENT] > [SERVER] Flags [.], ack 3, win 8686, options [nop,nop,TS val 415343809 ecr 8505483], length 0

The server to client communication is in blue and mostly consists of acknowledgements.

"The peer TCP must acknowledge the data, and as the ACKs arrive from the peer, only then can our TCP discard the acknowledged data from the socket send buffer. TCP must keep a copy of our data until it is acknowledged by the peer" [2].

Handshakes

If you take a look at the first three lines of that TCP dump, you'll see the 3-way handshake taking about 7 ms - that's about 20% of the overall call time. So, if you're connecting each time you talk to your server, you might want to keep the connection open and pool them.

There are moves afoot to exploit these packets to carry application data [4].

Slow starters

This in itself may not be enough. If a connection has been idle for a sufficiently large amount of time, something called Slow-Start Restart [3] may occur. This may depend on your kernel settings.

First, what is Slow-Start?

"The only way to estimate the available capacity between the client and the server is to measure it by exchanging data, and this is precisely what slow-start is designed to do. To start, the server initializes a new congestion window (cwnd) ... The cwnd variable is not advertised or exchanged between the sender and receiver... Further, a new rule is introduced: the maximum amount of data in flight (not ACKed) between the client and the server is the minimum of the rwnd and cwnd variables... [We aim] to start slow and to grow the window size as the packets are acknowledged: slow-start!" [3]

The size of this variable on my system is given by:

[henryp@corsair Blogs]$ grep -A 2 initcwnd `find /usr/src/kernels/3.6.10-2.fc17.x86_64/include  -type f -iname '*h'`
/usr/src/kernels/3.6.10-2.fc17.x86_64/include/net/tcp.h:/* TCP initial congestion window as per draft-hkchu-tcpm-initcwnd-01 */
/usr/src/kernels/3.6.10-2.fc17.x86_64/include/net/tcp.h-#define TCP_INIT_CWND 10
/usr/src/kernels/3.6.10-2.fc17.x86_64/include/net/tcp.h-

The receive window size is exchanged in the packet headers [5].

Cache Connections as a Solution

Perhaps the obvious response is to cache the connections on the client side. But beware of slow-start restart "which resets the congestion window of a connection after it has been idle for a defined period of time". [3]

On a Linux box, this can be checked with:

sysctl net.ipv4.tcp_slow_start_after_idle

where an output of 1 indicates that it this functionality is turned on. It is recommended that this is turned off if you want your client to cache connections [3].

Dropped Packets

Don't worry about dropped packets in TCP - they are essential. "In fact, packet loss is necessary to get the best performance from TCP! " [3]

[1] Hazelcast Community Edition.
[2] Unix Network Programming, p58, Stevens et al.
[3] High Performance Browser Networking - O'Reilly.
[4] TCP Fast Open: expediting web services - lwn.net
[5] Spy on Yourself with tcpdump




No comments:

Post a Comment