Title: TCP transfers over high latency/bandwidth network
1- TCP transfers over high latency/bandwidth
network - Grid TCP
- Sylvain Ravot
- sylvain_at_hep.caltech.edu
2Tests configuration
Lxusa-ge.cern.ch (Chicago)
GbE
POS 155 Mbps
GbE
GbE
Calren2 / Abilene
Pcgiga-gbe.cern.ch(Geneva)
Plato.cacr.caltech.edu(California)
Cernh 9
Ar1-chicago
- CERN (Geneva)lt--gtCaltech (California)
- RTT 175 ms
- Bandwith-delay product 3,4 MBytes.
- CERN lt--gt Chicago
- RTT 110 ms
- Bandwidth-delay-product 1.9 MBytes.
- Tcp flows were generated by Iperf.
- Tcpdump was used to capture packets flows
- Tcptrace and xplot were used to plot and
summarize tcpdump data set.
3TCP overview Slow Start and congestion Avoidance
Example
Here is an estimation of the cwnd (Output of
TCPtrace)
Cwnd average of the last 10 samples.
Cwnd average over the life of the connection to
that point
SSTHRESH
Slow start
Congestion Avoidance
- Slow start fast increase of the cwnd
- Congestion Avoidance slow increase of the
window size
4Influence of the initial SSTHRESH on TCP
performance
Slow start
Congestion avoidance
SSTHRESH 1460Kbyte
SSTHRESH 730Kbyte
Cwndf(time) ( Throughput 33 Mbit/s)
Cwndf(time) ( Throughput 63 Mbit/s)
- During congestion avoidance and without any
loss, the cwnd increases by one segment each RTT.
In our case, we have no loss, so the window
increases by 1460 bytes each 175 ms. If the cwnd
is equal to 730 kbyte, it takes more than 5
minutes to have a cwnd larger than the bandwidth
delay product (3,4 MByte). In other words, we
have to wait almost 5 minutes to use the whole
capacity of the link (155 Mbps)!!!
5Reactivity
- TCP reactivity
- Time to recover a 200 Mbps throughput after a
loss is larger than 50 seconds for a connection
between Chicago and CERN. - A single loss is disastrous
- TCP is much more sensitive to packet loss in WANs
than in LANs
53 sec
6Linux Patch GRID TCP
- Parameter tuning
- New parameter to better start a TCP transfer
- Set the value of the initial SSTHRESH
- Modifications of the TCP algorithms (RFC 2001)
- Modification of the well-know congestion
avoidance algorithm - During congestion avoidance, for every useful
acknowledgement received, cwnd increases by M
(segment size) (segment size) / cwnd.Its
equivalent to increase cwnd by M segments each
RTT. M is called congestion avoidance increment - Modification of the slow start algorithm
- During slow start, for every useful
acknowledgement received, cwnd increases by N
segments. N is called slow start increment. - Note N1 and M1 in common TCP implementations.
- Smaller backoff (Not implemented yet)
- Reduce the strong penalty imposed by a loss
- Reproduce the behavior of a Multi-streams TCP
connection. - Only the senders TCP stack need to be modified
- Alternative to Multi-streams TCP transfers
7TCP tuning by modifying the slow start increment
Slow start, 0.8s
Congestion window (cwnd) as function of the
timeSlow start increment 1, throughput 98
Mbit/s
Congestion window (cwnd) as function of the
timeSlow start increment 3, throughput 116
Mbit/s
Slow start , 1.2s
Slow start, 0.65s
Congestion window (cwnd) as function of the
timeSlow start increment 2, throughput 113
Mbit/s
Congestion window (cwnd) as function of the
timeSlow start increment 5, throughput 119
Mbit/s
8TCP tuning by modifying the congestion avoidance
increment (1)
Cwnd is increased by 1200 bytes in 27 sec.
Congestion window (cwnd) as function of the time
Congestion avoidance increment 1, throughput
37.5 Mbit/s
SSTHREH 0.783 Mbyte
Cwnd is increased by 12000 bytes(101200)in 27
sec.
Congestion window (cwnd) as function of the time
Congestion avoidance increment 10, throughput
61.5 Mbit/s
9Benefice of larger congestion avoidance increment
when losses occur
- We simulate losses by using a program which drops
packets according to a configured loss rate. For
the next two plots, the program drop one packet
every 10000 packets.
2) Fast Recovery (Temporary state until the
loss is repaired)
1) A packet is lost
3) cwndcwnd/2
Congestion window (cwnd) as function of the time
Congestion avoidance increment 1, throughput
8 Mbit/s
Congestion window (cwnd) as function of the time
Congestion avoidance increment 10, throughput
20 Mbit/s
- When a loss occur, the cwnd is divided by two.
The performance is determined by the speed at
which the cwnd increases after the loss. So
higher is the congestion avoidance increment,
better is the performance.
10TCP Performance Improvement
- Memory to memory transfers
TCP Grid on 622 Mbps US-CERN Link
TCP Grid on 2 X 155 Mbps US-CERN Link
TCP Grid on 155 Mbps US-CERN Link
By tuning TCP buffers
Without any tuning
- New bottlenecks
- Iperf is not able to perform long transfers
- Linux station with 32 bit 33 Mhz PCI bus (Will
replace with modern server)
11Conclusion
- To achieve high throughput over high
latency/bandwidth network, we need to - Set the initial slow start threshold (ssthresh)
to an appropriate value for the delay and
bandwidth of the link. - Avoid loss
- by limiting the max cwnd size.
- Recover fast if loss occurs
- Larger cwnd increment gt we increase faster the
cwnd after a loss - Smaller window reduction after a loss
- ..