HighPerformance Transport Protocols for DataIntensive WorldWide Grids - PowerPoint PPT Presentation

About This Presentation

Title:

HighPerformance Transport Protocols for DataIntensive WorldWide Grids

Description:

T. Kelly, University of Cambridge, UK. S. Ravot, Caltech, USA ... Kelly's proposal during internship at CERN: (lwnd,a,b) = (16, 0.01, 0.125) ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 38

Provided by: jpma2

Category:

more less

Transcript and Presenter's Notes

Title: HighPerformance Transport Protocols for DataIntensive WorldWide Grids

1
High-Performance Transport Protocols for
Data-Intensive World-Wide Grids

T. Kelly, University of Cambridge, UK
S. Ravot, Caltech, USA
J.P. Martin-Flatin, CERN, Switzerland

2
Outline

Overview of DataTAG project
Problems with TCP in data-intensive Grids
Problem statement
Analysis and characterization
Solutions
Scalable TCP
GridDT
Future Work

3
Overview ofDataTAG Project
4
Member Organizations
http//www.datatag.org/
5
Project Objectives

Build a testbed to experiment with massive file
transfers (TBytes) across the Atlantic
Provide high-performance protocols for gigabit
networks underlying data-intensive Grids
Guarantee interoperability between major HEP Grid
projects in Europe and the USA

6
DataTAG Testbed
v10chi v11chi v12chi v13chi
VTHD/INRIA
w01gva w02gva w03gva w04gva w05gva w06gva w20gva v
02gva v03gva
w03chi w04chi w05chi
w01chi w02chi
w06chi
stm16 (FranceTelecom)
ONS15454
SURFNET CESNET
r06gva Alcatel7770
ONS15454
stm64 (GC)
w03
stm64
3x
2x
3x
SURF NET
CNAF
GEANT
cernh7
8x
7x
2x
r06chi-Alcatel7770
w01bol
stm16(Colt) backupprojects
w02chi
Alcatel 1670
Alcatel 1670
r05gva-JuniperM10
r05chi-JuniperM10
r04gva Cisco7606
r04chi Cisco7609
stm16 (DTag)
s01chi Extreme S5i
s01gva Extreme S1i
1000baseSX
1000baseT
Chicago
Geneva
10GbaseLX
SDH/Sonet
CCC tunnel
Edoardo Martelli
7
Network Research Activities

Enhance performance of network protocols for
massive file transfers
Data-transport layer TCP, UDP, SCTP
QoS
LBE (Scavenger)
Equivalent DiffServ (EDS)
Bandwidth reservation
AAA-based bandwidth on demand
Lightpaths managed as Grid resources
Monitoring

8
Problems with TCP inData-Intensive Grids
9
Problem Statement

End-users perspective
Using TCP as the data-transport protocol for
Grids leads to a poor bandwidth utilization in
fast WANs
Network protocol designers perspective
TCP is inefficient in high bandwidthdelay
networks because
few TCP implementations have been tuned for
gigabit WANs
TCP was not designed with gigabit WANs in mind

10
Design Problems (1/2)

TCPs congestion control algorithm (AIMD) is not
suited to gigabit networks
Due to TCPs limited feedback mechanisms, line
errors are interpreted as congestion
Bandwidth utilization is reduced when it
shouldnt
RFC 2581 (which gives the formula for increasing
cwnd) forgot delayed ACKs
Loss recovery time twice as long as it should be

11
Design Problems (2/2)

TCP requires that ACKs be sent at most every
second segment
Causes ACK bursts
Bursts are difficult to handle by kernel and NIC

12
AIMD (1/2)

Van Jacobson, SIGCOMM 1988
Congestion avoidance algorithm
For each ACK in an RTT without loss, increase
For each window experiencing loss, decrease
Slow-start algorithm
Increase by one MSS per ACK until ssthresh

13
AIMD (2/2)

Additive Increase
A TCP connection increases slowly its bandwidth
utilization in the absence of loss
forever, unless we run out of send/receive
buffers or detect a packet loss
TCP is greedy no attempt to reach a stationary
state
Multiplicative Decrease
A TCP connection reduces its bandwidth
utilization drastically whenever a packet loss is
detected
assumption line errors are negligible, hence
packet loss means congestion

14
Congestion Window (cwnd)
slow start
loss
congestion avoidance
buffering beyond bandwidthdelay
15
Disastrous Effect of Packet Losson TCP in Fast
WANs (1/2)
AIMD C1 Gbit/s MSS1,460 Bytes
16
Disastrous Effect of Packet Losson TCP in Fast
WANs (2/2)

Long time to recover from a single loss
TCP should react to congestion rather than packet
loss
line errors and transient faults in equipment are
no longer negligible in fast WANs
TCP should recover quicker from a loss
TCP is particularly sensitive to packet loss in
fast WANs (i.e., when both cwnd and RTT are large)

17
Characterization of the Problem (1/2)

The responsiveness r measures how quickly we
go back to using the network link at full
capacity after experiencing a loss (i.e., loss
recovery time if loss occurs when bandwidth
utilization network link capacity)

2
C . RTT
r
2 . inc
18
Characterization of the Problem (2/2)
inc size MSS 1,460 Bytes
19
Congestion vs. Line Errors
RTT120 ms, MTU1,500 Bytes, AIMD
At gigabit speed, the loss rate required for
packet loss to be ascribed only to congestion is
unrealistic with AIMD
20
Single TCP Stream Performance under Periodic
Losses
MSS1,460 Bytes

Loss rate 0.01
LAN BW utilization 99
WAN BW utilization1.2

21
Solutions
22
What Can We Do?

To achieve higher throughputs over high
bandwidthdelay networks, we can
Change AIMD algorithm
Use larger MTUs
Change the initial setting of ssthresh
Avoid losses in end hosts
Two proposals
Kelly Scalable TCP
Ravot GridDT

23
Delayed ACKs with AIMD

RFC 2581 (spec. defining TCP congestion control
AIMD algorithm) erred
Implicit assumption one ACK per packet
In reality one ACK every second packet with
delayed ACKs
Responsiveness multiplied by two
Makes a bad situation worse in fast WANs
Problem fixed by ABC in RFC 3465 (Feb 2003)
Not implemented in Linux 2.4.21

24
Delayed ACKs with AIMD and ABC
25
Scalable TCP Algorithm

For cwndgtlwnd, replace AIMD with new algorithm
for each ACK in an RTT without loss
cwndi1 cwndi a
for each window experiencing loss
cwndi1 cwndi (b x cwndi)
Kellys proposal during internship at
CERN(lwnd,a,b) (16, 0.01, 0.125)
Trade-off between fairness, stability, variance
and convergence

26
Scalable TCP lwnd
27
Scalable TCP Advantages

Responsiveness is independent of capacity
Responsiveness improves dramatically for gigabit
networks

28
Scalable TCP Responsiveness Independent of
Capacity
29
Scalable TCPImproved Responsiveness

Responsiveness for RTT200 ms and MSS1,460
Bytes
Scalable TCP 3 s
AIMD
3 min at 100 Mbit/s
1h 10min at 2.5 Gbit/s
4h 45min at 10 Gbit/s
Patch available for Linux kernel 2.4.19
http//www-lce.eng.cam.ac.uk/ctk21/scalable/

30
Scalable TCP vs. AIMDBenchmarking
Bulk throughput tests with C2.5 Gbit/s. Flows
transfer 2 GBytes and start again for 20 min.
31
GridDT Algorithm

Congestion avoidance algorithm
For each ACK in an RTT without loss, increase
By modifying A dynamically according to RTT,
GridDT guarantees fairness among TCP connections

32
AIMD RTT Bias