Characterization and Evaluation of TCP and UDP-based Transport on Real Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Characterization and Evaluation of TCP and UDP-based Transport on Real Networks

Description:

Title: Evaluation of Advanced TCP stacks on Fast Long-distance Production Networks Author: julio Last modified by: cottrell Created Date: 8/5/2001 1:39:00 AM – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 18

Provided by: jul9

Learn more at: https://www.slac.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Characterization and Evaluation of TCP and UDP-based Transport on Real Networks

1
Characterization and Evaluation of TCP and
UDP-based Transport on Real Networks

Les Cottrell, Saad Ansari, Parakram Khandpur,
Ruchi Gupta, Richard Hughes-Jones, Michael Chen,
Larry McIntosh, Frank Leers
SLAC, Manchester University, Chelsio and Sun
Site visit to SLAC by DoE program managers Thomas
Ndousse Mary Anne Scott
April 27, 2005
www.slac.stanford.edu/grp/scs/net/talk05/tcp-apr05
.ppt

Partially funded by DOE/MICS Field Work Proposal
on Internet End-to-end Performance Monitoring
(IEPM), also supported by IUPAP
2
Project goals

Evaluate various techniques for achieving high
bulk-throughput on fast long-distance real
production WAN links
Compare contrast ease of configuration,
throughput, convergence, fairness, stability etc.
For different RTTs
Recommend optimum techniques for data intensive
science (BaBar) transfers using bbftp, bbcp,
GridFTP
Validate simulator emulator findings provide
feedback

3
Techniques rejected

Jumbo frames
Not an IEEE standard
May break some UDP applications
Not supported on SLAC LAN
Sender mods only, HENP model is few big senders,
lots of smaller receivers
Simplifies deployment, only a few hosts at a few
sending sites
So no Dynamic Right Sizing (DRS)
Runs on production nets
No router mods (XCP/ECN)

4
Software Transports

Advanced TCP stacks
To overcome AIMD congestion behavior of Reno
based TCPs
BUT
SLAC datamover are all based on Solaris, while
advanced TCPs currently are Linux only
SLAC production systems people concerned about
non-standard kernels, ensuring TCP patches keep
current with security patches for SLAC supported
Linux version
So also very interested in transport that runs in
user space (no kernel mods)
Evaluate UDT from UIC folks

5
Hardware Assists

For 1Gbits/s paths, cpu, bus etc. not a problem
For 10Gbits/s they are more important
NIC assistance to the CPU is becoming popular
Checksum offload
Interrupt coalescence
Large send/receive ofload (LSO/LRO)
TCP Offload Engine (TOE)
Several vendors for 10Gbits/s NICs, at least one
for 1Gbits/s NIC
But currently restricts to using NIC vendors TCP
implementation
Most focus is on the LAN
Cheap alternative to Infiniband, MyriNet etc.

6
Protocols Evaluated

TCP (implementations as of April 2004)
Linux 2.4 New Reno with SACK single and parallel
streams (Reno)
Scalable TCP (Scalable)
Fast TCP
HighSpeed TCP (HSTCP)
HighSpeed TCP Low Priority (HSTCP-LP)
Binary Increase Control TCP (BICTCP)
Hamilton TCP (HTCP)
Layering TCP (LTCP)
UDP
UDT v2.

7
Methodology (1Gbit/s)

Chose 3 paths from SLAC
Caltech (10ms), Univ Florida (80ms), CERN (180ms)
Used iperf/TCP and UDT/UDP to generate traffic
Each run was 16 minutes, in 7 regions

SLAC
bottleneck
Caltech/UFL/CERN
TCP/UDP
Iperf or UDT
Ping 1/s
iperf
ICMP/ping traffic
4 mins
2 mins
8
Behavior Indicators

Achievable throughput
Stability S s/µ (standard deviation/average)
Intra-protocol fairness F

9
Behavior wrt RTT

10ms (Caltech) Throughput, Stability (small is
good), Fairness minimum (over regions 2 thru 6)
(closer to 1 is better)
Excl. FAST 72064Mbps, S0.180.04, F0.95
FAST 400120Mbps, S0.33, F0.88
80ms (U. Florida) Throughput, Stability
All 350103Mbps, S0.30.12, F0.82
180ms (CERN)
All 340130Mbps, S0.420.17, F0.81
The Stability and Fairness effects are more
manifest on longer RTT, so focus on CERN

10
Reno single stream

Low performance on fast long distance paths
AIMD (add a1 pkt to cwnd / RTT, decrease cwnd by
factor b0.5 in congestion)
Net effect recovers slowly, does not effectively
use available bandwidth, so poor throughput
Remaining flows do not take up slack when flow
removed

Multiple streams increase recovery rate
Congestion has a dramatic effect
SLAC to CERN
Recovery is slow
RTT increases when achieves best throughput
11
Fast

Also uses RTT to detect congestion
RTT is very stable s(RTT) 9ms vs 370.14ms for
the others

2nd flow never gets equal share of bandwidth
Big drops in throughput which take several
seconds to recover from
SLAC-CERN
12
HTCP

One of the best performers
Throughput is high
Big effects on RTT when achieves best throughput
Flows share equally

Appears to need gt1 flow to achieve best
throughput
Two flows share equally
SLAC-CERN
13
BICTCP

Needs gt 1 flow for best throughput

14
UDTv2

Similar behavior to better TCP stacks
RTT very variable at best throughputs
Intra-protocol sharing is good
Behaves well as flows add subtract

15
Overall
Proto Avg thru (Mbps) S (s/µ) min (F) s (RTT) MHz/ Mbps
Scal. 423115 0.27 0.83 22 0.64
BIC 412117 0.28 0.98 55 0.71
HTCP 402113 0.28 0.99 57 0.65
UDT 390136 0.35 0.95 49 1.2
LTCP 376137 0.36 0.56 41 0.67
Fast 335110 0.33 0.58 9 0.66
HSTCP 255187 0.73 0.79 25 0.9
Reno 248163 0.66 0.6 22 0.63
HSTCP-LP 228114 0.5 0.64 33 0.65
Scalable is one of best, but inter-protocol is
poor (see Bullot et al.) BIC HTCP are about
equal UDT is close, BUT cpu intensive (used to be
much (factor of 10) worse) Fast gives low RTT
values variability All TCP protocols use
similar cpu (HSTCP looks poor because throughput
low)
16
Conclusions