Title: Masaki Hirabaru
1Performance Measurement on Large Bandwidth-Delay
Product Networks
3rd e-VLBI Workshop October 6, 2004 Makuhari,
Japan
- Masaki Hirabaru
- masaki_at_nict.go.jpNICT Koganei
2An ExampleHow much speed can we get?
a-1)
High-Speed Backbone
GbE
100M
GbE
Receiver
Sender
L2/L3SW
RTT 200ms
a-2)
High-Speed Backbone
GbE
GbE
100M
100M
Receiver
SW
SW
Sender
RTT 200ms
3Average TCP Throughput less than 20Mbps
In case we limit the sending rate at 100Mbps
4An Example (2)
b)
High-Speed Backbone
GbE
GbE
Sender
Receiver
Only 900 Mbps available
RTT 200ms
5Purposes
- Measure, analyze and improve end-to-end
performance in high bandwidth-delay product,
packet-switched networks - to support for networked science applications
- to help operations in finding a bottleneck
- to evaluate advanced transport protocols (e.g.
Tsunami, SABUL, HSTCP, FAST, XCP, ours) - Improve TCP under easier conditions
- with a signle TCP stream
- memory to memory
- bottleneck but no cross traffic
- Consume all the available bandwidth
6TCP on a path with bottleneck
queue
overflow
loss
bottleneck
The sender may generate burst traffic. The
sender recognizes the overflow after the delay gt
RTT/2. The bottleneck may change over time.
7Web100 (http//www.web100.org)
- A kernel patch for monitoring/modifying TCP
metrics in Linux kernel - We need to know TCP behavior to identify a
problem.
- Iperf (http//dast.nlanr.net/Projects/Iperf/)
- TCP/UDP bandwidth measurement
- bwctl (http//e2epi.internet2.edu/bwctl/)
- Wrapper for iperf with authentication and
scheduling
- tcpplot
- visualizer for web100 data
81st Step Tuning a Host with UDP
- Remove any bottlenecks on a host
- CPU, Memory, Bus, OS (driver),
- Dell PowerEdge 1650 (not enough power)
- Intel Xeon 1.4GHz x1(2), Memory 1GB
- Intel Pro/1000 XT onboard PCI-X (133Mhz)
- Dell PowerEdge 2650
- Intel Xeon 2.8GHz x1(2), Memory 1GB
- Intel Pro/1000 XT PCI-X (133Mhz)
- Iperf UDP throughput 957 Mbps
- GbE wire rate headers UDP(8B)IP(20B)EthernetII
(38B) - Linux 2.4.26 (RedHat 9) with web100
- PE1650 TxIntDelay0
92nd Step Tuning a Host with TCP
- Maximum socket buffer size (TCP window size)
- net.core.wmem_max net.core.rmem_max (64MB)
- net.ipv4.tcp_wmem net.tcp4.tcp_rmem (64MB)
- Driver descriptor length
- e1000 TxDescriptors1024 RxDescriptors256
(default) - Interface queue length
- txqueuelen100 (default)
- net.core.netdev_max_backlog300 (default)
- Interface queue descriptor
- fifo (default)
- MTU
- mtu1500 (IP MTU)
- Iperf TCP throughput 941 Mbps
- GbE wire rate headers TCP(32B)IP(20B)EthernetI
I(38B) - Linux 2.4.26 (RedHat 9) with web100
- Web100 (incl. High Speed TCP)
- net.ipv4.web100_no_metric_save1 (do not store
TCP metrics in the route cache) - net.ipv4.WAD_IFQ1 (do not send a congestion
signal on buffer full) - net.ipv4.web100_rbufmode0 net.ipv4.web100_sbufmod
e0 (disable auto tuning)
10TransPAC/I2 Test High Speed TCP (60 mins)
From Tokyo to Indianapolis
11Test in a Laboratory with Bottleneck
L2SW (FES12GCF)
PE 2650
PE 1650
Receiver
Sender
GbE/T
GbE/T
GbE/SX
Network Emulator
Bandwidth 800Mbps Delay 88 ms Loss 0
2BDP 16MB BGP Bandwidth Delay Product
12Laboratory Tests 800Mbps Bottleneck
TCPNewReno (Linux)
HighSpeedTCP (Web100)
13BIC TCP
buffer size100packets
buffer size1000packets
14FAST TCP
buffer size100packets
buffer size1000packets
15Identify the Bottleneck
- existing tools pathchar, pathload, pathneck,
etc. - Available bandwidth along the path
- How much the bottleneck (router) buffer size?
- pathbuff (under development)
- measuring buffer size at the bottleneck
- sending a packet train then detect a loss and
delay
16A Method of Measuring Buffer Size
network with bottleneck
T
packet train
Sender
Receiver
n packets
Capacity C
17Typical cases of congestion points
Congestion Point with small buffer(100
packets)
Congestion Point with large buffer(gt1000
packets)
Router
Switch
Router
Inexpensive, butPoor TCP performancefor high
BW delay path
Better TCP performancefor high BW delay path
18Summary
- Performance measurement to get a reliable result
and identify a bottleneck - Bottleneck buffer size impact on the result
Future Work
- Performance measurement platform in cooperation
with applications
19Network Diagram for e-VLBI and test servers
Seoul XP
10G
Korea
Kashima
100km
Daejon
bwctl server
JGNII
KOREN
perf server
1G (10G)
Taegu
Tokyo XP
SWITCH
2.5G
Kwangju
Busan
Koganei
e-vlbi server
1G
1G(10G)
250km
GEANT
2.5G SONET
TransPAC / JGN II
10G
APII/JGNII
7,000km
10G
2.5G
Kitakyushu
1,000km
9,000km
Chicago
MIT Haystack
1G (10G)
1G
Fukuoka
Abilene
2.4G (x2)
Genkai XP
Fukuoka Japan
10G
4,000km
Washington DC
Los Angeles
Indianapolis
Performance Measurement Point Directoryhttp//e2
epi.internet2.edu/pipes/pmp/pmp-dir.html