Title: Transmission Rate Controlled TCP in Data Reservoir Software control approach Mary Inaba
1Transmission Rate Controlled TCPin Data
Reservoir - Software control approach -Mary
Inaba
- University of Tokyo
- Fujitsu Laboratories
- Fujitsu Computer Technologies
2Data intensive scientific computation through
global networks
X-ray astronomy Satellite ASUKA
Nobeyama Radio Observatory (VLBI)
Nuclear experiments
Belle Experiments
Data Reservoir
Very High-speed Network
Digital Sky Survey
Distributed Shared files
Data Reservoir
SUBARU Telescope
Data Reservoir
Local Accesses
Grape6
Data analysis at University of Tokyo
3Research Projects with Data Reservoir
4Dream Computing System for real Scientists
- Fast CPU, huge memory and disks, good graphics
- Cluster technology, DSM technology, Graphics
processors - Grid technology
- Very fast remote file accesses
- Global file system, data parallel file systems,
Replication facilities - Transparency to local computation
- No complex middleware, no or small modification
to existing software - Real Scientists are not computer scientists
- Computer scientists are not work forces for real
scientists
5Objectives of Data Reservoir
- Sharing Scientific Data between distant research
institutes - Physics, astronomy, earth science, simulation
data - Very High-speed single file transfer on Long Fat
pipe Network - gt 10 Gbps, gt 20,000 Km (12,500 miles), gt 400ms
RTT - High utilization of available bandwidth
- Transferred file data rate gt 90 of available
bandwidth - Including header overheads, initial negotiation
overheads - OS and File system transparency
- Storage level data sharing (high speed iSCSI
protocol on stock TCP) - Fast single file transfer
6Basic Architecture
High latency Very high bandwidth Network
Data Reservoir
Disk-block level Parallel and Multi-stream
transfer
Local file accesses
Cache Disks
Data Reservoir
Distribute Shared Data (DSM like architecture)
Local file accesses
Cache Disks
7Data Reservoir Features
- Data sharing in low-level protocol
- Use of iSCSI protocol
- Efficient data transfer (optimization of disk
head movements) - File system transparency
- Single file image
- Multi-level striping for performance scalability
- Local file accesses through LAN
- Global disk transfer through WAN
Unified by iSCSI protocol
8File accesses on Data Reservoir
Scientific Detectors
User Programs
1st level striping
File Server
File Server
File Server
File Server
Disk access by iSCSI
IP Switch
IP Switch
2nd level striping
Disk Server
Disk Server
Disk Server
Disk Server
IBM x345 (2.6GHz x 2)
9Global Data Transfer
10BW behavior
Data Reservoir
Transfer through A file system
Bandwidth(Mbps)
Bandwidth(Mbps)
Time (sec)
Time (sec)
11Problems of BWC2002 experiments
- Low TCP bandwidth due to packet losses
- TCP congestion window size control
- Very slow recovery from fast recovery phase
(gt20min) - Unbalance among parallel iSCSI streams
- Packet scheduling by switches and routers
- User and other network users have interests only
to total behavior of parallel TCP streams
12Fast Ethernet vs. GbE
- Iperf in 30 seconds
- Min/Avg Fast Ethernet gt GbE
FE
GbE
13Packet Transmission Rate
- Bursty behavior
- Transmission in 20ms against RTT 200ms
- Idle in rest 180ms
Packet loss occurred
14Packet Spacing
- Ideal Story
- Transmitting packet
- every RTT/cwnd
- 24µs interval for
- 500Mbps (MTU 1500B)
- High load for software
- only
- Low overhead because of
- limited use at slow start phase
RTT
RTT/cwnd
15Example Case of 8 IPG
- Success on Fast Retransmit
- Smooth Transition to Congestion Avoidance
- CA takes 28 minutes to recover to 550Mbps
16Best Case of 1023B IPG
- Like Fast Ethernet case
- Proper transmission rate
- Spurious Retransmit due to Reordering
17Performance Divergence on LFN
- Parallel streams
- Difference grows adversely
- Slowest stream determines total performance
18Unbalance within parallel TCP streams
- Unbalance among parallel iSCSI streams
- Packet scheduling by switches and routers
- Meaningless unfairness among parallel streams
- User and other network users have interests only
to total behavior of parallel TCP streams - Our approach
- Constant Scwnd i for fair TCP network usage to
other users - Balance each cwnd i communicating between
parallel TCP streams
BW
BW
time
time
19BW2003 US-Japan experiments
- 24000 km (15,000 miles) distance (400ms RTT)
- Phoenix ? Tokyo ? Portland ? Tokyo
- OC-48 x 3 OC-192
OC-192 - GbE x 1
- Transfer 1TB file
- 32 servers, 128 iSCSI disks
DR
DR
10G Ether x 2
10G Ether
GbE x 4
OC-48 x 2
Phoenix
Tokyo
Seattle
Tokyo
Chicago
Portland
L.A.
OC-48
N.Y.
OC-192
OC-192
GbE
Abilene
IEEAF/ WIDE
IEEAF/ WIDE
NTT Com, APAN, SUPER-SINET
2024,000km(15,000miles)
OC-48 x 3 GbE x 4
OC-192
15,680km (9,800miles)
8,320km (5,200miles)
Juniper T320
21SC2002
- BWC2002
- 560Mbps (200ms RTT)
- 95 Utilization of available bandwidth
- U. of Tokyo ? Scinet (Maryland, USA)
- ? Data Reservoir can saturate 10Gbps network when
it will be available for US-JAPAN connection
22Results
- BWC2002
- Tokyo ? Baltimore 10,800km (6,700miles)
- Peak bandwidth (on network) 600 Mbps
- Average file transfer bandwidth 560 Mbps
- Bandwidth-distance products 6,048
terabit-kilometers/second - BWC results (pre-test)
- Phoenix ? Tokyo ? Portland ? Tokyo 24,000 km
(15,000 miles) - Peak bandwidth (on network) gt (8 Gbps)
- Average file transfer bandwidth gt (7 Gbps)
- Bandwidth-distance products gt (168
petabit-kilometers/second) - More than 25 times improvement from BWC2002
performance (bandwidth-distance products)
23Bad News
- Network cut-down on 11/8
- US-Japan north route connection has been
completely out of order - 23 weeks are necessary to repair the under-sea
fibers. - Planned BW 11.2 Gbps (OC48 x 3
GbE x 4) - Actual maximum BW ? 8.2 Gbps (OC48 x 3
GbE x 1)
24How your science benefits from high performance,
high bandwidth networking
- Easy and transparent access to remote scientific
data - Without special programming (normal NFS style
accesses) - Purely software approach with IA servers
- Utilization of high-BW network for his data
- 17 minutes for 1TB file transfer from the
opposite location on earth - High utilization factor (gt 90)
- Good for both scientists and network agencies
- Scientists can concentrate on his research topics
- Good for both Scientists and Computer Scientists
25Summary
- The most distant data transfer at BWC2003 (24,000
km) - Software techniques for improving efficiency and
stability - Transfer Rate Control on TCP
- CWND balancing on parallel TCP
- Based on stock TCP algorithm
- Possibly highest bandwidth-distance products for
file transfer between two points - Still high utilization of available bandwidth
26BWC 2003 Experiment is supported by
NTT / VERIO