Title: Automatic Parameter Configuration Mechanism for Data Transfer Protocol GridFTP
1Automatic Parameter Configuration Mechanism for
Data Transfer Protocol GridFTP
- Takeshi ItoHiroyuki OhsakiMakoto Imase
- Graduate School of Information Science and
TechnologyOsaka University, Japan
2Contents
- Background
- GridFTP
- Automatic Parameter Configuration Mechanism
(APCM) - Design strategy
- Basic idea
- Simulation
- Conclusions
3Background
- TCP (Transmission Control Protocol)
- Has been widely used in the Internet
- Has a difficulty in maintaining high throughput
- In a network with high packet loss probability
- GridFTP
- A protocol to effectively transfer large volume
data - Solve the existing TCP problems
- Currently under standardization by GGF (The
Global Grid Forum)
4GridFTP
- Features
- Parallel data transfer
- Automatic negotiation of TCP socket buffer size
- Third-party control of file transfer
- Partial file transfer
- Security
- Reliable file transfer
- Examination has not been conducted sufficiently
so far - How to configure GridFTP control parameters
5Objectives
- Propose a mechanism
- i.e., Automatic Parameter Configuration Mechanism
(APCM) - Optimizes the number of parallel TCP connections
- Improves GridFTP performance
- Demonstrate the effectiveness of APCM
6Related Works
9 S. Thulasidasan, W. Feng, and M. K. Gardner,
Optimizing GridFTP through dynamic
right-sizing, in Proceedings of IEEE
International Symposium on High Performance
Distributed Computing, June 2003.1 T. Ito, H.
Ohsaki, and M. Imase, On parameter tuning of
data transfer protocol GridFTP in wide-area Grid
computing, in Proceedings of Second
International Workshop on Networks for Grid
Applications (GridNets 2005), Oct. 2005, pp.
415.421.
- Proposed a method 9
- Determine the required TCP socket buffer size
- Realized by extending the SBUF command
- Optimal number of parallel TCP connections and
TCP socket buffer size are derived 1 - Using a TCP fluid-flow model
- Required information should be known in advance
- Round-trip time, available bandwidth
7Parallel Data Transfer
- Higher throughput can be expected
- In comparison to using a single TCP connection
- Throughput drops if the number of TCP connections
becomes too large
A single file can be transferred
throughmultiple TCP connections
8Design Strategy
- It is extremely important to
- Provide the compatibility with existing GridFTP
- Realize APCM on the side of the GridFTP client
- Establish APCM without changing the existing
GridFTP protocol - Realize APCM in the Grid middleware
- It is difficult to realize APCM in the
third-party transfer - Most of the transfers are executed between
GridFTP servers and clients - Not be so problematic
9Basic Idea
data channel
control channel
2. Transfer a chunk of the file to be transferred
1. Fix the number of TCP connections
a chunk
Grid Network
file
GridFTP client
GridFTP server
4. Update the number of TCP connections,
and another chunk is transferred
3. Measure the network status on the GridFTP
client
10Measuring Network Status
data channel
control channel
- Measure on the side of GridFTP client
GridFTP goodputCalculate from the size of
chunkand its transfer time
Round-trip timeCommand response timein the
GridFTP control channel
Grid Network
file
GridFTP client
GridFTP server
TCP socket buffer sizeUse the socket API
TCP socket buffer sizeUse a feature of GridFTP
(SBUF)
11Adjusting the Number of Parallel TCP Connections
- APCM adjusts the number of parallel TCP
connections - Discuss three operational modes
- MI (Multiplicative Increase)
- MI (Multiplicative Increase Plus)
- AIMD (Additive Increase and Multiplicative
Decrease)
12MI (Multiplicative Increase) Mode
Number of parallel TCP connections
If (4) is not satisfied, terminate the algorithm
If (4) is satisfied, increase the number
ofparallel TCP connections
Initialize the number ofparallel TCP connections
k-th chunk transfer
1 2 3 4
13MI (Multiplicative Increase Plus) Mode
If (7) is not satisfied, numerically derivethe
number of parallel TCP connections,which
maximize Eq. (1) 1
Number of parallel TCP connections
If (7) is satisfied, increase the number
ofparallel TCP connections
Configure the number ofparallel TCP
connections to that value, andterminate the
algorithm
Initialize the number ofparallel TCP connections
k-th chunk transfer
1 2 3 4
14AIMD (Additive Increase and Multiplicative
Decrease) Mode
Number of parallel TCP connections
If (10) is not satisfied, decrease the number
ofparallel TCP connections
If (10) is satisfied, increase the number
ofparallel TCP connections
Initialize the number ofparallel TCP connections
k-th chunk transfer
1 2 3 4 5 6 7
15Network Topology Used in Simulation
chunk size 100 MbyteTCP socket buffer size
64 Kbyte
A file of 10Gbyte istransferred from the
GridFTP clientto the GridFTP server
A GridFTP server and a GridFTP clientare
connected via two DropTail routers
16Simulation Result(Evolution of GridFTP Goodput
and Number of Parallel TCP connections)
The rise of the number of parallelTCP
connections is slowerin AIMD mode compared to
bothMI mode and MI mode
By using MI and MI mode,GridFTP can
sufficiently useavailable bandwidth
17Simulation Result (Effect of the Bottleneck Link
Bandwidth)
- For comparison purposes, GridFTP goodput with a
single TCP connection is also included
By using APCM, the bottleneck linkbandwidth can
bealmost fully utilized
Poor performance whenthe bottleneck link
bandwidth is large
18Simulation Result (Effect of Propagation Delay)
With MI mode and MI mode, GridFTP goodput hardly
degrades even fora large propagation delay
With AIMD mode, GridFTP goodputdegrades
linearly asthe propagation delaybecomes large
19Conclusions
- Proposed APCM for GridFTP
- Focusing on the parallel data transfer feature
- Transfer a file as chunks and measure GridFTP
goodput and round-trip time - Three operational modes (MI, MI, and AIMD)
- Evaluated effectiveness by simulation experiments
- Significantly improve GridFTP goodput
20Future Works
- Conduct simulation experiments
- Under various network configurations
- Identify a parameter range that optimizes the
performance of APCM - Evaluate fairness among GridFTP sessions
- GridFTP sessions share the same bottleneck link
21Thank you for your kind attention
22Why is Design Strategy important? (1/2)
- Provide the compatibility with existing GridFTP
- GridFTP is implemented in the Globus Toolkit
- GridFTP has been spreading rapidly in recent
years - Realize APCM on the side of the GridFTP client
- A number of GridFTP servers has been in operation
- Establish APCM without changing the existing
GridFTP protocol - To interconnect with existing GridFTP servers
23Why important? (2/2)
- Realize APCM in the Grid middleware
- APCM can be easily installed in Grid computing
environment - APCM needs to operate in various computer
environments as well as various network
environments - Grid computing is featured by the heterogeneity
of computers and networks constituning Grid
24What means the equation (4)?
GridFTP goodput is upper-bound by N W / Rwhen
the number of connections is too small
? is a parameter (0 lt ? lt 1)to consider effect
of slow-start phase of TCP
So, if the inequality (4) is satisfied,conjecture
the TCP socket is bottleneck
25How numerically derive the number of parallel TCP
connections from Eq. (1)?
- From Eqs. (1) and (2), G can be expressed as a
function of N, W, R, B - Can measure G, N, W, R and not B
- Can calculate B from Eqs. (1) and (2)
- Derive N which maximize G in the Eq. (1) by using
the value of W, R, B
26Why do you think of 3 modes?
- We dont know which mode performs better
- In a environment, MI mode may perform better
- In another environment, MI mode may perform
better - So, we think of 3 modes, and evaluate their
effectiveness - More through investigation of all 3 modes is
required - Under various network conditions
27MI mode always performs better than MI mode?
- No, we dont think so.
- The analysis 1 is based on some assumptions
- When a assumption is not satisfied, the number of
TCP connections could be misconfigured with MI
mode - Then, MI mode could perform better than MI mode
- More investigation under various network
configurations is needed
28AIMD mode is not useful?
- We think AIMD mode is also useful.
- For example, in a general network with multiple
GridFTP sessions - Regarding fairness among GridFTP sessions, it is
expected that AIMD mode may perform better than
MI and MI mode
29Why utilization of the bottleneck link bandwidth
is slightly degraded as the bottleneck link
bandwidth becomes large?
- APCM takes time to find the optimal number of
parallel TCP connections - TCP connections take time to fully utilize the
bottleneck link bandwidth when the bottleneck
link bandwidth is large
30Why not measure the packet loss probability and
available bandwidth?
- It is difficult to measure at the middleware
layer using a passive measurement method - Should features specific to certain operating
systems or network devices on the computers be
used, it would become possible to measure - However, ease of installation and/or deployment
in the Grid computing environment should be
compromised - Even if could measure by using active measurement
- Compatibility with existing GridFTP servers would
be impaired