Title: TeraGrid Data Transfer
1TeraGrid Data Transfer
- Jeffrey P. Gardner
- Pittsburgh Supercomputing Center
- gardnerj_at_psc.edu
2Outline
- GSISSH
- Use passwordless login between TeraGrid machines
- Hand-on Exercises
- TeraGrid File Management
- Data Transfer Performance
- GridFTP
- Terminology
- TeraGrid Deployment
- Hands-on Exercises
- Use of GridFTP clients servers to transfer
files
3Hands-on Preparation
- Prepare for exercises by logging into NCSA,
getting valid proxy certificate. - Login to tg-login.ncsa.teragrid.org
- ssh userid_at_tg-login.ncsa.teragrid.org
- Enter your password
- xxxxxx
- Get a valid proxy certificate
- tg-login1gt grid-proxy-init
- Enter GRID pass phrase for this identity
yyyyyy - Creating proxy . . . . . . . . . . . Done
- Your proxy is valid until Tue Jun 21 080603
2005
4GSISSH SSH using TG Certificates
- Now login to TACC using GSISSH
- tg-logingt gsissh tg-login.sdsc.teragrid.org
- TA DA!
- See that your NCSA certificate DN and user
account name have been entered into TACCs
grid-mapfile - gt grep -i userid /etc/grid-security/grid-mapfile
"/CUS/ONational Center for Supercomputing
Applications/CNJeff Gardner" gardnerj - Logout of TACC
- gt exit
5TeraGrid File Placement
- No common cross-site filesystems (currently)
- This will change very shortly!
- NCSA, SDSC, TACC, ANL will install GPFS (Global
Parallel File System) - User controls where their data resides
- Appropriate sites(s)
- Appropriate storage
- Online Filesystem(s)
- Speed, visibility, quotas, backup policy
- Each filesystem directly accessible from single
site - Mass Storage Systems
- Long-term storage, slower access
6TeraGrid File Movement
- File movement responsibility of user
- Between Online Filesystems
- Intra-site
- Cross-site
- Between Mass Storage and Online Filesystems
- Intra-site
- Cross-site
- Session focuses on these types of transfers
7TeraGrid Transfer Environment
- TeraGrid backbone bandwidth means Wide Area
Network is rarely a bottleneck - SDSClt-gtCaltechlt-gtNCSAlt-gtPSC 40 Gb/sec
- NCSAlt-gtTACC 10 Gb/sec
- GSI authentication and proxy certificates provide
automagic security for transfers - just do grid-proxy-init and youre in
- Transfer requests can be integrated into job
execution scripts - Moving input data to site(s) of job execution
- Moving results to another filesystem, site, or
archive
8Data Transfer Performance
- What impacts transfer rates?
- Disk and filesystem speed
- Connectivity of filesystem to node
- Node characteristics load
- Connectivity of node to WAN
- For all networks
- Bandwidth
- Latency
- Buffer Size
- Protocol
- Load
- Encryption
- Dont expect 40 Gb/sec!
node
node
1 Gb/s
switch
30 Gb/s
WAN (TG Backbone) 40 Gb/s
30 Gb/s
switch
node
9Performance Choices Matter
- Transfer large files for best performance
- Use fast filesystems, dedicated transfer nodes,
optimized transfer parameters - Transfer 1 GByte file from NCSA to SDSC
(10/6/2004)
10GridFTP Terminology - Protocol
- GridFTP is a high-performance, secure, reliable
data transfer protocol optimized for
high-bandwidth, wide-area networks. GridFTP is
based on FTP, the highly popular Internet file
transfer protocol. - - Quoted from Globus Alliance website
11Terminology - Client
- GridFTP client programs issue requests that
adhere to the GridFTP protocol - Users run GridFTP client programs to transfer
files - There is no client program named gridFTP, which
can be confusing because users are told use
gridFTP to transfer your files - tgcp, globus-url-copy and uberftp are three
GridFTP client programs that are part of the
Common TeraGrid Software Stack (CTSS)
12Terminology 3rd Party Transfer
- A GridFTP transfer between two GridFTP servers,
rather than between a server and a client, is
called a third-party transfer - A third-party transfer occurs when the GridFTP
client initiating the transfer is run on a
system that isneither the source northe
destination of thetransfer operation - Allows use of dedicated transfernodes
13Terminology - Server
- A GridFTP server process understands requests
that adhere to the GridFTP protocol, and performs
authentication and data transfer operations based
on those requests - TeraGrid GridFTP servers usually run on
- Login nodes
- tg-login.ltsitegt.teragrid.org
- Dedicated GridFTP nodes
- tg-gridftp.ltsitegt.teragrid.org
- Some mass storage front-ends are GridFTP servers
- mss.ncsa.teragrid.org
14TG GridFTP Server Deployment
- tg-login.ltsitegt.teragrid.org is a login node and
also runs a GridFTP server - Shared resource Many tasks
- tg-gridftp.ltsitegt.teragrid.org is a dedicated
GridFTP server - Dedicated file transfer resource
- usually better connectivity
15TG GridFTP Client Deployment
- uberftp
- interactive GridFTP transfer client
- configurable tcp buffersize and number of
parallel streams
16TG GridFTP Client Deployment
- globus-url-copy ltsource_urlgt ltdestination_urlgt
- command line interface
- -tcp-bs ltsizegt -tcp-buffer-size ltsizegt
- specify the size (in bytes) of the buffer to be
used by the underlying ftp data channels - -p ltparallelismgt -parallel ltparallelismgt
- specify the number of streams to be used in the
ftp transfer - tgcp gridFTP-server1file1 gridFTP-server2fil
e2 - command line interface
- friendly scp-like wrapper around
globus-url-copy
17Hands-on
- Participants will be led through a series of
exercises using tgcp, globus-url-copy and
uberftp. - Demonstrates transferring files
- Between TeraGrid sites
- Between TG machines and archival storage systems
18Hands-on preparation
- Login to tg-login.ncsa.teragrid.org if you have
not already done so - Get the test data file
- wget http//www.psc.edu/gardnerj/test.file
19Hands-on Exercise 1GridFTP between login nodes
- Copy a 9 MByte file from the current directory at
NCSA to your home directory at TACC. Use the
login node at TACC as the remote GridFTP server.
Use default transfer parameters. - Use globus-url-copy to transfer the file
- Type command on a single line no carriage
return! - tg-login1gt /usr/bin/time f e globus-url-copy
- filepwd/test.file
- gsiftp//tg-login.tacc.teragrid.org//test.file.E
x1 - 3.18
20Hands-on Exercise 2GridFTP between GridFTP
Servers
- Copy a 9 MByte file from the current directory at
NCSA to your home directory at TACC. Use a
third-party transfer and the GridFTP server nodes
at both NCSA and SDSC. - Use globus-url-copy to transfer the file
- tg-login1gt /usr/bin/time -f E globus-url-copy
gsiftp//tg-gridftp.ncsa.teragrid.org/pwd/test.
file gsiftp//tg-gridftp.tacc.teragrid.org//test
.file-Ex2 - 3.01
-
21Hands-on Exercise 3GridFTP between GridFTP
Servers
- Copy a 9 MByte file from the current directory at
NCSA to your home directory at TACC. Use a
third-party transfer and the GridFTP server nodes
at both NCSA and SDSC. Use optimized transfer
parameters. - Use globus-url-copy to transfer the file
- tg-login1gt /usr/bin/time -f E globus-url-copy
tcp-bs 4000000 p 4 gsiftp//tg-gridftp.ncsa.ter
agrid.org/pwd/test.file gsiftp//tg-gridftp.tac
c.teragrid.org//test.file-Ex3 - 2.54
22Hands-on Exercise 4Using tgcp
- Copy a 9 MByte file from your home directory at
NCSA to your home directory at TACC using tgcp.
tgcp automatically uses third-party transfers and
optimized transfer parameters. - Add tgcp to your path (it is not in there by
default) - tg-login1gt soft add tgcp
- Use tgcp to transfer the file
- tg-login1gt /usr/bin/time -f E tgcp test.file
- tg-gridftp.tacc.teragrid.org/home/userid/test.fi
le-Ex4 -
- globus-url-copy p 4 tcp-bs 2000000
- gsiftp//tg-gridftp.ncsa.teragrid.org2812/home/ac
/gardnerj/test.file - gsiftp//tg-gridftp.tacc.teragrid.org2812/home/ga
rdnerj/test.file - 4.06 (?!!)
23Hands-on Exercise 5 pg 1UberFTP between login
nodes
- Copy a 9 MByte file from your NCSA home directory
to TACC. Use optimized transfer parameters.
Interactive session. - Start uberftp and set transfer parameters
- tg-login1gt uberftp
- uberftpgt parallel 4
- uberftpgt tcpbuf 4000000
- TCP buffer set to 4000000 bytes
- Open connection to TACC
- uberftpgt open tg-login.tacc.teragrid.org
- BANNER
- 220 UNIX Archive FTP server ready.
- 230 User xxx logged in.
24Hands-on Exercise 5 pg 2UberFTP between login
nodes
- Copy the file
- uberftpgt put test.file test.file-Ex5
- 150 Opening BINARY connection(s) for
test.file-Ex5. - 226 Transfer complete.
- Transfer rate 9621728 bytes in 0.51 seconds.
19017.90 KB/sec - Get a listing of the TACC home directory
- uberftpgt ls
- -rw---- user group 9621728 date test.file-Ex1
- -rw---- user group 9621728 date test.file-Ex2
- -rw---- user group 9621728 date test.file-Ex3
- . . .
- Exit UberFTP
- uberftpgt quit
25Hands-on Exercise 6 pg 1UberFTP between
GridFTP servers
- Copy a 9 MByte file from your NCSA home directory
to TACC using third-party transfers. Use
optimized transfer parameters. Interactive
session. - Start uberftp and set transfer parameters
- tg-login1gt uberftp
- uberftpgt parallel 4
- uberftpgt tcpbuf 4000000
- TCP buffer set to 4000000 bytes
26Hands-on Exercise 6 pg 2UberFTP between
GridFTP servers
- Open local connection to NCSA dedicated GridFTP
server - tg-login1gt lopen tg-gridftp.ncsa.teragrid.org
- 220 tg-gridftp4.ncsa...blah..blah ready.
- 230 User xxx logged in.
- Open remote connection to TACC dedicated
GridFTP server - uberftpgt open tg-gridftp.tacc.teragrid.org
- 220 lonestar GridFTP...blah..blah ready.
- 230 User xxx logged in.
27Hands-on Exercise 6 pg 3UberFTP between
GridFTP servers
- Copy the file
- uberftpgt put test.file test.file-ex6
- srcgt 150 Opening BINARY mode data connection(s).
- dstgt 150 Opening BINARY mode data connection(s).
- srcgt 226 Transfer complete.
- dstgt 226 Transfer complete.
- Exit UberFTP
- uberftpgt quit
28Useful UberFTP commands
- Unix-like commands
- ls, cd, mkdir, rmdir, pwd, rm
- Put l in front for local versions of commands
- lls, lcd, lmkdir, lrmdir, lpwd, lrm
- put
- transfer from local host to remote host
- get
- transfer from remote host to local host
- mput, mget
- transfer multiple files between hosts
- help
29Tweaking Optimization Parameters
- globus-url-copy
- -tcp-bs ltsizegt -tcp-buffer-size ltsizegt
- specify the size (in bytes) of the buffer to be
used by the underlying ftp data channels - Low network traffic 8000000
- High network traffic 4000000
- -p ltparallelismgt -parallel ltparallelismgt
- specify the number of streams to be used in the
ftp transfer - Low network traffic 1
- High network traffic 2 - 4
30Tweaking Optimization Parameters
- uberftp
- tcpbuf ltsizegt
- specify the size (in bytes) of the buffer to be
used by the underlying ftp data channels - Low network traffic 8000000
- High network traffic 4000000
- parallel ltparallelismgt
- specify the number of streams to be used in the
ftp transfer - Low network traffic 1
- High network traffic 2 - 4
31Using Robotic-Tape Archival Resources
- NCSA Mass Storage System (MSS)
- Accessible using GridFTP to mss.ncsa.teragrid.org
- TACC SGI Data Migration Facility (DMF)
- Accessible by simply placing files in ARCHIVE
directory - SDSC HPSS archival storage system
- Use HSI from SDSC cluster only
- PSC Golem
- Accessible using GridFTP to
- tg-gridftp.psc.teragrid.org
32Using Robotic-Tape Archival Resources
- Files on these machines are transferred to their
local disks, but may be automatically migrated to
tape if necessary. - If you access a file that has been migrated to
tape, it will be retrieved automatically, but
expect some delay (up to a few minutes) - Storage capacity is essentially infinite!
33Hands-on Exercise 7 pg 1
- Copy several 9 MByte files from your home
directory at TACC to the NCSA Mass Storage
System. Use 3rd party transfer at TACC. - GSISSH from NCSA to TACC
- tg-logingt gsissh tg-login.tacc.teragrid.org
- Start uberftp session
- lonestargt uberftp
- Establish local connection to TACC dedicated
GridFTP server - uberftpgt lopen tg-gridftp.tacc.teragrid.org
- 220 lonestar GridFTP..blah..blah..ready.
- 230 User xxx logged in.
- Establish local connection to TACC dedicated
GridFTP server - uberftpgt open tg-gridftp.tacc.teragrid.org
- Lots of Stuff
- 230 User xxx logged in.
34Hands-on Exercise 7 pg 2
- Put multiple files to NCSA MSS
- uberftpgt mput test.file
- srcgt 150 Opening BINARY mode data connection for
test file... - dstgt 150 Opening BINARY mode data connection for
test file... - srcgt 226 Transfer complete.
- dstgt 226 Transfer complete.
- . . .
35Hands-on Exercise 7 pg 3
- Get a listing of the Mass Storage System
directory - uberftpgt ls
- -rw---- user group DK common 9621728 date
test.file-Ex1 - -rw---- user group DK common 9621728 date
test.file-Ex2 - -rw---- user group DK common 9621728 date
test.file-Ex3 - . . .
- Quit uberftp
- uberftpgt quit
36Using PSC Golem
- tg-gridftp.psc.teragrid.org maps directly onto
Golems filesystem. - Example
- tg-login1gt globus-url-copy tcp-bs 4000000 p 4
gsiftp//tg-gridftp.ncsa.teragrid.org/pwd/test.
file gsiftp//tg-gridftp.psc.teragrid.org//test.
file
37Using TACC DMF
- Simply copy files to ARCHIVE directory
- Files in this directory are automatically
migrated to tape if necessary. - If you access a file that has been migrated to
tape, it will be retrieved automatically, but
expect some delay (up to a few minutes) - /archive/teragrid/username is visible from the
login nodes, but not the TACC dedicated GridFTP
servers.
38Hands-on Wrapup
- Logout of TACC gsissh session
- lonestargt exit
- Destroy your proxy
- tg-logingt grid-proxy-destroy
- Logout of NCSA ssh session
- tg-logingt exit
39Data Transfer Summary
- GridFTP clients tgcp, globus-url-copy and uberftp
can be used to perform transfers between many
TeraGrid online filesystems and mass storage
systems accessible via GridFTP servers. - Users responsible for managing data transfers,
including job-related data movement which can be
incorporated into job scripts. - Choose servers, filesystems, and transfer
parameters wisely to optimize performance. - Ongoing efforts to improve rates and usability.
40Useful URLs for help
- TeraGrid user information overview
- http//www.teragrid.org/userinfo/index.html
- Summary of TG Resources
- http//www.teragrid.org/userinfo/guide_hardware_ta
ble.html - Summary of machines with links to site-specific
user guides (just click on the name of each site) - http//www.teragrid.org/userinfo/guide_hardware_sp
ecs.html - Data Transfer guide
- http//www.teragrid.org/userinfo/guide_data_transf
er.html - Archival Storage guide
- http//www.teragrid.org/userinfo/guide_data_storag
e.htmlarchival