TeraGrid Data Transfer - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

TeraGrid Data Transfer

Description:

Connectivity of node to WAN. For all networks. Bandwidth ... usually better connectivity. CIG MCW, Boulder, CO. 15. TG GridFTP Client Deployment. uberftp ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 41
Provided by: staf71
Category:

less

Transcript and Presenter's Notes

Title: TeraGrid Data Transfer


1
TeraGrid Data Transfer
  • Jeffrey P. Gardner
  • Pittsburgh Supercomputing Center
  • gardnerj_at_psc.edu

2
Outline
  • GSISSH
  • Use passwordless login between TeraGrid machines
  • Hand-on Exercises
  • TeraGrid File Management
  • Data Transfer Performance
  • GridFTP
  • Terminology
  • TeraGrid Deployment
  • Hands-on Exercises
  • Use of GridFTP clients servers to transfer
    files

3
Hands-on Preparation
  • Prepare for exercises by logging into NCSA,
    getting valid proxy certificate.
  • Login to tg-login.ncsa.teragrid.org
  • ssh userid_at_tg-login.ncsa.teragrid.org
  • Enter your password
  • xxxxxx
  • Get a valid proxy certificate
  • tg-login1gt grid-proxy-init
  • Enter GRID pass phrase for this identity
    yyyyyy
  • Creating proxy . . . . . . . . . . . Done
  • Your proxy is valid until Tue Jun 21 080603
    2005

4
GSISSH SSH using TG Certificates
  • Now login to TACC using GSISSH
  • tg-logingt gsissh tg-login.sdsc.teragrid.org
  • TA DA!
  • See that your NCSA certificate DN and user
    account name have been entered into TACCs
    grid-mapfile
  • gt grep -i userid /etc/grid-security/grid-mapfile
    "/CUS/ONational Center for Supercomputing
    Applications/CNJeff Gardner" gardnerj
  • Logout of TACC
  • gt exit

5
TeraGrid File Placement
  • No common cross-site filesystems (currently)
  • This will change very shortly!
  • NCSA, SDSC, TACC, ANL will install GPFS (Global
    Parallel File System)
  • User controls where their data resides
  • Appropriate sites(s)
  • Appropriate storage
  • Online Filesystem(s)
  • Speed, visibility, quotas, backup policy
  • Each filesystem directly accessible from single
    site
  • Mass Storage Systems
  • Long-term storage, slower access

6
TeraGrid File Movement
  • File movement responsibility of user
  • Between Online Filesystems
  • Intra-site
  • Cross-site
  • Between Mass Storage and Online Filesystems
  • Intra-site
  • Cross-site
  • Session focuses on these types of transfers

7
TeraGrid Transfer Environment
  • TeraGrid backbone bandwidth means Wide Area
    Network is rarely a bottleneck
  • SDSClt-gtCaltechlt-gtNCSAlt-gtPSC 40 Gb/sec
  • NCSAlt-gtTACC 10 Gb/sec
  • GSI authentication and proxy certificates provide
    automagic security for transfers
  • just do grid-proxy-init and youre in
  • Transfer requests can be integrated into job
    execution scripts
  • Moving input data to site(s) of job execution
  • Moving results to another filesystem, site, or
    archive

8
Data Transfer Performance
  • What impacts transfer rates?
  • Disk and filesystem speed
  • Connectivity of filesystem to node
  • Node characteristics load
  • Connectivity of node to WAN
  • For all networks
  • Bandwidth
  • Latency
  • Buffer Size
  • Protocol
  • Load
  • Encryption
  • Dont expect 40 Gb/sec!

node
node
1 Gb/s
switch
30 Gb/s
WAN (TG Backbone) 40 Gb/s
30 Gb/s
switch
node
9
Performance Choices Matter
  • Transfer large files for best performance
  • Use fast filesystems, dedicated transfer nodes,
    optimized transfer parameters
  • Transfer 1 GByte file from NCSA to SDSC
    (10/6/2004)

10
GridFTP Terminology - Protocol
  • GridFTP is a high-performance, secure, reliable
    data transfer protocol optimized for
    high-bandwidth, wide-area networks. GridFTP is
    based on FTP, the highly popular Internet file
    transfer protocol.
  • - Quoted from Globus Alliance website

11
Terminology - Client
  • GridFTP client programs issue requests that
    adhere to the GridFTP protocol
  • Users run GridFTP client programs to transfer
    files
  • There is no client program named gridFTP, which
    can be confusing because users are told use
    gridFTP to transfer your files
  • tgcp, globus-url-copy and uberftp are three
    GridFTP client programs that are part of the
    Common TeraGrid Software Stack (CTSS)

12
Terminology 3rd Party Transfer
  • A GridFTP transfer between two GridFTP servers,
    rather than between a server and a client, is
    called a third-party transfer
  • A third-party transfer occurs when the GridFTP
    client initiating the transfer is run on a
    system that isneither the source northe
    destination of thetransfer operation
  • Allows use of dedicated transfernodes

13
Terminology - Server
  • A GridFTP server process understands requests
    that adhere to the GridFTP protocol, and performs
    authentication and data transfer operations based
    on those requests
  • TeraGrid GridFTP servers usually run on
  • Login nodes
  • tg-login.ltsitegt.teragrid.org
  • Dedicated GridFTP nodes
  • tg-gridftp.ltsitegt.teragrid.org
  • Some mass storage front-ends are GridFTP servers
  • mss.ncsa.teragrid.org

14
TG GridFTP Server Deployment
  • tg-login.ltsitegt.teragrid.org is a login node and
    also runs a GridFTP server
  • Shared resource Many tasks
  • tg-gridftp.ltsitegt.teragrid.org is a dedicated
    GridFTP server
  • Dedicated file transfer resource
  • usually better connectivity

15
TG GridFTP Client Deployment
  • uberftp
  • interactive GridFTP transfer client
  • configurable tcp buffersize and number of
    parallel streams

16
TG GridFTP Client Deployment
  • globus-url-copy ltsource_urlgt ltdestination_urlgt
  • command line interface
  • -tcp-bs ltsizegt -tcp-buffer-size ltsizegt
  • specify the size (in bytes) of the buffer to be
    used by the underlying ftp data channels
  • -p ltparallelismgt -parallel ltparallelismgt
  • specify the number of streams to be used in the
    ftp transfer
  • tgcp gridFTP-server1file1 gridFTP-server2fil
    e2
  • command line interface
  • friendly scp-like wrapper around
    globus-url-copy

17
Hands-on
  • Participants will be led through a series of
    exercises using tgcp, globus-url-copy and
    uberftp.
  • Demonstrates transferring files
  • Between TeraGrid sites
  • Between TG machines and archival storage systems

18
Hands-on preparation
  • Login to tg-login.ncsa.teragrid.org if you have
    not already done so
  • Get the test data file
  • wget http//www.psc.edu/gardnerj/test.file

19
Hands-on Exercise 1GridFTP between login nodes
  • Copy a 9 MByte file from the current directory at
    NCSA to your home directory at TACC. Use the
    login node at TACC as the remote GridFTP server.
    Use default transfer parameters.
  • Use globus-url-copy to transfer the file
  • Type command on a single line no carriage
    return!
  • tg-login1gt /usr/bin/time f e globus-url-copy
  • filepwd/test.file
  • gsiftp//tg-login.tacc.teragrid.org//test.file.E
    x1
  • 3.18

20
Hands-on Exercise 2GridFTP between GridFTP
Servers
  • Copy a 9 MByte file from the current directory at
    NCSA to your home directory at TACC. Use a
    third-party transfer and the GridFTP server nodes
    at both NCSA and SDSC.
  • Use globus-url-copy to transfer the file
  • tg-login1gt /usr/bin/time -f E globus-url-copy
    gsiftp//tg-gridftp.ncsa.teragrid.org/pwd/test.
    file gsiftp//tg-gridftp.tacc.teragrid.org//test
    .file-Ex2
  • 3.01

21
Hands-on Exercise 3GridFTP between GridFTP
Servers
  • Copy a 9 MByte file from the current directory at
    NCSA to your home directory at TACC. Use a
    third-party transfer and the GridFTP server nodes
    at both NCSA and SDSC. Use optimized transfer
    parameters.
  • Use globus-url-copy to transfer the file
  • tg-login1gt /usr/bin/time -f E globus-url-copy
    tcp-bs 4000000 p 4 gsiftp//tg-gridftp.ncsa.ter
    agrid.org/pwd/test.file gsiftp//tg-gridftp.tac
    c.teragrid.org//test.file-Ex3
  • 2.54

22
Hands-on Exercise 4Using tgcp
  • Copy a 9 MByte file from your home directory at
    NCSA to your home directory at TACC using tgcp.
    tgcp automatically uses third-party transfers and
    optimized transfer parameters.
  • Add tgcp to your path (it is not in there by
    default)
  • tg-login1gt soft add tgcp
  • Use tgcp to transfer the file
  • tg-login1gt /usr/bin/time -f E tgcp test.file
  • tg-gridftp.tacc.teragrid.org/home/userid/test.fi
    le-Ex4
  • globus-url-copy p 4 tcp-bs 2000000
  • gsiftp//tg-gridftp.ncsa.teragrid.org2812/home/ac
    /gardnerj/test.file
  • gsiftp//tg-gridftp.tacc.teragrid.org2812/home/ga
    rdnerj/test.file
  • 4.06 (?!!)

23
Hands-on Exercise 5 pg 1UberFTP between login
nodes
  • Copy a 9 MByte file from your NCSA home directory
    to TACC. Use optimized transfer parameters.
    Interactive session.
  • Start uberftp and set transfer parameters
  • tg-login1gt uberftp
  • uberftpgt parallel 4
  • uberftpgt tcpbuf 4000000
  • TCP buffer set to 4000000 bytes
  • Open connection to TACC
  • uberftpgt open tg-login.tacc.teragrid.org
  • BANNER
  • 220 UNIX Archive FTP server ready.
  • 230 User xxx logged in.

24
Hands-on Exercise 5 pg 2UberFTP between login
nodes
  • Copy the file
  • uberftpgt put test.file test.file-Ex5
  • 150 Opening BINARY connection(s) for
    test.file-Ex5.
  • 226 Transfer complete.
  • Transfer rate 9621728 bytes in 0.51 seconds.
    19017.90 KB/sec
  • Get a listing of the TACC home directory
  • uberftpgt ls
  • -rw---- user group 9621728 date test.file-Ex1
  • -rw---- user group 9621728 date test.file-Ex2
  • -rw---- user group 9621728 date test.file-Ex3
  • . . .
  • Exit UberFTP
  • uberftpgt quit

25
Hands-on Exercise 6 pg 1UberFTP between
GridFTP servers
  • Copy a 9 MByte file from your NCSA home directory
    to TACC using third-party transfers. Use
    optimized transfer parameters. Interactive
    session.
  • Start uberftp and set transfer parameters
  • tg-login1gt uberftp
  • uberftpgt parallel 4
  • uberftpgt tcpbuf 4000000
  • TCP buffer set to 4000000 bytes

26
Hands-on Exercise 6 pg 2UberFTP between
GridFTP servers
  • Open local connection to NCSA dedicated GridFTP
    server
  • tg-login1gt lopen tg-gridftp.ncsa.teragrid.org
  • 220 tg-gridftp4.ncsa...blah..blah ready.
  • 230 User xxx logged in.
  • Open remote connection to TACC dedicated
    GridFTP server
  • uberftpgt open tg-gridftp.tacc.teragrid.org
  • 220 lonestar GridFTP...blah..blah ready.
  • 230 User xxx logged in.

27
Hands-on Exercise 6 pg 3UberFTP between
GridFTP servers
  • Copy the file
  • uberftpgt put test.file test.file-ex6
  • srcgt 150 Opening BINARY mode data connection(s).
  • dstgt 150 Opening BINARY mode data connection(s).
  • srcgt 226 Transfer complete.
  • dstgt 226 Transfer complete.
  • Exit UberFTP
  • uberftpgt quit

28
Useful UberFTP commands
  • Unix-like commands
  • ls, cd, mkdir, rmdir, pwd, rm
  • Put l in front for local versions of commands
  • lls, lcd, lmkdir, lrmdir, lpwd, lrm
  • put
  • transfer from local host to remote host
  • get
  • transfer from remote host to local host
  • mput, mget
  • transfer multiple files between hosts
  • help

29
Tweaking Optimization Parameters
  • globus-url-copy
  • -tcp-bs ltsizegt -tcp-buffer-size ltsizegt
  • specify the size (in bytes) of the buffer to be
    used by the underlying ftp data channels
  • Low network traffic 8000000
  • High network traffic 4000000
  • -p ltparallelismgt -parallel ltparallelismgt
  • specify the number of streams to be used in the
    ftp transfer
  • Low network traffic 1
  • High network traffic 2 - 4

30
Tweaking Optimization Parameters
  • uberftp
  • tcpbuf ltsizegt
  • specify the size (in bytes) of the buffer to be
    used by the underlying ftp data channels
  • Low network traffic 8000000
  • High network traffic 4000000
  • parallel ltparallelismgt
  • specify the number of streams to be used in the
    ftp transfer
  • Low network traffic 1
  • High network traffic 2 - 4

31
Using Robotic-Tape Archival Resources
  • NCSA Mass Storage System (MSS)
  • Accessible using GridFTP to mss.ncsa.teragrid.org
  • TACC SGI Data Migration Facility (DMF)
  • Accessible by simply placing files in ARCHIVE
    directory
  • SDSC HPSS archival storage system
  • Use HSI from SDSC cluster only
  • PSC Golem
  • Accessible using GridFTP to
  • tg-gridftp.psc.teragrid.org

32
Using Robotic-Tape Archival Resources
  • Files on these machines are transferred to their
    local disks, but may be automatically migrated to
    tape if necessary.
  • If you access a file that has been migrated to
    tape, it will be retrieved automatically, but
    expect some delay (up to a few minutes)
  • Storage capacity is essentially infinite!

33
Hands-on Exercise 7 pg 1
  • Copy several 9 MByte files from your home
    directory at TACC to the NCSA Mass Storage
    System. Use 3rd party transfer at TACC.
  • GSISSH from NCSA to TACC
  • tg-logingt gsissh tg-login.tacc.teragrid.org
  • Start uberftp session
  • lonestargt uberftp
  • Establish local connection to TACC dedicated
    GridFTP server
  • uberftpgt lopen tg-gridftp.tacc.teragrid.org
  • 220 lonestar GridFTP..blah..blah..ready.
  • 230 User xxx logged in.
  • Establish local connection to TACC dedicated
    GridFTP server
  • uberftpgt open tg-gridftp.tacc.teragrid.org
  • Lots of Stuff
  • 230 User xxx logged in.

34
Hands-on Exercise 7 pg 2
  • Put multiple files to NCSA MSS
  • uberftpgt mput test.file
  • srcgt 150 Opening BINARY mode data connection for
    test file...
  • dstgt 150 Opening BINARY mode data connection for
    test file...
  • srcgt 226 Transfer complete.
  • dstgt 226 Transfer complete.
  • . . .

35
Hands-on Exercise 7 pg 3
  • Get a listing of the Mass Storage System
    directory
  • uberftpgt ls
  • -rw---- user group DK common 9621728 date
    test.file-Ex1
  • -rw---- user group DK common 9621728 date
    test.file-Ex2
  • -rw---- user group DK common 9621728 date
    test.file-Ex3
  • . . .
  • Quit uberftp
  • uberftpgt quit

36
Using PSC Golem
  • tg-gridftp.psc.teragrid.org maps directly onto
    Golems filesystem.
  • Example
  • tg-login1gt globus-url-copy tcp-bs 4000000 p 4
    gsiftp//tg-gridftp.ncsa.teragrid.org/pwd/test.
    file gsiftp//tg-gridftp.psc.teragrid.org//test.
    file

37
Using TACC DMF
  • Simply copy files to ARCHIVE directory
  • Files in this directory are automatically
    migrated to tape if necessary.
  • If you access a file that has been migrated to
    tape, it will be retrieved automatically, but
    expect some delay (up to a few minutes)
  • /archive/teragrid/username is visible from the
    login nodes, but not the TACC dedicated GridFTP
    servers.

38
Hands-on Wrapup
  • Logout of TACC gsissh session
  • lonestargt exit
  • Destroy your proxy
  • tg-logingt grid-proxy-destroy
  • Logout of NCSA ssh session
  • tg-logingt exit

39
Data Transfer Summary
  • GridFTP clients tgcp, globus-url-copy and uberftp
    can be used to perform transfers between many
    TeraGrid online filesystems and mass storage
    systems accessible via GridFTP servers.
  • Users responsible for managing data transfers,
    including job-related data movement which can be
    incorporated into job scripts.
  • Choose servers, filesystems, and transfer
    parameters wisely to optimize performance.
  • Ongoing efforts to improve rates and usability.

40
Useful URLs for help
  • TeraGrid user information overview
  • http//www.teragrid.org/userinfo/index.html
  • Summary of TG Resources
  • http//www.teragrid.org/userinfo/guide_hardware_ta
    ble.html
  • Summary of machines with links to site-specific
    user guides (just click on the name of each site)
  • http//www.teragrid.org/userinfo/guide_hardware_sp
    ecs.html
  • Data Transfer guide
  • http//www.teragrid.org/userinfo/guide_data_transf
    er.html
  • Archival Storage guide
  • http//www.teragrid.org/userinfo/guide_data_storag
    e.htmlarchival
Write a Comment
User Comments (0)
About PowerShow.com