SC04 TeraGrid Tutorial: Applications in the TeraGrid Environment - PowerPoint PPT Presentation

1 / 315
About This Presentation
Title:

SC04 TeraGrid Tutorial: Applications in the TeraGrid Environment

Description:

SC04 TeraGrid Tutorial: Applications in the TeraGrid Environment – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 316
Provided by: tera3
Category:

less

Transcript and Presenter's Notes

Title: SC04 TeraGrid Tutorial: Applications in the TeraGrid Environment


1
SC04 TeraGrid Tutorial Applications in the
TeraGrid Environment
  • John Towns, NCSA ltjtowns_at_ncsa.edugt
  • Nancy Wilkins-Diehr, SDSC ltwilkinsn_at_sdsc.edugt
  • Derek Simmel, PSC ltdsimmel_at_psc.edugt
  • Eric Roberts, TACC ltericrobe_at_tacc.utexas.edugt
  • Bill Whitson, Purdue, ltwiw_at_purdue.edugt
  • Leesa Brieger, SDSC, ltleesa_at_sdsc.edugt
  • Ruth Aydt, NCSA, ltaydt_at_ncsa.edugt
  • Mike Papka, ANL, ltpapka_at_mcs.anl.govgt
  • Kelly Gaither, TACC, ltkelly_at_tacc.utexas.edugt
  • John Cobb, ORNL, ltcobbjw_at_ornl.govgt
  • Amit Majumdar, SDSC, majumdar_at_sdsc.edu
  • and many others participating in the TeraGrid
    Project

2
Tutorial Outline - Morning
  • Introduction to TeraGrid Resources and Services
  • John Towns, Nancy Wilkins-Diehr Slide 4 60 mins
  • User Certificates on the TeraGrid
  • Derek Simmel Slide 44 30 mins
  • TeraGrid User Portal
  • Eric Roberts Slide 77 30 mins
  • BREAK
  • Data Collections and Databases
  • Bill Whitson, Leesa Brieger Slide 104 60 mins
  • Data Management
  • Ruth Aydt, Leesa Brieger Slide 145 60 mins

3
Tutorial Outline - Afternoon
  • LUNCH
  • Launching jobs single site, Globus, Condor, and
    DAGman for pipelined applications
  • Leesa Brieger Slide 194 75 mins
  • Visualization and MPICH-G2
  • Mike Papka, Kelly Gaither Slide 260 45 min
  • BREAK
  • Instrument Integration
  • John Cobb, Amit Majumdar Slide 287 30 min
  • QA
  • Staff available to address individual application
    needs, allocation requests?

4
Introduction to TeraGrid Resources and Services
(60 min.)
  • Nancy Wilkins-Diehr SDSC / UCSD
  • wilkinsn_at_sdsc.edu
  • John Towns NCSA / Univ of Illinois
  • jtowns_at_ncsa.edu

5
Introduction to the TeraGrid
  • TeraGrid vision
  • Objectives
  • Science Applications
  • Current resources, extensibility
  • Data management/data resources
  • Visualization
  • Networking
  • Grid services
  • Software stack
  • Job execution
  • Allocations and proposals
  • Support, Documentation, Training

6
The TeraGrid VisionDistributing the resources is
better than putting them at one site
  • Build new, extensible, grid-based infrastructure
    to support grid-enabled scientific applications
  • Expand centers to support cyberinfrastructure
  • Distributed, coordinated operations center
  • Exploit unique partner expertise and resources to
    make whole greater than the sum of its parts
  • Leverage homogeneity to make the distributed
    computing easier and simplify initial development
    and standardization
  • Run single job across entire TeraGrid
  • Move executables between sites

7
TeraGrid Objectives
  • Create unprecedented capability
  • integrated with extant PACI capabilities
  • supporting a new class of scientific research
  • Deploy a balanced, distributed system
  • not a distributed computer but rather
  • a distributed system using Grid technologies
  • computing and data management
  • visualization and scientific application analysis
  • Define an open and extensible infrastructure
  • an enabling cyberinfrastructure for scientific
    research
  • extensible beyond the original sites
  • NCSA, SDSC, ANL, Caltech, PSC (under ETF)
  • ETF2 awards to TACC, Indiana/Purdue, ORNL

8
Measuring Success
  • Breakthrough science via new capabilities
  • integrated capabilities more powerful than
    existing PACI resources
  • current PACI users and new communities requiring
    Grids
  • An extensible Grid
  • design principles assume heterogeneity and more
    than four sites
  • Grid hierarchy, scalable, replicable, and
    interoperable
  • formally documented design, standard protocols
    and specifications
  • encourage, support, and leverage open source
    software
  • A pathway for current users
  • evolutionary paths from current practice
  • provide examples, tools, and training to exploit
    Grid capabilities
  • user support, user support, and user support

9
TeraGrid Application Targets
  • Multiple classes of user support
  • each with differing implementation complexity
  • minimal change from current practice
  • new models, software, and applications
  • Usage exemplars
  • traditional supercomputing made simpler
  • remote access to data archives and computers
  • distributed data archive access and correlation
  • remote rendering and visualization
  • remote sensor and instrument coupling

10
Wide Variety of Usage Scenarios
  • Tightly coupled jobs storing vast amounts of
    data, performing visualization remotely as well
    as making data available through online
    collections (ENZO)
  • Thousands of independent jobs using data from a
    distributed data collection (NVO)
  • Applications employing novel latency-hiding
    algorithms adapting to a changing number of
    processors (PPM)
  • High-throughput applications of loosely coupled
    jobs (MCell)

11
TeraGrid Roaming
Attend TeraGrid training class or access
web-based TG training materials
Receive Account info, pointers to training, POC
for user services Ops, pointers to login
resources, atlas of TG resources
Apply for TeraGrid Account
Develop and optimize code at Caltech
Run large job at NCSA, move data from SRB to
local scratch and store results in SRB
Run large job at SDSC, store data using SRB.
Run larger job using both SDSC and PSC systems
together, move data from SRB to local scratch
storing results in SRB
Move small output set from SRB to ANL cluster, do
visualization experiments, render small sample,
store results in SRB
Move large output data set from SRB to
remote-access storage cache at SDSC, render using
ANL hardware, store results in SRB
(Recompile may be necessary in some cases)
12
Expanding to Science Gateways
Science Gateways
TG backend into GenDB which will make it
possible for about 70 genome sequencing projects
to use TG
13
TeraGrid Extensibility
  • Sites interested in joining the TG need
  • fast network
  • non-trivial resources
  • meet SLA (testing and QA requirements)
  • become a member of the virtual organization
  • capable of TG hosting (peering arrangements)
  • TG Software Environment
  • user (download, configure, install, and run TG
    1.0)
  • developer (join distributed engineering team)
  • TG Virtual Organization
  • Working group participation
  • Operations, User services
  • Add new capability
  • make the whole greater than the sum of its parts

14
Organization
Strategic Relationships
CUAC
Executive Steering Committee Catlett
(Director) Andrews (SDSC) Boisseau (TACC) Gannon
(IU) Pennington (NCSA) Roskies (PSC)
Project Director (Catlett)
Science Communities
NSF
MRE Projects
Community Grids
NMI
ITR Projects
Globus Alliance
Support, Project Mgmt, Finance, EOT
Core
Rotating Chair TeraGrid Resource Forum
. . .
Area Director
Area Director
TeraGrid Resources
Base CyberInfrastructure
Community Engagement
Area Director
Area Director
  • Software
  • Services
  • Collaboration

User Services
Software Integration
15
TeraGrid Vision A Unified National HPC
Infrastructure that is Persistent and Reliable
  • Largest NSF compute resources
  • Largest DOE instrument (SNS)
  • Fastest network
  • Massive storage
  • Visualization instruments
  • Science Gateways
  • Community databases

E.g Geosciences 4 data collections including
high-res CT scans, global telemetry data,
worldwide hydrology data, and regional LIDAR
terrain data
16
TeraGrid Components
  • Compute hardware
  • Intel/Linux Clusters, Alpha SMP clusters, POWER4
    cluster, POWER3 cluster, SUN visualization system
  • Large-scale storage systems
  • hundreds of terabytes for secondary storage
  • Very high-speed network backbone
  • bandwidth for rich interaction and tight
    coupling
  • Grid middleware
  • Globus, data management,
  • Next-generation applications

17
Resources and Services(40TF, 1.4PB disk, 12 PB
tape)
18
TeraGrid Compute Resources
4 Lambdas
CHI
LA
96 GeForce4 Graphics Pipes
100 TB DataWulf
96 Pentium4 64 2p Madison Myrinet
32 Pentium4 52 2p Madison 20 2p Madison Myrinet
20 TB
Caltech
ANL
256 2p Madison 667 2p Madison Myrinet
128 2p Madison 256 2p Madison Myrinet
1.1 TF Power4 Federation
500 TB FCS SAN
230 TB FCS SAN
NCSA
SDSC
PSC
19
Data Resources and Data Management Services
  • Approach
  • Deploy core services
  • Drive the system with data intensive flagship
    applications
  • TG Data Services Plan
  • High-speed cross-site data transfer capability
  • Parallel file systems
  • Mass storage systems
  • GridFTP and SRB-based access to data
  • Hosted data collections
  • Database capabilities
  • 2 dedicated 32-processor IBM p690 nodes
  • DB2, mySQL, Oracle clients
  • HDF5 libraries
  • Bill, Leesa and Ruth will discuss some of these
    this morning

20
The TeraGrid Visualization Strategy
  • Combine existing resources and current
    technology
  • Commodity clustering and commodity graphics
  • Grid technology
  • Access Grid collaborative tools
  • Efforts, expertise, and tools from each of the
    sites
  • Volume Rendering (SDSC)
  • Coupled Visualization (PSC)
  • Volume Rendering (Caltech)
  • VisBench (NCSA)
  • Grid and Visualization Services (ANL)
  • Sun Terascale Visualization Machine (TACC)
  • to enable new and novel ways of visually
    interacting with simulations and data
  • Mike and Kelly will discuss this afternoon

21
Visualization Sample Use Cases
22
The TeraGrid Networking Strategy
  • TeraGrid Backplane
  • Provides sufficient connectivity (bandwidth,
    latency) to support virtual machine room
  • Core backbone 40 Gbps
  • Connectivity to each site 10-30 Gbps
  • Local networking
  • Provide support for all nodes at each site to
    have adequate access to backplane
  • Support for distributed simulations

23
Grid Services A Layered Grid Architecture
Talking to things communication (Internet
protocols) security
Connectivity
Controlling things locally Access to,
control of, resources
Fabric
24
TeraGrid Runtime Environment
CREDENTIAL
Single sign-on via grid-id
Assignment of credentials to user proxies
Globus Credential
Mutual user-resource authentication
Site 2
Authenticated interprocess communication
Mappingtolocal ids
Certificate
25
Common Authentication Service
  • Standardized GSI authentication
  • A user can authenticate for SSH and Grid services
    to every TeraGrid system with a single user
    certificate
  • Developed coordinated CA acceptance policy
  • TeraGrid-accepted Certificate Authorities include
  • NCSA, SDSC, PSC, Purdue, TACC CAs
  • DOEGrids CA, UK E-Science CA
  • Procedures and tools to simplify the use and
    management of certificates
  • Certificate request, retrieval, distribution,
    installation
  • Derek will cover these in more detail later

26
Grid Information Services
  • Currently Leveraging Globus Grid Information
    Service
  • each service/resource is an information source
  • index servers at each of the TG sites
  • full mesh between index servers for fault
    tolerance
  • access control as needed
  • Resource Information Management Strategies
  • TG GIS for systems level information
  • generic non-sensitive information
  • access control on sensitive info such as job
    level information
  • Applications specific GIS services
  • access controls applied as needed
  • user control

27
Grid Scheduling Job Management Condor-G, the
User Interface
  • Condor-G is the preferred job management
    interface
  • job scheduling, submission, tracking, etc.
  • allows for complex job relationships and data
    staging issues
  • interfaces to Globus layers transparently
  • allows you to use your workstation as your
    interface to the grid
  • The ability to determine current system loads and
    queue status will come in the form of a web
    interface
  • Talk this morning by Eric Roberts about early
    portal development work

28
Homogeneity Strategies
  • Common Grid Middleware Layer
  • TeraGrid Software Stack
  • Collectively Designed
  • tests and build modules required
  • Multiple Layers Coordinated
  • environment variables
  • pathnames
  • versions for system software, libraries, tools
  • Minimum requirements plus Site Value-Added
  • multiple environments possible
  • special services and tools on top of common
    TeraGrid software stack
  • Community software areas, TG_COMMUNITY
  • Common User Environment with softenv

29
TeraGrid Software Stack
  • CTSS Common TeraGrid Software Stack
  • A social contract with the user
  • LORA Learn Once, Run Anywhere
  • Reproducibility
  • standard configure, build, and install
  • single CVS repository for software
  • initial releases for IA-64, IA-32, Power4, Alpha

30
Current TG Software Stack
  • SuSE SLES
  • X-cat
  • MPICH, MPICH-G2, MPICH-VMI
  • gm drivers
  • VMI/CRM
  • Globus
  • Condor-G
  • gsi-ssh
  • GPT
  • SoftEnv
  • MyProxy
  • Intel compilers
  • GNU compilers
  • HDF4/5
  • SRB client
  • Inca
  • db2-client
  • uberftp
  • tg-policy
  • tg_usage
  • myprojects
  • Kx509
  • atlas

31
TG_COMMUNITY
  • Community Software Area (CSA) of TeraGrid is
    available and accepting requests from PIs
  • Intended for the installation of executables and
    libraries that will be used by a community of
    users.
  • Software in this area is maintained by users and
    is not guaranteed to be in any particular level
    of readiness
  • Consistent and easy-to-find location, for example
    TG_COMMUNITY/my_physics_application on all
    requested TG platforms
  • softenv keys are available for a community of
    users to customize their environment

32
SoftEnv System
  • Software package management system instituting
    symbolic keys for user environments
  • Replaces traditional UNIX dot files
  • Supports community keys
  • Programmable similar to other dot files
  • Integrated user environment transfer
  • Well suited to software lifecycles
  • Offers unified view of heterogeneous platforms

33
Manipulating the Environment
  • /home/ltusernamegt/.soft
  • softenv
  • Displays symbolic software key names
  • soft add ltpackage-namegt
  • Temporary addition of package to environment
  • soft delete ltpackage-namegt
  • Temporary package removal from environment
  • resoft
  • Modify dotfile and apply to present environment
  • man softenv

34
Complete User Support
  • Allocations
  • Documentation
  • Applications
  • Consulting
  • Training

35
Allocations Policies
  • Any US researcher can request an allocation
  • Policies/procedures posted at
  • http//www.paci.org/Allocations.html
  • Online proposal submission
  • https//pops-submit.paci.org/
  • Different levels of review for different size
    allocation requests development, medium and
    national resource allocation committees
  • DAC up to 30,000,
  • only a one paragraph abstract required
  • accepted continuously!
  • MRAC lt200,000 SUs/year
  • reviewed every 6 months
  • next deadline April 2005
  • NRAC 200,000 SUs/year
  • reviewed every 6 months
  • next deadline January 2005

36
Roaming and Specific Allocations
  • R-Type roaming allocations
  • can be used on any TG resource
  • usage debited to a single (global) allocation of
    resource maintained in a central database
  • S-Type specific allocations
  • can only be used on specified resource
  • All S awards come with 30,000 roaming SUs to
    encourage roaming usage of TG
  • usage debited to a single allocation of resource
    maintained in a central database
  • R- and S-Type allocation all come from single
    pool of TG resources

37
Accounts and Account Management
  • TG accounts created on ALL TG systems for every
    user
  • information regarding accounts on all resources
    delivered
  • working toward single US mail packet arriving for
    user
  • accounts synched through centralized database
  • certificates provide uniform access for users
  • jobs can be submitted to/run on any TG resource
  • NMI Account Management Information Exchange
    (AMIE) used to manage account and transport usage
    records in GGF format

38
Variety of TeraGrid Usage Scenarios
  • Tightly coupled distributed usage
  • plan to use resources at multiple sites in a
    coordinated fashion
  • multi-site MPI applications, coupled simulations,
    etc.
  • Loosely coupled distributed usage
  • can use resources at multiple sites
  • may only use resources at one site for a
    particular run
  • multiple jobs on multiple resources pipelined
    applications, etc.
  • Usage local to a site/resource
  • will only use resources local to a site for any
    particular run
  • may or may not be runnable at more than one site
  • All of these applications types are appropriate
    for TeraGrid
  • Fast networks allow users to consider new
    approaches
  • Will see many examples today!

39
Documentation
  • TeraGrid-wide documentation
  • simple
  • high-level
  • what works on all resources
  • Site-specific documentation
  • full details on unique capabilities of each
    resource
  • http//www.teragrid.org/docs

40
Common Installation of Applications
  • ls TG_APPS_PREFIX
  • ATLAS globus-2.4.2-2003-07-30-test2
    netcdf-3.5.0
  • HPSS goto
    papi
  • LAPACK gx-map
    pbs
  • PBSPro_5_2_2_2d gx-map-0.3
    perfmon
  • bin hdf4
    petsc
  • crm hdf5
  • Support for installation of your communitys
    application on all TeraGrid platforms in
    TG_COMMUNITY

41
24/7 Consulting Support
  • help_at_teragrid.org
  • Advanced ticketing system for cross-site support
  • 866-336-2357
  • http//news.teragrid.org/
  • Extensive experience solving problems for early
    access users

42
Training that Meets User Needs
  • Asynchronous training
  • This tutorial and materials will be available
    online
  • Synchronous training
  • TeraGrid training incorporated into ongoing
    training activities at all sites
  • Training at your site
  • With sufficient participants

43
Wed like to have you as a TeraGrid user!
  • Talk with any of us during the breaks about how
    to apply for time or visit http//www.paci.org/All
    ocations.html

44
User Authentication and TeraGrid Certificates (30
min.)
  • Derek Simmel
  • Pittsburgh Supercomputing Center
  • dsimmel_at_psc.edu

45
User Authentication
  • Goals
  • Single Sign-On
  • Unattended Program Execution
  • Manageable Security
  • Interactive Login to Compute Resources
  • Normal SSH via passwords or DSA/RSA keys
  • Authentication via X.509 User Certificates
  • GSI-SSH and Globus services
  • MyProxy
  • Managing User Credentials

46
Password Authentication
  • Without coordination of authenticationbetween
    sites

47
Single Password Authentication
48
SSH Password Authentication
  • Users use an account password or Kerberos
    password to authenticate login and file transfers
  • The password is a shared secret between the user
    and the remote resource provider

49
Public Key Cryptography
  • A pair of different, but complimentary keys, one
    private (secret) key, and one public key
  • Whatever you encrypt via one of the keys, you can
    only decrypt with the other key

50
Public Key Authentication
  • Private, Secret Key (S)is kept securelyby the
    user
  • Public Key (P) is distributedto sites to
    authenticate use of the Private Key (S)

51
SSH Public Key Authentication
  • Generate an SSH key-pair via ssh-keygen
  • Place /.ssh/id_dsa.pub or /.ssh/id_rsa.pub in
    remote host accounts /.ssh/authorized_keys
  • SSH prompts the user for the private key
    passphrase when using ssh or scp
  • On the remote host, /.ssh/authorized_keys file
    contains public keys for those allowed to access
    this account
  • SSH Key Passphrase is not transmitted over the
    wire

52
SSH Public Key Authentication
  • Generate an SSH key with ssh-keygen

dsimmel ssh-keygen -t dsa Generating
public/private dsa key pair. Enter file in which
to save the key (/Users/dsimmel/.ssh/id_dsa)
Enter passphrase (empty for no passphrase)
Enter same passphrase again Your
identification has been saved in
/Users/dsimmel/.ssh/id_dsa. Your public key has
been saved in /Users/dsimmel/.ssh/id_dsa.pub. The
key fingerprint is 2fbcaac98e3e7fee8e65
aa9d50561dc6 dsimmel_at_Derek-Simmels-Computer.
local dsimmel
53
SSH Public Key Authentication
  • Copy the public key to the remote host
  • Some TeraGrid sites do not permit password-based
    SSH authentication according to local policy
  • ANL, Caltech
  • Users must contact these sites, verify their
    identity, and transmit a copy of their SSH public
    key to them so that it can be installed on their
    behalf

dsimmel scp .ssh/id_dsa.pub dsimmel_at_lemieux.psc.e
duid_dsa.pub dsimmel_at_lemieux.psc.edu's password
id_dsa.pub 100 626
562.9KB/s 0000 dsimmel
54
SSH Public Key Authentication
  • Put the public key into the authorized_keys file
    on the remote host (policy permitting)

dsimmel ssh dsimmel_at_lemieux.psc.edu dsimmel_at_lemie
ux.psc.edu's password bash-2.04 ls -l
id_dsa.pub -rw------- 1 dsimmel staff 626 Oct
11 1409 id_dsa.pub bash-2.04 cat id_dsa.pub gtgt
.ssh/authorized_keys bash-2.04 ls -l
.ssh/authorized_keys -rw-r--r-- 1 dsimmel
staff 626 Oct 11 1447 .ssh/authorized_keys bash
-2.04 exit dsimmel
55
SSH Public Key Authentication
  • Login using your SSH key

dsimmel ssh dsimmel_at_lemieux.psc.edu Enter
passphrase for key '/Users/dsimmel/.ssh/id_dsa'
Compaq Tru64 UNIX V5.1A (Rev. 1885) Compaq
AlphaServer SC TS2.5 This system is for the use
of authorized users only. Unauthorized use may
be monitored and recorded. In the course of such
monitoringor through system maintenance, the
activities of authorized users maybe monitored.
By using this system you expressly consent to
such monitoring. If there are any problems,
please contact remarks_at_psc.edu. bash-2.04
56
SSH Public Key Single Sign-On
  • SSH-agent is a program that sets up a small
    service on your local machine to temporarily hold
    your private SSH keys
  • You add your keys with ssh-add
  • With your keys loaded, ssh and scp commands
    require no passphrase entry
  • Use ssh-agent -k to remove the service when done

57
SSH Public Key Single Sign-On
dsimmel eval ssh-agent Agent pid 544 dsimmel
env grep SSH SSH_AGENT_PID544 SSH_AUTH_SOCK/tm
p/ssh-6SvMtTGu/agent.543 dsimmel ssh-add Enter
passphrase for /Users/dsimmel/.ssh/id_dsa
Identity added /Users/dsimmel/.ssh/id_dsa
(/Users/dsimmel/.ssh/id_dsa) dsimmel ssh
dsimmel_at_lemieux.psc.edu Compaq Tru64 UNIX V5.1A
(Rev. 1885) Compaq AlphaServer SC TS2.5 This
system is for the use of authorized users only.
Unauthorized use may be monitored and recorded.
In the course of such monitoringor through
system maintenance, the activities of authorized
users maybe monitored. By using this system you
expressly consent to such monitoring. If there
are any problems, please contact
remarks_at_psc.edu. bash-2.04
58
SSH Public Key Single Sign-On
bash-2.04 exit logout Connection to
lemieux.psc.edu closed. dsimmel eval ssh-agent
-k Agent pid 544 killed dsimmel
  • The output of the ssh-agent command is a short
    set of commands to set (or with -k, remove) the
    SSH agent environment variables
  • Using eval ssh-agent -k is a shortcut to
    execute those environment-setting commands

59
Limitations of SSH
  • SSH was designed as a secure replacement for
    Telnet and r-commands (rlogin, rsh, rcp)
  • The ssh server provides a specific set of
    services for secure remote command execution and
    file transfer
  • Limitations of SSH Public Key method
  • Users have to prove who they are to every
    resource
  • With certificates, a resource provider need only
    trust the CA
  • Difficult to know if users keys are still valid
  • Every user needs to be polled, since there is no
    CRL

60
X.509 Certificates
  • X.509 certificates are widely accepted and
    flexible for current and future needs
  • A variety of authentication and authorization
    methods support X.509 certificates, including web
    services
  • X.509 standard permits additional data fields in
    the certificate for specialized purposes
  • With GSI-SSH, X.509 user certificates can be used
    to authenticate to SSH servers too

61
Certificates andPublic Key Infrastructure
Registration Authority
Certificate Authority
A
CA
RA
User Z
62
Certificate Authority (CA)
  • A trusted 3rd party for issuing credentials
  • Validates user identity
  • role of the Registration Authority (RA)
  • Issues digitally-signed certificates
  • To affirm the identity of users, hosts and
    services to those who trust the CA
  • Maintains a current certificate revocation list
    (CRL) for certificates revoked prior to their
    expiration date/time.

63
TeraGrid-Accepted CAs
  • Certificates issued by the following Certificate
    Authorities are accepted for use on the TeraGrid
  • NCSA
  • SDSC
  • PSC, PSC KCA
  • Purdue
  • TACC
  • DOEGrids
  • UK E-Science CA

64
X.509 User Certificates
  • De facto standard for Grid authentication
  • Public Key approach
  • User keeps the secret key private and secure
  • CA digitally signs the public key along with
    additional info to produce a user certificate
  • TeraGrid supports X.509 certificates for
  • GSI-SSH authentication
  • Globus grid services authentication
  • E.g., globus-job-submit, globus-url-copy, uberftp

65
Where to get a User Certificate
  • Users may request a new user certificate for use
    on the TeraGrid from one of the following
    Certificate Authorities
  • NCSA
  • SDSC
  • Instructions for acquiring and installing user
    certificates for use on TeraGrid are provided at
  • http//www.teragrid.org/userinfo/guide_access_auth
    _setup.html

66
The /.globus directory
  • The default location where a users private key
    and certificate are installed
  • The directory in which Globus creates temporary
    subdirectories and files to handle grid job
    submission and file transfer

ls -la /.globustotal 24drwxr-xr-x 3
train00 train00 4096 Nov 17 1345 .drwx------
33 train00 train00 4096 Oct 17 2017
..-r--r--r-- 1 train00 train00 2703 Nov 17
1355 usercert.pem-r--r--r-- 1 train00 train00
1420 Nov 17 1350 usercert_request.pem-r--------
1 train00 train00 963 Nov 17 1350 userkey.pem
67
User Certificate Information
  • grid-cert-info -issuer -subject -startdate
    -enddate/CUS/ONational Center for
    Supercomputing Applications/CNCertification
    Authority/CUS/ONational Center for
    Supercomputing Applications/CNTraining
    User00Jul 11 211605 2003 GMTJul 10 211605
    2004 GMT
  • The certificates subject field is its
    Distinguished Name (DN) - this is needed for
    entry into the grid-mapfile

68
Authentication using X.509 certificates
  • Single Sign-On
  • Users authenticate once by generating a proxy
    certificate which is subsequently used whenever
    authentication is required
  • Globus grid-proxy-init
  • KX.509 (kinit kx509 kxlist -p)
  • Unattended Authentication
  • Users can escrow proxy certificates in a MyProxy
    server for automated use later

69
Globus grid-proxy-init
  • Example Create a proxy that will last for 24
    hourstg-login2/users/kericson
    grid-proxy-init -hours 24Your identity
    /CUS/ONPACI/OUSDSC/CNKate Ericson/USERIDkeric
    sonEnter GRID pass phrase for this identity
    Creating proxy ....................DoneYour
    proxy is valid until Sat Nov 22 103840
    2003tg-login2/users/kericson
    grid-proxy-destroy

70
KX.509/KCA Authentication
  • KX.509/KCA (Univ. of Michigan)
  • PSC employs Kerberos for site-wide user
    authentication and can generate proxy
    certificates for TeraGrid users as needed using
    KX.509/KCA
  • Users authenticate to the Kerberos server, e.g.
  • gt kinit myaccount_at_PSC.EDU
  • Obtain a short term certificate from KCA service
  • gt kx509
  • Generate and install a Globus-compatible proxy
  • gt kxlist -p
  • This is equivalent to using Globus grid-proxy-init

71
KX.509/KCA Authentication
bash-2.04 kinit dsimmel_at_PSC.EDU dsimmel_at_PSC.EDU's
Password bash-2.04 kx509 bash-2.04 kxlist
-p Service kx509/certificate issuer
/CUS/OPittsburgh Supercomputing Center/CNPSC
Kerberos Certification Authority subject
/CUS/OPittsburgh Supercomputing Center/OUPSC
Kerberos Certification Authority/CNdsimmel/UIDd
simmel/emailAddressdsimmel_at_PSC.EDU serial0F34
hasha31d5407 bash-2.04 grid-proxy-info subject
/CUS/OPittsburgh Supercomputing Center/OUPSC
Kerberos Certification Authority/CNdsimmel/USERI
Ddsimmel/Emaildsimmel_at_PSC.EDU issuer
/CUS/OPittsburgh Supercomputing Center/CNPSC
Kerberos Certification Authority identity
/CUS/OPittsburgh Supercomputing Center/OUPSC
Kerberos Certification Authority/CNdsimmel/USERI
Ddsimmel/Emaildsimmel_at_PSC.EDU type end
entity credential strength 512 bits path
/tmp/x509up_u17780 timeleft 95938
72
Getting into the grid-mapfile
  • The Globus grid-mapfile maps the Distinguished
    Names (DNs) of users certificates to their
    corresponding local user accounts
  • In future, installing ones certificate DNs on
    all the TeraGrid systems will be simplified via
    theTeraGrid User Portal (coming soon to a
    browser near you)
  • For now, users must manually add their
    certificates to the grid-mapfile at each site
  • At most TeraGrid sites, SSH to a login node and
    execute
  • gx-map -interactive
  • At PSC, access the DN management webpage at
  • https//dirs.psc.edu/teragrid/userpage/

73
MyProxy
  • TeraGrid operates a MyProxy Server at
    myproxy.teragrid.org
  • MyProxy client tools are used to install proxy
    certificates onto the server for later retrieval
  • Use myproxy-init in place of grid-proxy-init
  • Retrieve a stored proxy using myproxy-get-delegati
    on
  • Grid portals commonly use MyProxy to facilitate
    grid authentication tasks
  • MyProxy User Documentation
  • http//grid.ncsa.uiuc.edu/myproxy/userguide.html

74
MyProxy
myproxy-init -a -s myproxy.teragrid.orgYour
identity /CUS/ONational Computational Science
Alliance/CNJim BasneyEnter GRID pass phrase
for this identity Creating proxy
...........................................
DoneYour proxy is valid until Fri Sep 13
135256 2002Enter MyProxy Pass
PhraseVerifying password - Enter MyProxy Pass
PhraseA proxy valid for 168 hours (7.0 days)
for user jbasney now exists on
myproxy.teragrid.org. myproxy-get-delegation
-s myproxy.teragrid.org Enter MyProxy Pass
Phrase A proxy has been received for user
jbasney in /tmp/x509up_u500
75
Security of User Credentials
  • Identity Theft is rampant
  • World-wide security incident affected TeraGrid
    sites too
  • Passwords and keys for all users had to be
    replaced
  • Continues to require vigilant monitoring
  • Dangers of passwordless keys
  • Yes, they are convenient (proxies are such but
    are short-term)
  • Typically relies completely on filesystem
    protections
  • Implicitly trusts all who can read files where
    the private keys are kept, including system
    administrators and operators
  • Please help us keep the TeraGrid secure for
    everyone!
  • Pick strong passwords and passphrases
  • Keep them secret private - do NOT share them
  • Future directions - hardware tokens?

76
User Authentication Summary
  • All TeraGrid sites support SSH and GSI-based
    authentication (GSI-SSH, Globus)
  • Some TeraGrid sites require SSH keys
  • ANL, Caltech
  • TeraGrid-approved Certificate Authorities provide
    certificates for use on TeraGrid
  • X.509 user certificates enable single sign-on and
    unattended program execution
  • Globus grid-proxy-init or KX.509 method, MyProxy
  • Please protect your passwords and passphrases!

77
TeraGrid User Portal (30 min.)
  • Eric Roberts - TACC
  • ericrobe_at_tacc.utexas.edu

78
Outline
  • Motivation
  • What is a Portal?
  • Characteristics of a User Portal
  • Current TeraGrid User Portal Capabilities
  • Future Directions
  • Summary
  • Demo

79
Motivation
  • Make joining the TeraGrid easier for users
  • Single place for users to find user information
    and get user support
  • Certain information can be displayed better in a
    web page than in a command shell
  • Allow novice users to start using grid resources
    securely through a Web interface
  • Increase productivity of TeraGrid researchers
    do more science!

80
What is a Portal?
  • In general, a portal is a gateway to a set of
    distributed services accessible from a Web
    browser
  • Provides
  • Aggregation of different services as a set of Web
    pages
  • Single URL
  • Single Sign-On
  • Personalization
  • Customization

81
Characteristics of a User Portal
  • A User Portal is a type of
  • Portal
  • Science Gateway
  • Includes the following services
  • Documentation Services
  • Notification Services
  • User Support Services
  • Allocations
  • Accounts
  • Training
  • Consulting

82
Characteristics of a User Portal
  • (contd)
  • Collaborative Services
  • Calendar
  • Chat
  • Resource sharing
  • Information Services
  • Resource
  • Grid-wide
  • Interactive Services
  • Doesnt replace the command shell but provides a
    simpler, alternative interface

83
Service Aggregation
User Support Consulting
Notification User News
Collaborative Calendar Chat
Documentation User Guides
Information Resource Grid
Interactive Job Submission File Transfer
HTTP/SSL/SOAP
User Portal
HTTP/SSL
Client Browser
84
Documentation Services
  • Provide documentation for using the TeraGrid
  • User Guides
  • Help documentation

85
Notification Services
  • Notify users of events within the TeraGrid
  • User news
  • Message of the Day
  • Downtimes
  • Previous
  • Scheduled
  • Un-scheduled

86
Message of the Day
87
User Support Services
  • Allocations
  • Apply for and manage your TeraGrid allocations
  • Accounts
  • Create and manage TeraGrid accounts
  • Training
  • Important information about upcoming training
    classes
  • Consulting
  • Ask questions and get answers from TeraGrid staff

88
User Support Services
89
Collaborative Services
  • Collaborate with other users through the portal
  • Calendar
  • Post and view important events (e.g. instrument
    reservations, resource downtimes)
  • Chat
  • Chat with other users who are logged into the
    portal
  • Resource Sharing
  • Share documents with the TeraGrid community

90
Calendar Service
91
Information Services
  • Resource
  • State information about individual resources
  • Queue, Status, Load, OS Version, Uptime,
    Software, etc..
  • Grid software status
  • Grid
  • Grid-wide network preformance
  • Aggregated capability

92
System Information
93
Grid Information
94
Interactive Services
  • Security
  • Hidden from the user as much as possible
  • Remote command execution
  • File Management
  • Upload
  • Download
  • Transfer between resources
  • Job Submission to a single resource
  • Job Submission to a grid meta-scheduler
  • Composite Job Sequencing

95
Manage Proxies
96
Remote Command Execution
97
Job Submission
98
File Management
99
Current TeraGrid User Portal Capabilities
  • User Services
  • Portal Account
  • Information Services
  • System
  • Grid
  • Network
  • Interactive
  • Remote Command Execution
  • Job Submission
  • File Management

100
Future Directions
  • Central gateway for TeraGrid services
  • TeraGrid allocations and account
    creation/management through portal
  • Streamline the process
  • Application portals
  • Science gateways that expose scientific
    applications through interfaces

101
Summary
  • Currently the User Portal offers basic user,
    informational and interactive services
  • Will extend current User Portal feature set in
    the future
  • User Portal should be starting place for new
    users
  • Ease the process of joining the TeraGrid

102
Demonstration
  • Login with MyProxy
  • Demonstration of Services
  • Documentation
  • Notification
  • User Support
  • Collaborative
  • Information
  • Interactive

103
Applications in the TeraGrid Environment Data
Collections and Databases (60 min.)
  • Bill Whitson - Purdue
  • wiw_at_purdue.edu
  • Leesa Brieger SDSC
  • leesa_at_sdsc.edu

104
Data Collections
  • Bill Whitson - Purdue
  • wiw_at_purdue.edu

105
Data on the TeraGridHelping fulfill the
cyberinfrastructure vision
  • The NSF Blue-Ribbon Advisory Panel on
    Cyberinfrastructure emphasized the importance of
    multidisciplinary, well-curated federated
    collections of scientific data to future research
  • TeraGrid provides
  • Scientific data collections on distributed mass
    storage systems
  • Storage and database services for new collections
  • GridFTP- and SRB-based access to data by
    computational grid via a very high-speed network

106
Data Collections on TeraGrid
  • Indiana
  • CLIOH Cultural Digital Libraries Indexing Our
    Heritage
  • CLSD Central Life Sciences Data
  • Flybase and euGenes databases
  • NCSA
  • Astronomy databases
  • Purdue
  • PTO Purdue Terrestrial Observatory
  • LARS Laboratory for the Application of Remote
    Sensing

107
Data Collections on TeraGrid (2)
  • SDSC
  • NVO National Virtual Observatory
  • SCEC Southern California Earthquake Center
  • NSDL K-12 Curriculum Web Sites
  • TACC
  • Worldwide hydrological data
  • Telemetry data
  • LIDAR terrain data
  • High-res CT scans of geological and biological
    specimens

108
Storage and Database Services
  • Mass storage
  • NCSA, SDSC, TACC
  • SRB
  • Caltech, NCSA, SDSC
  • Database services
  • IU, SDSC, TACC

109
Adding Data Collections
  • Hosting services being defined
  • datacentral.sdsc.edu
  • Considerations for hosting data include
  • Field of science, impact to domain
  • Size, growth rate, lifetime of collection
  • Methods of access/limitations on access
  • Data collection or database
  • For additional information about TeraGrid data
    collections, see the Resource Overview
  • http//www.teragrid.org/userinfo/guide_hardware.ht
    ml

110
Databases
  • Leesa Brieger SDSC
  • leesa_at_sdsc.edu
  • With Jesus Castegnetto and Qiao Xin, SDSC

111
Databases on TeraGrid
  • This session will present the database tools
  • hosted at SDSC
  • DB2 on Teragrid
  • Using the DB2 client
  • DB2 Java programming
  • How to get a DB2 account
  • Useful links

112
DB2 on TeraGrid
  • Database servers
  • ds003.sdsc.edu
  • 128 GB memory 2 TB SAN DB2
  • ds005.sdsc.edu
  • 128 GB memory DB2
  • Database clients
  • DB2 client installed on TeraGrid login nodes

113
DB2 on TeraGrid
  • Before the db2 client can connect to a
    database on a given server, the server and its
    databases must be registered (catalogued) once
    from the host running the client.

114
DB2 on TeraGrid
  • connect to server
  • Log on to TG login node using SSH
  • ssh username_at_tg-login.sdsc.teragrid.org
  • Setup db2 client
  • soft add db2
  • resoft
  • Or add db2 to your /.soft file
  • resoft
  • cd /tutorial/db_example/sample

115
DB2 on TeraGrid
  • Three ways to use the DB2 client
  • Interactive input mode
  • db2
  • db2gt(type SQL commands here)
  • Command mode
  • db2 SELECT FROM SCHEMA.TABLE
  • Batch mode
  • db2 tvf filename
  • the file filename contains SQL commands

116
Access the Databases
  • List database directories
  • db2 LIST DB DIRECTORY
  • Database 1 entry
  • Database alias
    SC2004
  • Database name SC2004
  • Node name DS003
  • Database release level a.00
  • Comment
  • Directory entry type
    Remote
  • Catalog database partition number -1

117
Access the Databases
  • Connect to database
  • db2 CONNECT TO SC2004 USER username
  • Enter current password for username (type
    your password here)
  • Show tables
  • db2 LIST TABLES
  • Table/View Schema Type
    Creation time
  • ------------------------------------ -----
    --------------------------
  • 0 record(s) selected .

118
Create New Tables
  • Create table
  • db2 CREATE TABLE CAPITAL(STATE VARCHAR(20)
    NOT NULL, CAPITAL VARCHAR(20) NOT NULL,
    LARGEST_CITY VARCHAR(20) NOT NULL)
  • Show table info
  • db2 LIST TABLES
  • Table/View Schema
    Type Creation time
  • -------------------------------
    --------------- ----- --------------------------
  • CAPITAL QIAO
    T 2004-09-30-10.51.12.382651

119
More Info on Table
  • Find table info
  • db2 DESCRIBE TABLE CAPITAL
  • Column Type Type
  • name schema name
    Length Scale Null
  • ------------------------------ ---------
    ------------------ -------- ----- ------
  • STATE SYSIBM
    VARCHAR 20 0 No
  • CAPITAL SYSIBM
    VARCHAR 20 0 No
  • LARGEST_CITY SYSIBM VARCHAR
    20 0 No
  • 3 record(s) selected.

120
Moving Data into Databases
  • Three ways to move data into databases
  • Insert
  • Import
  • Load
  • (Note load has partitioned load options to move
    data in parallel and it needs LOAD privilege. We
    will talk about insert and import here.)

121
Import Data into Databases
  • Import data
  • db2 IMPORT FROM data.txt OF DEL INSERT INTO
    CAPITAL
  • Check table
  • db2 select from capital
  • STATE CAPITAL
    LARGEST_CITY
  • -------------------- --------------------
    --------------------
  • Alabama Montgomery
    Birmingham
  • Alaska Juneau
    Anchorage
  • Wisconsin Madison
    Milwaukee
  • 49 record(s) selected.

122
Import Data Format
  • File data.txt
  • "Alabama", "Montgomery", "Birmingham"
  • "Alaska", "Juneau", "Anchorage"
  • "Arizona", "Phoenix", "Phoenix"
  • "Arkansas", "Little Rock", "Little Rock"
  • "California", "Sacramento", "Los Angeles"
  • "Colorado", "Denver", "Denver"
  • "Connecticut", "Hartford", "Bridgeport"
  • "Delaware", "Dover", "Wilmington"
  • "Florida", "Tallahassee", "Jacksonville"
  • . . .

123
Insert Data into Databases
  • Insert data
  • db2 INSERT INTO CAPITAL VALUES
  • ( Wyoming, Cheyenne, Cheyenne )
  • Check table
  • db2 SELECT FROM CAPITAL WHERE
    STATEWyoming
  • STATE CAPITAL
    LARGEST_CITY
  • -------------------- --------------------
    --------------------
  • Wyoming Cheyenne
    Cheyenne
  • 1 record(s) selected.

124
Databases Queries
  • Find the largest city for the state Iowa
  • db2 SELECT LARGEST_CITY FROM CAPITAL
    WHERE STATEIowa
  • Find states whose name looks like south
  • db2 SELECT FROM CAPITAL WHERE STATE
    LIKE South
  • Display rows in descending order where state
    names include the letter I
  • db2 SELECT FROM CAPITAL WHERE STATE
    LIKE i ORDER BY STATE DESC

125
More DB2 Commands
  • Delete all rows from the table
  • db2 DELETE FROM CAPITAL
  • Drop table
  • db2 DROP TABLE CAPITAL

126
Using Batch Mode
  • table.cmd
  • -- create table
  • create table capital(
  • state varchar(20) not null,
  • capital varchar(20) not null,
  • largest_city varchar(20) not null
  • )
  • --import data
  • import from all_data.txt of del insert into
    capital
  • --check table data
  • select from capital

127
Using Batch Mode
  • Run the SQL commands in table.cmd
  • make sure earlier examples of table CAPITAL have
    been deleted
  • db2 DROP TABLE CAPITAL
  • db2 tvf table.cmd

128
Java Programming for DB2
  • Construct a program which handles a whole
  • set of DB2 queries and DB management -
  • in a java program.
  • Next example -
  • cd /tutorial/db_example/acronym
  • README.txt in this directory contains detailed
    information.

129
Acronyms Example
  • First do some queries to get an idea of the data
  • Find info about the table
  • db2 DESCRIBE TABLE JESUS.ACRONYMS
  • Column Type Type
  • name schema name
    Length Scale Nulls
  • ------------------------------ ---------
    ------------------ -------- ----- ------
  • ACRONYM SYSIBM VARCHAR
    10 0 No
  • DEFINITION SYSIBM VARCHAR 120
    0 No
  • Look at contents of table ACRONYMS
  • db2 SELECT FROM JESUS.ACRONYMS
  • (Note the use of since we have in the
    SQL command.)

130
Java Programming
  • Type 4 Driver
  • No DB2 client required
  • Can be used to create both Java application
  • and applets
  • Communicate directly with the database
  • Type 2 Driver
  • Rely on DB2 client to connect to the server
  • Can not create Java applet
  • (We will use type 4 driver here.)

131
Java Programming
  • GetAcronym.java - implements the DB2-style query
  • db2 SELECT ACRONYM,DEFINITION FROM
  • JESUS.ACRONYMS WHERE ACRONYM EMACS
  • which finds the definition for acronym EMACS.
  • GetAcronym.java takes as argument the acronym
  • youre searching for.

132
Java Programming Steps
  • Import Java package java.sql.
  • Load the appropriate JDBC driver
  • Connect to the database
  • Pass SQL statement to the database
  • Get the results from the database
  • Process the results
  • Close the connection

133
Java Programming
  • GetAcronym.java
  • / 1. import the java packages /
  • import java.sql.
  • import java.util.
  • / 2. load the type 4 driver /
  • public class GetAcronym
  • static
  • try
  • Class.forName("com.ibm.db2.jcc.DB2
    Driver")
  • catch (ClassNotFoundException
    e)
  • System.err.println (" Unable
    to load DB2 driver \n e.getMessage())
  • e.printStackTrace()
  • System.exit(1)
  • (contd)

134
Java Programming
  • GetAcronym.java(contd)
  • public static void main(String args)
  • / check argument /
  • if (args.length ! 1)
  • System.out.println("Missing parameter An
    acronym to search for is needed")
  • System.exit(1)
  • / get value for acronym, database, user,
    passwd and query /
  • String acronym args0.toUpperCase()
  • String dbURL "jdbcdb2//ds003.sdsc.edu60035/
    SC2004"
  • String dbUser System.getProperty("user")
  • String dbPassword System.getProperty("password
    ")
  • String query select definition from
    jesus.acronyms where acronym?"
  • (contd)

135
Java Programming
  • GetAcronym.java (contd)
  • try
  • / 3. connect to database /
  • Connection conn DriverManager.getConnection(
    dbURL, dbUser, dbPassword)
  • / 4. prepare and execute statement /
  • PreparedStatement stmt conn.prepareStatemen
    t(query)
  • stmt.setString(1, acronym)
  • ResultSet rs stmt.executeQuery()
  • / 5. get back resultset /
  • ArrayList descnew ArrayList()
  • while (rs.next())
  • desc.add(rs.getString("definition").trim
    ())
  • (contd)

136
Java Programming
  • GetAcronym.java(contd)
  • / 6. process and print out the resultset /
  • if (desc.size() 0)
  • System.out.println("The acronym " acronym
    " is not in our database")
  • else
  • String out "Description(s) found for
    acronym " acronym "\n\n
  • for (int i0 i lt desc.size() i)
  • out "" String.valueOf(i 1) "
    "
  • out (String) desc.get(i) "\n"
  • System.out.print(out)
  • (contd)

137
Java Programming
  • GetAcronym.java(contd)
  • / 7. close resultset, statement and connection
    /
  • rs.close()
  • stmt.close()
  • conn.close()
  • catch (SQLException e)
  • System.err.println("SQL Exception "
    e.getMessage())
  • e.printStackTrace()

138
Java Programming
  • Using GetAcronym.java
  • Compile the program
  • javac GetAcronym.java
  • Run
  • java Duserusername Dpasswordpassword
    GetAcronym acronym
  • You dont really want to put your password on the
  • command line, so use an alternative approach

139
Java Programming
  • A shell script that prompts for the appropriate
    information and runs the java program
  • ./find_acronym.sh acronym

140
Java Programming
  • GetAcronym2.java - a few more functionalities
    built into this program
  • Find definition for acronyms containing ASA
  • db2 SELECT FROM JESUS.ACRONYMS WHERE
    ACRONYM LIKE ASA
  • Display acronyms whose definition include Array
    in descending order
  • db2 SELECT FROM JESUS.ACRONYMS WHERE
    DEFINITION LIKE Array ORDER BY ACRONYM DESC
  • Takes a search argument

141
Java Programming
  • Scripts for running GetAcronym2
  • Examples
  • ./acronym ASA
  • ./acronym_like ASA
  • ./acronym_with_definition As
  • Refer to README.txt for detailed info.

142
Databases on TeraGrid
  • How to get an account on SDSCs Datastar?
  • Hosting temporary DB2 databases
  • http//datacentral.sdsc.edu/allocation_short_te
    rm.html
  • Hosting long-term DB2 databases
  • http//datacentral.sdsc.edu/allocations.html

143
Databases on TeraGrid
  • U
Write a Comment
User Comments (0)
About PowerShow.com