Title: SC04 TeraGrid Tutorial: Applications in the TeraGrid Environment
1SC04 TeraGrid Tutorial Applications in the
TeraGrid Environment
- John Towns, NCSA ltjtowns_at_ncsa.edugt
- Nancy Wilkins-Diehr, SDSC ltwilkinsn_at_sdsc.edugt
- Derek Simmel, PSC ltdsimmel_at_psc.edugt
- Eric Roberts, TACC ltericrobe_at_tacc.utexas.edugt
- Bill Whitson, Purdue, ltwiw_at_purdue.edugt
- Leesa Brieger, SDSC, ltleesa_at_sdsc.edugt
- Ruth Aydt, NCSA, ltaydt_at_ncsa.edugt
- Mike Papka, ANL, ltpapka_at_mcs.anl.govgt
- Kelly Gaither, TACC, ltkelly_at_tacc.utexas.edugt
- John Cobb, ORNL, ltcobbjw_at_ornl.govgt
- Amit Majumdar, SDSC, majumdar_at_sdsc.edu
- and many others participating in the TeraGrid
Project
2Tutorial Outline - Morning
- Introduction to TeraGrid Resources and Services
- John Towns, Nancy Wilkins-Diehr Slide 4 60 mins
- User Certificates on the TeraGrid
- Derek Simmel Slide 44 30 mins
- TeraGrid User Portal
- Eric Roberts Slide 77 30 mins
- BREAK
- Data Collections and Databases
- Bill Whitson, Leesa Brieger Slide 104 60 mins
- Data Management
- Ruth Aydt, Leesa Brieger Slide 145 60 mins
3Tutorial Outline - Afternoon
- LUNCH
- Launching jobs single site, Globus, Condor, and
DAGman for pipelined applications - Leesa Brieger Slide 194 75 mins
- Visualization and MPICH-G2
- Mike Papka, Kelly Gaither Slide 260 45 min
- BREAK
- Instrument Integration
- John Cobb, Amit Majumdar Slide 287 30 min
- QA
- Staff available to address individual application
needs, allocation requests?
4Introduction to TeraGrid Resources and Services
(60 min.)
- Nancy Wilkins-Diehr SDSC / UCSD
- wilkinsn_at_sdsc.edu
- John Towns NCSA / Univ of Illinois
- jtowns_at_ncsa.edu
5Introduction to the TeraGrid
- TeraGrid vision
- Objectives
- Science Applications
- Current resources, extensibility
- Data management/data resources
- Visualization
- Networking
- Grid services
- Software stack
- Job execution
- Allocations and proposals
- Support, Documentation, Training
6The TeraGrid VisionDistributing the resources is
better than putting them at one site
- Build new, extensible, grid-based infrastructure
to support grid-enabled scientific applications - Expand centers to support cyberinfrastructure
- Distributed, coordinated operations center
- Exploit unique partner expertise and resources to
make whole greater than the sum of its parts - Leverage homogeneity to make the distributed
computing easier and simplify initial development
and standardization - Run single job across entire TeraGrid
- Move executables between sites
7TeraGrid Objectives
- Create unprecedented capability
- integrated with extant PACI capabilities
- supporting a new class of scientific research
- Deploy a balanced, distributed system
- not a distributed computer but rather
- a distributed system using Grid technologies
- computing and data management
- visualization and scientific application analysis
- Define an open and extensible infrastructure
- an enabling cyberinfrastructure for scientific
research - extensible beyond the original sites
- NCSA, SDSC, ANL, Caltech, PSC (under ETF)
- ETF2 awards to TACC, Indiana/Purdue, ORNL
8Measuring Success
- Breakthrough science via new capabilities
- integrated capabilities more powerful than
existing PACI resources - current PACI users and new communities requiring
Grids - An extensible Grid
- design principles assume heterogeneity and more
than four sites - Grid hierarchy, scalable, replicable, and
interoperable - formally documented design, standard protocols
and specifications - encourage, support, and leverage open source
software - A pathway for current users
- evolutionary paths from current practice
- provide examples, tools, and training to exploit
Grid capabilities - user support, user support, and user support
9TeraGrid Application Targets
- Multiple classes of user support
- each with differing implementation complexity
- minimal change from current practice
- new models, software, and applications
- Usage exemplars
- traditional supercomputing made simpler
- remote access to data archives and computers
- distributed data archive access and correlation
- remote rendering and visualization
- remote sensor and instrument coupling
10Wide Variety of Usage Scenarios
- Tightly coupled jobs storing vast amounts of
data, performing visualization remotely as well
as making data available through online
collections (ENZO) - Thousands of independent jobs using data from a
distributed data collection (NVO) - Applications employing novel latency-hiding
algorithms adapting to a changing number of
processors (PPM) - High-throughput applications of loosely coupled
jobs (MCell)
11TeraGrid Roaming
Attend TeraGrid training class or access
web-based TG training materials
Receive Account info, pointers to training, POC
for user services Ops, pointers to login
resources, atlas of TG resources
Apply for TeraGrid Account
Develop and optimize code at Caltech
Run large job at NCSA, move data from SRB to
local scratch and store results in SRB
Run large job at SDSC, store data using SRB.
Run larger job using both SDSC and PSC systems
together, move data from SRB to local scratch
storing results in SRB
Move small output set from SRB to ANL cluster, do
visualization experiments, render small sample,
store results in SRB
Move large output data set from SRB to
remote-access storage cache at SDSC, render using
ANL hardware, store results in SRB
(Recompile may be necessary in some cases)
12Expanding to Science Gateways
Science Gateways
TG backend into GenDB which will make it
possible for about 70 genome sequencing projects
to use TG
13TeraGrid Extensibility
- Sites interested in joining the TG need
- fast network
- non-trivial resources
- meet SLA (testing and QA requirements)
- become a member of the virtual organization
- capable of TG hosting (peering arrangements)
- TG Software Environment
- user (download, configure, install, and run TG
1.0) - developer (join distributed engineering team)
- TG Virtual Organization
- Working group participation
- Operations, User services
- Add new capability
- make the whole greater than the sum of its parts
14Organization
Strategic Relationships
CUAC
Executive Steering Committee Catlett
(Director) Andrews (SDSC) Boisseau (TACC) Gannon
(IU) Pennington (NCSA) Roskies (PSC)
Project Director (Catlett)
Science Communities
NSF
MRE Projects
Community Grids
NMI
ITR Projects
Globus Alliance
Support, Project Mgmt, Finance, EOT
Core
Rotating Chair TeraGrid Resource Forum
. . .
Area Director
Area Director
TeraGrid Resources
Base CyberInfrastructure
Community Engagement
Area Director
Area Director
- Software
- Services
- Collaboration
User Services
Software Integration
15TeraGrid Vision A Unified National HPC
Infrastructure that is Persistent and Reliable
- Largest NSF compute resources
- Largest DOE instrument (SNS)
- Fastest network
- Massive storage
- Visualization instruments
- Science Gateways
- Community databases
E.g Geosciences 4 data collections including
high-res CT scans, global telemetry data,
worldwide hydrology data, and regional LIDAR
terrain data
16TeraGrid Components
- Compute hardware
- Intel/Linux Clusters, Alpha SMP clusters, POWER4
cluster, POWER3 cluster, SUN visualization system - Large-scale storage systems
- hundreds of terabytes for secondary storage
- Very high-speed network backbone
- bandwidth for rich interaction and tight
coupling - Grid middleware
- Globus, data management,
- Next-generation applications
17Resources and Services(40TF, 1.4PB disk, 12 PB
tape)
18TeraGrid Compute Resources
4 Lambdas
CHI
LA
96 GeForce4 Graphics Pipes
100 TB DataWulf
96 Pentium4 64 2p Madison Myrinet
32 Pentium4 52 2p Madison 20 2p Madison Myrinet
20 TB
Caltech
ANL
256 2p Madison 667 2p Madison Myrinet
128 2p Madison 256 2p Madison Myrinet
1.1 TF Power4 Federation
500 TB FCS SAN
230 TB FCS SAN
NCSA
SDSC
PSC
19Data Resources and Data Management Services
- Approach
- Deploy core services
- Drive the system with data intensive flagship
applications - TG Data Services Plan
- High-speed cross-site data transfer capability
- Parallel file systems
- Mass storage systems
- GridFTP and SRB-based access to data
- Hosted data collections
- Database capabilities
- 2 dedicated 32-processor IBM p690 nodes
- DB2, mySQL, Oracle clients
- HDF5 libraries
- Bill, Leesa and Ruth will discuss some of these
this morning
20The TeraGrid Visualization Strategy
- Combine existing resources and current
technology - Commodity clustering and commodity graphics
- Grid technology
- Access Grid collaborative tools
- Efforts, expertise, and tools from each of the
sites - Volume Rendering (SDSC)
- Coupled Visualization (PSC)
- Volume Rendering (Caltech)
- VisBench (NCSA)
- Grid and Visualization Services (ANL)
- Sun Terascale Visualization Machine (TACC)
- to enable new and novel ways of visually
interacting with simulations and data - Mike and Kelly will discuss this afternoon
21Visualization Sample Use Cases
22The TeraGrid Networking Strategy
- TeraGrid Backplane
- Provides sufficient connectivity (bandwidth,
latency) to support virtual machine room - Core backbone 40 Gbps
- Connectivity to each site 10-30 Gbps
- Local networking
- Provide support for all nodes at each site to
have adequate access to backplane - Support for distributed simulations
23Grid Services A Layered Grid Architecture
Talking to things communication (Internet
protocols) security
Connectivity
Controlling things locally Access to,
control of, resources
Fabric
24TeraGrid Runtime Environment
CREDENTIAL
Single sign-on via grid-id
Assignment of credentials to user proxies
Globus Credential
Mutual user-resource authentication
Site 2
Authenticated interprocess communication
Mappingtolocal ids
Certificate
25Common Authentication Service
- Standardized GSI authentication
- A user can authenticate for SSH and Grid services
to every TeraGrid system with a single user
certificate - Developed coordinated CA acceptance policy
- TeraGrid-accepted Certificate Authorities include
- NCSA, SDSC, PSC, Purdue, TACC CAs
- DOEGrids CA, UK E-Science CA
- Procedures and tools to simplify the use and
management of certificates - Certificate request, retrieval, distribution,
installation - Derek will cover these in more detail later
26Grid Information Services
- Currently Leveraging Globus Grid Information
Service - each service/resource is an information source
- index servers at each of the TG sites
- full mesh between index servers for fault
tolerance - access control as needed
- Resource Information Management Strategies
- TG GIS for systems level information
- generic non-sensitive information
- access control on sensitive info such as job
level information - Applications specific GIS services
- access controls applied as needed
- user control
27Grid Scheduling Job Management Condor-G, the
User Interface
- Condor-G is the preferred job management
interface - job scheduling, submission, tracking, etc.
- allows for complex job relationships and data
staging issues - interfaces to Globus layers transparently
- allows you to use your workstation as your
interface to the grid - The ability to determine current system loads and
queue status will come in the form of a web
interface - Talk this morning by Eric Roberts about early
portal development work
28Homogeneity Strategies
- Common Grid Middleware Layer
- TeraGrid Software Stack
- Collectively Designed
- tests and build modules required
- Multiple Layers Coordinated
- environment variables
- pathnames
- versions for system software, libraries, tools
- Minimum requirements plus Site Value-Added
- multiple environments possible
- special services and tools on top of common
TeraGrid software stack - Community software areas, TG_COMMUNITY
- Common User Environment with softenv
29TeraGrid Software Stack
- CTSS Common TeraGrid Software Stack
- A social contract with the user
- LORA Learn Once, Run Anywhere
- Reproducibility
- standard configure, build, and install
- single CVS repository for software
- initial releases for IA-64, IA-32, Power4, Alpha
30Current TG Software Stack
- SuSE SLES
- X-cat
- MPICH, MPICH-G2, MPICH-VMI
- gm drivers
- VMI/CRM
- Globus
- Condor-G
- gsi-ssh
- GPT
- SoftEnv
- MyProxy
- Intel compilers
- GNU compilers
- HDF4/5
- SRB client
- Inca
- db2-client
- uberftp
- tg-policy
- tg_usage
- myprojects
- Kx509
- atlas
31TG_COMMUNITY
- Community Software Area (CSA) of TeraGrid is
available and accepting requests from PIs - Intended for the installation of executables and
libraries that will be used by a community of
users. - Software in this area is maintained by users and
is not guaranteed to be in any particular level
of readiness - Consistent and easy-to-find location, for example
TG_COMMUNITY/my_physics_application on all
requested TG platforms - softenv keys are available for a community of
users to customize their environment
32SoftEnv System
- Software package management system instituting
symbolic keys for user environments - Replaces traditional UNIX dot files
- Supports community keys
- Programmable similar to other dot files
- Integrated user environment transfer
- Well suited to software lifecycles
- Offers unified view of heterogeneous platforms
33Manipulating the Environment
- /home/ltusernamegt/.soft
- softenv
- Displays symbolic software key names
- soft add ltpackage-namegt
- Temporary addition of package to environment
- soft delete ltpackage-namegt
- Temporary package removal from environment
- resoft
- Modify dotfile and apply to present environment
- man softenv
34Complete User Support
- Allocations
- Documentation
- Applications
- Consulting
- Training
35Allocations Policies
- Any US researcher can request an allocation
- Policies/procedures posted at
- http//www.paci.org/Allocations.html
- Online proposal submission
- https//pops-submit.paci.org/
- Different levels of review for different size
allocation requests development, medium and
national resource allocation committees - DAC up to 30,000,
- only a one paragraph abstract required
- accepted continuously!
- MRAC lt200,000 SUs/year
- reviewed every 6 months
- next deadline April 2005
- NRAC 200,000 SUs/year
- reviewed every 6 months
- next deadline January 2005
36Roaming and Specific Allocations
- R-Type roaming allocations
- can be used on any TG resource
- usage debited to a single (global) allocation of
resource maintained in a central database - S-Type specific allocations
- can only be used on specified resource
- All S awards come with 30,000 roaming SUs to
encourage roaming usage of TG - usage debited to a single allocation of resource
maintained in a central database - R- and S-Type allocation all come from single
pool of TG resources
37Accounts and Account Management
- TG accounts created on ALL TG systems for every
user - information regarding accounts on all resources
delivered - working toward single US mail packet arriving for
user - accounts synched through centralized database
- certificates provide uniform access for users
- jobs can be submitted to/run on any TG resource
- NMI Account Management Information Exchange
(AMIE) used to manage account and transport usage
records in GGF format
38Variety of TeraGrid Usage Scenarios
- Tightly coupled distributed usage
- plan to use resources at multiple sites in a
coordinated fashion - multi-site MPI applications, coupled simulations,
etc. - Loosely coupled distributed usage
- can use resources at multiple sites
- may only use resources at one site for a
particular run - multiple jobs on multiple resources pipelined
applications, etc. - Usage local to a site/resource
- will only use resources local to a site for any
particular run - may or may not be runnable at more than one site
- All of these applications types are appropriate
for TeraGrid - Fast networks allow users to consider new
approaches - Will see many examples today!
39Documentation
- TeraGrid-wide documentation
- simple
- high-level
- what works on all resources
- Site-specific documentation
- full details on unique capabilities of each
resource - http//www.teragrid.org/docs
40Common Installation of Applications
- ls TG_APPS_PREFIX
- ATLAS globus-2.4.2-2003-07-30-test2
netcdf-3.5.0 - HPSS goto
papi - LAPACK gx-map
pbs - PBSPro_5_2_2_2d gx-map-0.3
perfmon - bin hdf4
petsc - crm hdf5
- Support for installation of your communitys
application on all TeraGrid platforms in
TG_COMMUNITY
4124/7 Consulting Support
- help_at_teragrid.org
- Advanced ticketing system for cross-site support
- 866-336-2357
- http//news.teragrid.org/
- Extensive experience solving problems for early
access users
42Training that Meets User Needs
- Asynchronous training
- This tutorial and materials will be available
online - Synchronous training
- TeraGrid training incorporated into ongoing
training activities at all sites - Training at your site
- With sufficient participants
43Wed like to have you as a TeraGrid user!
- Talk with any of us during the breaks about how
to apply for time or visit http//www.paci.org/All
ocations.html
44User Authentication and TeraGrid Certificates (30
min.)
- Derek Simmel
- Pittsburgh Supercomputing Center
- dsimmel_at_psc.edu
45User Authentication
- Goals
- Single Sign-On
- Unattended Program Execution
- Manageable Security
- Interactive Login to Compute Resources
- Normal SSH via passwords or DSA/RSA keys
- Authentication via X.509 User Certificates
- GSI-SSH and Globus services
- MyProxy
- Managing User Credentials
46Password Authentication
- Without coordination of authenticationbetween
sites
47Single Password Authentication
48SSH Password Authentication
- Users use an account password or Kerberos
password to authenticate login and file transfers - The password is a shared secret between the user
and the remote resource provider
49Public Key Cryptography
- A pair of different, but complimentary keys, one
private (secret) key, and one public key - Whatever you encrypt via one of the keys, you can
only decrypt with the other key
50Public Key Authentication
- Private, Secret Key (S)is kept securelyby the
user - Public Key (P) is distributedto sites to
authenticate use of the Private Key (S)
51SSH Public Key Authentication
- Generate an SSH key-pair via ssh-keygen
- Place /.ssh/id_dsa.pub or /.ssh/id_rsa.pub in
remote host accounts /.ssh/authorized_keys - SSH prompts the user for the private key
passphrase when using ssh or scp
- On the remote host, /.ssh/authorized_keys file
contains public keys for those allowed to access
this account - SSH Key Passphrase is not transmitted over the
wire
52SSH Public Key Authentication
- Generate an SSH key with ssh-keygen
dsimmel ssh-keygen -t dsa Generating
public/private dsa key pair. Enter file in which
to save the key (/Users/dsimmel/.ssh/id_dsa)
Enter passphrase (empty for no passphrase)
Enter same passphrase again Your
identification has been saved in
/Users/dsimmel/.ssh/id_dsa. Your public key has
been saved in /Users/dsimmel/.ssh/id_dsa.pub. The
key fingerprint is 2fbcaac98e3e7fee8e65
aa9d50561dc6 dsimmel_at_Derek-Simmels-Computer.
local dsimmel
53SSH Public Key Authentication
- Copy the public key to the remote host
- Some TeraGrid sites do not permit password-based
SSH authentication according to local policy - ANL, Caltech
- Users must contact these sites, verify their
identity, and transmit a copy of their SSH public
key to them so that it can be installed on their
behalf
dsimmel scp .ssh/id_dsa.pub dsimmel_at_lemieux.psc.e
duid_dsa.pub dsimmel_at_lemieux.psc.edu's password
id_dsa.pub 100 626
562.9KB/s 0000 dsimmel
54SSH Public Key Authentication
- Put the public key into the authorized_keys file
on the remote host (policy permitting)
dsimmel ssh dsimmel_at_lemieux.psc.edu dsimmel_at_lemie
ux.psc.edu's password bash-2.04 ls -l
id_dsa.pub -rw------- 1 dsimmel staff 626 Oct
11 1409 id_dsa.pub bash-2.04 cat id_dsa.pub gtgt
.ssh/authorized_keys bash-2.04 ls -l
.ssh/authorized_keys -rw-r--r-- 1 dsimmel
staff 626 Oct 11 1447 .ssh/authorized_keys bash
-2.04 exit dsimmel
55SSH Public Key Authentication
dsimmel ssh dsimmel_at_lemieux.psc.edu Enter
passphrase for key '/Users/dsimmel/.ssh/id_dsa'
Compaq Tru64 UNIX V5.1A (Rev. 1885) Compaq
AlphaServer SC TS2.5 This system is for the use
of authorized users only. Unauthorized use may
be monitored and recorded. In the course of such
monitoringor through system maintenance, the
activities of authorized users maybe monitored.
By using this system you expressly consent to
such monitoring. If there are any problems,
please contact remarks_at_psc.edu. bash-2.04
56SSH Public Key Single Sign-On
- SSH-agent is a program that sets up a small
service on your local machine to temporarily hold
your private SSH keys - You add your keys with ssh-add
- With your keys loaded, ssh and scp commands
require no passphrase entry - Use ssh-agent -k to remove the service when done
57SSH Public Key Single Sign-On
dsimmel eval ssh-agent Agent pid 544 dsimmel
env grep SSH SSH_AGENT_PID544 SSH_AUTH_SOCK/tm
p/ssh-6SvMtTGu/agent.543 dsimmel ssh-add Enter
passphrase for /Users/dsimmel/.ssh/id_dsa
Identity added /Users/dsimmel/.ssh/id_dsa
(/Users/dsimmel/.ssh/id_dsa) dsimmel ssh
dsimmel_at_lemieux.psc.edu Compaq Tru64 UNIX V5.1A
(Rev. 1885) Compaq AlphaServer SC TS2.5 This
system is for the use of authorized users only.
Unauthorized use may be monitored and recorded.
In the course of such monitoringor through
system maintenance, the activities of authorized
users maybe monitored. By using this system you
expressly consent to such monitoring. If there
are any problems, please contact
remarks_at_psc.edu. bash-2.04
58SSH Public Key Single Sign-On
bash-2.04 exit logout Connection to
lemieux.psc.edu closed. dsimmel eval ssh-agent
-k Agent pid 544 killed dsimmel
- The output of the ssh-agent command is a short
set of commands to set (or with -k, remove) the
SSH agent environment variables - Using eval ssh-agent -k is a shortcut to
execute those environment-setting commands
59Limitations of SSH
- SSH was designed as a secure replacement for
Telnet and r-commands (rlogin, rsh, rcp) - The ssh server provides a specific set of
services for secure remote command execution and
file transfer - Limitations of SSH Public Key method
- Users have to prove who they are to every
resource - With certificates, a resource provider need only
trust the CA - Difficult to know if users keys are still valid
- Every user needs to be polled, since there is no
CRL
60X.509 Certificates
- X.509 certificates are widely accepted and
flexible for current and future needs - A variety of authentication and authorization
methods support X.509 certificates, including web
services - X.509 standard permits additional data fields in
the certificate for specialized purposes - With GSI-SSH, X.509 user certificates can be used
to authenticate to SSH servers too
61Certificates andPublic Key Infrastructure
Registration Authority
Certificate Authority
A
CA
RA
User Z
62Certificate Authority (CA)
- A trusted 3rd party for issuing credentials
- Validates user identity
- role of the Registration Authority (RA)
- Issues digitally-signed certificates
- To affirm the identity of users, hosts and
services to those who trust the CA - Maintains a current certificate revocation list
(CRL) for certificates revoked prior to their
expiration date/time.
63TeraGrid-Accepted CAs
- Certificates issued by the following Certificate
Authorities are accepted for use on the TeraGrid - NCSA
- SDSC
- PSC, PSC KCA
- Purdue
- TACC
- DOEGrids
- UK E-Science CA
64X.509 User Certificates
- De facto standard for Grid authentication
- Public Key approach
- User keeps the secret key private and secure
- CA digitally signs the public key along with
additional info to produce a user certificate - TeraGrid supports X.509 certificates for
- GSI-SSH authentication
- Globus grid services authentication
- E.g., globus-job-submit, globus-url-copy, uberftp
65Where to get a User Certificate
- Users may request a new user certificate for use
on the TeraGrid from one of the following
Certificate Authorities - NCSA
- SDSC
- Instructions for acquiring and installing user
certificates for use on TeraGrid are provided at - http//www.teragrid.org/userinfo/guide_access_auth
_setup.html
66The /.globus directory
- The default location where a users private key
and certificate are installed - The directory in which Globus creates temporary
subdirectories and files to handle grid job
submission and file transfer
ls -la /.globustotal 24drwxr-xr-x 3
train00 train00 4096 Nov 17 1345 .drwx------
33 train00 train00 4096 Oct 17 2017
..-r--r--r-- 1 train00 train00 2703 Nov 17
1355 usercert.pem-r--r--r-- 1 train00 train00
1420 Nov 17 1350 usercert_request.pem-r--------
1 train00 train00 963 Nov 17 1350 userkey.pem
67User Certificate Information
- grid-cert-info -issuer -subject -startdate
-enddate/CUS/ONational Center for
Supercomputing Applications/CNCertification
Authority/CUS/ONational Center for
Supercomputing Applications/CNTraining
User00Jul 11 211605 2003 GMTJul 10 211605
2004 GMT - The certificates subject field is its
Distinguished Name (DN) - this is needed for
entry into the grid-mapfile
68Authentication using X.509 certificates
- Single Sign-On
- Users authenticate once by generating a proxy
certificate which is subsequently used whenever
authentication is required - Globus grid-proxy-init
- KX.509 (kinit kx509 kxlist -p)
- Unattended Authentication
- Users can escrow proxy certificates in a MyProxy
server for automated use later
69Globus grid-proxy-init
- Example Create a proxy that will last for 24
hourstg-login2/users/kericson
grid-proxy-init -hours 24Your identity
/CUS/ONPACI/OUSDSC/CNKate Ericson/USERIDkeric
sonEnter GRID pass phrase for this identity
Creating proxy ....................DoneYour
proxy is valid until Sat Nov 22 103840
2003tg-login2/users/kericson
grid-proxy-destroy
70KX.509/KCA Authentication
- KX.509/KCA (Univ. of Michigan)
- PSC employs Kerberos for site-wide user
authentication and can generate proxy
certificates for TeraGrid users as needed using
KX.509/KCA - Users authenticate to the Kerberos server, e.g.
- gt kinit myaccount_at_PSC.EDU
- Obtain a short term certificate from KCA service
- gt kx509
- Generate and install a Globus-compatible proxy
- gt kxlist -p
- This is equivalent to using Globus grid-proxy-init
71KX.509/KCA Authentication
bash-2.04 kinit dsimmel_at_PSC.EDU dsimmel_at_PSC.EDU's
Password bash-2.04 kx509 bash-2.04 kxlist
-p Service kx509/certificate issuer
/CUS/OPittsburgh Supercomputing Center/CNPSC
Kerberos Certification Authority subject
/CUS/OPittsburgh Supercomputing Center/OUPSC
Kerberos Certification Authority/CNdsimmel/UIDd
simmel/emailAddressdsimmel_at_PSC.EDU serial0F34
hasha31d5407 bash-2.04 grid-proxy-info subject
/CUS/OPittsburgh Supercomputing Center/OUPSC
Kerberos Certification Authority/CNdsimmel/USERI
Ddsimmel/Emaildsimmel_at_PSC.EDU issuer
/CUS/OPittsburgh Supercomputing Center/CNPSC
Kerberos Certification Authority identity
/CUS/OPittsburgh Supercomputing Center/OUPSC
Kerberos Certification Authority/CNdsimmel/USERI
Ddsimmel/Emaildsimmel_at_PSC.EDU type end
entity credential strength 512 bits path
/tmp/x509up_u17780 timeleft 95938
72Getting into the grid-mapfile
- The Globus grid-mapfile maps the Distinguished
Names (DNs) of users certificates to their
corresponding local user accounts - In future, installing ones certificate DNs on
all the TeraGrid systems will be simplified via
theTeraGrid User Portal (coming soon to a
browser near you) - For now, users must manually add their
certificates to the grid-mapfile at each site - At most TeraGrid sites, SSH to a login node and
execute - gx-map -interactive
- At PSC, access the DN management webpage at
- https//dirs.psc.edu/teragrid/userpage/
73MyProxy
- TeraGrid operates a MyProxy Server at
myproxy.teragrid.org - MyProxy client tools are used to install proxy
certificates onto the server for later retrieval - Use myproxy-init in place of grid-proxy-init
- Retrieve a stored proxy using myproxy-get-delegati
on - Grid portals commonly use MyProxy to facilitate
grid authentication tasks - MyProxy User Documentation
- http//grid.ncsa.uiuc.edu/myproxy/userguide.html
74MyProxy
myproxy-init -a -s myproxy.teragrid.orgYour
identity /CUS/ONational Computational Science
Alliance/CNJim BasneyEnter GRID pass phrase
for this identity Creating proxy
...........................................
DoneYour proxy is valid until Fri Sep 13
135256 2002Enter MyProxy Pass
PhraseVerifying password - Enter MyProxy Pass
PhraseA proxy valid for 168 hours (7.0 days)
for user jbasney now exists on
myproxy.teragrid.org. myproxy-get-delegation
-s myproxy.teragrid.org Enter MyProxy Pass
Phrase A proxy has been received for user
jbasney in /tmp/x509up_u500
75Security of User Credentials
- Identity Theft is rampant
- World-wide security incident affected TeraGrid
sites too - Passwords and keys for all users had to be
replaced - Continues to require vigilant monitoring
- Dangers of passwordless keys
- Yes, they are convenient (proxies are such but
are short-term) - Typically relies completely on filesystem
protections - Implicitly trusts all who can read files where
the private keys are kept, including system
administrators and operators - Please help us keep the TeraGrid secure for
everyone! - Pick strong passwords and passphrases
- Keep them secret private - do NOT share them
- Future directions - hardware tokens?
76User Authentication Summary
- All TeraGrid sites support SSH and GSI-based
authentication (GSI-SSH, Globus) - Some TeraGrid sites require SSH keys
- ANL, Caltech
- TeraGrid-approved Certificate Authorities provide
certificates for use on TeraGrid - X.509 user certificates enable single sign-on and
unattended program execution - Globus grid-proxy-init or KX.509 method, MyProxy
- Please protect your passwords and passphrases!
77TeraGrid User Portal (30 min.)
- Eric Roberts - TACC
- ericrobe_at_tacc.utexas.edu
78Outline
- Motivation
- What is a Portal?
- Characteristics of a User Portal
- Current TeraGrid User Portal Capabilities
- Future Directions
- Summary
- Demo
79Motivation
- Make joining the TeraGrid easier for users
- Single place for users to find user information
and get user support - Certain information can be displayed better in a
web page than in a command shell - Allow novice users to start using grid resources
securely through a Web interface - Increase productivity of TeraGrid researchers
do more science!
80What is a Portal?
- In general, a portal is a gateway to a set of
distributed services accessible from a Web
browser - Provides
- Aggregation of different services as a set of Web
pages - Single URL
- Single Sign-On
- Personalization
- Customization
81Characteristics of a User Portal
- A User Portal is a type of
- Portal
- Science Gateway
- Includes the following services
- Documentation Services
- Notification Services
- User Support Services
- Allocations
- Accounts
- Training
- Consulting
82Characteristics of a User Portal
- (contd)
- Collaborative Services
- Calendar
- Chat
- Resource sharing
- Information Services
- Resource
- Grid-wide
- Interactive Services
- Doesnt replace the command shell but provides a
simpler, alternative interface
83Service Aggregation
User Support Consulting
Notification User News
Collaborative Calendar Chat
Documentation User Guides
Information Resource Grid
Interactive Job Submission File Transfer
HTTP/SSL/SOAP
User Portal
HTTP/SSL
Client Browser
84Documentation Services
- Provide documentation for using the TeraGrid
- User Guides
- Help documentation
85Notification Services
- Notify users of events within the TeraGrid
- User news
- Message of the Day
- Downtimes
- Previous
- Scheduled
- Un-scheduled
86Message of the Day
87User Support Services
- Allocations
- Apply for and manage your TeraGrid allocations
- Accounts
- Create and manage TeraGrid accounts
- Training
- Important information about upcoming training
classes - Consulting
- Ask questions and get answers from TeraGrid staff
88User Support Services
89Collaborative Services
- Collaborate with other users through the portal
- Calendar
- Post and view important events (e.g. instrument
reservations, resource downtimes) - Chat
- Chat with other users who are logged into the
portal - Resource Sharing
- Share documents with the TeraGrid community
90Calendar Service
91Information Services
- Resource
- State information about individual resources
- Queue, Status, Load, OS Version, Uptime,
Software, etc.. - Grid software status
- Grid
- Grid-wide network preformance
- Aggregated capability
92System Information
93Grid Information
94Interactive Services
- Security
- Hidden from the user as much as possible
- Remote command execution
- File Management
- Upload
- Download
- Transfer between resources
- Job Submission to a single resource
- Job Submission to a grid meta-scheduler
- Composite Job Sequencing
95Manage Proxies
96Remote Command Execution
97Job Submission
98File Management
99Current TeraGrid User Portal Capabilities
- User Services
- Portal Account
- Information Services
- System
- Grid
- Network
- Interactive
- Remote Command Execution
- Job Submission
- File Management
100Future Directions
- Central gateway for TeraGrid services
- TeraGrid allocations and account
creation/management through portal - Streamline the process
- Application portals
- Science gateways that expose scientific
applications through interfaces
101Summary
- Currently the User Portal offers basic user,
informational and interactive services - Will extend current User Portal feature set in
the future - User Portal should be starting place for new
users - Ease the process of joining the TeraGrid
102Demonstration
- Login with MyProxy
- Demonstration of Services
- Documentation
- Notification
- User Support
- Collaborative
- Information
- Interactive
103Applications in the TeraGrid Environment Data
Collections and Databases (60 min.)
- Bill Whitson - Purdue
- wiw_at_purdue.edu
- Leesa Brieger SDSC
- leesa_at_sdsc.edu
104Data Collections
- Bill Whitson - Purdue
- wiw_at_purdue.edu
105Data on the TeraGridHelping fulfill the
cyberinfrastructure vision
- The NSF Blue-Ribbon Advisory Panel on
Cyberinfrastructure emphasized the importance of
multidisciplinary, well-curated federated
collections of scientific data to future research - TeraGrid provides
- Scientific data collections on distributed mass
storage systems - Storage and database services for new collections
- GridFTP- and SRB-based access to data by
computational grid via a very high-speed network
106Data Collections on TeraGrid
- Indiana
- CLIOH Cultural Digital Libraries Indexing Our
Heritage - CLSD Central Life Sciences Data
- Flybase and euGenes databases
- NCSA
- Astronomy databases
- Purdue
- PTO Purdue Terrestrial Observatory
- LARS Laboratory for the Application of Remote
Sensing
107Data Collections on TeraGrid (2)
- SDSC
- NVO National Virtual Observatory
- SCEC Southern California Earthquake Center
- NSDL K-12 Curriculum Web Sites
- TACC
- Worldwide hydrological data
- Telemetry data
- LIDAR terrain data
- High-res CT scans of geological and biological
specimens
108Storage and Database Services
- Mass storage
- NCSA, SDSC, TACC
- SRB
- Caltech, NCSA, SDSC
- Database services
- IU, SDSC, TACC
109Adding Data Collections
- Hosting services being defined
- datacentral.sdsc.edu
- Considerations for hosting data include
- Field of science, impact to domain
- Size, growth rate, lifetime of collection
- Methods of access/limitations on access
- Data collection or database
- For additional information about TeraGrid data
collections, see the Resource Overview - http//www.teragrid.org/userinfo/guide_hardware.ht
ml
110Databases
- Leesa Brieger SDSC
- leesa_at_sdsc.edu
- With Jesus Castegnetto and Qiao Xin, SDSC
111Databases on TeraGrid
- This session will present the database tools
- hosted at SDSC
- DB2 on Teragrid
- Using the DB2 client
- DB2 Java programming
- How to get a DB2 account
- Useful links
112DB2 on TeraGrid
- Database servers
- ds003.sdsc.edu
- 128 GB memory 2 TB SAN DB2
- ds005.sdsc.edu
- 128 GB memory DB2
- Database clients
- DB2 client installed on TeraGrid login nodes
113DB2 on TeraGrid
-
- Before the db2 client can connect to a
database on a given server, the server and its
databases must be registered (catalogued) once
from the host running the client.
114DB2 on TeraGrid
- connect to server
- Log on to TG login node using SSH
- ssh username_at_tg-login.sdsc.teragrid.org
- Setup db2 client
- soft add db2
- resoft
- Or add db2 to your /.soft file
- resoft
- cd /tutorial/db_example/sample
115DB2 on TeraGrid
- Three ways to use the DB2 client
- Interactive input mode
- db2
- db2gt(type SQL commands here)
- Command mode
- db2 SELECT FROM SCHEMA.TABLE
- Batch mode
- db2 tvf filename
- the file filename contains SQL commands
116Access the Databases
- List database directories
- db2 LIST DB DIRECTORY
- Database 1 entry
- Database alias
SC2004 - Database name SC2004
- Node name DS003
- Database release level a.00
- Comment
- Directory entry type
Remote - Catalog database partition number -1
117Access the Databases
- Connect to database
- db2 CONNECT TO SC2004 USER username
- Enter current password for username (type
your password here) - Show tables
- db2 LIST TABLES
- Table/View Schema Type
Creation time - ------------------------------------ -----
-------------------------- -
- 0 record(s) selected .
118Create New Tables
- Create table
- db2 CREATE TABLE CAPITAL(STATE VARCHAR(20)
NOT NULL, CAPITAL VARCHAR(20) NOT NULL,
LARGEST_CITY VARCHAR(20) NOT NULL) - Show table info
- db2 LIST TABLES
- Table/View Schema
Type Creation time - -------------------------------
--------------- ----- -------------------------- - CAPITAL QIAO
T 2004-09-30-10.51.12.382651
119More Info on Table
- Find table info
- db2 DESCRIBE TABLE CAPITAL
- Column Type Type
- name schema name
Length Scale Null - ------------------------------ ---------
------------------ -------- ----- ------ - STATE SYSIBM
VARCHAR 20 0 No - CAPITAL SYSIBM
VARCHAR 20 0 No - LARGEST_CITY SYSIBM VARCHAR
20 0 No - 3 record(s) selected.
120Moving Data into Databases
- Three ways to move data into databases
- Insert
- Import
- Load
-
- (Note load has partitioned load options to move
data in parallel and it needs LOAD privilege. We
will talk about insert and import here.)
121Import Data into Databases
- Import data
- db2 IMPORT FROM data.txt OF DEL INSERT INTO
CAPITAL - Check table
- db2 select from capital
- STATE CAPITAL
LARGEST_CITY - -------------------- --------------------
-------------------- - Alabama Montgomery
Birmingham - Alaska Juneau
Anchorage -
- Wisconsin Madison
Milwaukee - 49 record(s) selected.
-
122Import Data Format
- File data.txt
- "Alabama", "Montgomery", "Birmingham"
- "Alaska", "Juneau", "Anchorage"
- "Arizona", "Phoenix", "Phoenix"
- "Arkansas", "Little Rock", "Little Rock"
- "California", "Sacramento", "Los Angeles"
- "Colorado", "Denver", "Denver"
- "Connecticut", "Hartford", "Bridgeport"
- "Delaware", "Dover", "Wilmington"
- "Florida", "Tallahassee", "Jacksonville"
- . . .
123Insert Data into Databases
- Insert data
- db2 INSERT INTO CAPITAL VALUES
- ( Wyoming, Cheyenne, Cheyenne )
- Check table
- db2 SELECT FROM CAPITAL WHERE
STATEWyoming - STATE CAPITAL
LARGEST_CITY - -------------------- --------------------
-------------------- - Wyoming Cheyenne
Cheyenne - 1 record(s) selected.
124Databases Queries
- Find the largest city for the state Iowa
- db2 SELECT LARGEST_CITY FROM CAPITAL
WHERE STATEIowa - Find states whose name looks like south
- db2 SELECT FROM CAPITAL WHERE STATE
LIKE South - Display rows in descending order where state
names include the letter I - db2 SELECT FROM CAPITAL WHERE STATE
LIKE i ORDER BY STATE DESC
125More DB2 Commands
- Delete all rows from the table
- db2 DELETE FROM CAPITAL
- Drop table
- db2 DROP TABLE CAPITAL
126Using Batch Mode
- table.cmd
- -- create table
- create table capital(
- state varchar(20) not null,
- capital varchar(20) not null,
- largest_city varchar(20) not null
- )
- --import data
- import from all_data.txt of del insert into
capital - --check table data
- select from capital
127Using Batch Mode
- Run the SQL commands in table.cmd
- make sure earlier examples of table CAPITAL have
been deleted - db2 DROP TABLE CAPITAL
-
- db2 tvf table.cmd
-
128Java Programming for DB2
- Construct a program which handles a whole
- set of DB2 queries and DB management -
- in a java program.
- Next example -
- cd /tutorial/db_example/acronym
- README.txt in this directory contains detailed
information.
129Acronyms Example
- First do some queries to get an idea of the data
- Find info about the table
- db2 DESCRIBE TABLE JESUS.ACRONYMS
- Column Type Type
- name schema name
Length Scale Nulls - ------------------------------ ---------
------------------ -------- ----- ------ - ACRONYM SYSIBM VARCHAR
10 0 No - DEFINITION SYSIBM VARCHAR 120
0 No - Look at contents of table ACRONYMS
- db2 SELECT FROM JESUS.ACRONYMS
- (Note the use of since we have in the
SQL command.)
130Java Programming
- Type 4 Driver
- No DB2 client required
- Can be used to create both Java application
- and applets
- Communicate directly with the database
- Type 2 Driver
- Rely on DB2 client to connect to the server
- Can not create Java applet
- (We will use type 4 driver here.)
131Java Programming
- GetAcronym.java - implements the DB2-style query
- db2 SELECT ACRONYM,DEFINITION FROM
- JESUS.ACRONYMS WHERE ACRONYM EMACS
- which finds the definition for acronym EMACS.
- GetAcronym.java takes as argument the acronym
- youre searching for.
132Java Programming Steps
- Import Java package java.sql.
- Load the appropriate JDBC driver
- Connect to the database
- Pass SQL statement to the database
- Get the results from the database
- Process the results
- Close the connection
133Java Programming
- GetAcronym.java
- / 1. import the java packages /
- import java.sql.
- import java.util.
- / 2. load the type 4 driver /
- public class GetAcronym
- static
- try
- Class.forName("com.ibm.db2.jcc.DB2
Driver") - catch (ClassNotFoundException
e) - System.err.println (" Unable
to load DB2 driver \n e.getMessage()) - e.printStackTrace()
- System.exit(1)
-
- (contd)
134Java Programming
- GetAcronym.java(contd)
- public static void main(String args)
- / check argument /
- if (args.length ! 1)
- System.out.println("Missing parameter An
acronym to search for is needed") - System.exit(1)
-
- / get value for acronym, database, user,
passwd and query / - String acronym args0.toUpperCase()
- String dbURL "jdbcdb2//ds003.sdsc.edu60035/
SC2004" - String dbUser System.getProperty("user")
- String dbPassword System.getProperty("password
") - String query select definition from
jesus.acronyms where acronym?" - (contd)
135Java Programming
- GetAcronym.java (contd)
- try
- / 3. connect to database /
- Connection conn DriverManager.getConnection(
dbURL, dbUser, dbPassword) - / 4. prepare and execute statement /
- PreparedStatement stmt conn.prepareStatemen
t(query) - stmt.setString(1, acronym)
- ResultSet rs stmt.executeQuery()
- / 5. get back resultset /
- ArrayList descnew ArrayList()
- while (rs.next())
- desc.add(rs.getString("definition").trim
()) - (contd)
136Java Programming
- GetAcronym.java(contd)
-
- / 6. process and print out the resultset /
- if (desc.size() 0)
- System.out.println("The acronym " acronym
" is not in our database") - else
- String out "Description(s) found for
acronym " acronym "\n\n - for (int i0 i lt desc.size() i)
- out "" String.valueOf(i 1) "
" - out (String) desc.get(i) "\n"
-
- System.out.print(out)
-
- (contd)
137Java Programming
- GetAcronym.java(contd)
- / 7. close resultset, statement and connection
/ - rs.close()
- stmt.close()
- conn.close()
- catch (SQLException e)
- System.err.println("SQL Exception "
e.getMessage()) - e.printStackTrace()
-
-
138Java Programming
- Using GetAcronym.java
- Compile the program
- javac GetAcronym.java
- Run
- java Duserusername Dpasswordpassword
GetAcronym acronym - You dont really want to put your password on the
- command line, so use an alternative approach
139Java Programming
- A shell script that prompts for the appropriate
information and runs the java program - ./find_acronym.sh acronym
140Java Programming
- GetAcronym2.java - a few more functionalities
built into this program - Find definition for acronyms containing ASA
- db2 SELECT FROM JESUS.ACRONYMS WHERE
ACRONYM LIKE ASA - Display acronyms whose definition include Array
in descending order - db2 SELECT FROM JESUS.ACRONYMS WHERE
DEFINITION LIKE Array ORDER BY ACRONYM DESC - Takes a search argument
141Java Programming
- Scripts for running GetAcronym2
- Examples
- ./acronym ASA
- ./acronym_like ASA
- ./acronym_with_definition As
- Refer to README.txt for detailed info.
142Databases on TeraGrid
- How to get an account on SDSCs Datastar?
- Hosting temporary DB2 databases
- http//datacentral.sdsc.edu/allocation_short_te
rm.html - Hosting long-term DB2 databases
- http//datacentral.sdsc.edu/allocations.html
143Databases on TeraGrid