Title: Grid Computing Using Modern Technologies
1Grid Computing Using Modern Technologies
- A 3-Part Tutorial presented by
- Mary Thomas
- Dennis Gannon
- Geoffrey Fox
2Tutorial Outline
- Part I Mary Thomas (TACC/UT)
- Understanding Web and Grid portal technologies
- Building application portals with GridPort
- Grid Web services
- Part II Dennis Gannon (Indiana)
- Distributed Software Components
- Grid Web services and science portals for
scientific applications - Part III Geoffrey Fox (Indiana)
- Integrating Peer-to-Peer networks and web
services with the Grid - Grid Web services and Gateway
3Introduction to Developing Web-BasedGrid
Computing Portals
- Mary Thomas
- Texas Advanced Computing Center
- The University of Texas at Austin
- and
- NPACI
- Presented at GGF4, Toronto, Canada, Sunday,
2/15/02
4Goals of Part 1
- Introduce basic portal technologies and concepts
- Provide enough knowledge to go out and begin
process of evaluating/understanding technologies - Provide enough knowledge to build a computing
portal based on GridPort/Perl/CGI - Not going to teach you how to install Grid
software assume you can do this is you have
someone who takes care of this - Audience application developers or scientists
5Approach
- Introduce the concept of a portal for
computational Grids and science applications - Use GridPort Toolkit to demonstrate the basic
concepts needed to understand how to use the web
and grid technologies - Show examples of how to program a portal
6Outline
- Defining Portals and Web Technologies
- The GridPort Portal Architecture
- GridPort-Based Portals
- Application Portals
- Programming Example
- Portal Services for Remote Clients
- Client Portal Services
- Grid Portal Web Services
7Defining The Web
8What is the Web?
9What is the Grid Web?
10Web Server Technologies
- Web Servers
- Run on a machine, and clients access the process
- Common Versions
- Netscape (http//www.netscape.com)
- Apache (http//www.apache.org) - open source
- OSs Windows, Unix, MacIntosh, Lynus, etc.
- Web Programming Languages
- Server Java, Javascript, Python, PHP, Perl
- Client HTML, Javascript
- Protocols HTTP/CGI/Servlets/Applets
- Security
- HTTPS, SSL, Encryption
- Cookies
- Certificates
11Web Clients
- Multiple display devices
- Desktop workstations, PCs, PDAs, cell phones,
pagers, other wireless devices, televisions - Various viewing tools
- Browsers Internet Explorer, Netscape Navigator,
Opera - Visually Impaired tools
- OSs Windows/WinCE, Unix, Mac, Lynux, Palm, etc.
- Web Programming Languages
- HTML, Javascript
- Perl, Java for scrapers
- Security
- HTTPS, SSL, Encryption, Cookies
- Certificates
12Defining ComputationalWeb Portals
13What is a Portal?
- Web sites that provide centralized access to a
set of resources - Characterized by
- Personalization
- Security/authentication/authorization
- What you see often changes based on what you are
looking for (e.g. adds) - Navigation/choices
- Gateway for Web access to distributed
resources/information - Hub from which users can locate all the Web
content that they commonly need.
14Classes of portals
- Horizontal or mega-portals
- information from search engines and the ISP's
(yahoo) - everybody comes in, sees the same thing
- allow personalization to some degree
- Vertical
- portals that are customized by the system.
- the system recognizes who you are, and gives you
a different view of the university or the company
that you're going to build. - More specialized (amazon, travelocity, etc.)
- Intranet
- inside a company that give particular people the
information that they need
15Scientific Web Portals
- Differ from Commercial Portals (yahoo, amazon)
- Types of Science Portals
- User Portals
- simplify users ability to interact with and
utilize a complex, often distributed environment - direct access to resources (compute, data,
archival, instruments, and information) - Application Interfaces
- Enables scientists to conduct simulations on
multiple resources - EOT Portals
- Educates public (future scientists?) about
science using software simulations,
visualizations, etc - Learning tools
- Individual Portals
- Users can roll out their own portals by writing
web pages using standard HTML or Perl/CGI
16Why Use Portals for Computational Science?
- Computational science environment is complex
- Users have access to a variety of distributed
resources (compute, storage, etc.). - Interfaces, OSs, Grid tools to these resources
vary and change often - Environment changes
- Relocation/upgrade of binaries
- Policies at sites sometimes differ, allocations
change - Using multiple resources can be cumbersome
- Grid adds complexity for programmers
17(No Transcript)
18Portals Provide Simple Interfaces
- Portals are web based and that has advantages -
- Users know understand the web
- Can serve as a layer in the middle-tier
infrastructure of the Grid - Integrate various Grid services and resources
- Users can be isolated from resource specific
details - Single web interface isolates system
changes/differences - Not and end-all solution - several
issues/challenges here - Performance, scaleability
19Virtual Organizations
- GGF Model, based on Foster paper
- Anatomy of the Grid
- Hierarchical tree based
- Each node represents collections of
- Compute resources
- Projects,
- Centers
- HotPage portals represent VOs
- NPACI, PACI
- Like to build HiPCAT as a VO to run as DTF node
20Virtual Organizations (HotPage)
NSF
HiPCAT
PACI
DTF
NPACI
PSC
Alliance
SDSCI
TACC
Mich
21Portal Toolkits
- Commercial
- Sun Java Servlets, iPlanet
- IBM WebSphere
- MSFT .NET
- Special interest groups
- uPortal, Javaspeed
- RD within Grid community
- GridPort Toolkit (http//gridport.npaci.edu)
- GPDK ()
- GirdSphere (refer to ACM site???)
- Gateway (Fox)
- CCA (Gannon)
22Building Application Portals Using theGridPort
Toolkit
23GridPort Architecture
24GridPort Toolkit Design Concepts
- Key design goals
- Any site should be able to host a user portal
- Any user should be able to create their own user
portal if they have accounts and certificate - Key Requirements
- Base software design on infrastructure provided
by World Wide Web - use commodity technologies wherever possible
- avoid shell programs/applications/applets
- GridPort Toolkit should not require that
additional services be run on the HPC Systems - reduce complexity -- there are enough of these
already - so, leverage existing grid research development
- GSI certificate (PKI)
25GridPort Designed for Ease of Use
- WWW interface is common, well understood, and
pervasive - User Portals can be accessed by anyone who has
access to a web browser, regardless of location - Users can construct customized application web
pages - only basic knowledge of HTML and Perl/CGI
- Application programmers can extend the set of
basic functions provided by the Toolkit - Portal services hosts can modify support services
by adding/remove/modifying broker or grid
interface codes
26Commodity Web Technologies
- Use of commodity web technologies -gt Portability
- contribute to a plug-n-play grid
- Requirements
- Any Browser Communicator, IE
- HTTP, HTTPS, SSL, HTML/JavaScript, Perl/CGI, SSH,
FTP - Netscape or Apache servers
- Based on simple technology, this software is
easily ported to, and used by other sites. - easy to modify and adapt to local site policies
and requirements - Goal is to design a toolkit that is simple to
implement, support, port, and develop
27Grid Technologies
- Security
- Globus Grid Security Infrastructure (GSI), SSH
- Myproxy for remote proxies
- Job Execution
- Globus GRAM
- Information
- Globus Grid Information Services (GIS)
- File Management
- SDSC Storage Resource (SRB)
- GridFTP
28Information Services
- Designed to provide a user-oriented interface to
NPACI Resources and Services - Consists of on-line documentation, static
informational pages, and links to events within
NPACI, including basic user information such as - Documentation, training , news, consulting
- Simple tools
- application search
- systems information
- generation of batch scripts for all compute
resources - Network Weather System
- No user authentication is required to view this
information.
29Information Services Dynamic
- Dynamic information provided by automatic data
collection scripts that provide real-time
information for each machine (or summaries) such
as - Status Bar displays live updates showing
operational status and utilization of all NPACI
resources - Machine Usage displays summary of machine
status, load, and batch queues - Batch Queues displays currently executing and
queued jobs - Node Maps displays graphical map of how running
applications are mapped to nodes - Network Weathering System provides connectivity
information between a users local host and grid
resources - Pulled from 3 possible sources
- MDS, web services, local cron jobs
30Web Server to Grid Resource
31Interactive Sessions
- How do they work?
- What do they do
- Job submission
- File management
- Authentication
- What do we use to do them?
- List of grid technologies
32GridPort Interactive Services Diagram
33Interactive Sessions Login/Logout
- Login
- client browser connects to an HTTPS server
- user enters valid NPACI Portal account ID
- login into Portal using CA authentication
(Globus) - Globus infrastructure manages user accounts on
remote hosts - username used to map passphrase to valid key and
certificate (local repository), or Myproxy
(remote) - passphrase used to create proxy cert. using
globus-proxy-init - if proxy-init successful, session key stored on
client browser - data passed through web server over SSL channel
- Session info stored in secure, local repository
34Interactive Sessions Login/Logout
- Logout
- user automatically logged out
- if logout selected
- session times out
- on logout
- active session data files cleared
- relevant user information archived for future
sessions stored
35Grid Security at all Layers
- GSI authentication for all portal services
- transparent access to the grid via GSI
infrastructure - Security between the client -gt web server -gt
grid - SSL/RC4-40 128 bit key/ SSL RSA X509 certificate
- authentication tracked with cookies coupled to
server data base/session tracking - Single login environment (another key goal)
- provide access to all NPACI Resources where GSI
available. - with full account access privileges for specific
host - use client browser cookies to track state
36Portal Accounts
- Portal accounts are not the same as resource
accounts. - valid Grid user on resource, need allocations
- processes run under own account with same access
and privileges as if they had logged onto
resource - Portal users must have a digital certificate
signed by a known Certificate Authority (CA) - And must get DN into mapfile
- Accounts for NPACI users obtained via an on-line
web form - Can generate a certificate - certificate and key
are placed in a secure repository
37Interactive Sessions Job Execution
- Web server transactions
- confirm/authenticate user login status
- parse command/request (CGI vars)
- establish user environment
- assemble remote command (Globus/SSH)
- verify proxy (if Globus) or recreate
- send command (e.g., Globus daemon on remote
host) - parse, format, and return results to the web
browser on the users workstation or store data
(e.g., FTP). - While the user login is in the active state
- check for timeout
- track current state
- record information about job requests and user
data for use in subsequent transactions or
sessions.
38GridPort File System
- Without SRB capabilities, files are distributed
- Adds to complexities when migrating and managing
data
39GridPort SRB Architecture
- With SRB capabilities, file access is direct
- Single SRB account access allows for more
flexible data management
40Variety of GridPort Applications
- Current applications in production
- NPACI/PACI HotPages (also _at_PACI/NCSA )
- https//hotpage.npaci.edu
- LAPK Portal Pharmacokinetic Modeling (live demo
of Pharmacokinetic Modeling Portal) - https//gridport.npaci.edu/LAPK
- GAMESS (General Atomic and Molecular electronic
Structure System) - https//gridport.npaci.edu/GAMESS
- Telescience (Ellisman)
- https//gridport.npaci.edu/Telescience
- Protein Data Bank CE Portal (Phil Bourne)
- https//gridport.npaci.edu/CE
41Programming Example Job Submit
- Client
- Example of Client HTML page
- HTML Code
- Server
- Perl/CGI parser script running on server
- GridPort Toolkit function code
42HotPage View Job Submission
43JobSubmit Web Page
44JobSumbit HTML Code
- ltFORM action"https//hotpage.npaci.edu/tools/cgi-
bin/job_submit.cgi" - methodpost enctype"application/x-www-form-
urlencoded" name"job_submit"gt - Arguments ltINPUT TYPE"text" NAME"args"gt
- Select Queue ltSELECT NAME"queue"gt
- ltOPTION VALUE"low"gtlow
- ltOPTION VALUE"normal"gtnormal
- ltOPTION VALUE"high"gthigh
- ltOPTION VALUE"express"gtexpress
- lt/SELECTgt
- Number of Cpus ltINPUT TYPE"text" NAME"cpus"gt
- Max Time (min) ltINPUT TYPE"text"
NAME"max_time"gt - ltINPUT TYPE"hidden" NAME"mach" VALUE"SSPHN"gt
- ltINPUT TYPE"hidden" NAME"exe"
VALUE"/rmount/paci/sdsc/mthomas/mpi_pi"gt - ltINPUT TYPE"submit" METHOD"post"
ACTION"https//hotpage.npaci.edu/tools/cgi-bin/jo
b_submit.cgi" gt - lt/FORMgt
45JobSumbit Server Perl/CGI Parser
- GRABS HTTP/CGI data and sends it to GridPort
subroutine, waits for results - !/usr/local/bin/perl
- use CGI qw(all)
- my query new CGI
- 1
- BEGIN
- GET THE SCRIPTS LOCATION AND THE GLOBAL
VARS - MY_LOCATION "tools/cgi-bin"
- CURRENT_DIR pwd
- (PORTAL_ROOT, rest) split(/MY_LOCATION/,
CURRENT_DIR) - GLOBAL_VARS_CONFIG PORTAL_ROOT .
"cgi-bin/global_vars.cgi" - require "GLOBAL_VARS_CONFIG"
- require "PORTAL_HOME_DIR/cgi-bin/hotpage_authe
n.cgi"
46JobSubmit Server Perl/CGI code (cont.)
- load in code to do job submission through
globus - require "GRIDPORT_HOME_DIR/services/globus/cgi-bi
n/gridport_globus_job.cgi" - subroutines to get/set user directories
(home,work, current) and do job handling - require "PORTAL_HOME_DIR/tools/cgi-bin/user_dirs.
cgi" - require "PORTAL_HOME_DIR/tools/cgi-bin/user_jobs.
cgi" - my args query-gtparam(args)
- my queue query-gtparam(queue)
- my cpus query-gtparam(cpus)
- my max_time query-gtparam(max_time)
- mach query-gtparam(mach)
- my exe query-gtparam(exe)
- exe exe . " args"
- run the command through Globus, trap output,
return to caller process - _at_output gridport_globus_job_submit(mach,cpus,6
0,exe,max_time,queue)
47gridport_globus_job_submit
- sub gridport_globus_job_submit
- my _at_job ()
- my user get_username()
- get the input and set up globus
- my (mach, cpus, timeout, exe,
max_cpu_time, queue) _at__ - globus_config(user) verify data
- mach_config(mach) verify data
- build the globus command
- my globus_submit "globus_job_submitmachi
nesmachgv " - globus_submit . "machinesmachnamejob
-np cpus -queue queue " - globus_submit . "-maxtime max_cpu_time
exe" - _at_job run_command_timeout(globus_submit,
timeout) run job - return _at_job
48Laboratory for Applied Pharmacokinetics
- (LAPK) Portal
- Users are Doctors, so need extremely simple
interface - Must be portable run from many countries
- Need to hide details such as
- Type of resources (T3E), file storage, batch
script details, compilation,UNIX command line - Major Success
- LAPK users can now run multiple jobs at one time
using portal. - Not possible before because developers had to
keep codes scripts simple enough for doctors to
use on T3E
49Laboratory for Applied Pharmacokinetics
- Uses gridport.npaci.edu portal services/capabiliti
es - File upload/download between local
host/portal/HPC systems - Job Submit
- submission (builds batch script, moves files to
resource, submit jobs) - Job tracking in the background portal tracks
jobs on system and moves results back over to
portal storage when done - Job cancel/delete
- Job History maintains relevant job information
50LAPK Job Submit and Job History
51Portal Services for Remote Clients
52Portal Services
- How does one convert/modify existing
applications? - Can develop your own version of GridPort
- Can install GridPort or other toolkits
- Remote clients are typically browsers accessing
local portal (HotPage) - Application website located on ANY server
- either on local filespace/system where webserver
and GridPort toolkit installed - Or, running on a remote machine
- Want to allow remote users to have control over
access, display, interactions, etc - Need for a new service model
53Remote Portal Services
- Must be Grid based
- Critical to support GSI
- 2 current solutions (GridPort)
- GridPort Client Toolkit
- Allows client to use simple HTML to build remote
web pages limited to HTML/FORMS, CGI/JSP model - Grid Portal Web Services
- supports variety of clients
- Can be an application
- can be a portal server program
- Can be another web service
54Remote Portal Running on Laptop
55GridPort Client Toolkit
- Focus on medium/small applications and
researchers - Not all app. Scientists can afford to hire web
team - Base on simple protocols (HTTPS/CGI/Perl)
- Could use applets or JSP
- Connection to portal services is through the GCT
- GridPort Client Toolkit
- https//portals.npaci.edu/client/tools/FUNCTIONS
- Inherits all existing portal services running on
portal - Limited job functions, but concept works and is
needed - An Experiment in progress
- not production yet
56GridPort Client Toolkit
- Ease of use
- Do not have to install complex code to get
started - webservers, no Globus, no SSH, no SSL, no PKI,
etc. - Do not have to write complex interface scripts to
access these services (weve done that already) - Do not have to fund advanced web development
teams - Client has local control over project, including
filespace, etc. - Integration to existing portals can be done
- Bays to Estuaries project
57How Does GCT Work?
FORM/CGI action
user_job_submit.html
https//portals.npaci.edu/client/tools/auth/jobsub
mit.cgi
Job_submit.html
HREF job_error.html
HREF user_job_submit.html
58Services Implemented in GCT
- Authentication
- Login
- Logout
- Check authentication state
- Jobs
- Sumbit jobs to queues
- Cancel jobs
- Execute commands (command like interface)
- Files
- Upload from local host
- Download to local host
- FTP move FILE
- View Portal FILEpace (?)
- Commands
- Pwd
- Cd
- Whoami
- Etc.
59Basin, Bays to Estuaries (BBE) Portal
- Community model scientific portal for conducting
multi-model Earth System Science (ESS) - Simulations are run to forecast the transport of
sediments within the San Diego Bay area during a
storm. - Technology developed for the BBE project
- Website located on BBE webserver/machine
- Uses SRB for file management (GSI)
- Perl/CGI
- Uses GCT for all interactive functions
- minimal effort required to modify code
- roughly 14 tests needed to integrate GCT
- four new perl scripts required
60Basin, Bays to Estuaries (BBE) Portal
61Grid Portals the Problem
- Example portal or applications need to perform
grid tasks for any arbitrary user, on any
arbitrary resource, and span all layers of the
grid - portals must be aware of resources (use GIS)
- What grid services are running on that resources
- Globus/Legion/VegaGrid/SSH, etc
- GIS
- GSI/Kerberos, MyProxy
- Request syntax differs for each resource
- GRAM/Legion/SSH/MAUI/PBS/Others
- Portal must have permission to use/access for
user (GSI, MyProxy)
62GridPort Web Services
63Grid Portals Complexity Grows
- Growth of Grid presents a huge complexity problem
for developers and systems that does not scale - Portals interact with/integrate all layers
- GIU/Client interface
- Uses all middleware services (Globus, SRB,
GSI-FTP, etc.) - Each portal in the world must store and configure
same data - Repeated data, open to errors, variations
- Multiple programmers repeating same tasks and
implementations - Much portal software is hard-coded and not
dynamic - The Grid is international, need for scaleable,
interoperable services - Too much hard-coding needed at this time (big
issue for Portals)
64Web Services a Proposed Solution
- Web services architecture provides mechanisms for
- dynamic service discovery (UDDI)
- Separation of implementation from function (WSDL)
- Know protocol (SOAP/HTTP, SOAP/RPC)
- Service provider encapsulates implementation
details - Client does not need to know details, just where
to send the request - Challenge will be discovery ? problem with
Jini/CORBA - Commercial world developing web services
technologies in P2P world - Implies funding/support
- rapid development/technology advancement
- Caution this does NOT imply cohesiveness or
standards - Note in some ways, Globus/GRAM is a web service
- Advantage language independent, so can run on
any system - We are pursuing Perl, Python, Java, C at this
time
65Proposed Web Service Architecture
- Adopt W3C standards for
- WSDL 1.1 current standard (note uses old XML
version) - SOAP 1.1 over HTTP/HTTPS/GSI for authentication
- Java 1.3 or greater
- Python 2.0 or greater
- XML schemas for language/description
- Explore UDDI 2.0/WSIL
- Require
- GSI
- Adopt anatomy of the Grid model
- virtual organizations
- Portals be built as services in addition to
applications
66Portals Web Services Architecture
67Web Services Example Job Submit
68ApplicationPortals
Grid Web Services
Grid ServicesCollective and Resource Access
Resources
Clients
Job Submission / Control
XML / SOAP over Grid Security Infrastructure
Grid Protocols and Grid Security Infrastructure
Grid Protocols and Grid Security Infrastructure
Discipline / Application SpecificPortals (e.g.
SDSCTeleScience)
http, https
CORBA
File Transfer
GRAM
Data Management
Condor-G
Monitoring
ProblemSolvingEnvironments(AVS, SciRun,Cactus)
SRB/MCAT
Events
Web Browser
GridFTP
Replica Catalog / Management
EnvironmentManagement(LaunchPad,HotPage)
Credential Management
GridMonitoringArchitecture
Workflow Management
Grid X.509CertificationAuthority
- other services
- visualization
- interface builders
- collaboration tools
- numerical gridgenerators
- etc.
MPI
compositionframeworks
Secure, ReliableGroup Comm.
GridInformationService
Python, Java, Perl, etc., JSPs format to html
CoG Kits implementing Web Services in servelets,
servers, etc.
Grid Web ServiceDescription (WSDL) Discovery
(UDDI)
Apache TomcatWebSphereCold FusionJVM
servlet instantiation routing
Apache SOAP,.NET, etc.
69GridPort Team
- GridPort Project represents collaboration efforts
spanning TACC, SDSC, NPACI - Mary Thomas, Rich Toscano (TACC)
- Steve Mock, Maytal Dahan, Cathie Mills, Kurt
Mueller (SDSC) - And input from other Institutions
- Argonne/ISI Globus development team
- NCSA/Alliance
- NASA/IPG
- GGF/GCE Interoperable Web Services Testbed
70References
- www.w3c.org
- www.soaplite.com
- www.apache.org
- http//www-106.ibm.com/developerworks/webservices/
- GridPort Toolkit Contact Mary Thomas
(mthomas_at_tacc.utexas.edu) - https//gridport.npaci.edu
- HotPage User Portals
- https//hotpage.npaci.edu, https//hotpage.paci.or
g - Downloads
- http//gridport.npaci.edu/download
- GridPort Toolkit, NPACI HotPage, GCT Portal
(frames based) - GGF/GCE website http//www.computingportals.org