Title: Introduction to Grid Computing and the Globus Toolkit
1Introduction to Grid Computing and the Globus
Toolkit
- Jennifer M. Schopf
- Argonne National Lab
- National eScience Centre
2Some Background Questions
- How many people have heard of Web Services?
- How many people have heard of Grids or Grid
Computing? - How many people have heard of Globus?
3Some Background Questions
- How many people have heard of Web Services?
- How many people have heard of Grids or Grid
Computing? - How many people have heard of Globus?
- How many of you could explain any of these to
your date at a dinner party?
4Overview
- What is a Grid
- What does the Globus Toolkit do?
- Security
- Monitoring
- Resource Management
- Data Management
- Example Application Grid3
- Conclusions
5What is a Grid?
- Resource sharing
- Computers, storage, sensors, networks,
- Sharing always conditional issues of trust,
policy, negotiation, payment, - Coordinated problem solving
- Beyond client-server distributed data analysis,
computation, collaboration, - Dynamic, multi-institutional virtual orgs
- Community overlays on classic org structures
- Large or small, static or dynamic
6Why call it a Grid?
- Model this approach after a power grid
- When I need to run a toaster, I dont care where
the power comes from - Coal, wind, dynamo, etc
- I just plug my toaster into the wall, and I have
power! - Vision for Computational Grids
- I dont care where my compute cycles are coming
from, I just need to run my application - No, were not there yet
7Not A New Idea
- Late 70s Networked operating systems
- Late 80s Distributed operating system
- Early 90s Heterogeneous computing
- Mid 90s Metacomputing, or parallel distributed
computing - Then the Grid Foster and Kesselman, 1999
8Why is this hard/different?
- Lack of central control
- Cannot dictate what runs on a resource, how or
when - Different policies at different sites
- Heterogeneity is everywhere
- Shared resources
- Contention, variability
- Communication
- Different sites implies different sys admins,
users, institutional goals, and often strong
personalities
9So why do it?
- Computations that need to be done with a time
limit - Data that cant fit on one site
- Data owned by multiple sites
- Applications that need to be run bigger, faster,
more
10What Kinds of Applications?
- Computation intensive
- Interactive simulation (climate modeling,
financial markets) - Very large-scale simulation and analysis (galaxy
formation, gravity waves, battlefield simulation,
business models) - Engineering (parameter studies, linked component
models) - Data intensive
- Experimental data analysis (high-energy physics)
- Image and sensor analysis (astronomy, climate
study, ecology) - Distributed collaboration
- Online instrumentation (microscopes, x-ray
devices, etc.) - Remote visualization (climate studies, biology)
- Engineering (large-scale structural testing,
chemical engineering) - In all cases, the problems were big enough that
they required people in several organization to
collaborate and share computing resources, data,
instruments.
11How Do Grids Relate to Web Services? Both are
Service-Oriented Architectures!
- Idea is simple (and old)
- Define remote activities in terms of interface
and behavior, not implementation - Devil is in the details
- How to describe, discover, access, various type
of service (semantically practically) - Grids Web services
- Broad adoption, flexible XML-based model
- Standards including WSDL, SOAP, WS-Security
- Interfaces still being defined to date
- Performance challenges
12Grid and Web Services Convergence
- The definition of WSRF means that the Grid and
Web services communities can move forward on a
common base.
13Summary of Introduction
- Applications have grown beyond what a single
resource can handle - Scientists are using Grids to address this need
but its hard! - The Globus Toolkit can add a level of indirection
and some standard tools to help - Grids are now using Web services now and have
broader acceptance because of this
14Overview
- What is a Grid
- What does the Globus Toolkit do?
- Security
- Monitoring
- Resource Management
- Data Management
- Example Application Grid3
- Conclusions
15What Is the Globus Toolkit?
- Collection of solutions to common problems when
building collaborative distributed applications. - A set of basic Grid services
- Job submission/management
- File transfer (individual, queued)
- Database access
- Data management (replication, metadata)
- Monitoring/Indexing system information
- A Grid development environment for your own
services - Building blocks for WSRF-compliant Web Services,
including security infrastructure - Tools and Examples
- The prerequisites for many Grid community tools!
16Globus IsStandard Plumbing for the Grid
- Not turnkey solutions, but building blocks and
tools for application developers and system
integrators. - Some components (e.g., file transfer) go farther
than others (e.g., remote job submission) toward
end-user relevance. - Since these solutions exist and others are
already using them (and theyre free), its
easier to reuse than to reinvent. - And compatibility with other Grid systems comes
for free!
17How it Really Happens
ComputeServer
SimulationTool
ComputeServer
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
ChatTool
DataCatalog
Database service
CredentialRepository
Database service
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
18How it Really Happens(without the Globus Toolkit)
ComputeServer
A
SimulationTool
ComputeServer
B
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
C
ChatTool
DataCatalog
Database service
D
CredentialRepository
Database service
E
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
19How it Really Happens(with the Grid)
ComputeServer
GlobusGRAM
SimulationTool
ComputeServer
GlobusGRAM
WebBrowser
Portal/ CHEF
Globus IndexService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
GlobusOGSA- DAI
CHEF ChatTeamlet
GlobusMCS/RLS
Database service
GlobusOGSA DAI
MyProxy Cred. Rep.
Database service
GlobusOGSA DAI
CertificateAuthority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
20Why Grid Security is Hard
- Resources being used may be valuable the
problems being solved sensitive - Resources are often located in distinct
administrative domains - Each resource has own policies procedures
- Set of resources used by a single computation may
be large, dynamic, and unpredictable - Not just client/server, requires delegation
- It must be broadly available applicable
- Standard, well-tested, well-understood protocols
integrated with wide variety of tools
21Security Tools
- Grid Security is based on public key
infrasturcture - Basic Grid Security Mechanisms
- Certificate Generation Tools
- Certificate Management Tools
- Getting users registered to use a Grid
- Getting Grid credentials to wherever theyre
needed in the system - Authorization/Access Control Tools
- Storing and providing access to system-wide
authorization information
22Basic Grid Security Mechanisms
- Globus Toolkit provides
- Grid-wide identities implemented as PKI
certificates - Transport-level and message-level authentication
- Ability to delegate credentials to agents
- Ability to map between Grid local identities
- Local security administration enforcement
- Single sign-on support implemented as proxies
- A plug in framework for authorization decisions
23Basic Grid Security Mechanisms
- Basic security mechanisms are provided as
libraries/classes and APIs. - Integrated with other GT tools and services
- Integrated with many Grid community tools and
services (and applications systems) - A few stand-alone tools are also included.
24A Cautionary Note
- Grid security mechanisms are tedious to set up.
- If exposed to users, hand-holding is usually
required. - These mechanisms can be hidden entirely from end
users, but still used behind the scenes. - These mechanisms exist for good reasons.
- Many useful things can be done without Grid
security. - It is unlikely that an ambitious project could go
into production operation without security like
this. - Most successful projects end up using Grid
security, but using it in ways that end users
dont see much.
25Monitoring and Discovery Challenges
- Grid Information Service
- Requirements and characteristics
- Uniform, flexible access to information
- Scalable, efficient access to dynamic data
- Access to multiple information sources
- Decentralized maintenance
- Secure information provision
26Monitoring and Discovery Service in GT4 (MDS4)
- WS-RF compatible
- Monitoring of basic service data
- Primary use case is discovery of services
- Starting to be used for up/down statistics
27MDS4 Information Providers
- Code that generates resource property information
- Were called service data providers in GT3
- XML Based not LDAP
- Basic cluster data
- Interfaces to Ganglia, Hawkeye
- GLUE schema
- Some service data from GT4 services
- Start, timeout, etc
- Soft-state registration
- Push and pull data models
28MDS4 Index Service
- Index Service is both registry and cache
- Subscribes to information providers
- Data, datatype, data provider information
- Caches last value of all data
- In memory default approach
29MDS4 Trigger Service
- Compound consumer-producer service
- Subscribe to a set of resource properties
- Set of tests on incoming data streams to evaluate
trigger conditions - When a condition matches, email is sent to
pre-defined address
30The Resource Management Challenge
- Enabling secure, controlled remote access to
heterogeneous computational resources and
management of remote computation - Authentication and authorization
- Resource discovery characterization
- Reservation and allocation
- Computation monitoring and control
- Addressed by a set of protocols services
- GRAM protocol as a basic building block
- Resource brokering co-allocation services
- GSI for security, MDS for discovery
31GRAM - Basic Job Submission and Control Service
- A uniform service interface for remote job
submission and control - Includes file staging and I/O management
- Includes reliability features
- Supports basic Grid security mechanisms
- Available in Pre-WS and WS
- GRAM is not a scheduler.
- No scheduling
- No metascheduling/brokering
- Often used as a front-end to schedulers, and
often used to simplify metaschedulers/brokers
32CondorG
- The Condor project has produced a helper
front-end to GRAM - Managing sets of subtasks
- Reliable front-end to GRAM to manage
computational resources - Note this is not Condor which promotes
high-throughput computing, and use of idle
resources
33Data Management
(3) Log. Info
(1) Attribute Specification
Replica Catalog
Metadata Catalog
Application
(4) Multiple Locations
(2) Logical Collection and Logical File Name
MDS
(5) Selected Replica
Replica Selection
(6)PhysInfo
Performance Information Predictions
NWS
GridFTP Control Channel
Disk Cache
GridFTPDataChannel
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
34GridFTP
- A high-performance, secure, reliable data
transfer protocol optimized for high-bandwidth
wide-area networks - FTP with well-defined extensions
- Uses basic Grid security (control and data
channels) - Multiple data channels for parallel transfers
- Partial file transfers
- Third-party (direct server-to-server) transfers
- Reusable data channels
- Command pipelining
- GGF recommendation GFD.20
35Striped GridFTP Service
- A distributed GridFTP service that runs on a
storage cluster - Every node of the cluster is used to transfer
data into/out of the cluster - Head node coordinates transfers
- Multiple NICs/internal busses lead to very high
performance - Maximizes use of Gbit WANs
36RFT - File Transfer Queuing
- A WSRF service for queuing file transfer requests
- Server-to-server transfers
- Checkpointing for restarts
- Database back-end for failovers
- Allows clients to requests transfers and then
disappear - No need to manage the transfer
- Status monitoring available if desired
37OGSA-DAI
- OGSA interface for accessing XML and relational
data stores - Implements the GGF DAIS WG standard (in progress)
Figure courtesy of Malcolm Atkinson and Rob
Baxter, UK eScience Center
38Where is Globus Today
- Previous versions of the software is currently
available and in use by hundreds of projects - www.globus.org
- GT4 is what Ive mostly talked about
- WS-RF based, latest standards
- Beta currently available, Final in April 2005
- GT2 software (pre-ws, mixed standards) is also
included in the GT4 release - Complete functionality
- Not interoperable
39(No Transcript)
40How to Get Involved
- Become a GT4 Friend!
- Open group of people from various organizations
working with GT4 pre-release code and documents - Reporting problems in code and documents
- Contributing ideas, tests, documentation
- Building GT4-enabled applications
- Weekly telephone calls
- Discussion list
- To subscribe to the GT4 friends list, send an
email to majordomo_at_globus.org which contains the
words subscribe gt4-friends in the message body
41General Globus Help and Support
- Globus-discuss list
- discuss_at_globus.org
- http//globus.org/about/contacts.html
- Bugzilla
- Bugzilla.globus.org
- GT4 Information
- gt4-friends_at_globus.org
- Weekly telecons for early testers
42Overview
- What is a Grid
- What does the Globus Toolkit do?
- Security
- Monitoring
- Resource Management
- Data Management
- Example Application Grid3
- Conclusions
43- Grid2003 An Operational Grid
- 28 sites (2100-2800 CPUs) growing
- 400-1300 concurrent jobs
- 8 substantial applications CS experiments
- Running since October 2003
Korea
http//www.ivdgl.org/grid2003
44Grid2003 Project Goals
- Ramp up U.S. Grid capabilities in anticipation of
LHC experiment needs in 2005. - Build, deploy, and operate a working Grid.
- Include all U.S. LHC institutions.
- Run real scientific applications on the Grid.
- Provide state-of-the-art monitoring services.
- Cover non-technical issues (e.g., SLAs) as well
as technical ones. - Unite the U.S. CS and Physics projects that are
aimed at support for LHC. - Common infrastructure
- Joint (collaborative) work
45ExampleGrid2003Workflows
Genome sequence analysis
Sloan digital sky survey
Physics data analysis
46Grid2003 Components
- Security
- GT GSI, CAS, GSI-OpenSSH
- Monitoring
- GT MDS, MonALISA, Ganglia
- Job Submission
- GT GRAM, Condor-G, Chimera Pegasus
- Data Tools
- GT GridFTP, GT RLS, GT MCS
47Grid2003 Metrics
48Grid2003 Summary
- Working Grid for wide set of applications
- Joint effort between application scientists,
computer scientists - Globus software as a starting point, additions
from other communities as needed
49What Should You TakeAway From This Talk
- Grids are a way to work between administrative
domains - The Globus Toolkit offers a starting point to
building these applications - Many applications both in science and business
use these resources - Much work still to be done in this area- many
open research questions!
50GlobalCommunity
51For More Information
- Jennifer Schopf
- jms_at_mcs.anl.gov
- www.mcs.anl.gov/jms
- Globus Alliance
- www.globus.org
- Global Grid Forum
- www.ggf.org
2nd Edition www.mkp.com/grid2
52 53Globus Certificate Service
- An online service that issues low-quality GSI
certificates - Intended for people who want to experiment with
Grid components that require certificates but do
not have any other means of acquiring
certificates. - These certificates are not to be used on
production systems. - Not a true Certificate Authority (CA)
- No revoking or reissuing certificates
- No verification of identities
- The service itself is not especially secure.
54Simple CA
- A convenient method of setting up a certificate
authority (CA). - The Certificate Authority can then be used to
issue certificates for users and services that
work with GSI and WS-Security. - Simple CA is intended for operators of small Grid
testing environments and users who are not part
of a larger Grid. - Most production Grids will not accept
certificates that are not signed by a well-known
CA, so the certificates generated by Simple CA
will usually not be sufficient to gain access to
production services.
55MyProxy
- MyProxy is a remote service that stores user
credentials. - Users can request proxies for local use on any
system on the network. - Web Portals can request user proxies for use with
back-end Grid services. - Grid administrators can pre-load credentials in
the server for users to retrieve when needed. - Greatly simplifies certificate management!
56CAS Community Authorization Service
- CAS allows resource providers to specify
course-grained access control policies in terms
of communities as a whole. - Fine-grained access control is delegated to the
community. - Resource providers maintain ultimate authority
over their resources (including per-user control
and auditing) but are spared most day-to-day
policy administration tasks.
57VOMS
- A community-level group membership system
- Database of user roles
- Administrative tools
- Client interface
- voms-proxy-init
- Uses client interface to produce an attribute
certificate (instead of proxy) that includes
roles capabilities signed by VOMS server - Works with non-VOMS services, but gives more info
to VOMS-aware services - Allows VOs to centrally manage user roles
58Chimera Virtual Data
- Captures both logical and physical steps in a
data analysis process. - Transformations (logical)
- Derivations (physical)
- Builds a catalog.
- Results can be used to replay analysis.
- Generation of DAG (via Pegasus)
- Execution on Grid
- Catalog allows introspection of analysis process.
Sloan Survey Data
Galaxy cluster size distribution
59Pegasus Workflow Transformation
- Converts Abstract Workflow (AW) into Concrete
Workflow (CW). - Uses Metadata to convert user request to logical
data sources - Obtains AW from Chimera
- Uses replication data to locate physical files
- Delivers CW to DAGman
- Executes using Condor
- Publishes new replication and derivation data in
RLS and Chimera (optional)
ChimeraVirtual DataCatalog
MetadataCatalog
t
DAGman
ReplicaLocationService
Condor
ComputeServer
StorageSystem
ComputeServer
StorageSystem
StorageSystem
ComputeServer
ComputeServer
60MCS - Metadata Catalog Service
- A stand-alone metadata catalog service
- WSRF service interface
- Stores system-defined and user-defined attributes
for logical files/objects - Supports manipulation and query
- Integrated with OGSA-DAI
- OGSA-DAI provides metadata storage
- When run with OGSA-DAI, basic Grid authentication
mechanisms are available
61RLS - Replica Location Service
- A distributed system for tracking replicated data
- Consistent local state maintained in Local
Replica Catalogs (LRCs) - Collective state with relaxed consistency
maintained in Replica Location Indices (RLIs) - Performance features
- Soft state maintenance of RLI state
- Compression of state updates
- Membership and partitioning information
maintenance - Note
- RLS (developed by Globus Alliance and the
DataGrid Project) replaces earlier components in
the Globus Toolkit 2.x.