Introduction to Grid Computing and the Globus Toolkit - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Introduction to Grid Computing and the Globus Toolkit

Description:

Community overlays on classic org structures. Large or small, ... Community overlays on classic organizational structures. Large or small, static or dynamic ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 54
Provided by: jennife62
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Grid Computing and the Globus Toolkit


1
Introduction to Grid Computing and the Globus
Toolkit
  • Jennifer M. Schopf
  • Argonne National Lab
  • National eScience Centre

2
Overview
  • What is a Grid
  • What does the Globus Toolkit do?
  • Security
  • Resource Management
  • Data Management
  • Monitoring
  • Example Application OSG
  • Conclusions

3
What is a Grid?
  • Resource sharing
  • Computers, storage, sensors, networks,
  • Sharing always conditional issues of trust,
    policy, negotiation, payment,
  • Coordinated problem solving
  • Beyond client-server distributed data analysis,
    computation, collaboration,
  • Dynamic, multi-institutional virtual orgs
  • Community overlays on classic org structures
  • Large or small, static or dynamic

4
What is a Grid?
  • Many definitions many differences especially
    between academics and industry
  • Both use the buzzword to get funding
  • My definition
  • Resource sharing
  • Coordinated problem solving
  • Dynamic, multi-institutional virtual orgs

5
Resource Sharing
  • Resources can be anything-
  • Computers
  • Storage/repositories
  • Sensors and Networks
  • People and software
  • Local Control of the resources, and local
    policies for their use
  • Sharing is always conditional
  • Issues of trust, policy
  • Negotiation and payment

6
Coordinated Problem Solving
  • Beyond client-server
  • Client Server defines a small set of
    well-understood interactions as the only ones
    that can take place
  • Actions in this space can include
  • Distributed data analysis
  • Computation and visualization of results
  • Collaboration

7
Dynamic, Multi-institutionalVirtual Organizations
  • Crossing administrative domains
  • No one has full control over the resources
  • Local policy not global
  • Different local policy on different sites
  • Community overlays on classic organizational
    structures
  • Large or small, static or dynamic

8
Why is this hard/different?
  • Lack of central control
  • Cannot dictate what runs on a resource, how or
    when
  • Different policies at different sites
  • Heterogeneity is everywhere
  • Shared resources
  • Contention, variability
  • Communication
  • Different sites implies different sys admins,
    users, institutional goals, and often strong
    personalities

9
So why do it?
  • Computations that need to be done with a time
    limit
  • Data that cant fit on one site
  • Data owned by multiple sites
  • Applications that need to be run bigger, faster,
    more

10
What Kinds of Applications?
  • Computation intensive
  • Interactive simulation (climate modeling,
    financial mkts)
  • Very large-scale simulation and analysis (galaxy
    formation, gravity waves, battlefield simulation,
    business models)
  • Engineering (parameter studies, linked models)
  • Data intensive
  • Experimental data analysis (high-energy physics)
  • Image and sensor analysis (atro, climate study,
    ecology)
  • Distributed collaboration
  • Online instrumentation (microscopes, x-ray
    devices, etc.)
  • Remote visualization (climate studies, biology)
  • Engineering (large-scale structural testing, chem
    eng.)
  • All required people in several organization to
    collaborate and share computing resources, data,
    instruments

11
History
  • In the early 90s, Ian Foster (ANL, U-C) and Carl
    Kesselman (USC-ISI) enjoyed helping scientists
    apply distributed computing.
  • Opportunities seemed ripe for the picking.
  • Application of technology always uncovers new and
    interesting requirements.
  • Science is cool!
  • Big/Innovative science is even cooler!

12
What Types of Problems?
  • While helping to build/integrate a diverse
  • range of applications, the same problems
  • kept showing up over and over again.
  • Too hard to keep track of authentication data
    (ID/password) across institutions
  • Too many ways to submit jobs
  • Too many ways to store access files and data
  • Too many ways to keep track of data
  • Too easy to leave dangling resources lying
    around (robustness)

13
What Was Needed
  • Solutions to common problems
  • Way to address heterogeniety
  • Way to use standards- or to help push standards
    forward
  • Without standards we cant have interoperability
  • Globus Toolkit was built to address this

14
Overview
  • What is a Grid
  • What does the Globus Toolkit do?
  • Security
  • Monitoring
  • Resource Management
  • Data Management
  • Example Application Grid3
  • Conclusions

15
The Role of the Globus Toolkit
  • A collection of solutions to problems that come
    up frequently when building collaborative
    distributed applications
  • Heterogeneity
  • A focus, in particular, on overcoming
    heterogeneity for application developers
  • Standards
  • We capitalize on and encourage use of existing
    standards (IETF, W3C, OASIS, GGF)
  • GT also includes reference implementations of
    new/proposed standards in these organizations

16
Globus is an Hour Glass
Higher-Level Services and Users
  • Local sites have an their own policies, installs
    heterogeneity!
  • Queuing systems, monitors, network protocols, etc
  • Globus unifies
  • Build on Web services
  • Use WS-RF, WS-Notification to represent/access
    state
  • Common management abstractions interfaces

Standard GT4 Interfaces
Local heterogeneity
17
What Is the Globus Toolkit?
  • Collection of solutions to common problems when
    building collaborative distributed applications.
  • A set of basic Grid services
  • Job submission/management
  • File transfer (individual, queued)
  • Database access
  • Data management (replication, metadata)
  • Monitoring/Indexing system information
  • A Grid development environment for your own
    services
  • Building blocks for WSRF-compliant Web Services,
    including security infrastructure
  • Tools and Examples
  • The prerequisites for many Grid community tools!

18
Globus IsStandard Plumbing for the Grid
  • Not turnkey solutions, but building blocks and
    tools for application developers and system
    integrators.
  • Some components (e.g., file transfer) go farther
    than others (e.g., remote job submission) toward
    end-user relevance.
  • Since these solutions exist and others are
    already using them (and theyre free), its
    easier to reuse than to reinvent.
  • And compatibility with other Grid systems comes
    for free!

19
How it Really Happens
ComputeServer
SimulationTool
ComputeServer
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
ChatTool
DataCatalog
Database service
CredentialRepository
Database service
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
20
How it Really Happens(without the Globus Toolkit)
ComputeServer
A
SimulationTool
ComputeServer
B
WebBrowser
WebPortal
RegistrationService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
C
ChatTool
DataCatalog
Database service
D
CredentialRepository
Database service
E
Certificate authority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
21
How it Really Happens(with the Grid)
ComputeServer
GlobusGRAM
SimulationTool
ComputeServer
GlobusGRAM
WebBrowser
Portal/ CHEF
Globus IndexService
Camera
TelepresenceMonitor
DataViewerTool
Camera
Database service
GlobusOGSA- DAI
CHEF ChatTeamlet
GlobusMCS/RLS
Database service
GlobusOGSA DAI
MyProxy Cred. Rep.
Database service
GlobusOGSA DAI
CertificateAuthority
Resources implement standard access management
interfaces
Collective services aggregate /or virtualize
resources
Users work with client applications
Application services organize VOs enable access
to other services
22
Globus Toolkit V4.0
  • Released on April 2005
  • Previous fifteen months spent on design,
    development, and testing
  • 1.8M lines of code
  • Major contributions from five institutions
  • Hundreds of millions of service calls executed
    over weeks of continuous operation
  • Significant improvements over GT3 code base in
    all dimensions

23
Our Goals for GT4
  • Usability, reliability, scalability,
  • Web service components have quality equal or
    superior to pre-WS components
  • Documentation at acceptable quality level
  • Consistency with latest standards (WS-, WSRF,
    WS-N, etc.) and Apache platform
  • WS-I Basic (Security) Profile compliant
  • New components, platforms, languages
  • And links to larger Globus ecosystem

24
Open Source/Open Standards
  • WSRF developed in collaboration with IBM
  • Currently in OASIS process
  • Contributions to Apache for
  • WS-Security
  • WS-Addressing
  • Axis
  • Apollo (WSRF)
  • Hermes (WS-Notification)

25
WSRF vs XML/SOAP
  • The definition of WSRF means that the Grid and
    Web services communities can move forward on a
    common base
  • Why Not Just Use XML/SOAP?
  • WSRF and WS-N are just XML and SOAP
  • WSRF and WS-N are just Web services
  • Benefits of following the specs
  • These patterns represent best practices that have
    been learned in many Grid applications
  • There is a community behind them
  • Why reinvent the wheel?
  • Standards facilitate interoperability

26
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
27
Why Grid Security is Hard
  • Resources being used may be valuable the
    problems being solved sensitive
  • Resources are often located in distinct
    administrative domains
  • Each resource has own policies procedures
  • Set of resources used by a single computation may
    be large, dynamic, and unpredictable
  • Not just client/server, requires delegation
  • It must be broadly available applicable
  • Standard, well-tested, well-understood protocols
    integrated with wide variety of tools

28
Security Tools
  • Grid Security is based on public key
    infrasturcture
  • Basic Grid Security Mechanisms
  • Certificate Generation Tools
  • Certificate Management Tools
  • Getting users registered to use a Grid
  • Getting Grid credentials to wherever theyre
    needed in the system
  • Authorization/Access Control Tools
  • Storing and providing access to system-wide
    authorization information

29
Basic Grid Security Mechanisms
  • Globus Toolkit provides
  • Grid-wide identities implemented as PKI
    certificates
  • Transport-level and message-level authentication
  • Ability to delegate credentials to agents
  • Ability to map between Grid local identities
  • Local security administration enforcement
  • Single sign-on support implemented as proxies
  • A plug in framework for authorization decisions

30
Other Security Services Include
  • MyProxy
  • Simplified credential management
  • Web portal integration
  • Single-sign-on support
  • KCA kx.509
  • Bridging into/out-of Kerberos domains
  • SimpleCA
  • Online credential generation
  • PERMIS
  • Authorization service callout

31
A Cautionary Note
  • Grid security mechanisms are tedious to set up.
  • If exposed to users, hand-holding is usually
    required.
  • These mechanisms can be hidden entirely from end
    users, but still used behind the scenes.
  • These mechanisms exist for good reasons.
  • Many useful things can be done without Grid
    security.
  • It is unlikely that an ambitious project could go
    into production operation without security like
    this.
  • Most successful projects end up using Grid
    security, but using it in ways that end users
    dont see much.

32
The Resource Management Challenge
  • Enabling secure, controlled remote access to
    heterogeneous computational resources and
    management of remote computation
  • Authentication and authorization
  • Resource discovery characterization
  • Reservation and allocation
  • Computation monitoring and control
  • Addressed by a set of protocols services
  • GRAM protocol as a basic building block
  • Resource brokering co-allocation services
  • GSI for security, MDS for discovery

33
GRAM - Basic Job Submission and Control Service
  • A uniform service interface for remote job
    submission and control
  • Includes file staging and I/O management
  • Includes reliability features
  • Supports basic Grid security mechanisms
  • Available in Pre-WS and WS
  • GRAM is not a scheduler.
  • No scheduling
  • No metascheduling/brokering
  • Often used as a front-end to schedulers, and
    often used to simplify metaschedulers/brokers

34
Execution Management (GRAM)
  • Common WS interface to schedulers
  • Unix, Condor, LSF, PBS, SGE,
  • More generally interface for process execution
    management
  • Lay down execution environment
  • Stage data
  • Monitor manage lifecycle
  • Kill it, clean up
  • A basis for application-driven provisioning

35
GT4 GRAM
  • 2nd-generation WS implementation
  • optimized for performance, stability,
    scalability
  • Streamlined critical path
  • Use only what you need
  • Flexible credential management
  • Credential cache delegation service
  • GridFTP RFT used for data operations
  • Data staging streaming output
  • Eliminates redundant GASS code
  • Single and multi-job support

36
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
37
GT4 Data Management
  • Stage/move large data to/from nodes
  • GridFTP, Reliable File Transfer (RFT)
  • Alone, and integrated with GRAM
  • Locate data of interest
  • Replica Location Service (RLS)
  • Replicate data for performance/reliability
  • Distributed Replication Service (DRS)
  • Provide access to diverse data sources
  • File systems, parallel file systems, hierarchical
    storage GridFTP
  • Databases OGSA DAI

38
Data Management
(3) Log. Info
(1) Attribute Specification
Replica Catalog
Metadata Catalog
Application
(4) Multiple Locations
(2) Logical Collection and Logical File Name
MDS
(5) Selected Replica
Replica Selection
(6)PhysInfo
Performance Information Predictions
NWS
GridFTP Control Channel
Disk Cache
GridFTPDataChannel
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
39
GridFTP
  • A high-performance, secure, reliable data
    transfer protocol optimized for high-bandwidth
    wide-area networks
  • FTP with well-defined extensions
  • Uses basic Grid security (control and data
    channels)
  • Multiple data channels for parallel transfers
  • Partial file transfers
  • Third-party (direct server-to-server) transfers
  • Reusable data channels
  • Command pipelining
  • GGF recommendation GFD.20

40
Striped GridFTP Service
  • A distributed GridFTP service that runs on a
    storage cluster
  • Every node of the cluster is used to transfer
    data into/out of the cluster
  • Head node coordinates transfers
  • Multiple NICs/internal busses lead to very high
    performance
  • Maximizes use of Gbit WANs

41
RFT - File Transfer Queuing
  • A GT4 web service for queuing file transfer
    requests
  • Server-to-server transfers
  • Checkpointing for restarts
  • Database back-end for failovers
  • Allows clients to requests transfers and then
    disappear
  • No need to manage the transfer
  • Status monitoring available if desired

42
Replica Location Service
  • Identify location of files via logical to
    physical name map
  • Distributed indexing of names, fault tolerant
    update protocols
  • GT4 version scalable stable
  • Managing 40 million files across 10 sites

Index
Index
43
OGSA-DAI
  • Web service interface for accessing XML and
    relational data stores
  • Implements the GGF DAIS WG standard (in progress)

Figure courtesy of Malcolm Atkinson and Rob
Baxter, UK eScience Center
44
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
45
Monitoring and Discovery System(MDS4)
  • Grid-level monitoring system used most often for
    resource selection
  • Aid user/agent to identify host(s) on which to
    run an application
  • Uses standard interfaces to provide publishing of
    data, discovery, and data access, including
    subscription/notification
  • WS-ResourceProperties, WS-BaseNotification,
    WS-ServiceGroup
  • Functions as an hourglass to provide a common
    interface to lower-level monitoring tools

46
Information Users Schedulers, Portals, etc.
WS standard interfaces for subscription,
registration, notification
GLUE Schema Attributes (cluster info, queue info,
FS info)
47
MDS4 Components
  • Higher level services
  • Index Service a way to aggregate data
  • Trigger Service a way to be notified of changes
  • Both built on common aggregator framework
  • Information providers
  • Monitoring is a part of every WSRF service
  • Non-WS services can also be used
  • Clients
  • WebMDS
  • All of the tool are schema-agnostic, but
    interoperability needs a well-understood common
    language

48
MDS4 Index Service
  • Index Service is both registry and cache
  • Subscribes to information providers
  • Publishes (as resource properties)
  • Datatype and data provider info, like a registry
  • Last value of data, like a cache
  • In memory default approach, DB backing store
    currently being developed to allow for very large
    indexes
  • Soft-state registration
  • Can be set up for a site or set of sites, a
    specific set of project data, or for
    user-specific data only
  • Can be a multi-rooted hierarchy

49
MDS4 Trigger Service
  • Subscribe to a set of resource properties
  • Evaluate that data against a set of
    pre-configured conditions (triggers)
  • When a condition matches, email is sent to
    pre-defined address
  • Similar functionality in Hawkeye

50
Information ProvidersCluster and Queue Data
  • Interfaces to Hawkeye, Ganglia, CluMon
  • Not WS so these are Execution Sources
  • Basic host data (name, ID), processor
    information, memory size, OS name and version,
    file system data, processor load data
  • Some condor/cluster specific data
  • Interfaces to PBS, Torque LSF queue system
  • Queue information, number of CPUs available and
    free, job count information, some memory
    statistics and host info for head node of cluster

51
Information ProvidersGT4 Services
  • Every WS built using GT4 core
  • ServiceMetaDataInfo element includes start time,
    version, and service type name
  • Reliable File Transfer Service (RFT)
  • Service status data, number of active transfers,
    transfer status, information about the resource
    running the service
  • Community Authorization Service (CAS)
  • Identifies the VO served by the service instance
  • Replica Location Service (RLS)
  • Note not a WS
  • Location of replicas on physical storage systems
    (based on user registrations) for later queries

52
WebMDS User Interface
  • Web-based interface to WSRF resource property
    information
  • User-friendly front-end to the Index Service
  • Uses standard resource property requests to query
    resource property data
  • XSLT transforms to format and display them
  • Customized pages are simply done by using HTML
    form options and creating your own XSLT
    transforms
  • Sample page
  • http//mds.globus.org8080/webmds/webmds?infoinde
    xinfoxslservicegroupxsl

53
WebMDS Service
54
(No Transcript)
55
Globus Toolkit Open Source Grid Infrastructure
Globus Toolkit v4 www.globus.org
Data Replication
Replica Location
Grid Telecontrol Protocol
CredentialMgmt
Data Access Integration
Community Scheduling Framework
Delegation
Python Runtime
WebMDS
Reliable File Transfer
CommunityAuthorization
Trigger
C Runtime
Workspace Management
GridFTP
Authentication Authorization
Grid Resource Allocation Management
Index
Java Runtime
Data Mgmt
Security
CommonRuntime
Execution Mgmt
Info Services
56
Overview
  • What is a Grid
  • What does the Globus Toolkit do?
  • Security
  • Resource Management
  • Data Management
  • Monitoring
  • Example Application Grid3
  • Conclusions

57
  • Open Science Grid (OSG)
  • 28 sites (2100-2800 CPUs) growing
  • 400-1300 concurrent jobs
  • 8 substantial applications CS experiments
  • Running since October 2003

Korea
http//www.ivdgl.org/grid2003
58
Grid2003 Project Goals
  • Ramp up U.S. Grid capabilities in anticipation of
    LHC experiment needs in 2005.
  • Build, deploy, and operate a working Grid.
  • Include all U.S. LHC institutions.
  • Run real scientific applications on the Grid.
  • Provide state-of-the-art monitoring services.
  • Cover non-technical issues (e.g., SLAs) as well
    as technical ones.
  • Unite the U.S. CS and Physics projects that are
    aimed at support for LHC.
  • Common infrastructure
  • Joint (collaborative) work

59
ExampleOSGWorkflows
Genome sequence analysis
Sloan digital sky survey
Physics data analysis
60
Grid2003 Components
  • Security
  • GT GSI, GSI-OpenSSH, Community Authorization
    Sevrice (CAS)
  • Monitoring
  • GT MDS, Ganglia on local systems
  • Job Submission
  • GT GRAM, Chimera Pegasus for workflows
  • Data Tools
  • GT GridFTP, GT RLS, metadata catalogs

61
OSG Metrics
62
OSG Summary
  • Working Grid for wide set of applications
  • Joint effort between application scientists,
    computer scientists
  • Globus software as a starting point, additions
    from other communities as needed

63
What Should You TakeAway From This Talk
  • Grids are a way to work between administrative
    domains
  • The Globus Toolkit offers a starting point to
    building these applications
  • Many applications research, science and business
    use these resources
  • Much work still to be done in this area- many
    open research questions!

64
For More Information
  • Jennifer M. Schopf
  • jms_at_mcs.anl.gov
  • www.mcs.anl.gov/jms
  • Support from DOE, NSF, NeSC
  • This talk
  • www.mcs.anl.gov/jms/Talks (not there yet)
  • Globus Toolkit
  • www.globus.org
  • UK ETF GT4 report
  • www.nesc.ac.uk/technical_papers/ UKeS-2005-03.pdf
Write a Comment
User Comments (0)
About PowerShow.com