An Introduction to Grid Computing - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

An Introduction to Grid Computing

Description:

What is Grid Computing? Why are we interested in Grids? Grid Architecture from 10,000 feet ... Exploit grid computing. Data mining and statistical calculations ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 17
Provided by: jmbr8
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Grid Computing


1
An Introduction toGrid Computing
  • Richard Fujimoto
  • Reference The Grid 2, ch. 1-4, 7
  • Ian Foster Carl Kesselman (eds.)

2
Outline
  • What is Grid Computing?
  • Why are we interested in Grids?
  • Grid Architecture from 10,000 feet

3
Evolution of Technology
  • Phase I Developmental Stage
  • Concerned with development of the technology
  • Focus is the technology itself - how it is built,
    how it works
  • Users of the technology are experts
  • If successful, technology grows in popularity,
    standards develop, costs decline, widespread use
  • Examples automobile, electric power
  • Phase II Post-Technology Phase
  • Technology is taken for granted, except when it
    fails
  • Main issues are application of technology, ease
    of use, reliability, availability, cost
  • Experts behind the scenes make it work,
    transparent to users

4
Information Technology
  • Fast approaching post-technology phase (mass
    adoption)
  • Increasing commoditization (processors, memory,
    storage, communications)
  • More complex, powerful, systems
  • Possible to have systems with billions of devices
    and sophistication to hide them from users
  • Issues
  • Integration and standards
  • Efficiency while maintaining transparency
  • Virtualization seen as key approach to allow
    transparent, shared resource usage
  • Quality of Service
  • Sophisticated, end-to-end resource management
    needed to ensure high quality at low price

5
Virtual Organizations
  • mutually distrustful participants with varying
    degrees of prior relationships (perhaps none at
    all) want to share resources in order to perform
    some task. Foster/Kesselman, p. 39
  • Coordinated, controlled resource sharing among
    dynamic multi-institutional virtual organizations
  • Resources
  • Computational facilities
  • Software
  • Data
  • Sensors, instruments, actuators
  • Control over what is shared, who has access,
    conditions under which sharing occurs

6
What is a Grid?
  • Includes three essential elements
  • Coordinates sharing of distributed resources
  • Resources and users live within different control
    domains
  • Issues such as security, policy, payment,
    membership etc.
  • Uses standard, open, general-purpose protocols
    and interfaces
  • Address issues such as authentication,
    authorization, resource discovery, resource
    access
  • Deliver non-trivial qualities of service
  • Throughput, response time, availability, security
  • But what does this mean?
  • Main elements
  • Distributed computing using
  • Standard interfaces, APIs, tools in order to
  • Virtualize resources, people, applications, to
    support
  • Virtual organizations

7
Virtual Observatory Application
Reference Grid 2, Chapter 7 (Szalay, Gray)
  • Multiple archives of astronomic data stored at
    geographically distributed sites
  • Each covers part of the electromagnetic spectrum
    for a certain period of time for certain
    celestial objects
  • Desire to do multi-spectral or temporal studies
    of specific objects by combining data from
    different archives
  • Terabytes to petabytes of data data growing at
    an exponential rate
  • Peer-reviewed data!
  • Virtual data Grid Physics Network Project
    -GriPhyN
  • Pipelined processing of data typical
  • Data used by analysis packages might be generated
    dynamically, e.g., query distributed data,
    processed data in pipeline specified by the user
    (e.g., recalibration followed by object
    detection)
  • Moving data vs. moving computation?
  • Large data sets, operations involving much
    reduction suggest moving the computation to the
    data

8
Hierarchical Architecture
  • Archives
  • Text, images, raw data
  • Data mining tools to search and subset data
    objects
  • Metadata (units, provenance)
  • Web services
  • Queries
  • File transfer
  • Data format standards (VOTable) - similar to HLA
    OMT
  • Registries
  • Records kinds of information stored in each
    archive - sky coverage, temporal coverage,
    spectral coverage, resolution
  • Portals
  • Process user queries by integrating data from
    different archives

9
Issues
  • Economics of database queries
  • Empirical costs for computation, disk space,
    network bandwidth, DB access use to compute most
    economical approach to processing query
  • Most queries data intensive (lt10K instructions
    per byte) suggesting usually better to move
    computation near data
  • Either provide cluster near data, or move
    database to user (Internet or sneakernet)
  • Compute-Intensive tasks
  • Raw data must be converted to calibrated,
    cataloged data
  • Must reprocess data annually due to s/w
    improvements
  • Currently, about 1017 instructions (15 TB data) -
    10 CPU years
  • Clusters can do it in about 6 weeks
  • Exploit grid computing
  • Data mining and statistical calculations
  • Amount of computations for large data sets a
    major impediment

10
Sample VO Grid Workflow
  • Locate suitable sites (data archives)
  • Authenticate access to these sites
  • Allocate resources on those computers
  • Select, configure and initiate computations at
    those sites
  • Automatically and transparently adapt to changes
    in resource availability, changes in user
    requirements
  • Display output to user

11
Grid Architecture
Slide courtesy of C. Kessleman Cal(IT)2
Presentation
12
Fabric Layer
  • Two types of basic services for individual
    resources
  • Introspection mechanisms
  • Determination of structure, state, capability of
    resource
  • Resource management mechanisms
  • Control over delivered quality of service
  • Resource types and example services
  • Computational resources
  • Characteristics of hardware/software resources
    available, status (e.g., load, job queue length)
  • Starting programs, monitoring and controling
    execution of processes
  • Control over resources allocated to processes,
    advance reservations
  • Storage resources
  • File access (read, write)
  • Check availability of memory or disk space
  • Control of resources allocated for data transfer
    (e.g., disk bandwidth)
  • Network resources
  • Control over prioritization, bandwidth allocation
  • Interrogate for network characteristics of load

13
Connectivity Layer
  • Communication services between fabric layer
    resources
  • Basically, Internet protocols (TCP, UDP, DNS,
    RSVP, etc.)
  • Authentication protocols
  • Single sign-on to access multiple resources
  • Delegation - give program ability to access
    resources user is authorized to access
  • Integration with local security mechanisms
  • User-based trust relationships - if user can
    access A and B, should be able to access both
    without requiring As and Bs security
    administrators to interact

14
Resource Layer Sharing Single Resources
  • Protocols for secure negotiation, initiation,
    monitoring, control, accounting, and payment of
    sharing operations on individual resources
  • Envisioned to be a small set of protocols
  • Use fabric level functions to access and control
    local resources
  • Information protocols - obtain information on
    structure and state of resource (e.g., loading,
    configuration, cost of use)
  • Management protocols - negotiate access to
    resource, e.g., for QoS
  • Check usage against policy
  • Accounting and payment

15
Collective Coordinating Multiple Resources
  • Discovery services to allow discovery of
    resources and queries of status
  • Coallocation, scheduling, and brokering services
    to utilize multiple resources for a specific
    purpose
  • Monitoring and diagnostic services
  • Data replication services - manage storage
    resources to achieve acceptable performance
  • Programming models and tools
  • Grid enable programming systems, e.g., grid-MPI
  • Workflow specification and management
  • Software discovery services
  • Collaborative work services
  • Security, policy, accounting issues

16
Final Comments
  • Current trends
  • Merging of Grid and Web services
  • Has much momentum - substantial industry support
  • Universally embraced by scientific computing
    community
  • Enterprise computing in commercial sector
  • Ideas have been around for awhile (e.g.,
    meta-computing)
  • Standardization perhaps most important aspect
Write a Comment
User Comments (0)
About PowerShow.com