Grid Computing - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Grid Computing

Description:

Grid Computing. Session 1. David Yates. January 9, 2004. Acknowledgments ... Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 65
Provided by: david259
Category:
Tags: andrea | computing | grid | yates

less

Transcript and Presenter's Notes

Title: Grid Computing


1
Grid Computing
  • Session 1
  • David Yates
  • January 9, 2004

2
Acknowledgments
  • Most of these slides were written by the authors
    of the papers
  • Most of the editorial comments are mine

3
Areas of Grid Computing
  • Distributed everything
  • Distributed (and parallel) computing
  • Distributed databases
  • Distributed file systems
  • Distributed resource management
  • Distributed AI
  • There is depth with this breadth, but
  • State of Grid Computing reveals lots of
    architecture and framework development,
    technology implementation, and engineering,
    most of this motivated by very interesting
    applications
  • Resource management and data management are two
    important areas within the landscape of Grid
    computing
  • Session 1 covers sample papers for applications,
    some data management (virtual data and query
    processing), resource management (SLA and
    Ethernet-for-grid), profiling and benchmarking,
    and resource scheduling

4
Session 1 Papers
James Annis, Yong Zhao, Jens Voeckler, Michael
Wilde, Steve Kent and Ian Foster, Applying
Chimera Virtual Data Concepts to Cluster Finding
in the Sloan Sky Survey. In Supercomputing 2002
(SC2002), November 2002. http//www.globus.org/res
earch/papers/SDSS-SC02.pdf Henrique Andrade,
Tahsin Kurc, Alan Sussman and Joel Saltz, Active
Proxy-G Optimizing the Query Execution Process
in the Grid. In Supercomputing 2002 (SC2002),
November 2002. ftp//ftp.cs.umd.edu/pub/hpsl/paper
s/papers-pdf/sc02-tr.pdf K. Appleby, S.
Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S.
Krishnakumar, D. Pazel, J. Pershing and B.
Rochwerger, Oceano - SLA based management of a
computing utility. In Seventh IFIP/IEEE
International Symposium on Integrated Network
Management, Seattle, WA, May 2001. http//roc.cs.b
erkeley.edu/294fall01/readings/oceanoIM01.pdf Dou
glas Thain and Miron Livny, The Ethernet
Approach to Grid Computing. In Twelfth IEEE
Symposium on High Performance Distributed
Computing (HPDC-12), Seattle, WA,
2003. http//www.cs.wisc.edu/condor/doc/ethernet-h
pdc12.pdf
5
Session 1 Papers, Continued
Matthew S. Allen and Rich Wolski, The Livny and
Plank-Beck Problems Studies in Data Movement on
the Computational Grid. In Supercomputing 2003
(SC20O3), Phoenix, AZ, November,
2003.http//www.cs.ucsb.edu/msa/publications/liv
nyplank.pdf Douglas Thain, John Bent, Andrea
Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron
Livny, Pipeline and Batch Sharing in Grid
Workloads. In Twelfth IEEE Symposium on High
Performance Distributed Computing (HPDC-12),
Seattle, WA, June 2003.http//www.cs.wisc.edu/con
dor/doc/profiling.pdf Chuang Liu, Lingyun Yang,
Ian Foster, Dave Angulo, Design and Evaluation
of a Resource Selection Framework for Grid
Applications. In Eleventh IEEE Symposium on High
Performance Distributed Computing (HPDC-11),
Edinburgh, Scotland, July 2002.http//www.globus.
org/research/papers/RS-hpdc.pdf Rajesh Raman,
Miron Livny and Marvin Solomon, Policy Driven
Heterogeneous Resource Co-Allocation with
Gangmatching. In Twelfth IEEE International
Symposium on High Performance Distributed
Computing (HPDC-12), Seattle, WA, June
2003.http//www.cs.wisc.edu/condor/doc/gangmatchi
ng-hpdc12.pdf
6
Virtual Data and Chimera Vision
  • Problem
  • Managing programs, computations and datasets as
    community resources
  • Solution
  • Virtual data -- Representation of complex
    workflows (including transformations and
    derivations of datasets)
  • Facilitates Planning
  • Mapping complex workflows to distributed
    resources
  • Facilitates Execution
  • Reliable, high-performance execution of complex
    workflows
  • Facilitates Monitoring
  • Monitoring for performance and troubleshooting

7
Programs as Community ResourcesData Derivation
and Provenance
  • Most scientific data are not simple
    measurements essentially all are
  • Computationally corrected/reconstructed
  • And/or produced by numerical simulation
  • And thus, as data and computers become ever
    larger and more expensive
  • Programs are significant community resources
  • So are the executions of those programs
  • Management of the transformations that map
    between datasets an important problem

8
Virtual Data Grid Vision
9
Chimera Virtual Data System
  • Virtual data catalog
  • Transformations, derivations, data
  • Virtual data language
  • Catalog definitions
  • Query Tool
  • Applications include browsers and data analysis
    applications

10
Chimera Virtual Data in SDSS Galaxy Cluster
Search
DAG
Sloan Data
Galaxy cluster size distribution
Jim Annis, Steve Kent, Vijay Sehkri, Fermilab,
Michael Milligan, Yong Zhao,
University of Chicago
11
Active Proxy-G Optimizing Query Execution
Client
Application
Server
subquery
Application
Server
subq results
Active
Proxy-G
query
Application
insert
Server
results
Client
insert
Cache
results
Server
query
results
query
Cache
Server
Client
Client
12
Functional Components
  • Query Server
  • Lightweight Directory Service
  • Workload Monitor Service
  • Persistent Data Store Service

.........
Client 1
Client 2
Client
k
query workload
Active Proxy-G
Query Server
Persistent
Lightweight
Workload
Data
Directory
Monitor
Store
Service
Service
Service
directory updates
workload updates
subqueries
Application
Application
Application
Application
Application
Application
.........
Server
Server
Server
Server
Server
Server
I
II
n
I
II
n
13
Performance Results
Average Execution Time Per Query
50
  • 12 clients
  • 4 x 16 VR queries
  • 8 x 32 VM queries
  • 4 processors (up to 4 queries at the same time)

45
40
35
30
Time (s)
25
20
15
10
128M
192M
256M
320M
PDSS size
Disabled
Basic
Projection Aware
14
Load Balancing
  • 12 clients
  • 4 x 16 VR queries
  • 8 x 32 VM queries
  • Some queries are sent directly to app servers
  • Some queries are sent to APG

Batch Execution Time
700
650
600
550
Time (s)
500
450
400
350
300
250
8-4
10-2
11-1
12-0
Clients Assigned to APG - Clients Assigned to an
App Server
Round Robin
TPU
NDRR
TPUNDRR
15
Oceano - SLA Based Management
Dolphins can be allocated to a customer domain
and later reallocated to another. Whales are
permanently assigned to a domain (e.g., a
customer database server)
16
Oceano Domain Components
17
Predefined Metrics in Oceano
18
Dolphin Server States
19
Oceano Preliminary Performance(Server allocation
time)
  • C Cold server allocation W Warm allocation
  • 2.5 minutes best case 12.5 minutes worst case

20
Ethernet Approach to Grid Computing
Ethernet Carrier Sense Collision
Detect Exponential Backoff Limited Allocation
  • Two problems in real systems
  • Timing is uncontrollable.
  • Failures lack detail.
  • A solution
  • The Ethernet Approach.
  • A language and a tool
  • The Fault Tolerant Shell.
  • Time and failures are explicit.
  • Example Applications
  • Shared Job Queue.
  • Shared Data Servers.

try for 30 minutes ... end
21
Shared Job Queue
Multiple clients connect to a job queue to
manipulate jobs. (Submit, query, remove, etc.)
Whats the bottleneck?
Match Maker
Client
Condor schedd
CPU
Client
CPU
Client
Local Filesystem
Job
Activity Log
Job
Job
Job Queue
Job
Job
Job
CPU
Job
Job
22
(No Transcript)
23
Shared Data Servers
Accepts all connections and holds them idle
indefinitely.
A healthy but loaded server might also have a
high response time.
Each client wants one instance of the data set,
but doesnt care which one. How to deal with
delays and failures?
24
All Clients Blocked on Black Hole
25
Livny and Plank-Beck Problems Livny Problem
  • Given k files of uniform size distributed across
    k hosts in a Grid setting, optimize the transfer
    of all files to a central location so that the
    maximum number of files arrive in the minimum
    amount of time

26
Plank-Beck Problem
  • Given a single large file divided into k ordered
    segments of uniform length in which each segment
    has r replicas distributed across hosts, minimize
    the time necessary to deliver each segment in
    order

27
Predictions
Measurements T1,M1 T2,M2 T3,M3
Forecaster
Forecast F
  • The Network Weather Service (NWS)
  • distributed resource monitoring tool
  • Uses non-parametric statistical algorithms that
    analyze a set of time-stamped measurements to
    generate a prediction and a prediction error
  • Used 64Kb bandwidth probes for network evaluation
  • Used file send times for timeout prediction

28
(No Transcript)
29
Progress-Driven Redundancy for Plank-Beck Problem
  • Parallel fetches are use to retrieve different
    segments from the replica server
  • If the download of segment k has not completed,
    but k1 and k2 have completed, additional
    fetches are issued for segment k
  • Two variants
  • Original
  • General

30
(No Transcript)
31
(No Transcript)
32
Pipeline and Batch Sharing in Grid Workloads
  • Behavior of single applications has been well
    studied
  • sequential and parallel
  • But many apps are not run in isolation
  • End result is product of a group of apps
  • Commonly found in batch systems
  • Run 100s or 1000s of times
  • Key is sharing behavior between apps

33
Batch-Pipelined Sharing
34
3 types of I/O
  • Endpoint unique input and output
  • Pipeline ephemeral data
  • Batch shared input data

Shared dataset
35
Six (plus one) target scientific applications
  • BLAST - biology
  • IBIS - ecology
  • CMS - physics
  • Hartree-Fock - chemistry
  • Nautilus - molecular dynamics
  • AMANDA -astrophysics
  • SETI_at_home - astronomy

36
Example Application IBIS
inputs
climate data
analyze
IBIS is a global-scale simulation of earths
climate used to study effects of human activity
(e.g. global warming). Only one app thus no
pipeline sharing.
forecast
37
Absolute Resources Consumed
  • Wide range of runtimes. Modest memory usage.

38
Absolute I/O Mix
  • Only IBIS has significant ratio of endpoint I/O.

39
Scalability of batch width
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
40
Batch elimination
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
41
Pipeline elimination
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
42
Endpoint only
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
43
Resource Selection Framework for Grid Applications
  • Co-selection of resources, which involves
  • Select a set resources for an Grid application
    that meet application requirements
  • Map workload to resources
  • Investigate feasibility of declarative language
    for expressing application requirements and
    resource capabilities
  • Set matching as an extension of matchmaking
  • Represent characteristics of resources
  • Represent requests for a resource set

44
Set Match Matchmaker
requests
Set-extended Condor matchmaking engine
Set
Resource ClassAd 1
Constructor
Resource ClassAd 2
resources
evaluate
Request ClassAd
Resource ClassAd 3
Res2
Res1,Res2
Resource ClassAd 4
Res1,Res3
match, or
fail

45
Set Match An Example
46
Resource Selection Service Framework
RSS
GIIS
Resource
Information
Resource
Request
MDS
Set
GRISes
Resource
App
Matcher
Monitor
NWS
Result
Mapper
47
Experiments Mapping Result
  • Machine1 450 MHz, no CPU load
  • Machine2 500 MHz, CPU load2

48
Experiments Resource Selection (One Site)
  • 1. o.ucsd.edu, mystere.ucsd.edu,
    saltimbanco.ucsd.edu
  • 2 mystere.ucsd.edu, o.ucsd.edu
  • 3 o.ucsd.edu, Saltimbanco.ucsd.edu 4
    o.ucsd.edu
  • 5 saltimbanco.ucsd.edu 6
    mystere.ucsd.edu

49
Experiments Resource Selection (Two Sites)
  • 1 torc6.cs.utk.edu 2 o.ucsd.edu 3
    Saltimbanco.ucsd.edu
  • 4 torc6.cs.utk.edu o.ucsd.edu 5
    o.ucsd.edu saltimbanco.ucsd.edu
  • 6 o.ucsd.edu, mystere.ucsd.edu, torc 6
    cs.utk.edu

50
Summary and Future Work
  • Extended the ClassAds language to describe
    set-based requirement for a resource set
  • Implemented a set matchmaker created a
    resource selection service framework
  • Validation with Cactus application
  • Future Work Extend semantics, implementation,
    application of set matching framework

51
Resource Co-Allocation with Gangmatching
  • Problem Matchmaking is limited by its purely
    bilateral formalism of matching a single customer
    with a single resource
  • One-to-one matching of customers and resources
    makes it awkward or impossible to support
    heterogeneous resource co-allocation.
  • Solution Gangmatching is a multilateral
    extension to the Matchmaking
  • Gangmatching replaces a single implicit
    bilateral match imperative with an explicit list
    of required bilateral matches.

52
A Gangmatch Request
53
A Workstation Advertisement
54
A Software License Advertisement
55
Performance of Indexing
56
Performance of Dynamic Algorithm
57
Performance of Indexing vs. Dynamic Algorithm
58
Future Work
  • Incorporate request and resource preferences into
    Gangmatching algorithm
  • Develop more sophisticated algorithms to cope
    with larger (i.e., both wider and deeper)
    ClassAd gangs
  • Support for unknown number of co-allocation
    resources (e.g., dynamic number of relatively
    homogeneous resources)

59
Grid ComputingAdditional Slides
  • Session 1
  • David Yates
  • January 9, 2004

60
Computer Science and GriPhyN (Foster, Jan 2003)
Partner Physics Experiments
Requirements
Prototyping experimentation
Production Deployment
Computer Science Research
Virtual Data Toolkit
Larger Science Community
Productization
Technology Transfer
Globus, Condor, NMI, EU DataGrid, PPDG Communities
61
GriPhyN CS RD Activities Coupling with VDT
(Foster, Jan 2003)
Knowledge
VDT
Research
Partial Queries (Liu, Franklin)
Ontologies (Zhao)
Chimera Virtual Data System Pegasus Planner
Virtual Data
Virtual data language design (Voeckler,Wilde)
AI Planning (Deelman,Narang)
Virtual data language applns (Milligan, Zhao)
Decentralized scheduling (Ranganathan)
Prophesy (Taylor, Yin)
Workflow
DAGman Workflow
Fault-tolerant master-worker (Marzullo)
DAGman enhancements (UW team)
Policy-aware scheduling (Dumitrescu)
Globus Toolkit, Condor, Ganglia, Etc.
Jobs
Scalable replica location service (UC, ISI team)
NeST Storage mgmt (UW team)
HP monitoring (George)
62
Virtual Data Research Issues(Wilde, Oct 2002)
  • Representation
  • Metadata how is it created, stored, propagated?
  • Datasets and Type Model
  • What knowledge must be represented? How?
  • Capturing notions of data approximation
  • Virtual data catalogs as a community resource
  • Automating data capture
  • Access control, sharing and privacy issues
  • Quality control
  • Hyperlinked virtual data catalogs
  • Data derivation
  • Query estimation and request planning

63
Virtual Data Research Issues(Wilde, Oct 2002)
  • Engineering issues
  • Dynamic (runtime-computed) dependencies
  • Large dependent sets
  • Extensions to other data models relational, OO
  • Virtual data browsers
  • XML vs. relational databases query languages
  • Additional usage modalities
  • E.g., meta-analyses, automated experiment
    generation, active notebooks
  • Virtual data browsers, editors
  • Additional applications
  • E.g., bioinformatics, earth sciences

64
Active Proxy-G Integration with Globus
  • Integrate standard components
  • Globus MDS (Monitoring and Discovery Service)
  • NWS (Network Weather Service)
  • SRB (Storage Resource Broker)
  • Use standard protocols
  • WSDL (Web Service Definition Language) interface
  • SOAP (Simple Object Access Protocol)

.........
Client 1
Client 2
Client
k
query workload
WSDL Interface
Active Proxy-G
Query Server
Globus MDS
NWS
Persistent
Lightweight
Workload
Data
Directory
Monitor
Store
Service
Service
Service
directory updates
workload updates
subqueries
Application
Application
Application
Application
Application
Application
Server
Server
Server
.........
Server
Server
Server
I
II
n
I
II
n
SRB
Data Source
Write a Comment
User Comments (0)
About PowerShow.com