Grid Computing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Grid Computing

1
Grid Computing

Session 1
David Yates
January 9, 2004

2
Acknowledgments

Most of these slides were written by the authors
of the papers
Most of the editorial comments are mine

3
Areas of Grid Computing

Distributed everything
Distributed (and parallel) computing
Distributed databases
Distributed file systems
Distributed resource management
Distributed AI
There is depth with this breadth, but
State of Grid Computing reveals lots of
architecture and framework development,
technology implementation, and engineering,
most of this motivated by very interesting
applications
Resource management and data management are two
important areas within the landscape of Grid
computing
Session 1 covers sample papers for applications,
some data management (virtual data and query
processing), resource management (SLA and
Ethernet-for-grid), profiling and benchmarking,
and resource scheduling

4
Session 1 Papers
James Annis, Yong Zhao, Jens Voeckler, Michael
Wilde, Steve Kent and Ian Foster, Applying
Chimera Virtual Data Concepts to Cluster Finding
in the Sloan Sky Survey. In Supercomputing 2002
(SC2002), November 2002. http//www.globus.org/res
earch/papers/SDSS-SC02.pdf Henrique Andrade,
Tahsin Kurc, Alan Sussman and Joel Saltz, Active
Proxy-G Optimizing the Query Execution Process
in the Grid. In Supercomputing 2002 (SC2002),
November 2002. ftp//ftp.cs.umd.edu/pub/hpsl/paper
s/papers-pdf/sc02-tr.pdf K. Appleby, S.
Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S.
Krishnakumar, D. Pazel, J. Pershing and B.
Rochwerger, Oceano - SLA based management of a
computing utility. In Seventh IFIP/IEEE
International Symposium on Integrated Network
Management, Seattle, WA, May 2001. http//roc.cs.b
erkeley.edu/294fall01/readings/oceanoIM01.pdf Dou
glas Thain and Miron Livny, The Ethernet
Approach to Grid Computing. In Twelfth IEEE
Symposium on High Performance Distributed
Computing (HPDC-12), Seattle, WA,
2003. http//www.cs.wisc.edu/condor/doc/ethernet-h
pdc12.pdf
5
Session 1 Papers, Continued
Matthew S. Allen and Rich Wolski, The Livny and
Plank-Beck Problems Studies in Data Movement on
the Computational Grid. In Supercomputing 2003
(SC20O3), Phoenix, AZ, November,
2003.http//www.cs.ucsb.edu/msa/publications/liv
nyplank.pdf Douglas Thain, John Bent, Andrea
Arpaci-Dusseau, Remzi Arpaci-Dusseau and Miron
Livny, Pipeline and Batch Sharing in Grid
Workloads. In Twelfth IEEE Symposium on High
Performance Distributed Computing (HPDC-12),
Seattle, WA, June 2003.http//www.cs.wisc.edu/con
dor/doc/profiling.pdf Chuang Liu, Lingyun Yang,
Ian Foster, Dave Angulo, Design and Evaluation
of a Resource Selection Framework for Grid
Applications. In Eleventh IEEE Symposium on High
Performance Distributed Computing (HPDC-11),
Edinburgh, Scotland, July 2002.http//www.globus.
org/research/papers/RS-hpdc.pdf Rajesh Raman,
Miron Livny and Marvin Solomon, Policy Driven
Heterogeneous Resource Co-Allocation with
Gangmatching. In Twelfth IEEE International
Symposium on High Performance Distributed
Computing (HPDC-12), Seattle, WA, June
2003.http//www.cs.wisc.edu/condor/doc/gangmatchi
ng-hpdc12.pdf
6
Virtual Data and Chimera Vision

Problem
Managing programs, computations and datasets as
community resources
Solution
Virtual data -- Representation of complex
workflows (including transformations and
derivations of datasets)
Facilitates Planning
Mapping complex workflows to distributed
resources
Facilitates Execution
Reliable, high-performance execution of complex
workflows
Facilitates Monitoring
Monitoring for performance and troubleshooting

7
Programs as Community ResourcesData Derivation
and Provenance

Most scientific data are not simple
measurements essentially all are
Computationally corrected/reconstructed
And/or produced by numerical simulation
And thus, as data and computers become ever
larger and more expensive
Programs are significant community resources
So are the executions of those programs
Management of the transformations that map
between datasets an important problem

8
Virtual Data Grid Vision
9
Chimera Virtual Data System

Virtual data catalog
Transformations, derivations, data
Virtual data language
Catalog definitions
Query Tool
Applications include browsers and data analysis
applications

10
Chimera Virtual Data in SDSS Galaxy Cluster
Search
DAG
Sloan Data
Galaxy cluster size distribution
Jim Annis, Steve Kent, Vijay Sehkri, Fermilab,
Michael Milligan, Yong Zhao,
University of Chicago
11
Active Proxy-G Optimizing Query Execution
Client
Application
Server
subquery
Application
Server
subq results
Active
Proxy-G
query
Application
insert
Server
results
Client
insert
Cache
results
Server
query
results
query
Cache
Server
Client
Client
12
Functional Components

Query Server
Lightweight Directory Service
Workload Monitor Service
Persistent Data Store Service

.........
Client 1
Client 2
Client
k
query workload
Active Proxy-G
Query Server
Persistent
Lightweight
Workload
Data
Directory
Monitor
Store
Service
Service
Service
directory updates
workload updates
subqueries
Application
Application
Application
Application
Application
Application
.........
Server
Server
Server
Server
Server
Server
I
II
n
I
II
n
13
Performance Results
Average Execution Time Per Query
50

12 clients
4 x 16 VR queries
8 x 32 VM queries
4 processors (up to 4 queries at the same time)

45
40
35
30
Time (s)
25
20
15
10
128M
192M
256M
320M
PDSS size
Disabled
Basic
Projection Aware
14
Load Balancing

12 clients
4 x 16 VR queries
8 x 32 VM queries
Some queries are sent directly to app servers
Some queries are sent to APG

Batch Execution Time
700
650
600
550
Time (s)
500
450
400
350
300
250
8-4
10-2
11-1
12-0
Clients Assigned to APG - Clients Assigned to an
App Server
Round Robin
TPU
NDRR
TPUNDRR
15
Oceano - SLA Based Management
Dolphins can be allocated to a customer domain
and later reallocated to another. Whales are
permanently assigned to a domain (e.g., a
customer database server)
16
Oceano Domain Components
17
Predefined Metrics in Oceano
18
Dolphin Server States
19
Oceano Preliminary Performance(Server allocation
time)

C Cold server allocation W Warm allocation
2.5 minutes best case 12.5 minutes worst case

20
Ethernet Approach to Grid Computing
Ethernet Carrier Sense Collision
Detect Exponential Backoff Limited Allocation

Two problems in real systems
Timing is uncontrollable.
Failures lack detail.
A solution
The Ethernet Approach.
A language and a tool
The Fault Tolerant Shell.
Time and failures are explicit.
Example Applications
Shared Job Queue.
Shared Data Servers.

try for 30 minutes ... end
21
Shared Job Queue
Multiple clients connect to a job queue to
manipulate jobs. (Submit, query, remove, etc.)
Whats the bottleneck?
Match Maker
Client
Condor schedd
CPU
Client
CPU
Client
Local Filesystem
Job
Activity Log
Job
Job
Job Queue
Job
Job
Job
CPU
Job
Job
22
(No Transcript)
23
Shared Data Servers
Accepts all connections and holds them idle
indefinitely.
A healthy but loaded server might also have a
high response time.
Each client wants one instance of the data set,
but doesnt care which one. How to deal with
delays and failures?
24
All Clients Blocked on Black Hole
25
Livny and Plank-Beck Problems Livny Problem

Given k files of uniform size distributed across
k hosts in a Grid setting, optimize the transfer
of all files to a central location so that the
maximum number of files arrive in the minimum
amount of time

26
Plank-Beck Problem

Given a single large file divided into k ordered
segments of uniform length in which each segment
has r replicas distributed across hosts, minimize
the time necessary to deliver each segment in
order

27
Predictions
Measurements T1,M1 T2,M2 T3,M3
Forecaster
Forecast F

The Network Weather Service (NWS)
distributed resource monitoring tool
Uses non-parametric statistical algorithms that
analyze a set of time-stamped measurements to
generate a prediction and a prediction error
Used 64Kb bandwidth probes for network evaluation
Used file send times for timeout prediction

28
(No Transcript)
29
Progress-Driven Redundancy for Plank-Beck Problem

Parallel fetches are use to retrieve different
segments from the replica server
If the download of segment k has not completed,
but k1 and k2 have completed, additional
fetches are issued for segment k
Two variants
Original
General

30
(No Transcript)
31
(No Transcript)
32
Pipeline and Batch Sharing in Grid Workloads

Behavior of single applications has been well
studied
sequential and parallel
But many apps are not run in isolation
End result is product of a group of apps
Commonly found in batch systems
Run 100s or 1000s of times
Key is sharing behavior between apps

33
Batch-Pipelined Sharing
34
3 types of I/O

Endpoint unique input and output
Pipeline ephemeral data
Batch shared input data

Shared dataset
35
Six (plus one) target scientific applications

BLAST - biology
IBIS - ecology
CMS - physics
Hartree-Fock - chemistry
Nautilus - molecular dynamics
AMANDA -astrophysics
SETI_at_home - astronomy

36
Example Application IBIS
inputs
climate data
analyze
IBIS is a global-scale simulation of earths
climate used to study effects of human activity
(e.g. global warming). Only one app thus no
pipeline sharing.
forecast
37
Absolute Resources Consumed

Wide range of runtimes. Modest memory usage.

38
Absolute I/O Mix

Only IBIS has significant ratio of endpoint I/O.

39
Scalability of batch width
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
40
Batch elimination
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
41
Pipeline elimination
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
42
Endpoint only
Storage center (1500 MB/s)
Commodity disk (15 MB/s)
43
Resource Selection Framework for Grid Applications

Co-selection of resources, which involves
Select a set resources for an Grid application
that meet application requirements
Map workload to resources
Investigate feasibility of declarative language
for expressing application requirements and
resource capabilities
Set matching as an extension of matchmaking
Represent characteristics of resources
Represent requests for a resource set

44
Set Match Matchmaker
requests
Set-extended Condor matchmaking engine
Set
Resource ClassAd 1
Constructor
Resource ClassAd 2
resources
evaluate
Request ClassAd
Resource ClassAd 3
Res2
Res1,Res2
Resource ClassAd 4
Res1,Res3
match, or
fail

45
Set Match An Example
46
Resource Selection Service Framework
RSS
GIIS
Resource
Information
Resource
Request
MDS
Set
GRISes
Resource
App
Matcher
Monitor
NWS
Result
Mapper
47
Experiments Mapping Result

Machine1 450 MHz, no CPU load
Machine2 500 MHz, CPU load2

48
Experiments Resource Selection (One Site)

1. o.ucsd.edu, mystere.ucsd.edu,
saltimbanco.ucsd.edu
2 mystere.ucsd.edu, o.ucsd.edu
3 o.ucsd.edu, Saltimbanco.ucsd.edu 4
o.ucsd.edu
5 saltimbanco.ucsd.edu 6
mystere.ucsd.edu

49
Experiments Resource Selection (Two Sites)

1 torc6.cs.utk.edu 2 o.ucsd.edu 3
Saltimbanco.ucsd.edu
4 torc6.cs.utk.edu o.ucsd.edu 5
o.ucsd.edu saltimbanco.ucsd.edu
6 o.ucsd.edu, mystere.ucsd.edu, torc 6
cs.utk.edu

50
Summary and Future Work

Extended the ClassAds language to describe
set-based requirement for a resource set
Implemented a set matchmaker created a
resource selection service framework
Validation with Cactus application
Future Work Extend semantics, implementation,
application of set matching framework

51
Resource Co-Allocation with Gangmatching

Problem Matchmaking is limited by its purely
bilateral formalism of matching a single customer
with a single resource
One-to-one matching of customers and resources
makes it awkward or impossible to support
heterogeneous resource co-allocation.
Solution Gangmatching is a multilateral
extension to the Matchmaking
Gangmatching replaces a single implicit
bilateral match imperative with an explicit list
of required bilateral matches.

52
A Gangmatch Request
53
A Workstation Advertisement
54
A Software License Advertisement
55
Performance of Indexing
56
Performance of Dynamic Algorithm
57
Performance of Indexing vs. Dynamic Algorithm
58
Future Work

Incorporate request and resource preferences into
Gangmatching algorithm
Develop more sophisticated algorithms to cope
with larger (i.e., both wider and deeper)
ClassAd gangs
Support for unknown number of co-allocation
resources (e.g., dynamic number of relatively
homogeneous resources)

59
Grid ComputingAdditional Slides

Session 1
David Yates
January 9, 2004

60
Computer Science and GriPhyN (Foster, Jan 2003)
Partner Physics Experiments
Requirements
Prototyping experimentation
Production Deployment
Computer Science Research
Virtual Data Toolkit
Larger Science Community
Productization
Technology Transfer
Globus, Condor, NMI, EU DataGrid, PPDG Communities
61
GriPhyN CS RD Activities Coupling with VDT
(Foster, Jan 2003)
Knowledge
VDT
Research
Partial Queries (Liu, Franklin)
Ontologies (Zhao)
Chimera Virtual Data System Pegasus Planner
Virtual Data
Virtual data language design (Voeckler,Wilde)
AI Planning (Deelman,Narang)
Virtual data language applns (Milligan, Zhao)
Decentralized scheduling (Ranganathan)
Prophesy (Taylor, Yin)
Workflow
DAGman Workflow
Fault-tolerant master-worker (Marzullo)
DAGman enhancements (UW team)
Policy-aware scheduling (Dumitrescu)
Globus Toolkit, Condor, Ganglia, Etc.
Jobs
Scalable replica location service (UC, ISI team)
NeST Storage mgmt (UW team)
HP monitoring (George)
62
Virtual Data Research Issues(Wilde, Oct 2002)

Representation
Metadata how is it created, stored, propagated?
Datasets and Type Model
What knowledge must be represented? How?
Capturing notions of data approximation
Virtual data catalogs as a community resource
Automating data capture
Access control, sharing and privacy issues
Quality control
Hyperlinked virtual data catalogs
Data derivation
Query estimation and request planning

63
Virtual Data Research Issues(Wilde, Oct 2002)

Engineering issues
Dynamic (runtime-computed) dependencies
Large dependent sets
Extensions to other data models relational, OO
Virtual data browsers
XML vs. relational databases query languages
Additional usage modalities
E.g., meta-analyses, automated experiment
generation, active notebooks
Virtual data browsers, editors
Additional applications
E.g., bioinformatics, earth sciences

64
Active Proxy-G Integration with Globus

Integrate standard components
Globus MDS (Monitoring and Discovery Service)
NWS (Network Weather Service)
SRB (Storage Resource Broker)
Use standard protocols
WSDL (Web Service Definition Language) interface
SOAP (Simple Object Access Protocol)

.........
Client 1
Client 2
Client
k
query workload
WSDL Interface
Active Proxy-G
Query Server
Globus MDS
NWS
Persistent
Lightweight
Workload
Data
Directory
Monitor
Store
Service
Service
Service
directory updates
workload updates
subqueries
Application
Application
Application
Application
Application
Application
Server
Server
Server
.........
Server
Server
Server
I
II
n
I
II
n
SRB
Data Source

Write a Comment

User Comments (0)

About PowerShow.com

Grid Computing PowerPoint PPT Presentation