Gathering at the Well: Creating Communities for Grid I/O presentation

About This Presentation

Transcript and Presenter's Notes

Title: Gathering at the Well: Creating Communities for Grid I/O

1
Gathering at the WellCreating Communities for
Grid I/O
2
New framework needed

Remote I/O is possible anywhere
Build notion of locality into system?
What are possibilities?
Move job to data
Move data to job
Allow job to access data remotely
Need framework to expose these policies

3
Key elements

Storage appliance, interposition agents,
schedulers and match-makers
Mechanism not policies
Policies are exposed to an upper layer
We will however demonstrate the strength of this
mechanism

4
To infinity and beyond

Speedups of 2.5x possible when we are able to use
locality intelligently
This will continue to be important
Data sets are getting larger and larger
There will always be bottlenecks

5
Outline

Motivation
Components
Expressing locality
Experiments
Conclusion

6
I/O communities

Mechanism which allow either
jobs to move to data, or
data to move to jobs, or
data to be accessed remotely
Framework to evaluate these policies

7
Grocers, butchers, cops

Members of an I/O community
Storage appliances
Interposition agents
Scheduling systems
Discovery systems
Match-makers
Collection of CPUs

8
Storage appliances

Should run without special privilege
Flexible and easily deployable
Acceptable to nervous sys admins
Should allow multiple access modes
Low latency local accesses
High bandwidth remote puts and gets

9
NeST
Storage Manager
Physical storage layer
10
Interposition agents

Thin software layer interposed between
application and OS
Allow applications to transparently interact with
storage appliances
Unmodified programs can run in grid environment

11
PFS Pluggable File System
12
Scheduling systems and discovery

Top level scheduler needs ability to discover
diverse resources
CPU discovery
Where can a job run?
Device discovery
Where is my local storage appliance?
Replica discovery
Where can I find my data?

13
Match-making

Match-making is the glue which brings discovery
systems together
Allows participants to indirectly identify each
other
i.e. can locate resources without explicitly
naming them

14
Condor and ClassAds
15
Outline

Motivation
Components
Expressing locality
Experiments
Conclusion

16
I/O Communities
17
Two I/O communities

INFN Condor pool
236 machines, about 30 available at any one time
Wide range of machines and networks spread across
Italy
Storage appliance in Bologna
750 MIPS, 378 MB RAM

18
Two I/O communities

UW Condor pool
900 machines, 100 dedicated for us
Each is 600 MIPS, 512 MB RAM
Networked on 100 Mb/s switch
One was used as a storage appliance

19
Who Am I This Time?

We assumed the role of an Italian scientist
Database stored in Bologna
Need to run 300 instances of simulator

20
Hmmm
21
Three way matching
Refers to NearestStorage.
Knows where NearestStorage is.
Job Ad
Machine Ad
Storage Ad
match
Machine
Job
NeST
22
Two way ClassAds
Type job TargetType machine Cmd
sim.exe Owner thain Requirements
(OpSyslinux)
Type machine TargetType job OpSys
linux Requirements (Ownerthain)
Machine ClassAd
Job ClassAd
23
Three way ClassAds
Type job TargetType machine Cmd
sim.exe Owner thain Requirements
(OpSyslinux) NearestStorage.HasCMSData
Type machine TargetType job OpSys
linux Requirements (Ownerthain) NearestSto
rage ( Name turkey) (TypeStorage)
Machine ClassAd
Job ClassAd
24
Outline

Motivation
Components
Expressing locality
Experiments
Conclusion

25
BOOM!
26
CMS simulator sample run

Purposefully choose a run with high I/O to CPU
ratio
Accesses about 20 MB of data from a 300 MB
database
Writes about 1 MB of output
160 seconds execution time
on a 600 MIPS machine with local disk

27
Policy specification

Run only with locality
Requirements (NearestStorage.HasCMSData)
Run in only one particular community
Requirements (NearestStorage.Name
nestore.bologna)
Prefer home community first
Requirements (NearestStorage.HasCMSData)
Rank (NearestStorage.Name nestore.bologna
) ? 10 0
Arbitrarily complex
Requirements ( NearestStorage.Name
nestore.bologna) ( ClockHour lt 7
) ( ClockHour gt 18 )

28
Policies evaluated

INFN local
UW remote
UW stage first
UW local (pre-staged)
INFN local, UW remote
INFN local, UW stage
INFN local, UW local

29
Completion Time
30
CPU Efficiency
31
Conclusions

I/O communities expose locality policies
Users can increase throughput
Owners can maximize resource utilization

32
Future work

Automation
Configuration of communities
Dynamically adjust size as load dictates
Automation
Selection of movement policy
Automation

33
For more info

Condor
http//www.cs.wisc.edu/condor
ClassAds
http//www.cs.wisc.edu/condor/classad
PFS
http//www.cs.wisc.edu/condor/pfs
NeST
http//www.nestproject.org

34
Local only
35
Remote only
36
Both local and remote
37
I/O communities are an old idea, right?

File servers and administrative domains
No, not really. We need
more flexible boundaries
simple mechanism by which users can express I/O
community relationships
hooks into system that allow users to use locality

38
Grid applications have demanding I/O needs

Petabytes of data in tape repositories
Scheduling systems have demonstrated that there
are idle CPUs
Some systems
move jobs to data
move data to jobs
allow job remote access to data
No one approach is always best

39
Easy come, easy go

In a computation grid, resources are very dynamic
Programs need rich methods for finding and
claiming resources
CPU discovery
Device discovery
Replica discovery

40
Bringing it all together
Distributed Repository
Storage appliance
Execution site
CPU Discovery System
Job
Agent
Short-haul I/O
Long-haul I/O
Replica Discovery System
Device Discovery System
41
Conclusions

Locality is good
Balance point between staging data and accessing
it remotely is not static
depends on specific attributes of the job
data size, expected degree of re-reference, etc
depends on performance metric
CPU efficiency or job completion time

42
Implementation

NeST
storage appliance
Pluggable File System (PFS)
interposition agent built with Bypass
Condor and ClassAds
scheduling system
discovery system
match-maker

43
Jim Gast and Bart say ...

Too many bullet slides
Contributions
scientist doesnt want to name bec
resources are dynamic and
name is irrelevant
hooks into system to allow users to express and
take advantage of locality

44
Jim Gast and Bart say ...

everyone knows locality is good - but there is no
way to express this and run jobs on the grid
I/O communities are mechanism by which user can
use locality and specify policies to optimize job
performance

45
4 earth-shattering revelations

1) The grid is big.
2) Scientific data-sets are large.
3) Idle resources are available.
4) Locality is good.

46
Mechanisms not policies

I/O communities are a mechanism not a policy
A higher layer is expected to choose application
appropriate policies
We will however demonstrate the strength of the
mechanism by defining appropriate policies for
one particular application

47
Experimental results

Implementation
Environment
Application
Measurements
Evaluation

Write a Comment

User Comments (0)

About PowerShow.com

Gathering at the Well: Creating Communities for Grid I/O PowerPoint PPT Presentation