Title: Distributed%20Services%20for%20Grid
1Distributed Services for Grid Enabled Data
Analysis
Distributed Services for Grid Enabled Data
Analysis
2Scenario
- Liz and John are members of CMS
- Liz is from Caltech and is an expert in event
reconstruction - John is from Florida and is an expert in
statistical fits - They wish to combine their expertise and
collaborate on a CMS Data Analysis Project
3Demo Goals
- Prototype vertically integrated system
- Transparent/seamless experience
- Distribute grid services using a uniform web
service - Clarens !
- Understand system
- latencies
- failure modes
- Investigate request scheduling in a resource
limited and dynamic environment - Emphasize functionality over scalability
- Investigate interactive vs. scheduled data
analysis on a grid - Hybrid example
- Understand where are the difficult issues
4Data Discovery
Virtual data products are pre-registered with
the Chimera Virtual Data Service. Using
Clarens, data products are discovered by Liz and
John by remotely browsing the Chimera Virtual
Data Service
y.cards
x.cards
pythia
pythia
y.ntpl
x.ntpl
h2root
h2root
x.root
y.root
Chimera Virtual Data System
5Data Analysis
Liz wants to analyse x.root using her analysis
code a.C
x.cards
pythia
// Analysis code a.C include
ltiostream.hgt include ltmath.hgt include
"TFile.h" include "TTree.h" include
"TBrowser.h" include "TH1.h" include
"TH2.h" include "TH3.h" include
"TRandom.h" include "TCanvas.h" include
"TPolyLine3D.h" include "TPolyMarker3D.h" includ
e "TString.h" void a( char treefile, char
newtreefile ) Int_t Nhep Int_t
Nevhep Int_t Isthep3000 Int_t
Idhep3000, Jmohep30002,
Jdahep30002 Float_t Phep30005,
Vhep30004 Int_t Irun,
Ievt Float_t Weight Int_t
Nparam Float_t Param200 TFile
file new TFile( treefile ) TTree tree
(TTree) file -gt Get( "h10 tree -gt
SetBranchAddress( "Nhep", Nh
x.ntpl
h2root
x.root
Chimera Virtual Data System
6Interactive Workflow Generation
Liz browses the local directory for her analysis
code and the Chimera Virtual Data Service for
input LFNs
x.cards
pythia
x.ntpl
Select input LFN
h2root
x.root
Select CINT script
Define output LFN
Chimera Virtual Data System
register
browse
7Interactive Workflow Generation
She selects and registers (to the Grid) her
analysis code, the appropriate input LFN, and a
newly defined ouput LFN
x.cards
pythia
x.ntpl
Select input LFN
y.ntpl y.root x.ntpl x.root
a.C b.C c.C d.C
h2root
x.root
Select CINT script
Define output LFN
xa.root
Chimera Virtual Data System
register
browse
8Interactive Workflow Generation
A branch is automatically added in the Chimera
Virtual Data Catalog, and a.C is uploaded
into gridspace and registered with RLS
x.cards
pythia
x.ntpl
Select input LFN
y.ntpl y.root x.ntpl x.root
a.C b.C c.C d.C
h2root
a.C
x.root
a.C
x.root
root
Select CINT script
Define output LFN
xa.root
xa.root
Chimera Virtual Data System
register
browse
9Interactive Workflow Generation
x.cards
Querying the Virtual Data Service, Liz sees that
xa.root is now available to her as a new virtual
data product
pythia
x.ntpl
y.ntpl y.root x.ntpl x.root xa.root
h2root
x.root
a.C
root
request
browse
xa.root
Chimera Virtual Data System
10Request Submission
x.cards
She requests it.
pythia
x.ntpl
y.ntpl y.root x.ntpl x.root xa.root
h2root
x.root
a.C
xa.root
root
request
browse
xa.root
Chimera Virtual Data System
11Brief Interlude The Grid is Busy and Resources
are Limited!
- Busy
- Production is taking place
- Other physicists are using the system
- Use MonALISA to avoid congestion in the grid
- Limited
- As grid computing becomes standard fare,
oversubscription to resources will be common ! - CMS gives Liz a global high priority
- Based upon local and global policies, and current
Grid weather, a grid-scheduler - must schedule her requests for optimal resource
use
12Sphinx Scheduling Server
- Nerve Centre
- Global view of system
- Data Warehouse
- Information driven
- Repository of current state of the grid
- Control Process
- Finite State Machine
- Different modules modify jobs, graphs, workflows,
etc and change their state - Flexible
- Extensible
Sphinx Server
Message Interface
Graph Reducer
Control Process
Job Predictor
Graph Predictor
Data Warehouse
Job Admission Control
- Policies
- Accounting Info
- Grid Weather
- Resource Prop.
- and status
- Request Tracking
- Workflows
- etc
Graph Admission Control
Graph Data Planner
Job Execution Planner
Graph Tracker
Data Management
Information Gatherer
13Distributed Services for Grid Enabled Data
Analysis
Distributed Services for Grid Enabled Data
Analysis
Clarens
Clarens
Globus
Clarens
GridFTP
Clarens
Globus
Globus
MonALISA
14Collaborative Analysis
x.cards
Meanwhile, John has been developing his
statistical fits in b.C by analysing the data
product x.root
pythia
x.ntpl
h2root
y.ntpl y.root x.ntpl x.root xa.root xb.root
x.root
a.C
b.C
root
root
xb.root
xa.root
xb.root
request
browse
15Collaborative Analysis
x.cards
After Liz has finished optimising the event
reconstruction, John uses his analysis code b.C
on her data product xa.root to produce the final
statistical fits and results !
pythia
x.ntpl
h2root
y.root x.ntpl x.root xa.root xb.root xab.root
x.root
a.C
b.C
root
root
xab.root
xa.root
xb.root
request
browse
root
xab.root
16Key Features
- Distributed Services Prototype in Data Analysis
- Remote Data Service
- Replica Location Service
- Virtual Data Service
- Scheduling Service
- Grid-Execution Service
- Monitoring Service
- Smart Replication Strategies for Hot Data
- Virtual Data w.r.t. Location
- Execution Priority Management on a Resource
Limited Grid - Policy Based Scheduling QoS
- Virtual Data w.r.t. Existence
- Collaborative Environment
- Sharing of Datasets
- Use of Provenance
17Credits
- California Institute of Technology
- Julian Bunn, Iosif Legrand, Harvey Newman, Suresh
Singh, Conrad Steenberg, Michael Thomas, Frank
Van Lingen, Yang Xia - University of Florida
- Paul Avery, Dimitri Bourilkov, Richard Cavanaugh,
Laukik Chitnis, Jang-uk In, Mandar Kulkarni,
Pradeep Padala, Craig Prescott, Sanjay Ranka - Fermi National Accelerator Laboratory
- Anzar Afaq, Greg Graham