Title: Grid Enabled Analysis
1Grid Enabled Analysis
- Conrad Steenberg (conrad_at_hep.caltech.edu)
- Frank van Lingen (fvlingen_at_caltech.edu)
2Physics analysis
3LHC Data Grid Hierarchy
cms144 institutes 2000 users
PByte/sec
100-1500 MBytes/sec
CERN Center PBs of Disk Tape Robot
10 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
10 Gbps
2.5-10 Gbps
Institute
Institute
Institute
Institute
Physics data cache
0.1 to 10 Gbps
Tier 4
Workstations
Needle in the haystack problem
4ARCHITECTURE
5System view
- Suport 100-1000s of analysis-production tasks
- Batch, interactive (interactive should be really
interactive!) - Chaotic behavior. (not only production like
workflow) - Resource limited and policy constrained
- Who is allowed to access what resources?
- Real time monitoring and trend analysis
- Workflow tracking, and data provenance
- Collaborate on analysis (country, world wide)
- Provide secure access to data and resources
- Self organizing (prevent the 100 system
administrators nightmare) - Detection of bottlenecks within the grid
network, storage, cpu and take action without
human intervention. - Secure, robust, fast data transfer
- High level services autonomous replication,
steering of jobs, workflow management (service
flows, data analysis flows) - Creating a robust end 2 end system for physics
analysis - No single point of failure
- Composite services
- Provide a simple access point for the user, while
performing complex tasks behind the scene
6User (scientist) view
- Provide a transparent environment for a physicist
to perform his/her analysis (batch/interactive)
in a distributed dynamic environment Identify
your data (Catalogs), submit your (complex) job
(Scheduling, Workflow,JDL), get fair access to
resources (Priority, Accounting), monitor job
progress (Monitor, Steering), get the results
(Storage, Retrieval), repeat the process and
refine results - I want to share my results/code with a selected
audience! - I want access to data as quickly as possible!
Catalogs
Identify
locate
execute
submit
Scheduler
Farm
store
Simplistic view!
Monitor/steering
Storage
Notify/move
7More users
- System administrators
- Grid operators
8Architecture
Example implementations Associated with GEA
components
ROOT (analysis tool) Python Cojac (detector
viz.)/ IGUANA (cms viz tool)
Analysis Client
- Analysis Clients talk standard protocols to the
Grid Services Web Server, a.k.a. the Clarens
data/services portal. - Simple Web service API allows Analysis Clients
(simple or complex) to operate in this
architecture. - Typical clients ROOT, Web Browser, IGUANA,
COJAC - The Clarens portal hides the complexity of the
Grid Services from the client, but can expose it
in as much detail as reqd for e.g. monitoring. - Key features Global Scheduler, Catalogs,
Monitoring, and Grid-wide Execution service.
Analysis Client
- Discovery,
- Acl management,
- Certificate based access
HTTP, SOAP, XML-RPC
Clarens
Grid Services Web Server
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
Sphinx
RefDB
MCRunjob
Partially- Abstract Planner
Virtual Data
MonALISA
ORCA
Applications
Data Management
Chimera
Monitoring
MOPDB
Replica
FAMOS
Fully- Concrete Planner
BOSS
ROOT
POOL
Grid
Execution Priority Manager
VDT-Server
Grid Wide Execution Service
9Peer 2 Peer System
Discover services
- Allow a Peer-to-Peer configuration to be
built, with associated robustness and scalability
features. - Discovery of Services
- No Single point of failure
- Robust file download
Discover services
Catalog
Discover services
Query for data
Download file
Client
Find service (e.g. Catalog)
10Self Organizing
Trend analysis
Job scheduling
Real time feedback
Steering jobs, job feedback
Autonomous Replica management
Remove
Trend analysis
Trend analysis
Real time feedback
Real time feedback
Replicate
11Development
12 Services Backbone Clarens Service Framework
- X509 Cert based access
- Good Performance
- Access Control Management
- Remote File Access
- Dyanamic Discovery of Services on a Global Scale
- Available in Python and Java
- Easy to install, as root or normal user, and part
of DPE distribution. As root do - wget -q -O - http//hepgrid1.caltech.edu/clarens/s
etup_clump.sh sh - export opkg_root/opt/openpkg
- Interoperability with other web service
environments such as Globus, through SOAP - Interoperability with MonALISA (Publication of
service methods via MonALISA)
Monitoring Clarens parameters
Service publication
13Clarens Cont.
POOL catalog RefDB/PubDB (CERN) Boss Phedex
(CERN) MCrunJob/MopDB (FNAL) Sphinx (Grid
scheduler) (UFL)
Python
dbsvrClarens.clarens_client('http//tier2c.cacr.c
altech.edu8080/clarens/') dbsvr.echo.echo('alive
?') dbsvr.file.size('index.html') dbsvr.file.ls('/
web/system','.html') dbsvr.file.find('//web','
','all') dbsvr.catalog.getMetaDataSpec('cat4') dbs
vr.catalog.queryCatalog('cat4','val1 LIKE
"val"','meta') dbsvr.refdb.listApplications('cms
',0,20,'Simulation')
Web server
Grid Portal Secure cert-based access to
services through browser
http/https
- Java client
- ROOT (analysis tool)
- IGUANA (CMS viz. tool)
- ROOT-CAVES client (analysis sharing tool)
- CLASH (Giulio)
- any app that can make XML-RPC/SOAP calls
other servers
14Clarens Grid Portals
PDA
Job execution
Catalog service
Collaborative Analysis Destop
15MonALISA Integration
Query repositories for monitor information
Gather and publish access patterns on collections
of data
Publish web services information for discovery in
other distribution systems
16SPHINX Grid scheduler
- Data warehouse
- Policies, account information, grid weather,
resource properties and status, request tracking,
workflows - Control process
- Finite state machine
- Different modules modify jobs, graphs, workflow
- Flexible
- Exstensible
Sphinx Server
VDT Client
Sphinx Client
Chimera Virtual Data System
Clarens
- Simple sanity checks
- 120 canonical virtual data workflows submitted to
US-CMS Grid - Round-robin strategy
- Equally distribute work to all sites
- Upper-limit strategy
- Makes use of global information (site capacity)
- Throttle jobs using just-in-time planning
- 40 better throughput (given grid topology)
Request Processing
WS Backbone
Condor-G/DAGMan
Data Warehouse
Data Management
VDT Server Site
Globus Resource
Information Gathering
Replica Location Service
MonALISA Monitoring Service
17Services
PubDB
(Provenance catalogCERN
POOL catalog
(Data catalog) Caltech/CERN
BOSS
(Job submission) CERN/Caltech/INFN
(Transfer catalog) CERN
TMDB
Refdb
(Data, prvenance catalog)CERN/caltech
Monte carlo processing service
FNAL/Caltech
(Monte carlog)FNAL
MOPDB
(Monte carlo production) FNAL
MCrunjob
UFL
Codesh
Sphinx
(Scheduling) UFL
(Monitoring) Caltech
MonaLisa
SRM
Storage resource management
Service discovery
SRB
GROSS
ACL management
Physics analysis job submission
VO management
on wish list to become a service or to
interoperate with this service
accessible through a web service
has javascript front end
File access
service being developed
Clarens core service
18Deployment
19The new Clarens distributions register
automatically with MonaLisa. (Notice there are
several entries for the same server representing
different protocols)
http//monalisa.cacr.caltech.edu/
GAE testbed
CACR
CERN
Pakistan
Approximately 20 installations world wide
UCSD
UK
Conference User!
20Scenario I
21Host 1
Querying for datasets
service
(2) Query for dataset
Grid scheduler
(2) Query for dataset
Host 2
(3) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
runjob
(1) Discover pool catalog, refdb, grid schedulers
Host 3
Host 6
Client
(2) Query for dataset
Host 4
(1) Discover pool catalog, refdb, grid schedulers
(2) Query for dataset
Host 7
Host 5
Refdb (replica)
Client code has no knowledge about location of
services, except for several urls for discovery
services
Multiple clients will query and submit jobs
https//rick.dnsalias.net443/clarens/
22Scheduling Push Model
service
(1) Submit job(s) with dataset(s) for
reconstruction/analysis
Push model has limitations once the system
becomes resource limited
(3) Submit job(s)
Scheduler
(2) Query resource status
(2) Query resource status
(2) Query resource status
(2) Query resource status
Farm 4
Uniform job submission layer
BOSS
https//rick.dnsalias.net443/clarens/
PBS
23Scheduling Pull Model
service
(1) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
(3) pull job(s)
Queue
(2) Resources are available, give me a job
(2) Resources are available, give me a job
Combining push and pull to get better scalability
Farm 4
Uniform job submission layer
BOSS
PBS
24Scenario II
25Client code and global manager have no knowledge
about location of services, except for several
urls for discovery services
Similarity with other approaches (PEAC)
(1) Discover a global manager
(7) Submit job(s)
service
discover
client
(2) Request session (dataset)
(3) Discover catalog service
Multiple clients query and submit jobs
(4) Get list of farms that have this dataset
(6) Allocate time
(5) Reserve process time
(5) Reserve process time
(5) Reserve process time
(10) Alive signal during processing
(9) Data ready?
(7) Report access statistics to MonaLisa
(9) Data moved
(7) Move data to nodes
Job
(8) Create job
26Peac test run with MonaLisa
27Scenario III
28Future Work
- Migrate javascript interface to java applets
- Develop/deploy accounting service (ppdg activity)
- Job steering service
- Autonomous replication
- Trend analysis using monitor data
- Integrate/interoperability mass storage (e.g.
SRM) applications into/with Clarens environment
29GAE Pointers
- GAE web page http//ultralight.caltech.edu/gaeweb
/ - Clarens web page http//clarens.sourceforge.net
- Service descriptions http//hepgrid1.caltech.edu/
GAE/services/ - MonaLisa http//monalisa.cacr.caltech.edu/
- SPHINX http//www.griphyn.org/sphinx/Research/res
earch.php