Grid Enabled Analysis - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Grid Enabled Analysis

Description:

Online System. CERN Center. PBs of Disk; Tape Robot. FNAL Center. IN2P3 Center. INFN Center ... PBS (1) Submit orca/root job(s) with dataset(s) for ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 30

Provided by: julia71

Category:

more less

Transcript and Presenter's Notes

Title: Grid Enabled Analysis

1
Grid Enabled Analysis

Conrad Steenberg (conrad_at_hep.caltech.edu)
Frank van Lingen (fvlingen_at_caltech.edu)

2
Physics analysis
3
LHC Data Grid Hierarchy
cms144 institutes 2000 users
PByte/sec
100-1500 MBytes/sec
CERN Center PBs of Disk Tape Robot
10 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
10 Gbps
2.5-10 Gbps
Institute
Institute
Institute
Institute
Physics data cache
0.1 to 10 Gbps
Tier 4
Workstations
Needle in the haystack problem
4
ARCHITECTURE
5
System view

Suport 100-1000s of analysis-production tasks
Batch, interactive (interactive should be really
interactive!)
Chaotic behavior. (not only production like
workflow)
Resource limited and policy constrained
Who is allowed to access what resources?
Real time monitoring and trend analysis
Workflow tracking, and data provenance
Collaborate on analysis (country, world wide)
Provide secure access to data and resources
Self organizing (prevent the 100 system
administrators nightmare)
Detection of bottlenecks within the grid
network, storage, cpu and take action without
human intervention.
Secure, robust, fast data transfer
High level services autonomous replication,
steering of jobs, workflow management (service
flows, data analysis flows)
Creating a robust end 2 end system for physics
analysis
No single point of failure
Composite services
Provide a simple access point for the user, while
performing complex tasks behind the scene

6
User (scientist) view

Provide a transparent environment for a physicist
to perform his/her analysis (batch/interactive)
in a distributed dynamic environment Identify
your data (Catalogs), submit your (complex) job
(Scheduling, Workflow,JDL), get fair access to
resources (Priority, Accounting), monitor job
progress (Monitor, Steering), get the results
(Storage, Retrieval), repeat the process and
refine results
I want to share my results/code with a selected
audience!
I want access to data as quickly as possible!

Catalogs
Identify
locate
execute
submit
Scheduler
Farm
store
Simplistic view!
Monitor/steering
Storage
Notify/move
7
More users

System administrators
Grid operators

8
Architecture
Example implementations Associated with GEA
components
ROOT (analysis tool) Python Cojac (detector
viz.)/ IGUANA (cms viz tool)
Analysis Client

Analysis Clients talk standard protocols to the
Grid Services Web Server, a.k.a. the Clarens
data/services portal.
Simple Web service API allows Analysis Clients
(simple or complex) to operate in this
architecture.
Typical clients ROOT, Web Browser, IGUANA,
COJAC
The Clarens portal hides the complexity of the
Grid Services from the client, but can expose it
in as much detail as reqd for e.g. monitoring.
Key features Global Scheduler, Catalogs,
Monitoring, and Grid-wide Execution service.

Analysis Client

Discovery,
Acl management,
Certificate based access

HTTP, SOAP, XML-RPC
Clarens
Grid Services Web Server
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
Sphinx
RefDB
MCRunjob
Partially- Abstract Planner
Virtual Data
MonALISA
ORCA
Applications
Data Management
Chimera
Monitoring
MOPDB
Replica
FAMOS
Fully- Concrete Planner
BOSS
ROOT
POOL
Grid
Execution Priority Manager
VDT-Server
Grid Wide Execution Service
9
Peer 2 Peer System
Discover services

Allow a Peer-to-Peer configuration to be
built, with associated robustness and scalability
features.
Discovery of Services
No Single point of failure
Robust file download

Discover services
Catalog
Discover services
Query for data
Download file
Client
Find service (e.g. Catalog)
10
Self Organizing
Trend analysis
Job scheduling
Real time feedback
Steering jobs, job feedback
Autonomous Replica management
Remove
Trend analysis
Trend analysis
Real time feedback
Real time feedback
Replicate
11
Development
12
Services Backbone Clarens Service Framework

X509 Cert based access
Good Performance
Access Control Management
Remote File Access
Dyanamic Discovery of Services on a Global Scale
Available in Python and Java
Easy to install, as root or normal user, and part
of DPE distribution. As root do
wget -q -O - http//hepgrid1.caltech.edu/clarens/s
etup_clump.sh sh
export opkg_root/opt/openpkg
Interoperability with other web service
environments such as Globus, through SOAP
Interoperability with MonALISA (Publication of
service methods via MonALISA)

Monitoring Clarens parameters
Service publication
13
Clarens Cont.
POOL catalog RefDB/PubDB (CERN) Boss Phedex
(CERN) MCrunJob/MopDB (FNAL) Sphinx (Grid
scheduler) (UFL)
Python
dbsvrClarens.clarens_client('http//tier2c.cacr.c
altech.edu8080/clarens/') dbsvr.echo.echo('alive
?') dbsvr.file.size('index.html') dbsvr.file.ls('/
web/system','.html') dbsvr.file.find('//web','
','all') dbsvr.catalog.getMetaDataSpec('cat4') dbs
vr.catalog.queryCatalog('cat4','val1 LIKE
"val"','meta') dbsvr.refdb.listApplications('cms
',0,20,'Simulation')
Web server
Grid Portal Secure cert-based access to
services through browser
http/https

Java client
ROOT (analysis tool)
IGUANA (CMS viz. tool)
ROOT-CAVES client (analysis sharing tool)
CLASH (Giulio)
any app that can make XML-RPC/SOAP calls

other servers
14
Clarens Grid Portals
PDA
Job execution
Catalog service
Collaborative Analysis Destop
15
MonALISA Integration
Query repositories for monitor information
Gather and publish access patterns on collections
of data
Publish web services information for discovery in
other distribution systems
16
SPHINX Grid scheduler

Data warehouse
Policies, account information, grid weather,
resource properties and status, request tracking,
workflows
Control process
Finite state machine
Different modules modify jobs, graphs, workflow
Flexible
Exstensible

Sphinx Server
VDT Client
Sphinx Client
Chimera Virtual Data System
Clarens

Simple sanity checks
120 canonical virtual data workflows submitted to
US-CMS Grid
Round-robin strategy
Equally distribute work to all sites
Upper-limit strategy
Makes use of global information (site capacity)
Throttle jobs using just-in-time planning
40 better throughput (given grid topology)

Request Processing
WS Backbone
Condor-G/DAGMan
Data Warehouse
Data Management
VDT Server Site
Globus Resource
Information Gathering
Replica Location Service
MonALISA Monitoring Service
17
Services
PubDB
(Provenance catalogCERN
POOL catalog
(Data catalog) Caltech/CERN
BOSS
(Job submission) CERN/Caltech/INFN
(Transfer catalog) CERN
TMDB
Refdb
(Data, prvenance catalog)CERN/caltech
Monte carlo processing service
FNAL/Caltech
(Monte carlog)FNAL
MOPDB
(Monte carlo production) FNAL
MCrunjob
UFL
Codesh
Sphinx
(Scheduling) UFL
(Monitoring) Caltech
MonaLisa
SRM
Storage resource management
Service discovery
SRB
GROSS
ACL management
Physics analysis job submission
VO management
on wish list to become a service or to
interoperate with this service
accessible through a web service
has javascript front end
File access
service being developed
Clarens core service
18
Deployment
19
The new Clarens distributions register
automatically with MonaLisa. (Notice there are
several entries for the same server representing
different protocols)
http//monalisa.cacr.caltech.edu/
GAE testbed
CACR
CERN
Pakistan
Approximately 20 installations world wide
UCSD
UK
Conference User!
20
Scenario I
21
Host 1
Querying for datasets
service
(2) Query for dataset
Grid scheduler
(2) Query for dataset
Host 2
(3) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
runjob
(1) Discover pool catalog, refdb, grid schedulers
Host 3
Host 6
Client
(2) Query for dataset
Host 4
(1) Discover pool catalog, refdb, grid schedulers
(2) Query for dataset
Host 7
Host 5
Refdb (replica)
Client code has no knowledge about location of
services, except for several urls for discovery
services
Multiple clients will query and submit jobs
https//rick.dnsalias.net443/clarens/
22
Scheduling Push Model
service
(1) Submit job(s) with dataset(s) for
reconstruction/analysis
Push model has limitations once the system
becomes resource limited
(3) Submit job(s)
Scheduler
(2) Query resource status
(2) Query resource status
(2) Query resource status
(2) Query resource status
Farm 4
Uniform job submission layer
BOSS
https//rick.dnsalias.net443/clarens/
PBS
23
Scheduling Pull Model
service
(1) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
(3) pull job(s)
Queue
(2) Resources are available, give me a job
(2) Resources are available, give me a job
Combining push and pull to get better scalability
Farm 4
Uniform job submission layer
BOSS
PBS
24
Scenario II
25
Client code and global manager have no knowledge
about location of services, except for several
urls for discovery services
Similarity with other approaches (PEAC)
(1) Discover a global manager
(7) Submit job(s)
service
discover
client
(2) Request session (dataset)
(3) Discover catalog service
Multiple clients query and submit jobs
(4) Get list of farms that have this dataset
(6) Allocate time
(5) Reserve process time
(5) Reserve process time
(5) Reserve process time
(10) Alive signal during processing
(9) Data ready?
(7) Report access statistics to MonaLisa
(9) Data moved
(7) Move data to nodes
Job
(8) Create job
26
Peac test run with MonaLisa
27
Scenario III
28
Future Work

Migrate javascript interface to java applets
Develop/deploy accounting service (ppdg activity)
Job steering service
Autonomous replication
Trend analysis using monitor data
Integrate/interoperability mass storage (e.g.
SRM) applications into/with Clarens environment

29
GAE Pointers