Grid Enabled Analysis - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Grid Enabled Analysis

Description:

Online System. CERN Center. PBs of Disk; Tape Robot. FNAL Center. IN2P3 Center. INFN Center ... PBS (1) Submit orca/root job(s) with dataset(s) for ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 30
Provided by: julia71
Category:
Tags: analysis | enabled | grid

less

Transcript and Presenter's Notes

Title: Grid Enabled Analysis


1
Grid Enabled Analysis
  • Conrad Steenberg (conrad_at_hep.caltech.edu)
  • Frank van Lingen (fvlingen_at_caltech.edu)

2
Physics analysis
3
LHC Data Grid Hierarchy
cms144 institutes 2000 users
PByte/sec
100-1500 MBytes/sec
CERN Center PBs of Disk Tape Robot
10 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
10 Gbps
2.5-10 Gbps
Institute
Institute
Institute
Institute
Physics data cache
0.1 to 10 Gbps
Tier 4
Workstations
Needle in the haystack problem
4
ARCHITECTURE
5
System view
  • Suport 100-1000s of analysis-production tasks
  • Batch, interactive (interactive should be really
    interactive!)
  • Chaotic behavior. (not only production like
    workflow)
  • Resource limited and policy constrained
  • Who is allowed to access what resources?
  • Real time monitoring and trend analysis
  • Workflow tracking, and data provenance
  • Collaborate on analysis (country, world wide)
  • Provide secure access to data and resources
  • Self organizing (prevent the 100 system
    administrators nightmare)
  • Detection of bottlenecks within the grid
    network, storage, cpu and take action without
    human intervention.
  • Secure, robust, fast data transfer
  • High level services autonomous replication,
    steering of jobs, workflow management (service
    flows, data analysis flows)
  • Creating a robust end 2 end system for physics
    analysis
  • No single point of failure
  • Composite services
  • Provide a simple access point for the user, while
    performing complex tasks behind the scene

6
User (scientist) view
  • Provide a transparent environment for a physicist
    to perform his/her analysis (batch/interactive)
    in a distributed dynamic environment Identify
    your data (Catalogs), submit your (complex) job
    (Scheduling, Workflow,JDL), get fair access to
    resources (Priority, Accounting), monitor job
    progress (Monitor, Steering), get the results
    (Storage, Retrieval), repeat the process and
    refine results
  • I want to share my results/code with a selected
    audience!
  • I want access to data as quickly as possible!

Catalogs
Identify
locate
execute
submit
Scheduler
Farm
store
Simplistic view!
Monitor/steering
Storage
Notify/move
7
More users
  • System administrators
  • Grid operators

8
Architecture
Example implementations Associated with GEA
components
ROOT (analysis tool) Python Cojac (detector
viz.)/ IGUANA (cms viz tool)
Analysis Client
  • Analysis Clients talk standard protocols to the
    Grid Services Web Server, a.k.a. the Clarens
    data/services portal.
  • Simple Web service API allows Analysis Clients
    (simple or complex) to operate in this
    architecture.
  • Typical clients ROOT, Web Browser, IGUANA,
    COJAC
  • The Clarens portal hides the complexity of the
    Grid Services from the client, but can expose it
    in as much detail as reqd for e.g. monitoring.
  • Key features Global Scheduler, Catalogs,
    Monitoring, and Grid-wide Execution service.

Analysis Client
  • Discovery,
  • Acl management,
  • Certificate based access

HTTP, SOAP, XML-RPC
Clarens
Grid Services Web Server
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
Sphinx
RefDB
MCRunjob
Partially- Abstract Planner
Virtual Data
MonALISA
ORCA
Applications
Data Management
Chimera
Monitoring
MOPDB
Replica
FAMOS
Fully- Concrete Planner
BOSS
ROOT
POOL
Grid
Execution Priority Manager
VDT-Server
Grid Wide Execution Service
9
Peer 2 Peer System
Discover services
  • Allow a Peer-to-Peer configuration to be
    built, with associated robustness and scalability
    features.
  • Discovery of Services
  • No Single point of failure
  • Robust file download

Discover services
Catalog
Discover services
Query for data
Download file
Client
Find service (e.g. Catalog)
10
Self Organizing
Trend analysis
Job scheduling
Real time feedback
Steering jobs, job feedback
Autonomous Replica management
Remove
Trend analysis
Trend analysis
Real time feedback
Real time feedback
Replicate
11
Development
12
Services Backbone Clarens Service Framework
  • X509 Cert based access
  • Good Performance
  • Access Control Management
  • Remote File Access
  • Dyanamic Discovery of Services on a Global Scale
  • Available in Python and Java
  • Easy to install, as root or normal user, and part
    of DPE distribution. As root do
  • wget -q -O - http//hepgrid1.caltech.edu/clarens/s
    etup_clump.sh sh
  • export opkg_root/opt/openpkg
  • Interoperability with other web service
    environments such as Globus, through SOAP
  • Interoperability with MonALISA (Publication of
    service methods via MonALISA)

Monitoring Clarens parameters
Service publication
13
Clarens Cont.
POOL catalog RefDB/PubDB (CERN) Boss Phedex
(CERN) MCrunJob/MopDB (FNAL) Sphinx (Grid
scheduler) (UFL)
Python
dbsvrClarens.clarens_client('http//tier2c.cacr.c
altech.edu8080/clarens/') dbsvr.echo.echo('alive
?') dbsvr.file.size('index.html') dbsvr.file.ls('/
web/system','.html') dbsvr.file.find('//web','
','all') dbsvr.catalog.getMetaDataSpec('cat4') dbs
vr.catalog.queryCatalog('cat4','val1 LIKE
"val"','meta') dbsvr.refdb.listApplications('cms
',0,20,'Simulation')
Web server
Grid Portal Secure cert-based access to
services through browser
http/https
  • Java client
  • ROOT (analysis tool)
  • IGUANA (CMS viz. tool)
  • ROOT-CAVES client (analysis sharing tool)
  • CLASH (Giulio)
  • any app that can make XML-RPC/SOAP calls

other servers
14
Clarens Grid Portals
PDA
Job execution
Catalog service
Collaborative Analysis Destop
15
MonALISA Integration
Query repositories for monitor information
Gather and publish access patterns on collections
of data
Publish web services information for discovery in
other distribution systems
16
SPHINX Grid scheduler
  • Data warehouse
  • Policies, account information, grid weather,
    resource properties and status, request tracking,
    workflows
  • Control process
  • Finite state machine
  • Different modules modify jobs, graphs, workflow
  • Flexible
  • Exstensible

Sphinx Server
VDT Client
Sphinx Client
Chimera Virtual Data System
Clarens
  • Simple sanity checks
  • 120 canonical virtual data workflows submitted to
    US-CMS Grid
  • Round-robin strategy
  • Equally distribute work to all sites
  • Upper-limit strategy
  • Makes use of global information (site capacity)
  • Throttle jobs using just-in-time planning
  • 40 better throughput (given grid topology)

Request Processing
WS Backbone
Condor-G/DAGMan
Data Warehouse
Data Management
VDT Server Site
Globus Resource
Information Gathering
Replica Location Service
MonALISA Monitoring Service
17
Services
PubDB
(Provenance catalogCERN
POOL catalog
(Data catalog) Caltech/CERN
BOSS
(Job submission) CERN/Caltech/INFN
(Transfer catalog) CERN
TMDB
Refdb
(Data, prvenance catalog)CERN/caltech
Monte carlo processing service
FNAL/Caltech
(Monte carlog)FNAL
MOPDB
(Monte carlo production) FNAL
MCrunjob
UFL
Codesh
Sphinx
(Scheduling) UFL
(Monitoring) Caltech
MonaLisa
SRM
Storage resource management
Service discovery
SRB
GROSS
ACL management
Physics analysis job submission
VO management
on wish list to become a service or to
interoperate with this service
accessible through a web service
has javascript front end
File access
service being developed
Clarens core service
18
Deployment
19
The new Clarens distributions register
automatically with MonaLisa. (Notice there are
several entries for the same server representing
different protocols)
http//monalisa.cacr.caltech.edu/
GAE testbed
CACR
CERN
Pakistan
Approximately 20 installations world wide
UCSD
UK
Conference User!
20
Scenario I
21
Host 1
Querying for datasets
service
(2) Query for dataset
Grid scheduler
(2) Query for dataset
Host 2
(3) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
runjob
(1) Discover pool catalog, refdb, grid schedulers
Host 3
Host 6
Client
(2) Query for dataset
Host 4
(1) Discover pool catalog, refdb, grid schedulers
(2) Query for dataset
Host 7
Host 5
Refdb (replica)
Client code has no knowledge about location of
services, except for several urls for discovery
services
Multiple clients will query and submit jobs
https//rick.dnsalias.net443/clarens/
22
Scheduling Push Model
service
(1) Submit job(s) with dataset(s) for
reconstruction/analysis
Push model has limitations once the system
becomes resource limited
(3) Submit job(s)
Scheduler
(2) Query resource status
(2) Query resource status
(2) Query resource status
(2) Query resource status
Farm 4
Uniform job submission layer
BOSS
https//rick.dnsalias.net443/clarens/
PBS
23
Scheduling Pull Model
service
(1) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
(3) pull job(s)
Queue
(2) Resources are available, give me a job
(2) Resources are available, give me a job
Combining push and pull to get better scalability
Farm 4
Uniform job submission layer
BOSS
PBS
24
Scenario II
25
Client code and global manager have no knowledge
about location of services, except for several
urls for discovery services
Similarity with other approaches (PEAC)
(1) Discover a global manager
(7) Submit job(s)
service
discover
client
(2) Request session (dataset)
(3) Discover catalog service
Multiple clients query and submit jobs
(4) Get list of farms that have this dataset
(6) Allocate time
(5) Reserve process time
(5) Reserve process time
(5) Reserve process time
(10) Alive signal during processing
(9) Data ready?
(7) Report access statistics to MonaLisa
(9) Data moved
(7) Move data to nodes
Job
(8) Create job
26
Peac test run with MonaLisa
27
Scenario III
28
Future Work
  • Migrate javascript interface to java applets
  • Develop/deploy accounting service (ppdg activity)
  • Job steering service
  • Autonomous replication
  • Trend analysis using monitor data
  • Integrate/interoperability mass storage (e.g.
    SRM) applications into/with Clarens environment

29
GAE Pointers
  • GAE web page http//ultralight.caltech.edu/gaeweb
    /
  • Clarens web page http//clarens.sourceforge.net
  • Service descriptions http//hepgrid1.caltech.edu/
    GAE/services/
  • MonaLisa http//monalisa.cacr.caltech.edu/
  • SPHINX http//www.griphyn.org/sphinx/Research/res
    earch.php
Write a Comment
User Comments (0)
About PowerShow.com