CMS on the Grid

About This Presentation

Title:

CMS on the Grid

Description:

Make best use of new IT technologies. Increased demand of both ... Development and deployment of a data distributed processing system (Hardware & Software) ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 46

Provided by: ygap

Category:

more less

Transcript and Presenter's Notes

Title: CMS on the Grid

1
CMS on the Grid
Toward a fully distributed Physics Analysis

Vincenzo Innocente
CERN/EP

2
Challenges Complexity

Detector
2 orders of magnitude more channels than today
Triggers must choose correctly only 1 event in
every 400,000
Level 23 triggers are software-based (must be of
highest quality)

Computer resources will not be available in a
single location
3
Challenges Geographical Spread

1700 Physicists
150 Institutes
32 Countries
CERN state 55
NMS 45

Major challenges associated with
Communication and collaboration at a distance
Distributed computing resources
Remote software development and physics analysis

4
Challenges b physics

Typically the subject of thesis and work of small
groups in university already today
150 physicists in CMS Heavy-flavor group
gt 40 institutions involved
Often requires precise and specialized algorithms
for vertex-reconstruction and particle
identification
Most of CMS triggered events include B particles
High level software triggers select exclusive
channels in events triggered in hardware using
inclusive conditions
Objectives
Allow remote physicists to access detailed
event-information
Migrate effectively reconstruction and selection
algorithms to HTL

5
HEP Experiment-Data Analysis
Quasi-online Reconstruction
Environmental data
Detector Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Object Formatter
Request part of event
store

Persistent Object Store Manager
Database Management System
Store rec-Obj and calibrations
Physics Paper
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation
User Analysis on demand
6
Analysis Model

Hierarchy of Processes (Experiment, Analysis
Groups, Individuals)

3000 SI95sec/event 1 job year
3000 SI95sec/event 3 jobs per year
Reconstruction
Experiment- Wide Activity (109 events)
Re-processing 3 per year
New detector calibrations Or understanding
5000 SI95sec/event
25 SI95sec/event 20 jobs per month
Monte Carlo
Trigger based and Physics based refinements
Iterative selection Once per month
Selection
20 Groups Activity (109 ? 107 events)
10 SI95sec/event 500 jobs per day
25 Individual per Group Activity (106 108
events)
Different Physics cuts MC comparison 1 time
per day
Analysis
Algorithms applied to data to get results
7
Data handling baseline

CMS computing in year 2007
data model
typical objects 1KB-1MB
3 PB of storage space
10,000 CPUs
31 sites 1 tier05 tier125 tier2 all over the
world
I/O rates disk-gtCPU 10,000 MB/s, average 1
MB/s/CPU
RAW-gtESD generation 0.2 MB/s I/O
/ CPU
ESD-gtAOD generation 5 MB/s I/O
/ CPU
AOD analysis into histos 0.2 MB/s
I/O / CPU
DPD generation from AOD and ESD 10 MB/s I/O /
CPU
Wide-area I/O capacity order of 700 MByte/s
aggregate over all payload intercontinental
TCP/IP streams
This implies a system with heavy reliance on
access to site-local (cached) data

8
Prototype Computing Installation (T0/T1)
9
Scalability, regional centres

CMS computing in year 2007
Object data model, typical objects 1KB-1MB
3 PB of storage space
10,000 CPUs
Regional centres 31 sites1 tier0 5 tier1 25
tier2 all over the world
I/O rates disk-gtCPU 10,000 MB/s, average 1
MB/s/CPU just to keep CPUs busy
Wide-area I/O capacity order of 700 MByte/s
aggregate over all payload intercontinental
TCP/IP streams
This implies a distributed system with heavy
reliance on access to site-local (cached) data
Natural match for Grid technology

10
Analysis Environments

Real Time Event Filtering and Monitoring
Data driven pipeline
High reliability
Pre-emptive Simulation, Reconstruction and Event
Classification
Massive parallel batch-sequential process
Excellent error recovery and rollback mechanisms
Excellent scheduling and bookkeeping systems
Interactive Statistical Analysis
Rapid Application Development environment
Excellent visualization and browsing tools
Human readable navigation

11
Different challenges

Centralized quasi-online processing
Keep-up with the rate
Validate and distribute data efficiently
Distributed organized processing
Automatization
Interactive chaotic analysis
Efficient access to data and Metadata
Management of private data

12
Migration

Today Nobel price becomes trigger for tomorrow
(and background the day after)
Boundaries between running environments are fuzzy
Physics Analysis algorithms should migrate up
to the online to make the trigger more selective
Robust batch systems should be made available for
physics analysis of large data sample
The result of offline calibrations should be fed
back to online to make the trigger more efficient

13
The Final Challenge

Beyond the interactive analysis tool (User point
of view)
Data analysis presentation N-tuples,
histograms, fitting, plotting,
A great range of other activities with fuzzy
boundaries (Developer point of view)
Batch
Interactive from pointy-clicky to Emacs-like
power tool to scripting
Setting up configuration management tools,
application frameworks and reconstruction
packages
Data store operations Replicating entire data
stores Copying runs, events, event parts between
stores Not just copying but also doing something
more complicatedfiltering, reconstruction,
analysis,
Browsing data stores down to object detail level
2D and 3D visualisation
Moving code across final analysis, reconstruction
and triggers
Today this involves (too) many tools

14
Architecture Overview
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
Software development and installation
Coherent set of basic tools and mechanisms
Consistent User Interface
15
Offline Architecture Requirements at LHC

Bigger Experiment, higher rate, more data
Larger and dispersed user community performing
non trivial queries against a large event store
Make best use of new IT technologies
Increased demand of both flexibility and
coherence
ability to plug-in new algorithms
ability to run the same algorithms in multiple
environments
guarantees of quality and reproducibility
high-performance user-friendliness

16
Requirements on data processing

High efficiency
Processing-sites hardware optimization
Processing-sites software optimization
job structure depends very much on hardware setup
Data quality assurance
Data validation
Data history (job book-keeping)
Automatize
Input data discovery
Crash recovery
Resource monitoring
Identify bottlenecks and fragile components

17
Analysis part

Physics data analysis will be done by 100s of
users
Analysis part is connected to same catalogs
Maintain a global view of all data
Big analysis jobs can use production job handling
mechanisms
Analysis services based on tags

18
Emacs used to edit CMS C plugin to create and
fill histograms
OpenInventor-based display of selected event
Lizard Qt plotter
ANAPHE histogram Extended with pointers to CMS
events
Python shell with Lizard CMS modules
19
Varied components and data flows
Tier 0/1/2
Tier 1/2
Production data flow
TAGs/AODs data flow
Tier 3/4/5
Physics Query flow
User
20
TODAY

Data production and analysis exercises
Granularity (Data Product) Data-Set
Development and deployment of a data distributed
processing system (Hardware Software)
Test and integration of Grid middleware
prototypes
RD on distributed interactive analysis

21
CMS Production 2000-2002
Signal
Zebra files with HITS
HEPEVT ntuples
CMSIM
MC Prod.
MB
Catalog import
ORCA Digitization (merge signal and MB)
Objectivity Database
ORCA ooHit Formatter
Objectivity Database
ORCA Prod.
Catalog import
HLT Algorithms New Reconstructed Objects
Objectivity Database
HLT Grp Databases
Mirrored Dbs (US, Russia, Italy..)
22
Current CMS Production
23
CMS Production stream
Task Application Input Output Output Req. on resources
Task Application non-standard non-standard non-standard Req. on resources
1 Generation Pythia None None Ntuple (static link) Geometry files Storage
2 Simulation CMSIM Ntuple Ntuple FZ file (static link) Geometry files Storage
3 Hit Formatting ORCA H.F. FZ file FZ file DB Shared libs Full CMS env. Storage
4 Digitization ORCA Digi. DB DB DB Shared libs Full CMS env. Storage
5 User analysis ORCA User DB DB Ntuple or root Shared libs Full CMS env. Distributed input
24
Production 2002, Complexity
Number of Regional Centers 11
Number of Computing Centers 21
Number of CPUs 1000
Largest Local Center 176 CPUs
Number of Production Passes for each Dataset(including analysis group processing done by production) 6-8
Number of Files 11,000
Data Size (Not including fz files from Simulation) 17TB
File Transfer by GDMP and by perl Scripts over scp/bbcp 7TB toward T1 4TB toward T2
25
Spring02 CPU Resources
4.4.02 700 active CPUs plus 400 CPUs to come
Wisconsin
UFL 5
18
Bristol 3
UCSD 3
RAL 6
Caltech 4
Moscow
FNAL 8
10
HIP 1
INFN 18
CERN 15
IN2P3 10
IC 6
26
Current data processing
27
ORCA Db Structure
One CMSIM Job, oo-formatted into multiple Dbs.
For example
FZ File
Few kB/ev
MC Info Container 1
300kB/ev
1 CMSIM Job
ooHit dB's
100kB/ev
Calo/Muon Hits
200kB/ev
Tracker Hits
Multiple sets of ooHits concatenated into
single Db file. For example
MC Info Run1
MC Info Run2
2 GB/file
Concatenated MC Info from N runs.
MC Info Run3..
Physical and logical Db structures diverge...
28
Production center setup

Most critical task is digitization
300 KB per pile-up event
200 pile-up events per signal event ? 60 MB
10 s to digitize 1 full event on a 1 GHz CPU
6 MB / s per CPU (12 MB / s per dual processor
client)
Up to 5 clients per pile-up server ( 60 MB / s
on its network card Gigabit)
Fast disk access

5 clients per server
29
INFN-Legnaro Tier-2 prototype
2001 35 Nodes 70 CPUs 3500 SI95 8 TB
1
8
2
N24
2001-2-3 up to 190 Nodes
N24
N1
N24
N1
N1
FastEth
FastEth
FastEth
SWITCH
SWITCH
SWITCH
To WAN 34 Mbps 2001 155 Mbps 2002
32 GigaEth 1000 BT
2001 11 Servers 1100 SI95 2.5 TB
S16
S1
S11
Sx Disk Server Node Dual PIII 1 GHz Dual PCI
(33/32 66/64) 512 MB 3x75 GB Eide Raid 0-5
disks (exp up to 10) 1x20 GB disk O.S.
Nx Computational Node Dual PIII 1 GHz 512
MB 3x75 GB Eide disk 1x20 GB for O.S.
30
IMPALA

Each step in the production chain is split into 3
sub-steps
Each sub-step is factorized into customizable
functions

JobDeclaration
Search for something to do
JobCreation
Generate jobs from templates
JobSubmission
Submit jobs to the scheduler
31
Job declaration and creation

Jobs to-do are automatically discovered
looking at predefined directory contents for the
Fortran Steps
querying the Objectivity/DB federation for
Digitization, Event Selection, Analysis
Once the to-do list is ready, the site manager
can actually generate instances of jobs starting
from a template
Job execution includes validation of produced data

32
Job submission

Thank to the sub-step decomposition into
customizable functions site managers can
Define local actions to be taken to submit the
job (is there any job scheduler? Which one? How
are the queues organized?)
Define local actions to be taken before and after
the start of the job (is there a tape library?
Need to stage tapes before run?)
Auto-recovery of crashed jobs
When a job is started for the first time its
startup cards are automatically modified so that
if the job is re-started it continues from the
last analyzed event

33
BOSS

Submission of batch jobs to a computing farm
Independency from local scheduler (PBS, LSF,
Condor, etc...)
Persistent storage of job information (in RDB)
Job dependent book-keeping monitor different
information in different job types
(e.g. number of events in input, number of events
in output, version of software used, internal
production software errors, etc)

34
BOSS job submission an running
BOSS
Local Scheduler
boss submit boss query boss kill
BOSS DB

Accepts job submission from users
Stores info about job in a DB
Builds a wrapper around the job (BossExecuter)
Sends the wrapper to the local scheduler
The wrapper sends to the DB info about the job

35
Store info about a job

A registered job has a schema associated to it
with the relevant information to be stored
A table is created in the DB to keep this info.

36
Getting info from the job

A registered job has scripts associated to it
which are able to understand the job output

Users executable
37
Boss Logical Diagram
Job Specification
Executable
Book-keeping definition
submit
Job instrumentation for book-keeping, and
submission
query, kill
SQL UPDATE
Book-keeping DB
SQL SELECT
Book-keeping info retrieval and task modification
SQL UPDATE
submit
Book-keeping info update (MySQL)
query, kill
Filter Interface
submit, kill, query
Scheduler Condor Vanilla, LSF, FBSNG, Grid
Scheduler
Executing Job
38
TOMORROW

Map Data-Sets to Grid Data-Products
Use Grid Security infrastructure Workload
manager
Deploy Grid-enabled portal to interactive
Analysis
Global monitoring of Grid performances and
quality of service

39
Computing

Ramp Production systems 05-07 (30,30,40 of
cost each year)
Match Computing power available with LHC
luminosity

2007 300M Reco ev/mo 200M Re-Reco ev/mo 50k ev/s
Analysis
2006 200M Reco ev/mo 100M Re-Reco ev/mo 30k ev/s
Analysis
40
Toward ONE Grid

Build a unique CMS-GRID framework (EUUS)
EU and US grids not interoperable today. Wait for
help from DataTAG-iVDGL-GLUE
Work in parallel in EU and US
Main US activities
MOP
Virtual Data System
Interactive Analysis
Main EU activities
Integration of IMPALA with EDG WP1WP2 sw.
Batch Analysis user job submission analysis
farm

41
PPDG MOP system

PPDG Developed MOP System
Allows submission of CMS prod. Jobs from a
central location, run on remote locations, and
returnresults
Relies on GDMP for replication
Globus GRAM
Condor-G and local queuing systems for Job
Scheduling
IMPALA for Job Specification

being deployed in USCMS testbed
Proposed as basis for next CMS-wide production
infrastructure

42
(No Transcript)
43
Prototype VDG System (production)
no code
existing
implemented using MOP
44
Globally Scalable Monitoring Service
Push Pull rsh ssh existing scripts snmp
45
Optimisation of Tag Databases

Tags (n-tuple) are small (0.2 - 1 kbyte) summary
objects for each event
Crucial for fast selection of interesting event
subsetsthis will be an intensive activity
Past work concentrated in three main areas
Development of Objectivity based Tags integrated
with the CMS COBRA framework and Lizard
Investigations of Tag bitmap indexing to speed
queries
Comparisons of OO and traditional databases (SQL
Server, Oracle 9i, PostGreSQL) as efficient
stores for Tags
New work concentrates on tag based analysis
services

46
CLARENS a Portal to the Grid

Grid-enabling the working environment for
physicists' data analysis
Clarens consists of a server communicating with
various clients via the commodity XML-RPC
protocol. This ensures implementation
independence.
The server is implemented in C to give access
to the CMS OO analysis toolkit.
The server will provide a remote API to Grid
tools
Security services provided by the Grid (GSI)
The Virtual Data Toolkit Object collection
access
Data movement between Tier centres using GSI-FTP
CMS analysis software (ORCA/COBRA),
Current prototype is running on the Caltech
proto-Tier2
More information at http//clarens.sourceforge.net
, along with a web-based demo

47
Clarens Architecture

Common protocol spoken by all types of clients to
all types of services
Implement service once for all clients
Implement client access to service once for each
client type using common protocol already
implemented for all languages (C, Java,
Fortran, etc. -)
Common protocol is XML-RPC with SOAP close to
working, CORBA doable, but would require
different server above Clarens (uses IIOP, not
HTTP)
Handles authentication using Grid certificates,
connection management, data serialization,
optionally encryption
Implementation uses stable, well-known server
infrastructure (Apache) that is debugged/audited
over a long period by many
Clarens layer itself implemented in Python, but
can be reimplemented in C should performance be
inadequate

48
Clarens Architecture II

Diagram

http/https
Service
Clarens
Web server
RPC
Client
49
Clarens Architecture
Authentication
Authentication
Session initialization
Session initialization
Request deserializing
Request serializing
Request mashalling
Request transmission
Worker code invocation
Worker code invocation
Result deserializing
Result serializing
Session termination
Session termination
50

Clarens is a simple way to implement web services
on the server
Provides some basic connectivity functionality
common to all services
Uses commodity protocols
No Globus needed on client side, only certificate
Simple to implement clients in scripts and
compiled code

51
2007

Sub event components map to Grid Data-Products
Balance of load between Network and CPU
Complete Data and Software base virtually
available at the physicist desktop

52
Simulation, Reconstruction Analysis Software
System
Uploadable on the Grid
Physics modules
Specific Framework
Reconstruction Algorithms
Data Monitoring
Event Filter
Physics Analysis
Grid-enabled Application Framework
Calibration Objects
Event Objects
Configuration Objects
Generic Application Framework
Grid-Aware Data-Products
adapters and extensions
Basic Services
C standard library Extension toolkit
ODBMS
Geant3/4
CLHEP
Paw Replacement
53
Reconstruction on Demand
Compare the results of two different track
reconstruction algorithms
Rec Hits
Detector Element
Rec Hits
Rec Hits
Hits
Event
Rec T1
T1
CaloCl
Rec T2
Analysis
Rec CaloCl
T2
54
Conclusions

Grid is the enabling technology for the effective
deployment of a coherent and consistent data
processing environment
This is the only base for an efficient physics
analysis program at LHC
CMS is engaged in an active development, test and
deployment program of all software and hardware
component that will constitute the future LHC
grid

Write a Comment

User Comments (0)