CMS on the Grid - PowerPoint PPT Presentation

About This Presentation
Title:

CMS on the Grid

Description:

Make best use of new IT technologies. Increased demand of both ... Development and deployment of a data distributed processing system (Hardware & Software) ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 46
Provided by: ygap
Category:
Tags: cms | centers | grid

less

Transcript and Presenter's Notes

Title: CMS on the Grid


1
CMS on the Grid
Toward a fully distributed Physics Analysis
  • Vincenzo Innocente
  • CERN/EP

2
Challenges Complexity
  • Detector
  • 2 orders of magnitude more channels than today
  • Triggers must choose correctly only 1 event in
    every 400,000
  • Level 23 triggers are software-based (must be of
    highest quality)

Computer resources will not be available in a
single location
3
Challenges Geographical Spread
  • 1700 Physicists
  • 150 Institutes
  • 32 Countries
  • CERN state 55
  • NMS 45
  • Major challenges associated with
  • Communication and collaboration at a distance
  • Distributed computing resources
  • Remote software development and physics analysis

4
Challenges b physics
  • Typically the subject of thesis and work of small
    groups in university already today
  • 150 physicists in CMS Heavy-flavor group
  • gt 40 institutions involved
  • Often requires precise and specialized algorithms
    for vertex-reconstruction and particle
    identification
  • Most of CMS triggered events include B particles
  • High level software triggers select exclusive
    channels in events triggered in hardware using
    inclusive conditions
  • Objectives
  • Allow remote physicists to access detailed
    event-information
  • Migrate effectively reconstruction and selection
    algorithms to HTL

5
HEP Experiment-Data Analysis
Quasi-online Reconstruction
Environmental data
Detector Control
Online Monitoring
store
Request part of event
Store rec-Obj
Request part of event
Event Filter Object Formatter
Request part of event
store

Persistent Object Store Manager
Database Management System
Store rec-Obj and calibrations
Physics Paper
store
Request part of event
Data Quality Calibrations Group Analysis
Simulation
User Analysis on demand
6
Analysis Model
  • Hierarchy of Processes (Experiment, Analysis
    Groups, Individuals)

3000 SI95sec/event 1 job year
3000 SI95sec/event 3 jobs per year
Reconstruction
Experiment- Wide Activity (109 events)
Re-processing 3 per year
New detector calibrations Or understanding
5000 SI95sec/event
25 SI95sec/event 20 jobs per month
Monte Carlo
Trigger based and Physics based refinements
Iterative selection Once per month
Selection
20 Groups Activity (109 ? 107 events)
10 SI95sec/event 500 jobs per day
25 Individual per Group Activity (106 108
events)
Different Physics cuts MC comparison 1 time
per day
Analysis
Algorithms applied to data to get results
7
Data handling baseline
  • CMS computing in year 2007
  • data model
  • typical objects 1KB-1MB
  • 3 PB of storage space
  • 10,000 CPUs
  • 31 sites 1 tier05 tier125 tier2 all over the
    world
  • I/O rates disk-gtCPU 10,000 MB/s, average 1
    MB/s/CPU
  • RAW-gtESD generation 0.2 MB/s I/O
    / CPU
  • ESD-gtAOD generation 5 MB/s I/O
    / CPU
  • AOD analysis into histos 0.2 MB/s
    I/O / CPU
  • DPD generation from AOD and ESD 10 MB/s I/O /
    CPU
  • Wide-area I/O capacity order of 700 MByte/s
    aggregate over all payload intercontinental
    TCP/IP streams
  • This implies a system with heavy reliance on
    access to site-local (cached) data

8
Prototype Computing Installation (T0/T1)
9
Scalability, regional centres
  • CMS computing in year 2007
  • Object data model, typical objects 1KB-1MB
  • 3 PB of storage space
  • 10,000 CPUs
  • Regional centres 31 sites1 tier0 5 tier1 25
    tier2 all over the world
  • I/O rates disk-gtCPU 10,000 MB/s, average 1
    MB/s/CPU just to keep CPUs busy
  • Wide-area I/O capacity order of 700 MByte/s
    aggregate over all payload intercontinental
    TCP/IP streams
  • This implies a distributed system with heavy
    reliance on access to site-local (cached) data
  • Natural match for Grid technology

10
Analysis Environments
  • Real Time Event Filtering and Monitoring
  • Data driven pipeline
  • High reliability
  • Pre-emptive Simulation, Reconstruction and Event
    Classification
  • Massive parallel batch-sequential process
  • Excellent error recovery and rollback mechanisms
  • Excellent scheduling and bookkeeping systems
  • Interactive Statistical Analysis
  • Rapid Application Development environment
  • Excellent visualization and browsing tools
  • Human readable navigation

11
Different challenges
  • Centralized quasi-online processing
  • Keep-up with the rate
  • Validate and distribute data efficiently
  • Distributed organized processing
  • Automatization
  • Interactive chaotic analysis
  • Efficient access to data and Metadata
  • Management of private data

12
Migration
  • Today Nobel price becomes trigger for tomorrow
  • (and background the day after)
  • Boundaries between running environments are fuzzy
  • Physics Analysis algorithms should migrate up
    to the online to make the trigger more selective
  • Robust batch systems should be made available for
    physics analysis of large data sample
  • The result of offline calibrations should be fed
    back to online to make the trigger more efficient

13
The Final Challenge
  • Beyond the interactive analysis tool (User point
    of view)
  • Data analysis presentation N-tuples,
    histograms, fitting, plotting,
  • A great range of other activities with fuzzy
    boundaries (Developer point of view)
  • Batch
  • Interactive from pointy-clicky to Emacs-like
    power tool to scripting
  • Setting up configuration management tools,
    application frameworks and reconstruction
    packages
  • Data store operations Replicating entire data
    stores Copying runs, events, event parts between
    stores Not just copying but also doing something
    more complicatedfiltering, reconstruction,
    analysis,
  • Browsing data stores down to object detail level
  • 2D and 3D visualisation
  • Moving code across final analysis, reconstruction
    and triggers
  • Today this involves (too) many tools

14
Architecture Overview
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
Software development and installation
Coherent set of basic tools and mechanisms
Consistent User Interface
15
Offline Architecture Requirements at LHC
  • Bigger Experiment, higher rate, more data
  • Larger and dispersed user community performing
    non trivial queries against a large event store
  • Make best use of new IT technologies
  • Increased demand of both flexibility and
    coherence
  • ability to plug-in new algorithms
  • ability to run the same algorithms in multiple
    environments
  • guarantees of quality and reproducibility
  • high-performance user-friendliness

16
Requirements on data processing
  • High efficiency
  • Processing-sites hardware optimization
  • Processing-sites software optimization
  • job structure depends very much on hardware setup
  • Data quality assurance
  • Data validation
  • Data history (job book-keeping)
  • Automatize
  • Input data discovery
  • Crash recovery
  • Resource monitoring
  • Identify bottlenecks and fragile components

17
Analysis part
  • Physics data analysis will be done by 100s of
    users
  • Analysis part is connected to same catalogs
  • Maintain a global view of all data
  • Big analysis jobs can use production job handling
    mechanisms
  • Analysis services based on tags

18
Emacs used to edit CMS C plugin to create and
fill histograms
OpenInventor-based display of selected event
Lizard Qt plotter
ANAPHE histogram Extended with pointers to CMS
events
Python shell with Lizard CMS modules
19
Varied components and data flows
Tier 0/1/2
Tier 1/2
Production data flow
TAGs/AODs data flow
Tier 3/4/5
Physics Query flow
User
20
TODAY
  • Data production and analysis exercises
  • Granularity (Data Product) Data-Set
  • Development and deployment of a data distributed
    processing system (Hardware Software)
  • Test and integration of Grid middleware
    prototypes
  • RD on distributed interactive analysis

21
CMS Production 2000-2002
Signal
Zebra files with HITS
HEPEVT ntuples
CMSIM
MC Prod.
MB
Catalog import
ORCA Digitization (merge signal and MB)
Objectivity Database
ORCA ooHit Formatter
Objectivity Database
ORCA Prod.
Catalog import
HLT Algorithms New Reconstructed Objects
Objectivity Database
HLT Grp Databases
Mirrored Dbs (US, Russia, Italy..)
22
Current CMS Production
23
CMS Production stream
Task Application Input Output Output Req. on resources
Task Application non-standard non-standard non-standard Req. on resources
1 Generation Pythia None None Ntuple (static link) Geometry files Storage
2 Simulation CMSIM Ntuple Ntuple FZ file (static link) Geometry files Storage
3 Hit Formatting ORCA H.F. FZ file FZ file DB Shared libs Full CMS env. Storage
4 Digitization ORCA Digi. DB DB DB Shared libs Full CMS env. Storage
5 User analysis ORCA User DB DB Ntuple or root Shared libs Full CMS env. Distributed input
24
Production 2002, Complexity
Number of Regional Centers 11
Number of Computing Centers 21
Number of CPUs 1000
Largest Local Center 176 CPUs
Number of Production Passes for each Dataset(including analysis group processing done by production) 6-8
Number of Files 11,000
Data Size (Not including fz files from Simulation) 17TB
File Transfer by GDMP and by perl Scripts over scp/bbcp 7TB toward T1 4TB toward T2
25
Spring02 CPU Resources
4.4.02 700 active CPUs plus 400 CPUs to come
Wisconsin
UFL 5
18
Bristol 3
UCSD 3
RAL 6
Caltech 4
Moscow
FNAL 8
10
HIP 1
INFN 18
CERN 15
IN2P3 10
IC 6
26
Current data processing
27
ORCA Db Structure
One CMSIM Job, oo-formatted into multiple Dbs.
For example
FZ File
Few kB/ev
MC Info Container 1
300kB/ev
1 CMSIM Job
ooHit dB's
100kB/ev
Calo/Muon Hits
200kB/ev
Tracker Hits
Multiple sets of ooHits concatenated into
single Db file. For example
MC Info Run1
MC Info Run2
2 GB/file
Concatenated MC Info from N runs.
MC Info Run3..
Physical and logical Db structures diverge...
28
Production center setup
  • Most critical task is digitization
  • 300 KB per pile-up event
  • 200 pile-up events per signal event ? 60 MB
  • 10 s to digitize 1 full event on a 1 GHz CPU
  • 6 MB / s per CPU (12 MB / s per dual processor
    client)
  • Up to 5 clients per pile-up server ( 60 MB / s
    on its network card Gigabit)
  • Fast disk access

5 clients per server
29
INFN-Legnaro Tier-2 prototype
2001 35 Nodes 70 CPUs 3500 SI95 8 TB
1
8
2
N24
2001-2-3 up to 190 Nodes
N24
N1
N24
N1
N1
FastEth
FastEth
FastEth
SWITCH
SWITCH
SWITCH
To WAN 34 Mbps 2001 155 Mbps 2002
32 GigaEth 1000 BT
2001 11 Servers 1100 SI95 2.5 TB
S16
S1
S11
Sx Disk Server Node Dual PIII 1 GHz Dual PCI
(33/32 66/64) 512 MB 3x75 GB Eide Raid 0-5
disks (exp up to 10) 1x20 GB disk O.S.
Nx Computational Node Dual PIII 1 GHz 512
MB 3x75 GB Eide disk 1x20 GB for O.S.
30
IMPALA
  • Each step in the production chain is split into 3
    sub-steps
  • Each sub-step is factorized into customizable
    functions

JobDeclaration
Search for something to do
JobCreation
Generate jobs from templates
JobSubmission
Submit jobs to the scheduler
31
Job declaration and creation
  • Jobs to-do are automatically discovered
  • looking at predefined directory contents for the
    Fortran Steps
  • querying the Objectivity/DB federation for
    Digitization, Event Selection, Analysis
  • Once the to-do list is ready, the site manager
    can actually generate instances of jobs starting
    from a template
  • Job execution includes validation of produced data

32
Job submission
  • Thank to the sub-step decomposition into
    customizable functions site managers can
  • Define local actions to be taken to submit the
    job (is there any job scheduler? Which one? How
    are the queues organized?)
  • Define local actions to be taken before and after
    the start of the job (is there a tape library?
    Need to stage tapes before run?)
  • Auto-recovery of crashed jobs
  • When a job is started for the first time its
    startup cards are automatically modified so that
    if the job is re-started it continues from the
    last analyzed event

33
BOSS
  • Submission of batch jobs to a computing farm
  • Independency from local scheduler (PBS, LSF,
    Condor, etc...)
  • Persistent storage of job information (in RDB)
  • Job dependent book-keeping monitor different
    information in different job types
  • (e.g. number of events in input, number of events
    in output, version of software used, internal
    production software errors, etc)

34
BOSS job submission an running
BOSS
Local Scheduler
boss submit boss query boss kill
BOSS DB
  • Accepts job submission from users
  • Stores info about job in a DB
  • Builds a wrapper around the job (BossExecuter)
  • Sends the wrapper to the local scheduler
  • The wrapper sends to the DB info about the job

35
Store info about a job
  • A registered job has a schema associated to it
    with the relevant information to be stored
  • A table is created in the DB to keep this info.

36
Getting info from the job
  • A registered job has scripts associated to it
    which are able to understand the job output

Users executable
37
Boss Logical Diagram
Job Specification
Executable
Book-keeping definition
submit
Job instrumentation for book-keeping, and
submission
query, kill
SQL UPDATE
Book-keeping DB
SQL SELECT
Book-keeping info retrieval and task modification
SQL UPDATE
submit
Book-keeping info update (MySQL)
query, kill
Filter Interface
submit, kill, query
Scheduler Condor Vanilla, LSF, FBSNG, Grid
Scheduler
Executing Job
38
TOMORROW
  • Map Data-Sets to Grid Data-Products
  • Use Grid Security infrastructure Workload
    manager
  • Deploy Grid-enabled portal to interactive
    Analysis
  • Global monitoring of Grid performances and
    quality of service

39
Computing
  • Ramp Production systems 05-07 (30,30,40 of
    cost each year)
  • Match Computing power available with LHC
    luminosity

2007 300M Reco ev/mo 200M Re-Reco ev/mo 50k ev/s
Analysis
2006 200M Reco ev/mo 100M Re-Reco ev/mo 30k ev/s
Analysis
40
Toward ONE Grid
  • Build a unique CMS-GRID framework (EUUS)
  • EU and US grids not interoperable today. Wait for
    help from DataTAG-iVDGL-GLUE
  • Work in parallel in EU and US
  • Main US activities
  • MOP
  • Virtual Data System
  • Interactive Analysis
  • Main EU activities
  • Integration of IMPALA with EDG WP1WP2 sw.
  • Batch Analysis user job submission analysis
    farm

41
PPDG MOP system
  • PPDG Developed MOP System
  • Allows submission of CMS prod. Jobs from a
    central location, run on remote locations, and
    returnresults
  • Relies on GDMP for replication
  • Globus GRAM
  • Condor-G and local queuing systems for Job
    Scheduling
  • IMPALA for Job Specification
  • being deployed in USCMS testbed
  • Proposed as basis for next CMS-wide production
    infrastructure

42
(No Transcript)
43
Prototype VDG System (production)
no code
existing
implemented using MOP
44
Globally Scalable Monitoring Service
Push Pull rsh ssh existing scripts snmp
45
Optimisation of Tag Databases
  • Tags (n-tuple) are small (0.2 - 1 kbyte) summary
    objects for each event
  • Crucial for fast selection of interesting event
    subsetsthis will be an intensive activity
  • Past work concentrated in three main areas
  • Development of Objectivity based Tags integrated
    with the CMS COBRA framework and Lizard
  • Investigations of Tag bitmap indexing to speed
    queries
  • Comparisons of OO and traditional databases (SQL
    Server, Oracle 9i, PostGreSQL) as efficient
    stores for Tags
  • New work concentrates on tag based analysis
    services

46
CLARENS a Portal to the Grid
  • Grid-enabling the working environment for
    physicists' data analysis
  • Clarens consists of a server communicating with
    various clients via the commodity XML-RPC
    protocol. This ensures implementation
    independence.
  • The server is implemented in C to give access
    to the CMS OO analysis toolkit.
  • The server will provide a remote API to Grid
    tools
  • Security services provided by the Grid (GSI)
  • The Virtual Data Toolkit Object collection
    access
  • Data movement between Tier centres using GSI-FTP
  • CMS analysis software (ORCA/COBRA),
  • Current prototype is running on the Caltech
    proto-Tier2
  • More information at http//clarens.sourceforge.net
    , along with a web-based demo

47
Clarens Architecture
  • Common protocol spoken by all types of clients to
    all types of services
  • Implement service once for all clients
  • Implement client access to service once for each
    client type using common protocol already
    implemented for all languages (C, Java,
    Fortran, etc. -)
  • Common protocol is XML-RPC with SOAP close to
    working, CORBA doable, but would require
    different server above Clarens (uses IIOP, not
    HTTP)
  • Handles authentication using Grid certificates,
    connection management, data serialization,
    optionally encryption
  • Implementation uses stable, well-known server
    infrastructure (Apache) that is debugged/audited
    over a long period by many
  • Clarens layer itself implemented in Python, but
    can be reimplemented in C should performance be
    inadequate

48
Clarens Architecture II
  • Diagram

http/https
Service
Clarens
Web server
RPC
Client
49
Clarens Architecture
Authentication
Authentication
Session initialization
Session initialization
Request deserializing
Request serializing
Request mashalling
Request transmission
Worker code invocation
Worker code invocation
Result deserializing
Result serializing
Session termination
Session termination
50
  • Clarens is a simple way to implement web services
    on the server
  • Provides some basic connectivity functionality
    common to all services
  • Uses commodity protocols
  • No Globus needed on client side, only certificate
  • Simple to implement clients in scripts and
    compiled code

51
2007
  • Sub event components map to Grid Data-Products
  • Balance of load between Network and CPU
  • Complete Data and Software base virtually
    available at the physicist desktop

52
Simulation, Reconstruction Analysis Software
System
Uploadable on the Grid
Physics modules
Specific Framework
Reconstruction Algorithms
Data Monitoring
Event Filter
Physics Analysis
Grid-enabled Application Framework
Calibration Objects
Event Objects
Configuration Objects
Generic Application Framework
Grid-Aware Data-Products
adapters and extensions
Basic Services
C standard library Extension toolkit
ODBMS
Geant3/4
CLHEP
Paw Replacement
53
Reconstruction on Demand
Compare the results of two different track
reconstruction algorithms
Rec Hits
Detector Element
Rec Hits
Rec Hits
Hits
Event
Rec T1
T1
CaloCl
Rec T2
Analysis
Rec CaloCl
T2
54
Conclusions
  • Grid is the enabling technology for the effective
    deployment of a coherent and consistent data
    processing environment
  • This is the only base for an efficient physics
    analysis program at LHC
  • CMS is engaged in an active development, test and
    deployment program of all software and hardware
    component that will constitute the future LHC
    grid
Write a Comment
User Comments (0)
About PowerShow.com