Modeling%20LHC%20Regional%20Centers%20with%20the%20MONARC%20Simulation%20Tools - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling%20LHC%20Regional%20Centers%20with%20the%20MONARC%20Simulation%20Tools

Description:

Modeling LHC Regional Centers with the MONARC Simulation Tools Irwin Gaines, FNAL for the MONARC collaboration MONARC A joint project (LHC experiments and CERN/IT) to ... – PowerPoint PPT presentation

Number of Views:259
Avg rating:3.0/5.0
Slides: 36
Provided by: Computi426
Category:

less

Transcript and Presenter's Notes

Title: Modeling%20LHC%20Regional%20Centers%20with%20the%20MONARC%20Simulation%20Tools


1
Modeling LHC Regional Centers with the MONARC
Simulation Tools
  • Irwin Gaines, FNAL
  • for the MONARC collaboration

2
MONARC
  • A joint project (LHC experiments and CERN/IT) to
    understand issues associated with distributed
    data access and analysis for the LHC
  • Examine distributed data plans of current and
    near future experiments
  • Determine characteristics and requirements for
    LHC regional centers
  • Understand details of analysis process and data
    access needs for LHC data
  • Measure critical parameters characterizing
    distributed architectures, especially database
    and network issues
  • Create modeling and simulation tools
  • Simulate a variety of models to understand
    constraints on architectures

3
MONARC
  • Models Of Networked Analysis
    At Regional Centers
  • Caltech, CERN, FNAL, Heidelberg, INFN,
  • Helsinki, KEK, Lyon, Marseilles, Munich,
    Orsay, Oxford, RAL,Tufts, ...
  • GOALS
  • Specify the main parameters characterizing the
    Models performance throughputs, latencies
  • Determine classes of Computing Models feasible
    for LHC (matched to network capacity and data
    handling resources)
  • Develop Baseline Models in the feasible
    category
  • Verify resource requirement baselines
    (computing, data handling, networks)
  • COROLLARIES
  • Define the Analysis Process
  • Define Regional Center Architectures
  • Provide Guidelines for the final Models

622 Mbits/s
FNAL 4.107 MIPS 110 Tbyte Robot
Desk tops
622 Mbits/s
Desk tops
University n.106MIPS m Tbyte Robot
N x 622 Mbits/s
Optional Air Freight
CERN n.107 MIPS m Pbyte Robot
Desk tops
622Mbits/s
622 Mbits/s
622 Mbits/s
4
MONARC a systematic study of LHC regional
center issues
  • This talk will discuss
  • - study of existing and near future experiment
    analysis architectures
  • (http//home.fnal.gov/odell/future/future_frame.h
    tml)
  • - description of regional center services
  • (http//home.fnal.gov/butler/rcarchitecture.htm)
  • - understanding of LHC analysis process
  • - use of tools to draw conclusions about
    suitability of different analysis architectures
  • (testbed measurements, development and
    verification of modeling tools covered in other
    talks at this conference)

5
General Need for distributed data access and
analysis
  • Potential problems of a single centralized
    computing center include
  • - scale of LHC experiments difficulty of
    accumulating and managing all resources at one
    location
  • - geographic spread of LHC experiments
    providing equivalent location independent access
    to data for physicists
  • - help desk, support and consulting in same time
    zone
  • - cost of LHC experiments optimizing use of
    resources located world wide

6
Motivations for Regional Centers
  • A distributed computing architecture based on
    regional centers offers
  • A way of utilizing the expertise and resources
    residing in computing centers all over the world
  • Provide local consulting and support
  • To maximize the intellectual contribution of
    physicists all over the world without requiring
    their physical presence at CERN
  • Acknowledgement of possible limitations of
    network bandwidth
  • Allows people to make choices on how they analyze
    data based on availability or proximity of
    various resources such as CPU, data, or network
    bandwidth.

7
Current and Future Experiment Surveys
8
Future Experiment Survey
  • Analysis/Results
  • From the previous survey, we saw many sites
    contributed to Monte Carlo generation
  • This is now the norm
  • New experiments trying to use the Regional Center
    concept
  • BaBar has Regional Centers at IN2P3 and RAL
  • STAR has Regional Center at LBL/NERSC
  • CDF and D0 offsite institutions paying more
    attention as run gets closer.

9
Future Experiment Survey
  • Other observations/ requirements
  • In the last survey, we pointed out the following
    requirements for RCs
  • 24X7 support
  • software development team
  • diverse body of users
  • good, clear documentation of all s/w and s/w
    tools
  • The following are requirements for the central
    site (I.e. CERN)
  • Central code repository easy to use and easily
    accessible for remote sites
  • be sensitive to remote sites in database
    handling, raw data handling and machine flavors
  • provide good, clear documentation of all s/w and
    s/w tools
  • The experiments in this survey achieving the most
    in distributed computing are following these
    guidelines

10
Regional Center Characteristics
11
Regional Centers
  • Regional Centers will
  • Provide all technical services and data services
    required to do the analysis
  • Maintain all (or a large fraction of) the
    processed analysis data. Possibly may only have
    large subsets based on physics channels. Maintain
    a fixed fraction of fully reconstructed and raw
    data
  • Cache or mirror the calibration constants
  • Maintain excellent network connectivity to CERN
    and excellent connectivity to users in the
    region. Data transfer over the network is
    preferred for all transactions but transfer of
    very large datasets on removable data volumes is
    not ruled out.
  • Share/develop common maintenance, validation, and
    production software with CERN and the
    collaboration
  • Provide services to physicists in the region,
    contribute a fair share to post-reconstruction
    processing and data analysis, collaborate with
    other RCs and CERN on common projects, and
    provide services to members of other regions on a
    best effort basis to further the science of the
    experiment
  • Provide support services, training,
    documentation, trouble shooting to RC and remote
    users in the region

12
Mass Storage Disk Servers Database Servers
Data Import
Data Export
Tier 2
Network from CERN
Local institutes
Network from Tier 2 and simulation centers
Production Reconstruction Raw/Sim--gtESD Schedule
d, predictable experiment/ physics groups
Production Analysis ESD--gtAOD AOD--gtDPD Schedule
d Physics groups
Individual Analysis AOD--gtDPD and
plots Chaotic Physicists
CERN
Tapes
Tapes
Desktops
Support Services
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
13
Mass Storage Disk Servers Database Servers
Data Import
Data Export
Total Storage
Robotic Mass Storage - 300TB Raw
Data 50TB 5107 events (5 of 1 year) Raw
(Simulated) Data 100TB 108 events EDS
(Reconstructed Data) 100TB - 109 events (50
of 2 years) AOD (Physics Object) Data 20TB
2109 events (100 of 2 years) Tag Data 2TB
(all) Calibration/Conditions data base 10TB
(only latest version of most data types kept
here) Central Disk Cache - 100TB (per user
demand) CPU Required for AMS database servers
??103 SI95 power
Tier 2
Network from CERN
Local institutes
Network from Tier 2 and simulation centers
Production Reconstruction Raw/Sim--gtESD Schedule
d, predictable experiment/ physics groups
Production Analysis ESD--gtAOD AOD--gtDPD Schedule
d Physics groups
Individual Analysis AOD--gtDPD and
plots Chaotic Physicists
CERN
Tapes
Tapes
Data Input Rate from CERN Raw Data - 5
50TB/yr ESD Data - 50 50TB/yr AOD Data -
All 10TB/yr Revised ESD - 20TB/yr
Data Input from Tier 2 Revised ESD and AOD -
10TB/yr Data Input from Simulation Centers Raw
Data - 100TB/yr
Data Output Rate to CERN AOD Data -
8 TB/yr Recalculated ESD - 10 TB/yr
Simulation ESD data - 10 TB/yr Data Output to
Tier 2 Revised ESD and AOD - 15 TB/yr Data
Output to local institutes ESD, AOD, DPD data -
20TB/yr
Desktops
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
14
Physics Sftware Development
Mass Storage Disk Servers Database Servers
Data Import
Data Export
Tier 2
Network from CERN
Local institutes
Web Servers Telepresence Servers
Network from Tier 2 and simulation centers
Production Reconstruction Raw/Sim--gtESD Schedule
d experiment/ physics groups
Production Analysis ESD--gtAOD AOD--gtDPD Schedule
d Physics groups
Individual Analysis AOD--gtDPD and
plots Chaotic Physicists
CERN
Tapes
Info servers Code servers
Tapes
Event Selection Jobs 10 physics groups
108 events (10samples) 3 times/yr
based on ESD and latest AOD data 50 SI95/evt gt
5000 SI95 power Physics Object creation Jobs
10 physics groups 107 events (1 samples)
8 times/yr based on selected event sample
ESD data 200 SI95/event gt 5000 SI95
power Derived Physics data creation Jobs 10
physics groups 107 events 20 times/yr
based on selected AOD samples, generates
canonical derived physics data 50 SI95/evt gt
3000 SI95 power Total 110 nodes of 100 SI95 power
Training Consulting Help Desk
Farms of low cost commodity computers, limited
I/O rate, modest local disk cache ----------------
------------------------------------- Reconstructi
on Jobs Reprocessing of raw data 108
events/year (10) Initial processing of
simulated data 108/year 1000 SI95-sec/event
gt 104 SI95 capacity 100 processing nodes
of 100 SI95 power
Derived Physics data creation Jobs 200
physicists 107 events 20 times/yr based on
selected AOD and DPD samples 20 SI95/evt gt
30,000 SI95 power Total 300 nodes of 100 SI95
power
RD Systems and Testbeds
15
Understanding the LHC Analysis Process
16
MONARC Analysis Process Example

17
Model and Simulation parameters
  • Have a new set of parameters common to all
    simulating groups.
  • More realistic values, but still to be
    discussed/agreed on the basis of Experiments
    information.

1000 Proc_time_RAW SI95sec/event
(350) 25 Proc_Time_ESD
(2.5) 5
Proc_Time_AOD
(0.5) 3 Analyze_Time_TAG
3 Analyze_Time_AOD 15
Analyze_Time_ESD
(3) 600 Analyze_Time_RAW
(350) 100 Memory of
Jobs MB 5000 Proc_Time_Create_RAW
SI95sec/event (35) 1000
Proc_Time_Create_ESD
(1) 25 Proc_Time_Create_AOD
(1)
18
Example Physics Analysis at Regional Centres
  • Similar data processing jobs are performed
    in several RCs
  • Each Centre has TAG and AOD databases
    replicated.
  • Main Centre provides ESD and RAW data
  • Each job processes AOD data, and also a a
    fraction of ESD and RAW.

19
Example Physics Analysis

20
Results of Models of Distributed Architectures
21
Analysis and Reconstruction Simulations
P. Capiluppi, L. Perini, S. Resconi, D.
Ugolotti Dept. of Physics INFN - Bologna
Milano
  • Preliminary Results for simple Models

Try to stress the System and look for a steady
state (same Jobs repeated every day)
22
Base Model used
  • Basic Jobs
  • Reconstruction of 107 events RAW--gt ESD --gt AOD
    --gt TAG at CERNIts the production while the
    data are coming from the DAQ (100 days of running
    collecting a billion of events per year)
  • Analysis of 5 Working Groups each of 25 analyzers
    on TAG only (no request to higher level data
    samples). Every analyzer submit 4 sequential
    jobs on 106 events.Each analyzer work start-time
    is a flat random choice in the range of 3000
    seconds.Each analyzer data sample of 106 events
    is a random choice in the complete data sample of
    TAG DataBase consisting of 107 events.
  • Transfer (FTP) of a 107 events ESD, AOD and TAG
    from CERN to RC
  • CERN Activities Reconstruction, 5 WG Analysis,
    FTP transfer
  • RC Activities 5 (uncorrelated) WG Analysis,
    receive FTP transfer
  • Jobs paper estimate
  • Single Analysis Job 1.67 CPU hours at CERN
    6000 sec at CERN (same at RC)
  • Reconstruction at CERN for 1/500 RAW to ESD
    3.89 CPU hours 14000 sec
  • Reconstruction at CERN for 1/500 ESD to AOD
    0.03 CPU hours 100 sec

23
Resources LAN speeds ?!
  • In our Models the DB Servers are uncorrelated and
    thus one activity uses a single Server. The
    bottlenecks are the read and write speed to
    and from the Server. In order to use the CPU
    power at reasonable percentage we need a read
    speed of at least 300 MB/s and a write speed of
    100 MB/s (milestone already met today)
  • We use 100 MB/s in current simulations (10
    Gbits/sec switched LANs in 2005 may be possible).
  • Processing node link speed is negligible in our
    simulations.
  • Of course the real implementation of the Farms
    can be different, but the results of the
    simulation do not depend on real
    implementation they are based on usable
    resources.

See following slides
24
Data access speeds
Reconstruction of ESD, AOD and TAG (107 events)
at CERN, repeated for 10 days.
DB read speed 25 MB/s DB write speed 15 MB/s DB
link speed 100 MB/s Node link speed 10 MB/s
  • Poor CPU use (less than 5)
  • Low jobs efficiency
  • Jobs span over the following days

?
25
Data access speeds
Reconstruction of ESD, AOD and TAG (107 events)
at CERN, repeated for 10 days.
DB read speed 100 MB/s DB write speed 100 MB/s DB
link speed 100 MB/s Node link speed 100 MB/s
  • Better CPU use (about 15)
  • Still low jobs efficiency
  • Jobs span over the following days

?
26
More realistic values for CERN and RC
  • Data Link speeds at 100 MB/sec (all values)
    except
  • Node_Link_Speed at 10 MB/sec
  • WAN Link speeds at 40 MB/sec
  • CERN
  • 1000 Processing nodes each of 500 SI95
  • RC
  • 200 Processing nodes each of 500 SI95

1000 Processing nodes times 500SI95 500kSI95
about the CPU power of CERN Tier0
disk space as for the number of DBs
100kSI95 processing Power 20 CERN
disk space as for the number of DBs
27
Analysis on 107 events
Reconstruction of ESD, AOD and TAG (107 events)
at CERN 5 WG Analysis at CERN 5 WG Analysis at
RC Transfer (FTP) of 107 events ESD and AOD to
the RC
Test7_Model1 107 events per job!
2 days of simulated activities
28
Analysis on 107 events
Reconstruction of ESD, AOD and TAG (107 events)
at CERN 5 WG Analysis at CERN 5 WG Analysis at
RC Transfer (FTP) of 107 events ESD and AOD to
the RC
Test7_Model1 107 events per job!
2 days of simulated activities
29
Analysis on 107 events
Reconstruction of ESD, AOD and TAG (107 events)
at CERN 5 WG Analysis at CERN 5 WG Analysis at
RC Transfer (FTP) of 107 events ESD and AOD to
the RC
RC with doubled CPU resources
Test7bis_Model1 107 events per job!
2 days of simulated activities
30
Some Conclusions of Simulations
  • Larger CPU power (of the order of 1000 SI95sec)
    for event reconstruction is possible at CERN.
    (may eventually interfere with number of
    re-reprocessing per year).
  • A concern. A RC is 20 of CERN but the full
    Analysis process load of 5 physics groups, if
    fully performed at a single RC, requires more
    than the 20 of CERN resources! We need to
    better define full Analysis process.
  • Role of Tier2 RC should be coordinated with the
    corresponding Tier1 RC activities and/or the
    distribution of WGs over all the Centres should
    be revisited.
  • Using 107 events for all the Analysis requires a
    re-thinking of the Analysis Model. RCs must have
    place for building Revised data and MonteCarlo
    data.

31
 
SIMULATION OF DAILY ACTIVITITIES AT REGIONAL
CENTERS
MONARC Collaboration   Alexander Nazarenko
and Krzysztof Sliwa                               


32
 
    Each group reads 100 TAG events and
follows 10 to AOD                 1 to
ESD                 0.01 to RAW   
            
20 Jobs/Day in total evenly spread among
participating RCs                         
33
 
            Five Tier 1 and one Tier 2 Centers
optimized to perform the complete set with 30
MBps WAN and optimized LAN
    Model1    (fixed values)                      
Model2  (randomized data processing times and
sizes)      
            
34
(No Transcript)
35
Overall Conclusions
  • MONARC simulation tools are
  • sophisticated enough to allow modeling of complex
    distributed analysis scenarios
  • simple enough to be used by non experts
  • Initial modeling runs are alkready showing
    interestung results
  • Future work will help identify bottlenecks and
    understand constraints on architectures
Write a Comment
User Comments (0)
About PowerShow.com