D - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

D

Description:

Time difference between continents affecting efficiencies. From the Nov. Survey. May 9, 2002 ... Takes 7.6Mo using 500 750MHz (40 specInt 95) machines at 100% ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 31
Provided by: jae563
Category:
Tags: continents | map | of | the

less

Transcript and Presenter's Notes

Title: D


1
DØ RACE
DØ Internal Computing Review May 9 10, 2002 Jae
Yu
  • Introduction 
  • Current Status
  • DØRAM Architecture
  • Regional Analysis Centers
  • Conclusions

2
How Do You Want to Do?
  • John Krane would say I want to measure
    inclusive jet cross section at my desk in ISU!!
  • Chip Brock would say I want to measure W cross
    section at MSU!!
  • Meena would say I want to find the Higgs at
    BU!!
  • All of the above should be possible in the near
    future!!!
  • What do we need to do to accomplish the above?

3
What is DØRACE, and Why Do We Need It?
  • DØ Remote Analysis Coordination Efforts
  • In existence to accomplish
  • Setting up and maintaining remote analysis
    environment
  • Promote institutional contribution remotely
  • Allow remote institutions to participate in data
    analysis
  • To prepare for the future of data analysis
  • More efficient and faster delivery of multi-PB
    data
  • More efficient sharing processing resources
  • Prepare for possible massive re-processing and MC
    production to expedite the process
  • Expedite physics result production

4
DØRACE Contd
  • Maintain self-sustained support amongst the
    remote institutions to construct a broader bases
    of knowledge
  • Alleviate the load on expert by sharing the
    knowledge and allow them to concentrate on
    preparing for the future
  • Improve communication between the experiment site
    and the remote institutions
  • Minimize travel around the globe for data access
  • Sociological issues of HEP people at the home
    institutions and within the field.
  • Primary goal is allow individual desktop users to
    make significant contribution without being at
    the lab

5
From the Nov. Survey
  • Difficulties
  • Having hard time setting up initially
  • Lack of updated documentation
  • Rather complicated set up procedure
  • Lack of experience? No forum to share experiences
  • OS version differences (RH6.2 vs 7.1), let alone
    OS
  • Most the established sites have easier time
    updating releases
  • Network problems affecting successful completion
    of large size releases (4GB) takes a couple of
    hours (SA)
  • No specific responsible persons to ask questions
  • Availability of all necessary software via
    UPS/UPD
  • Time difference between continents affecting
    efficiencies

6
DØRACE Strategy
  • Categorized remote analysis system set up by the
    functionality
  • Desk top only
  • A modest analysis server
  • Linux installation
  • UPS/UPD Installation and deployment
  • External package installation via UPS/UPD
  • CERNLIB
  • Kai-lib
  • Root
  • Download and Install a DØ release
  • Tar-ball for ease of initial set up?
  • Use of existing utilities for latest release
    download
  • Installation of cvs
  • Code development
  • KAI C compiler
  • SAM station setup

Phase 0 Preparation
Phase I Rootuple Analysis
Phase II Executables
Phase III Code Dev.
Phase IV Data Delivery
7
What has been accomplished?
  • Regular bi-weekly meetings every on-week
    Thursdays
  • Remote participating through video conferencing
    (ISDN)? Moving toward switching over to VRVS per
    VCTFs recommendation
  • Keep up with the progress via site reports
  • Provide forum to share experience
  • DØRACE home page established (http//www-hep.uta.e
    du/d0race)
  • To ease the barrier over the difficulties in
    initial set up
  • Updated and simplified instructions for set up
    available on the web ? Many institutions have
    participated in refining the instruction
  • Tools to make DØ software download and
    installation made available
  • More tools identified and are in the works (Need
    to automate download and installation as much as
    we can, if possible one button based operation)

8
  • Release Ready notification system activated
  • Success is defined by institutions
  • Pull system ? you can decide whether to download
    and install a specific release
  • Build Error log and dependency tree utility in
    place
  • Release packet split to minimize network
    dependence
  • Automated one-button release download and
    operation utility in the works
  • Had a DØRACE workshop with hands-in session in
    Feb.

9
(No Transcript)
10
Where are we?
  • DØRACE is entering the next stage
  • The compilation and running
  • Active code development
  • Propagation of setup to all institutions
  • Instructions seem to take their shape well
  • Need to maintain and to keep them up to date
  • Support to help problems people encounter
  • DØGRID
  • Prepare SAM and other utilities for transparent
    and efficient remote contribution
  • Need to establish Regional Analysis Centers

11
Proposed DØRAM Architecture
Central Analysis Center (CAC)
Regional Analysis Centers
Provide Various Services
Institutional Analysis Centers
Desktop Analysis Stations
12
Why do we need a DØRAM?
  • Total Run II data size reaches over multiple PB
  • 300TB and 2.8PB for RAW in Run IIa and IIb
  • 410TB and 3.8PB for RAWDSTTMB
  • 1.0x109/ 1.0x109 Events total
  • At the fully optimized 10sec/event (40 specInt95)
    reco.?1.0x1010 Seconds for one time reprocessing
    for Run IIa
  • Takes 7.6Mo using 500 750MHz (40 specInt 95)
    machines at 100 CPU efficiency
  • 1.5 Mo with 500 4GHz machines for Run IIa
  • 7.5 to 9 Mos with 500 4GHz machines for Run II b
  • Time for data transfer occupying 100 of a
    gigabit (125Mbyte/s) network
  • 3.2x106/3.2x107 seconds to transfer the entire
    data set (A full year with 100 OC3 bandwidth)

13
  • Data should be readily available for expeditious
    analyses
  • Preferably disk resident so that time for caching
    is minimized
  • Analysis processing compute power should be
    available without having the users relying on CAC
  • MC generation should be transparently done
  • Should exploit compute resources at remote sites
  • Should exploit human resources at remote sites
  • Minimize resource needs at the CAC
  • Different resources will be needed

14
What is a DØRAC?
  • An institute with large concentrated and
    available computing resources
  • Many 100s of CPUs
  • Many 10s of TBs of disk cache
  • Many 100Mbytes of network bandwidth
  • Possibly equipped with HPSS
  • An institute willing to provide services to a few
    small institutes in the region
  • An institute willing to provide increased
    infrastructure as the data from the experiment
    grows
  • An institute willing to provide support personnel
    if necessary

15
Chips W x-sec Measurement
3
4
2
16
What services do we want a DØRAC do?
  1. Provide intermediary code distribution
  2. Generate and reconstruct MC data set
  3. Accept and execute analysis batch job requests
  4. Store data and deliver them upon requests
  5. Participate in re-reconstruction of data
  6. Provide database access
  7. Provide manpower support for the above activities

17
Code Distribution Service
  • Current releases 4GB total ? will grow to gt8GB?
  • Why needed?
  • Downloading 8GB once every week is not a big load
    on network bandwidth
  • Efficiency of release update rely on Network
    stability
  • Exploit remote human resources
  • What is needed?
  • Release synchronization must be done at all RACs
    every time a new release become available
  • Potentially need large disk spaces to keep
    releases
  • UPS/UPD deployment at RACs
  • FNAL specific
  • Interaction with other systems?
  • Need administrative support for bookkeeping
  • Current DØRACE procedure works well, even for
    individual users ? Do not see the need for this
    service

18
Generate and Reconstruct MC data
  • Currently done 100 at remote sites
  • Why needed?
  • Extremely self-contained
  • Code distribution done via a tar-ball
  • Demand will grow
  • Exploit available compute resources
  • What is needed?
  • A mechanism to automate request processing
  • A Grid that can
  • Accept job requests
  • Packages the job
  • Identify and locate the necessary resources
  • Assign the job to the located institution
  • Provide status to the users
  • Deliver or keep the results
  • Perhaps most undisputable task but do we need a
    DØRAC?

19
Batch Job Processing
  • Currently rely on FNAL resources
  • D0mino, ClueD0, CLUBS, etc
  • Why needed?
  • Bring the compute resources closer to the user
  • Distribute the computing load to available
    resources
  • Allow remote users to process their jobs
    expeditiously
  • Exploit the available compute resources
  • Minimize resource load at CAC
  • Exploit remote human resources

20
Batch Job Processing contd
  • What is needed?
  • Sufficient computing infrastructure to process
    requests
  • Network
  • CPU
  • Cache storage
  • Access to relevant databases
  • A Grid that can
  • Accept job requests
  • Packages the job
  • Identify and locate the necessary resources
  • Assign the job to the located institution
  • Provide status to the users
  • Deliver or keep the results
  • This task definitely needs a DØRAC
  • What do we do with input? Keep them at RACs?

21
Data Caching and Delivery
  • Currently only at FNAL
  • Why needed?
  • Limited disk cache at FNAL
  • Tape access needed
  • Latencies involved, sometimes very long
  • Delivering data within a reasonable time over the
    network to all the requests is imprudent
  • Reduce resource load on the CAC
  • Data should be readily available to the users
    with minimal latency for delivery

22
Data Caching and Delivery contd
  • What is needed?
  • Need to know what data and how much we want to
    store
  • 100 TMB
  • 10-20 DST?
  • Any RAW data at all?
  • What about MC? 50 of the actual data
  • Should be on disk to minimize data caching
    latency
  • How much disk space? (50TB if 100 TMB and 10
    DST for RunIIa)
  • Constant shipment of data to all RACs from the
    CAC
  • Constant bandwidth occupation (14MB/sec for Run
    IIa RAW)
  • Resources from CAC needed
  • A Grid that can
  • Locate the data (SAM can do this already)
  • Tell the requester about the extent of the
    request
  • Decide whether to move the data or pull the job
    over

23
Data Reprocessing Services
  • These include
  • Re-reconstruction of the data
  • From DST?
  • From RAW?
  • Re-streaming of data
  • Re-production of TMB data sets
  • Re-production of roottree
  • ab initio reconstruction
  • Currently done only at CAC offline farm

24
Reprocessing Services contd
  • Why needed?
  • The CAC offline farm will be busy with fresh data
    reconstruction
  • Only 50 of the projected capacity is used for
    this but
  • Going to be harder to re-reconstruct as more data
    accumulates
  • We will have to
  • Reconstruct a few times (gt2) to improve data
  • Re-stream TMB
  • Re-produce TMBs from DST and RAW
  • Re-produce root-tree
  • It will take a many months to re-reconstruct the
    large amount of data
  • 1.5 Mo with 500 4GHz machines for Run IIa
  • 7.5 to 9 Mos for full reprocessing Run IIb
  • Exploit large resources in remote institutions
  • Expedite re-processing for expeditious analyses
  • Cutting down the time by a factor of 2 to 3 will
    make difference
  • Reduce the load on CAC offline farm
  • Just in case the CAC offline farm is having
    trouble, the RACs can even help out with ab
    initio reconstruction

25
Reprocessing Services contd
  • What is needed?
  • Permanently store necessary data, because it
    would take a long time just to transfer data
  • DSTs
  • RAW
  • Large date storage
  • Constant data transfer from CAC to RACs as we
    take and reconstruct data
  • Dedicated file server for data distribution to
    RACs
  • Constant bandwidth occupation
  • Sufficient buffer storage at CAC in case network
    goes down
  • Reliable and stable network
  • Access to relevant databases
  • Calibration
  • Luminosity
  • Geometry and Magnetic Field Map

26
Database Access Service
  • Currently done only at CAC
  • Why needed?
  • For data analysis
  • For reconstruction of data
  • To exploit available resources
  • What is needed?
  • Remote DB access software services
  • Some copy of DB at RACs
  • A substitute of Oracle DB at remote sites
  • A means of synchronizing DBs

27
Reprocessing Services contd
  • Transfer of new TMB and Roottrees to other sites
  • Well synchronized reconstruction code
  • A grid that can
  • Identify resources on the net
  • Optimize resource allocation for most expeditious
    reproduction
  • Move data around if necessary
  • A dedicated block of time for concentrated CPU
    usage if disaster strikes
  • Questions
  • Do we keep copies of all data at the CAC?
  • Do we ship DSTs and TMBs back to CAC?
  • This service is perhaps the most debatable one
    but I strongly believe this is one of the most
    valuable functionality of RAC.

28
Progress in DØRAC Proposal
  • Working group members
  • I. Bertram, R. Brock, F. Filthaut, L. Lueking, P.
    Mattig,
  • M. Narain , P. Lebrun, B. Thooris , J. Yu, C.
    Zeitnitz
  • A proposal document has been worked on
  • Target to release within two weeks, sufficiently
    prior to the Directors review in June
  • Doc. At http//www-hep.uta.edu/d0race/d0rac-wg/d
    0rac-spec-050602.pdf

29
DØRAC Implementation Timescale
  • Implement First RAC by Oct. 1, 2002
  • Cluster associated IACs
  • Transfer Thumbnail data set constantly from CAC
    to the RAC
  • Workshop on RAC in Nov., 2002
  • Implement the next set of RAC by Apr. 1, 2003

30
Conclusions
  • DØRACE has been rather successful
  • DØ must prepare for large data set era
  • Need to expedite analyses in timely fashion
  • Need to distribute data set throughout the
    collaboration
  • DØRAC proposal almost ready for release
  • Establishing regional analysis centers will be
    the first step toward DØ Grid? By the end of Run
    IIa (2-3 years)
Write a Comment
User Comments (0)
About PowerShow.com