Alexandre A' P' Suaide - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Alexandre A' P' Suaide

Description:

A physics ready production needs ~ 2 production rounds (calibrations, improvements, etc) ... File catalog and scheduler available outside BNL ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 14
Provided by: Alexandr186
Category:

less

Transcript and Presenter's Notes

Title: Alexandre A' P' Suaide


1
  • STAR grid activities
  • and
  • São Paulo experience

2
  • BNL (2 sites)
  • 1100 CPU
  • 400 TB
  • LSF batch

Upgrade project 50 CPU and 40 TB
3
The size of the raw data
  • STAR AuAu event statistics (raw)
  • 2-3 MB/event
  • 20-40 events/s
  • Total 2004 AuAu
  • 20-30 M events
  • 65 TB
  • CuCu run
  • 70 M events _at_ 200 GeV
  • 40 M events _at_ 62 GeV
  • 4 M events _at_ 22 GeV
  • Plus all the pp, dAu and previous runs

4
The reconstruction, simulation, etc.
  • Reconstruction
  • Basically done in BNL
  • AuAu is estimated to take 18 months (only 60 is
    complete)
  • Compare with 1 new run every year
  • A physics ready production needs 2 production
    rounds (calibrations, improvements, etc)
  • Simulation and embedding
  • Done at PDSF
  • Simulation is transferred to BNL
  • STAR takes more data that it currently can make
    available for analysis

5
Analysis
  • Real data analysis is done in RCF
  • Simulation and embedding analysis is done in PDSF
  • Small fractions of datasets are scattered over
    many institutions mainly for analysis development

_at_ PDSF
6
Why do we need grid?
  • If STAR wants to keep the production and analysis
    running in a speed compatible with data taking,
    other institutions need to share computer power
  • Next run STAR will take at least one order of
    magnitude more events than last year
  • The RCF/PSDF farm does not grow in the same rate
  • The user point of view
  • More time available for physics
  • Data will be available earlier
  • More computing power for analysis
  • Analysis will run faster
  • Submit the jobs from your home institution and
    get the output in there
  • No need to know where the data is
  • No need to log on RCF or PDSF
  • You manage your disk space

7
STAR grid
  • Three level structure
  • Tier0 sites (BNL)
  • Dedicated to reconstruction, simulation and
    analysis
  • Tier1 sites (PDSF)
  • Runs reconstruction on demand
  • Receives all the reconstructed files for analysis
  • Simulations and embedding
  • Tier2 sites (all other facilities, including São
    Paulo)
  • Receives a fraction of files for analysis
  • Eventually runs reconstruction depending on demand

8
Needs
  • Reconstruction and file distribution
  • Tier0 production
  • ALL EVENT files get copied on HPSS at the end of
    a job
  • Strategy implies dataset IMMEDIATE replication
  • As soon as a file is registered, it becomes
    available for distribution
  • 2 Levels of data distributions Local and Global
  • Local
  • All analysis files are on disks
  • Notions of distributed disk Cost effective
    solution
  • Global
  • Tier1 (all) and tier2 (partial) sites
  • Cataloging is fundamental
  • Must know where the files are
  • The only central connection between users and
    files
  • Central and local catalogs
  • Database should be updated right after file
    transfer
  • Customized scheduler
  • Find out where data is upon user request
  • Redirect jobs to cluster where data is saved

9
What is STAR doing on grid?
  • For STAR, grid computing is EVERY DAY Production
    used
  • Data transfer using SRM, RRS, ..
  • We run simulation production on the Grid (easy)
  • Resource reserved for DATA production (still done
    traditionally)
  • No real technical difficulties
  • Mostly fears related to un-coordinated access and
    massive transfers
  • User analysis
  • Chaotic in nature, requires accounting, quota,
    privilege, etc
  • Increase interest from some institutions
  • Already success under controlled conditions

10
STAR jobs in the grid
11
Accomplishments in the last few months
  • Full database mirrors over many institutions
  • Hold detector conditions, calibrations, status,
    etc
  • Highly used during user analisys
  • File catalog and scheduler available outside BNL
  • User can query files and submit jobs using grid
  • Still some pitfalls for general user analysis
  • Integration between sites
  • Tools to keep grid certificates, batch systems
    and local catalogs updated
  • Library distribution automatically done using AFS
    or local copy (updated in a daily basis)
  • Full integration of the 3 sites (BNL, PDSF and
    SP) with OSG

12
User analysis in the grid
  • STAR analysis schema
  • 99 based on ROOT applications
  • User develops personal analysis code that process
    the data
  • Steps to properly submit analysis jobs in the
    grid
  • Select the proper cluster in the grid
  • Transfer and compile the analysis code to that
    cluster
  • Use the file catalog to select the files
  • Run the jobs (as many as necessary)
  • The node the job runs and the number of jobs is
    defined by the scheduler and depends on the
    cluster size, number of events and time to
    process each event. All this information is
    managed by the file catalog
  • Transfer the output to the local site
  • Many of these steps are not yet fully functional
    but progressing fast

13
Current status and to do list
  • The GRID between PSDF and RCF works quite well
  • Mainly used for simulation jobs
  • São Paulo, BNL and LBL are fully integrated
  • Libraries, file catalog, scheduler, OSG, etc.
  • Being used to test user analysis under the grid
  • Activities for the next few months
  • Integrate the SGE batch system in the grid
    framework
  • Still some problems with respect to report right
    numbers to gridCat
  • Problems keeping jobs alive after few hours
  • Developments of authentication tools
  • RCF (BNL) and PDSF (LBL) are part of DOE labs
  • User analysis
Write a Comment
User Comments (0)
About PowerShow.com