Replica Optimisation Within The EU DataGrid - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Replica Optimisation Within The EU DataGrid

Description:

Commodity Processors IBM (mirrored) EIDE Disks.. 2004 Scale: ~1,000 CPUs ... Access Mediator (AM) - contacts replica optimisers to locate the cheapest copies ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 25
Provided by: came189
Category:

less

Transcript and Presenter's Notes

Title: Replica Optimisation Within The EU DataGrid


1
Replica Optimisation Within The EU DataGrid
  • David Cameron
  • e-Science Summer School
  • 16 21 September 2002

2
Summary
  • The need for Grid.
  • Grid architecture.
  • Replica management and optimisation through
    economic models.
  • Grid simulation OptorSim.
  • Some results.
  • Simulation demo.

3
The Large Hadron Collider
4
Complexity CPU Requirements
  • Complex events
  • Large number of signals
  • good signals are covered with background
  • Many events
  • 109 events/experiment/year
  • 1- 25 MB/event raw data
  • several PB per year
  • Need world-wide

7106 SPECint95 (3108 MIPS) Several PB of
storage space
GRID
computing
5
A Physics Event
  • Gated electronics response from a proton-proton
    collision
  • Raw data hit addresses, digitally
  • converted charges and times
  • Marked by a unique code
  • Proton bunch crossing number, RF bucket
  • Event number
  • Collected, Processed, Analyzed, Archived.
  • Variety of data objects become associated
  • Event migrates through analysis chain
  • may be reprocessed
  • selected for various analyses
  • replicated to various locations.

6
Data Structure
Trigger System
Data Acquisition
Run Conditions
Level 3 trigger
Calibration Data
Raw Data
Trigger Tags
Reconstruction
Event Summary Data ESD
Event Tags
REAL and SIMULATED data required
7
Data Hierarchy
RAW
Recorded by DAQ Triggered events
2 MB/event
Detector digitisation
ESD
Pseudo-physical information Clusters, track
candidates (electrons, muons), etc.
Reconstructed information
100 kB/event
Physical information Transverse momentum,
Association of particles, jets, (best) id of
particles, Physical info for relevant objects
AOD
Selected information
10 kB/event
Analysis information
TAG
Relevant information for fast event selection
1 kB/event
8
Physics Analysis
ESD Data or Monte Carlo
Tier 0,1 Collaboration wide
Event Tags
Event Selection
Calibration Data
Analysis, Skims
INCREASING DATA FLOW
Raw Data
Tier 2 Analysis Groups
Physics Objects
Physics Objects
Physics Objects
Tier 3, 4 Physicists
Physics Analysis
9
Tier-0 - CERN
Commodity Processors IBM (mirrored) EIDE Disks..

Storage Systems.
2004 Scale 1,000 CPUs 1 PBytes
10
UK Tier-1 RAL
New Computing Farm 4 racks holding 156 dual
1.4GHz Pentium III cpus. Each box has 1GB of
memory, a 40GB internal disk and 100Mb ethernet.
Tape Robot upgraded last year uses 60GB STK 9940
tapes 45TB currrent capacity could hold 330TB.
50TByte disk-based Mass Storage Unit after RAID 5
overhead. PCs are clustered on network switches
with up to 8x1000Mb ethernet out of each rack.
2004 Scale 1000 CPUs 0.5 PBytes
11
UK Tier-2 ScotGRID
  • 59 IBM X Series 330 dual 1 GHz Pentium III with
    2GB memory
  • IBM X Series 370 PIII Xeon with 512 MB memory 32
    x 512 MB RAM
  • 70 x 73.4 GB IBM FC Hot-Swap HDD

2004 Scale 300 CPUs 0.1 PBytes
12
Grid architecture
13
Replica management
  • Replica Manager
  • copyFile()
  • copyAndRegisterFile()
  • listReplicas()
  • deleteFile()
  • Replica Catalogue (LFN PFNs )
  • registerEntry()
  • unregisterEntry()

14
Submitting a Job
The Grid
Site 1
User
Scheduler
Site 2
Site 3
15
Replica Optimisation
  • Optimise use of computing, storage and network
    resources.
  • Short term optimisation
  • Minimise running time of current job.
  • Get me the files for my job as quickly as
    possible
  • Long term optimisation
  • Minimise running time of all jobs.
  • Make sure files are in the best
  • places for all my future jobs.

16
Optimisation Through Economic Models
  • Files represent goods.
  • Bought by Computing Elements for jobs.
  • Bought and sold by Storage Elements to make
    profit.
  • Investment decision based on projected future
    value based on previous file access patterns.
  • Storage Elements can buy popular files
  • independently of running jobs.

17
Replica optimiser architecture
  • Access Mediator (AM) - contacts replica
    optimisers to locate the cheapest copies of files
    and makes them locally available
  • Storage Broker (SB) - manages files stored in
    storage element, trying to maximise profit for
    the finite amount of storage space available
  • P2P Mediator (P2PM) - establishes and maintains
    P2P communication between grid sites

18
Auction Mechanism
  • Use Vickrey auction
  • Every seller makes a bid lower than the asking
    price.
  • File is sold to lowest bidder at second lowest
    price.
  • Ensures
  • Low price for purchaser.
  • Trading fairness.
  • Minimal messaging

19
OptorSim a replica optimiser simulation
  • Need to tune optimisation algorithms.
  • Develop Grid simulation in JAVA.
  • Input network configuration and files and jobs.
  • Job transfer the files defined in job
    description to CE running job.

20
OptorSim a replica optimiser simulation
  • Schedule to CE using
  • CEcost queueSize
  • accessCost
  • Files requested according to access pattern.
  • Sequential
  • Random
  • Unitary random walk
  • Gaussian random walk
  • Zipf distribution (not yet implemented).
  • No processing involved, only file transfer.

21
OptorSim a replica optimiser simulation
Data Sample Number of Files Total Size (GB)
Central J/y 120 1200
High pt electrons 20 200
Inclusive electrons 500 5000
Inclusive muons 140 1400
High Et photons 580 5800
Z0 -gt b bbar 60 600
  • Input site policies and experiment data files
    (simplified CDF jobs).
  • Tested replication strategies
  • No replication
  • Always Replicate, Delete Oldest File
  • Always Replicate, Delete Least Accessed File
  • Economic Model

22
Results
  • Eco model 40 better for sequential but no better
    for others expected since eco model is tuned
    for sequential access.

23
Future Work
  • 3rd party replication
  • SAM access patterns
  • Integration Optor Reptor Testbed

24
Conclusions
  • Simulation shows Eco Model successful.
  • Further simulation will help tune algorithms.
  • Integration into testbed code soon.
Write a Comment
User Comments (0)
About PowerShow.com