title'open revolution execute - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

title'open revolution execute

Description:

title'open revolution execute – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 56
Provided by: tonyd2
Category:

less

Transcript and Presenter's Notes

Title: title'open revolution execute


1
title.open ( ) revolution execute
  • LHC Computing Challenge
  • Methodology?
  • Hierarchical Information in a Global Grid
    Supernet
  • Aspiration?
  • HIGGS
  • DataGRID-UK
  • Aspiration?
  • ALL Data Intensive Computation

2
Outline
  • Starting Points
  • Physics Motivation
  • The LHC Computing Challenge
  • ATLAS
  • Data Hierarchy
  • GridPP
  • First Grid Developments
  • Monte Carlo Production
  • Grid Tools
  • LHC Computing Grid Startup
  • Summary

3
Starting Point
Something Missing...
Mass Generation via Higgs Boson
SolutionBuild a Collider The Large Hadron
Collider at CERN
Problem Large Datasets Petabytes per year (one
mile high stack of CDs)
4
The Higgs Mechanism
Theory vacuum potential
Expt direct searches and electroweak fits
Energy
LHC aims to 1. Discover a Higgs particle 2.
Measure properties e.g. mass, spin, lifetime,
branching ratios.
5
LHC pp, Ecms 14 TeV
6
ATLAS detector
ATLAS large international collaboration to find
the Higgs (and much more) in the range 0.1TeV lt
mH lt 1TeV
The ATLAS experiment is 26m long, stands 20m
high, weighs 7000 tons and has 200 million
read-out channels
7
ATLAS Parameters
  • Running conditions at startup
  • Raw event size 2 MB (recently revised
    upwards...)
  • 2.7x109 event sample ? 5.4 PB/year, before data
    processing
  • Reconstructed events, Monte Carlo data ? 9
    PB/year (2PB disk)
  • CPU 2M SpecInt95
  • CERN alone can handle only a fraction of these
    resources

8
LHC Computing Challenge
1 TIPS 25,000 SpecInt95 PC (1999) 15
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm20 TIPS
  • One bunch crossing per 25 ns
  • 100 triggers per second
  • Each event is 1 Mbyte

100 MBytes/sec
Tier 0
CERN Computer Centre gt20 TIPS
Gbits/sec
or Air Freight
HPSS
Tier 1
RAL Regional Centre
US Regional Centre
French Regional Centre
Italian Regional Centre
HPSS
HPSS
HPSS
HPSS
Tier 2
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Gbits/sec
Tier 3
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
9
A Physics Event
  • Gated electronics response from a proton-proton
    collision
  • Raw data hit addresses, digitally converted
    charges and times
  • Marked by a unique code
  • Proton bunch crossing number, RF bucket
  • Event number
  • Collected, Processed, Analyzed, Archived.
  • Variety of data objects become associated
  • Event migrates through analysis chain
  • may be reprocessed
  • selected for various analyses
  • replicated to various locations.

10
Data Structure
Trigger System
Data Acquisition
Run Conditions
Level 3 trigger
Calibration Data
Raw Data
Trigger Tags
Reconstruction
Event Summary Data ESD
Event Tags
REAL and SIMULATED data required
11
Data Hierarchy
RAW, ESD, AOD, TAG
RAW
Recorded by DAQ Triggered events
Detector digitisation
2 MB/event
12
Physics Analysis
ESD Data or Monte Carlo
Event Tags
Tier 0,1 Collaboration wide
Event Selection
Calibration Data
Analysis, Skims
INCREASING DATA FLOW
Raw Data
Tier 2 Analysis Groups
Physics Objects
Physics Objects
Physics Objects
Tier 3, 4 Physicists
Physics Analysis
13
Making the Grid Work for the Experiments
14
(No Transcript)
15
(No Transcript)
16
GridPP Context
Provide architecture and middleware
Future LHC Experiments
Running US Experiments
Build Tier-A/prototype Tier-1 and Tier-2 centres
in the UK and join worldwide effort to develop
middleware for the experiments
Use the Grid with simulated data
Use the Grid with real data
17
SR2000 e-Science Allocation
Tony Hey
DG Research Councils
Grid TAG
E-Science Steering Committee
Director
Directors Management Role
Directors Awareness and Co-ordination Role
Generic Challenges EPSRC (15m), DTI (15m)
Academic Application Support Programme Research
Councils (74m), DTI (5m) PPARC (26m) BBSRC
(8m) MRC (8m) NERC (7m) ESRC (3m) EPSRC
(17m) CLRC (5m)
Neil Geddes
80m Collaborative projects
Industrial Collaboration (40m)
18
Grid Architecture
For more info www.globus.org/research/papers/anat
omy.pdf
19
GridPP
17m 3-year project funded by PPARC
  • LCG
  • (start-up phase)
  • funding for staff and hardware...
  • EDG - UK Contributions
  • Architecture
  • Testbed-1
  • Network Monitoring
  • Certificates Security
  • Storage Element
  • R-GMA
  • LCFG
  • FTree
  • MDS deployment
  • GridSite
  • SlashGrid
  • Spitfire

http//www.gridpp.ac.uk
  • Applications (start-up phase)
  • BaBar
  • CDF/D0 (SAM)
  • ATLAS/LHCb
  • CMS
  • (ALICE)
  • UKQCD

20
UK Tier1/A Status
Current setup 14 Dual 1GHz PIII, 500MB RAM 40GB
disks Compute Element (CE) Storage Element
(SE) User Interface (UI) Information Node (IN)
Central Facilities (Non Grid) 250 CPUs 10TB
Disk 35TB Tape (Capacity 330 TB)
Hardware Purchase for delivery in March 2002 156
Dual 1.4GHz 1GB RAM, 30GB disks 26 Disk servers
(Dual 1.266GHz) each with 1.9TB disk Expand the
capacity of the tape robot by about 35TB
21
UK Tier-2 Example Site - ScotGRID
  • ScotGrid Processing nodes at Glasgow
  • 59 IBM X Series 330 dual 1 GHz Pentium III with
    2GB memory
  • 2 IBM X Series 340 dual 1 GHz Pentium III with
    2GB memory and dual ethernet
  • 3 IBM X Series 340 dual 1 GHz Pentium III with
    2GB memory and 100 1000 Mbit/s ethernet
  • 1TB disk
  • LTO/Ultrium Tape Library
  • Cisco ethernet switches
  • ScotGrid Storage at Edinburgh
  • IBM X Series 370 PIII Xeon with 512 MB memory 32
    x 512 MB RAM
  • 70 x 73.4 GB IBM FC Hot-Swap HDD
  • Griddev testrig at Glasgow
  • 4 x 233 MHz Pentium II
  • BaBar UltraGrid System at Edinburgh
  • 4 UltraSparc 80 machines in a rack 450 MHz CPUs
    in each 4Mb cache, 1 GB memory
  • Fast Ethernet and MirrorNet switching
  • CDF equipment at Glasgow
  • 8 x 700 MHz Xeon IBM xSeries 370 4 GB memory 1
    TB disk

One of (currently) 10 GridPP sites running in the
UK
22
(No Transcript)
23
Network
  • Tier1 internal networking will be a hybrid of
  • 100Mb(ps) to nodes of cpu farms with 1Gb up from
    switches
  • 1Gb to disk servers
  • 1Gb to tape servers
  • UK academic network SuperJANET4
  • 2.5Gb backbone upgrading to 20Gb in 2003
  • RAL has 622Mb into SJ4
  • SJ4 has 2.5Gb interconnect to Geant
  • New 2.5Gb link to ESnet and Abilene just for
    research users
  • UK involved in networking development
  • internal with Cisco on QoS
  • external with DataTAG

24
Distributed MC Production, Today
Submit jobs remotely via Web
Transfer data to CASTOR mass-store at CERN
Update bookkeeping database (Oracle at CERN)
Execute on farm
Data Quality Check on data stored at CERN
Monitor performance of farm via Web
25
Validation of Middleware via Distributed MC
Production, Tomorrow
WP 1 job submission tools WP 4 environment
WP 2 data replication WP 5 API for mass storage
Submit jobs remotely via Web
Transfer data to CASTOR (and HPSS, RAL Datastore)
Execute on farm
Update bookkeeping database
WP 1 job submission tools
WP 2 meta data tools WP1 tools
Online histogram production using GRID pipes
WP 3 monitoring tools
Data Quality Check Online
Monitor performance of farm via Web
26
GANGA Gaudi ANd Grid Alliance
GANGA
GUI
Collective Resource Grid Services
Histograms Monitoring Results
JobOptions Algorithms
GAUDI Program
Making the Grid Work for the Experiments
27
CMS Data in 2001
  • OBJECTIVITY DATATOTAL 29 TB
  • TYPICAL EVENT SIZES
  • Simulated
  • 1 CMSIM event 1 OOHit event 1.4 MB
  • Reconstructed
  • 1 1033 event 1.2 MB
  • 1 2x1033 event 1.6 MB
  • 1 1034 event 5.6 MB
  • 14 TB
  • CERN
  • 12 TB
  • FNAL
  • 0.60 TB
  • Caltech
  • 0.45 TB
  • Moscow
  • 0.40 TB
  • INFN
  • 0.22 TB
  • Bristol/RAL
  • 0.20 TB
  • UCSD
  • 0.10 TB
  • IN2P3
  • 0.05 TB
  • Wisconsin
  • -
  • Helsinki
  • 0.08 TB
  • UFL

28
A CMS Data Grid Job
2003 CMS data grid system vision
29
ALICE Data Challenge
  • COTS for mass storage?
  • order of magnitude increase in disk access speed
    reqd.
  • 5 years 100MB/s to gt1GB/s)

30
D0
31
CDF
32
Overview of SAM
SAM
Database Server(s) (Central Database)
Name Server
Global Resource Manager(s)
Log server
Shared Globally
Station 1 Servers
Station 3 Servers
Local
Station n Servers
Station 2 Servers
?
Mass Storage System(s)
Arrows indicate Control and data flow
Shared Locally
33
Overview of SAM
SAM
SAM and DataGrid should use common (lower)
middleware
34
(No Transcript)
35
Experiment Deployment
36
LHC computing at a glance
  • The investment in LHC computing will be massive
  • LHC Review estimated 240MCHF (before LHC delay)
  • 80MCHF/y afterwards
  • These facilities will be distributed
  • Political as well as sociological and practical
    reasons

Europe 267 institutes, 4603 users Elsewhere
208 institutes, 1632 users
37
Access Grid
  • Collection of resources that support
    group-to-group interaction across the grid
  • Supports large-scale distributed meetings and
    collaborative work sessions
  • VRVS (VIC/RAT tools) and H323 commonly used in
    GridPP

38
(No Transcript)
39
How Large is Large?
  • Is the LHC Grid
  • Just the O(10) Tier 0/1 sites and O(20,000) CPUs?
  • the O(50) Tier 2 sites O(40,000) CPUs?
  • the collective computing power of O(300) LHC
    institutions perhaps O(60,000) CPUs in total?
  • Are the LHC Grid users
  • The experiments and their relatively few,
    well-structured production computing
    activities?
  • The curiosity-driven work of 1000s of physicists?
  • Depending on our answer, the LHC Grid is
  • A relatively simple deployment of todays
    technology
  • A significant information technology challenge

40
Service Graph
Allowed? ? Hierarchical Model
All Nodes Grid Aware?
Optimisation? Directory Hierarchicacal?
Relational Heterogeneous?
41
Resource Discovery/Monitoring
R
?
R
R
R
?
R
R
R
R
R
network
dispersed users
R
?
?
R
R
R
R
R
R
R
R
R
R
VO-A
VO-B
  • Large numbers of distributed sensors with
    different properties, varying status
  • Need different views of this information,
    depending on community membership, security
    constraints, intended purpose, sensor types, etc

42
R-GMA
43
R-GMA Schema
CPULoad (Global View)
Timestamp
Load
Facility
Site
Country
19055711022002
0.3
CDF
RAL
UK
19055611022002
1.6
ATLAS
RAL
UK
19055811022002
0.4
CDF
GLA
UK
19055611022002
0.5
LHCb
GLA
UK
19055611022002
0.9
ALICE
CERN
CH
19055511022002
0.6
CMS
CERN
CH
CPULoad (Producer3)
19055611022002
1.6
ATLAS
CERN
CH
19055511022002
0.6
CMS
CERN
CH
44
OvervIew
Grid Application Layer
Application Management
Database Management
Algorithm Registry
Job Management
Data Registering
Job Decomposition
Job Prediction
Data Reclustering
Collective Services
Information Monitoring
Replica Manager
Grid Scheduler
Service Index
Network Monitoring
Time Estimation
Replica Catalog
Grid Information
Load Balancing
Replica Optimisation
Underlying Grid Services
Remote Job Execution Services (GRAM)
Security Services (Auth Access Control)
Messaging Services (MPI)
File Transfer Services (GridFTP)
SQLDatabase Service (Metadata storage)
45
OvervIew
46
Documentsubmittedby Wolfgang Hoschekand Gavin
McCance
47
Meta Data Service
  • Provide Generic RDB Access
  • User Access via HTTPS
  • Decode XML with WSDL/Soap
  • Security Servlet Maps Roles
  • Command Translator
  • From generic to specific
  • Backend Types
  • Oracle, PostgreSQL, MySQL

48
Query Optimisation
  • Local minimisation of execution time by replica
    selection
  • Two phase minimisation
  • RB selects CE based on speculative cost
  • Job contacts RM (RO), and pins file
  • P2P starting point
  • inefficient
  • Optimisation gt Economic Model
  • Backwards (Vickery) Auction
  • Status simulation currently under development

49
Resource Broker
Computing Element
Optor
50
DataGrid Demonstrator
  • Sites involved CERN, CNAF, LYON, NIKHEF, RAL
  • User interface in X, dg-job-submit demo.jdl gt
  • job sent to the Workload Management System at
    CERN
  • The WMS selects a site according to resource
  • attributes given in the Job Definition Language
    (jdl) file and to the resources
  • published via the Information System (currently
    MDS)
  • The job is sent to one of the sites, a data file
    is written
  • the file is copied to the nearest Mass Storage
    and replicated on
  • all other sites
  • dg-job-get-output is used to retrieve the files

51
First steps...
Computing element gatekeeper Jobmanger-PBS/LSF/BQ
S Publish CPU resources
Worker Node
Submit job
CPU
Worker Node
Worker Node
Workload manager Information system
SITE-X
User ITF Node
Storage element gatekeeper Publish storage
resources
Storage
Client
File Catalog server
Resources provider
52
demo.jdl Executable demo.csh Arguments
none StdInput none StdOutput
demo.out StdError
demo.err InputSandbox demo.csh OutputSandb
oxdemo.out.demo.err,demo.log Requirements
other.OpSysRH 6.2
COMPUTING
CERN
STORAGE
COMPUTING
CNAF
STORAGE
COMPUTING
LYON
STORAGE
dg-job-submit demo.jdl
Workload manager Information system
User ITF Node
Input sanbox
Output sandbox
COMPUTING
dg-job-get-output job-id
data
NIKHEF
STORAGE
File catalog server
STORAGE
RAL
COMPUTING
53
GRID issues Coordination
  • Technical part is not the only problem
  • Sociological problems? resource sharing
  • Short-term productivity loss but long-term gain
  • Key? communication/coordination between
    people/centres/countries
  • This kind of world-wide close coordination across
    multi-national collaborations has never been done
    in the past
  • We need mechanisms here to make sure that all
    centres are part of a global planning
  • In spite of different conditions of funding,
    internal planning, timescales etc
  • The Grid organisation mechanisms should be
    complementary and not parallel or conflicting to
    existing experiment organisation
  • LCG-DataGRID-eSC-GridPP
  • BaBar-CDF-D0-ALICE-ATLAS-CMS-LHCb-UKQCD
  • Local Perspective build upon existing strong PP
    links in the UK to build a single Grid for all
    experiments.

54
Latest Info
55
Summary
Motivation Experimental Particle Physics
  • Unique funding environment
  • Particle Physics needs the Grid
  • Mutual Interest (leads to
    teamwork)
  • Emphasis on
  • Software Development
  • CERN lead Unique Local Identity
  • Extension of Open Source Ideas Grid
    Culture Academia Industry
  • Multidisciplinary Approach
    University Regional Basis
  • Use of Existing Structures
  • Large distributed databases a common
    problemchallenge
  • Now LHC
  • Early days opportunity to be involved in first
    Grid prototypes

GRIDPP
Write a Comment
User Comments (0)
About PowerShow.com