Tony Doyle University of Glasgow - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Tony Doyle University of Glasgow

Description:

Tony Doyle University of Glasgow – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 29
Provided by: scotg
Category:

less

Transcript and Presenter's Notes

Title: Tony Doyle University of Glasgow


1
Particle Physics and Grid Development
  • Joint Edinburgh/Glasgow SHEFC JREI-funded project
    to develop a prototype Tier-2 centre for LHC
    computing.
  • UK-wide project to develop a prototype Grid for
    Particle Physics Applications.
  • EU-wide project to develop middleware for
    Particle Physics, Bioinformatics and Earth
    Observation applications
  • Emphasis on local developments

2
Outline
  • Introduction
  • Grid Computing Context
  • Challenges
  • ScotGrid
  • Starting Point
  • Timelines
  • UK Context Tier-1 and -2 Centre Resources
  • How Does the Grid Work?
  • Middleware Development
  • Grid Data Management
  • Testbed
  • ScotGrid Tour
  • Hardware
  • Software
  • Web-Based Monitoring
  • Summary(s)

3
Grid Computing Context
  • LHC computing investment will be massive
  • LHC Review estimated 240MCHF
  • 80MCHF/y afterwards

Europe 267 institutes, 4603 usersElsewhere
208 institutes, 1632 users
Hardware
Middleware
Applications
Total Investment 1.5m
4
Rare Phenomena Huge Background
All interactions
9 orders of magnitude
The HIGGS
When you are face to face with a difficulty you
are up against a discovery Lord Kelvin
5
Challenges Event Selection
All interactions
9 orders of magnitude
The HIGGS
6
Challenges Complexity
  • Many events
  • 109 events/experiment/year
  • gt1 MB/event raw data
  • several passes required
  • Worldwide Grid computing requirement (2007)
  • 300 TeraIPS
  • (100,000 of todays fastest processors connected
    via a Grid)

Detectors
16 Million
channels
40 MHz
3
Gigacell
buffers
COLLISION RATE
Charge
Time
Pattern
100 kHz
LEVEL
-
1 TRIGGER
Energy
Tracks
1 MegaByte
EVENT DATA
1 Terabit/s
200 GigaByte
BUFFERS
(50000 DATA CHANNELS)
500 Readout memories
EVENT BUILDER
500 Gigabit/s
Networks
20
TeraIPS
EVENT FILTER
Gigabit/s
PetaByte
Grid Computing Service
SERVICE LAN
ARCHIVE
300
TeraIPS
  • Understand/interpret data via numerically
    intensive simulations
  • e.g. ATLAS Monte Carlo (gg H
    bb) 182 sec/3.5 MB event on 1 GHz linux box
    (current ScotGrid nodes)

7
LHC Computing Challenge
PBytes/sec
Event Builder
500 Gigabit/s
Event Filter20 TIPS
  • One bunch crossing per 25 ns
  • 100 triggers per second
  • Each event is 1 MByte

100 Gigabit/s
Tier 0
CERN Computer Centre gt20 TIPS
HPSS
Gigabit/s
Tier 1
RAL Regional Centre
US Regional Centre
French Regional Centre
Italian Regional Centre
HPSS
HPSS
HPSS
HPSS
Tier 2
Tier2 Centre 1 TIPS
Tier2 Centre 1 TIPS
Tier 2 Centre 1 TIPS
Gigabit/s
Tier 3
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels Data for these channels should be
cached by the institute server
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
100 - 1000 Mbits/sec
Tier 4
Workstations
8
Starting Point (Dec 2000)
2001 2002 2003
9
Timelines
IBM equipment arrived at Edinburgh and Glasgow
for Phase 1.
Phase 0. Equipment is tested and set up in a
basic configuration, networking the two sites.
Phase 1. Prototyping of the integrated local
computing fabric, with emphasis on scaling,
reliability and resilience to errors.
IBM equipment arrives at Edinburgh and Glasgow
for Phase 2.
LHC Global Grid TDR
50 prototype (LCG-3) available
ScotGRID 300 CPUs 50 TBytes
LCG-1 reliability and performance targets
First Global Grid Service (LCG-1) available
10
The Spirit of the Project
The JREI funds make it possible to commission and
fully exercise a prototype LHC computing centre
in Scotland
  • The Centre would develop, support and test
  • Technical service based on the grid
  • DataStore to handle samples of data for user
    analysis
  • Significant simulation production capability
  • Network connectivity (internal and external)
  • Grid middleware
  • Core software within LHCb and ATLAS
  • User applications in other scientific areas
  • This will enable us to answer
  • is the grid a viable solution for the LHC
    computing challenge?
  • Can a two-site T2 centre be set up and operate
    effectively?
  • How can network use between Edinburgh, Glasgow,
    RAL CERN be improved?

11
Tier-1 and -2 CentreResource Planning
  • Estimated resources at start
  • of GridPP2 (Sept. 2004)

Tier-1
Tier-2 e(6000 CPUs 400 TB) Tier-1 1000 CPUs
500 TB
Shared distributed resources required to meet
experiment requirements Connected by
network and grid
12
Testbed to ProductionTotal Resources
  • Dynamic Grid Optimisation via Replica
    Optimisation Service

2004 2007 7,000 1GHz
CPUs 30,000 1GHz CPUs
400 TB disk 2200 TB disk
(note x2 scale change)
13
Experiment Requirements UK only
Total Requirement
14
From Testbed to Production
Build System
Development Testbed 15CPU
Production Testbed 1000CPU
Certification Testbed 40CPU
Unit Test
Production
Build
Integration
Certification
add unit tested code to repository
Run nightly build auto. tests
Grid certification
Certified public release for use by apps.
Individual WP tests
Users
Build system
Test Group
Integration Team
Tagged package
WPs
Certified release selected for deployment
Application Certification
Overall release tests
Tagged release selected for certification
Fix problems
Apps. Representatives
Releases candidate
Releases candidate
Tagged Releases
Certified Releases
24x7
Problem reports
15
How Does theGrid Work?
1. Authentication grid-proxy-init 2. Job
submission edg-job-submit 3. Monitoring and
control edg-job-status edg-job-cancel edg-job-g
et-output 4. Data publication and
replication Replica Location Service, Replica
Optimisation Service 5. Resource scheduling use
of Mass Storage Systems JDL, sandboxes, storage
elements
0. Web User Interface
16
Middleware Development
17
Grid Data Management
  • Secure access to metadata
  • metadata where are the files on the grid?
  • database client interface
  • grid service using standard web services
  • develop with UK e-science programme
  • Input to OGSA-DAI
  • Optimised file replication
  • simulations required
  • economic models using CPU, disk, network inputs
  • OptorSim
  • Large increases in cost with questionable
    increases in performance can be tolerated only in
    race horses and fancy women Lord Kelvin

18
MetaData Spitfire
Servlet Container
SSLServletSocketFactory
RDBMS
Trusted CAs
TrustManager
Revoked Certsrepository
Secure? At the level required in Particle
Physics
Security Servlet
ConnectionPool
Authorization Module
Does user specify role?
Role repository
Translator Servlet
Role
Connectionmappings
Map role to connection id
Glasgow authors Will Bell, Gavin McCance
19
OptorSim File Replication Simulation
  • Test P2P file replication strategies
    e.g. economic models

3. Build in realistic JANET background traffic
1. Optimisation principles applied to GridPP
2004 testbed with realistic PP use
patterns/policies
4. Replication algorithms optimise CPU use/job
time as replicas are built up on the Grid.
2. Job scheduling Queue access cost takes into
account queue length and network connectivity.
Anticipate replicas needed at close sites using
three replication algorithms.
Glasgow authors Will Bell, David Cameron, Paul
Millar, Caitriana Nicholson
20
Testbed StatusSummer 2003
Tier-2 Regional Centres
ScotGrid
NorthGrid
UK-wide development using EU-DataGrid tools
(v1.47). Deployed during Sept 02-03. Currently
being upgraded to v2.0. See http//www.gridpp.ac.u
k/map/
SouthGrid
London Grid
21
Sequential Access via Metadata
SAM
SAM system went into Production Mode for CDF
on June 3, 2002.
Treat WAN as an abundant file transfer
resource Rick St Denis, Run II Computing
Review, (June 4-6, 2002).
Grid theme require metadata to enable
distributed resources e.g. CDF_at_Ggo to work
coherently.
Glasgow authors Morag Burgon-Lyon, Rick St.
Denis, Stan Thompson
22
Tour of ScotGrid
  • Hardware
  • 59 IBM X Series 330 dual 1 GHz Pentium III with
    2GB memory
  • 2 IBM X Series 340 dual 1 GHz Pentium III with
    2GB memory and dual ethernet
  • 3 IBM X Series 340 dual 1 GHz Pentium III with
    2GB memory and 100
    1000 Mbit/s ethernet
  • 1TB disk
  • LTO/Ultrium Tape Library
  • Cisco ethernet switches
  • New..
  • IBM X Series 370 PIII Xeon with 32 x 512 MB RAM
  • 5TB FastT500 disk 70 x 73.4 GB IBM FC Hot-Swap
    HDD
  • eDIKT 28 IBM blades dual 2.4 GHz Xeon with 1.5GB
    memory
  • eDIKT 6 IBM X Series 335 dual 2.4 GHz Xeon with
    1.5GB memory
  • CDF 10 Dell PowerEdge 2650 2.4 GHz Xeon with
    1.5GB memory
  • CDF 7.5TB Raid disk

Shared Resources Disk 15TB CPU 330 1GHz
23
Tour of ScotGrid
Ongoing Upgrade Programme
  • Software
  • OPENPBS Batch System
  • Job Description Language is Shell Scripts with
    special comments
  • Jobs submitted using qsub command
  • Location of job output determined by PBS shell
    script
  • Jobs scheduled using the maui plugin to OpenPBS
  • Frontend machines provide a Grid-based entry
    mechanism to this system
  • Remote users prepare jobs and submit them
    from e.g. their desktop
  • Users authenticate using X509 certificates
  • Users do not have personal accounts on
    e.g. ScotGRID-Glasgow but use pool
    accounts

Development/deployment System Manager David
Martin
EDG 1.4
CE
SE
59xWN
24
Web-BasedMonitoring
Accumulated CPU Use
Total Disk Use
ScotGrid reached its 500,000th processing hour on
Thursday 17th July 2003.
Documentation
Instantaneous CPU Use
Prototype
25
Duty cycle typically 70 Large
fluctuations Contention control? via target
shares
Total delivered CPU
LHC targets met Significant non-LHC application
use Bioinformatics/CDF resources being
integrated
UKQCD
26
Summary in UK-EU-World context..
  • 50/50 Edin/GU funding model, funded by SHEFC
  • compute-intensive jobs performed at GU
  • data-intensive jobs performed at Edin
  • Leading RD in Grid Data Management in UK
  • Open policy on usage and target shares
  • Open monitoring system
  • Meeting real requirements of applications
    currently HEP (Experiment and Theory),
    Bioinformatics, Computing Science
  • open source research (all code)
  • open source systems (IBM linux-based system)
  • part of a worldwide grid infrastructure through
    GridPP
  • GridPP Project (17m over three years -gt Sep 04)
  • Dedicated people actively developing a Grid All
    with personal certificates
  • Using the largest UK grid testbed (16 sites and
    more than 100 servers)
  • Deployed within EU-wide programme
  • Linked to Worldwide Grid testbeds
  • LHC Grid Deployment Programme Defined First
    International testbed in July
  • Active Tier-1/A Production Centre already meeting
    International Requirements
  • Latent Tier-2 resources being monitored
    ScotGRID recognised as leading developments
  • Significant middleware development programme
    importance of Grid Data Management

27
Summary
Hardware
Middleware
Applications
  • PPARC/SHEFC/University strategic investment
  • Software prototyping (Grid Data Management) and
    stress-testing (Applications)
  • Long-term commitment to Grid computing (LHC era)
  • Partnership with Bioinformatics, Computing
    Science, Edinburgh, Glasgow, IBM, Particle
    Physics
  • Working locally as part of National, European and
    International Grid development
  • middleware testbed linked to real applications
    via ScotGrid
  • Development/deployment

ScotGRID
Radio has no future X-rays will prove to be a
hoax
28
Where are we?
  • Development/deployment?
  • Depends on your perspective
  • (What tangled webs we weave when first we
    practise.. building grids)
Write a Comment
User Comments (0)
About PowerShow.com