The Promise of Computational Grids in the LHC Era - PowerPoint PPT Presentation

About This Presentation
Title:

The Promise of Computational Grids in the LHC Era

Description:

The Promise of Computational Grids in the LHC Era. Paul Avery. University of Florida ... Central lab cannot manage / help 1000s of users ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 21
Provided by: Stickl4
Category:

less

Transcript and Presenter's Notes

Title: The Promise of Computational Grids in the LHC Era


1
  • The Promise of Computational Grids in the LHC Era
  • Paul AveryUniversity of FloridaGainesville,
    Florida, USA
  • avery_at_phys.ufl.eduhttp//www.phys.ufl.edu/avery/
  • CHEP 2000Padova, ItalyFeb. 7-11, 2000

2
LHC Computing Challenges
  • Complexity of LHC environment and resulting data
  • Scale Petabytes of data per year
  • Geographical distribution of people and resources

Example CMS 1800 Physicists 150 Institutes 32
Countries
3
Dimensioning / Deploying IT Resources
  • LHC computing scale is something new
  • Solution requires directed effort, new
    initiatives
  • Solution must build on existing foundations
  • Robust computing at national centers essential
  • Universities must have resources to maintain
    intellectual strength, foster training, engage
    fresh minds
  • Scarce resources are/will be a fact of life ?
    plan for it
  • Goal get new resources, optimize deployment of
    all resources to maximize effectiveness
  • CPU CERN / national lab / region / institution /
    desktop
  • Data CERN / national lab / region / institution
    / desktop
  • Networks International / national / regional /
    local

4
Deployment Considerations
  • Proximity of datasets to appropriate IT resources
  • Massive ? CERN national labs
  • Data caches ? Regional centers
  • Mini-summary ? Institutional
  • Micro-summary ? Desktop
  • Efficient use of network bandwidth
  • Local gt regional gt national gt international
  • Utilizing all intellectual resources
  • CERN, national labs, universities, remote sites
  • Scientists, students
  • Leverage training, education at universities
  • Follow lead of commercial world
  • Distributed data, web servers

5
Solution A Data Grid
  • Hierarchical grid best deployment option
  • Hierarchy ? Optimal resource layout (MONARC
    studies)
  • Grid ? Unified system
  • Arrangement of resources
  • Tier 0 ? Central laboratory computing resources
    (CERN)
  • Tier 1 ? National center (Fermilab / BNL)
  • Tier 2 ? Regional computing center (university)
  • Tier 3 ? University group computing resources
  • Tier 4 ? Individual workstation/CPU
  • We call this arrangement a Data Grid to reflect
    the overwhelming role that data plays in
    deployment

6
Layout of Resources
  • Want good impedance match between Tiers
  • TierN-1 serves TierN
  • TierN big enough to exert influence on TierN-1
  • TierN-1 small enough to not duplicate TierN
  • Resources roughly balanced across Tiers

Reasonable balance?
7
Data Grid Hierarchy (Schematic)
Tier 0 (CERN)
3
3
3
3
3
T2
T2
3
T2
Tier 1
3
3
T2
T2
3
3
3
3
3
3
8
2.4 Gbps
Tier 3 Univ WG 1
Tier 1 FNAL/BNL 70k Si95 70 Tbytes Disk Robot
2.4 Gbps
Tier 3 Univ WG 2
Tier 2 Center 20k Si95 25 Tbytes Disk, Robot
N ? 622 Mbits/s
  • Optional Air Freight

CERN (CMS/ATLAS) 350k Si95 350 Tbytes Disk Robot
Tier 3 Univ WG M
622Mbits/s
622 Mbits/s
622 Mbits/s
US Model Circa 2005
9
Data Grid Hierarchy (CMS)
1 TIPS 25,000 SpecInt95 PC (today) 10-20
SpecInt95
PBytes/sec
Online System
100 MBytes/sec
Offline Farm20 TIPS
Bunch crossing per 25 nsecs. 100 triggers per
second Event is 1 MByte in size
100 MBytes/sec
Tier 0
CERN Computer Center
622 Mbits/sec
Tier 1
Fermilab4 TIPS
France Regional Center
Italy Regional Center
Germany Regional Center
2.4 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Physicists work on analysis channels. Each
institute has 10 physicists workingon one or
more channels Data for these channels is cached
by the institute server
Physics data cache
1-10 Gbits/sec
Tier 4
Workstations
10
Why a Data Grid Physical
  • Unified system all computing resources part of
    grid
  • Efficient resource use (manage scarcity)
  • Averages out spikes in usage
  • Resource discovery / scheduling / coordination
    truly possible
  • The whole is greater than the sum of its parts
  • Optimal data distribution and proximity
  • Labs are close to the data they need
  • Users are close to the data they need
  • No data or network bottlenecks
  • Scalable growth

11
Why a Data Grid Political
  • Central lab cannot manage / help 1000s of users
  • Easier to leverage resources, maintain control,
    assert priorities regionally
  • Cleanly separates functionality
  • Different resource types in different Tiers
  • Funding complementarity (NSF vs DOE)
  • Targeted initiatives
  • New IT resources can be added naturally
  • Additional matching resources at Tier 2
    universities
  • Larger institutes can join, bringing their own
    resources
  • Tap into new resources opened by IT revolution
  • Broaden community of scientists and students
  • Training and education
  • Vitality of field depends on University / Lab
    partnership

12
Tier 2 Regional Centers
  • Possible Model CERNNationalTier 2 ? 1/3 1/3
    1/3
  • Complementary role to Tier 1 lab-based centers
  • Less need for 24 ? 7 operation ? lower component
    costs
  • Less production-oriented ? respond to analysis
    priorities
  • Flexible organization, i.e. by physics goals,
    subdetectors
  • Variable fraction of resources available to
    outside users
  • Range of activities includes
  • Reconstruction, simulation, physics analyses
  • Data caches / mirrors to support analyses
  • Production in support of parent Tier 1
  • Grid RD
  • ...

13
Distribution of Tier 2 Centers
  • Tier 2 centers arranged regionally in US model
  • Good networking connections to move data (caches)
  • Location independence of users always maintained
  • Increases collaborative possibilities
  • Emphasis on training, involvement of students
  • High quality desktop environment for remote
    collaboration, e.g., next generation VRVS system

14
Strawman Tier 2 Architecture
  • Linux Farm of 128 Nodes 0.30 M
  • Sun Data Server with RAID Array 0.10 M
  • Tape Library 0.04 M
  • LAN Switch 0.06 M
  • Collaborative Infrastructure 0.05 M
  • Installation and Infrastructure 0.05 M
  • Net Connect to Abilene network 0.14 M
  • Tape Media and Consumables 0.04 M
  • Staff (Ops and System Support) 0.20 M
  • Total Estimated Cost (First Year) 0.98 M
  • Cost in Succeeding Years, for evolution, 0.68
    Mupgrade and ops
  • 1.5 2 FTE support required per Tier 2.
    Physicists from institute also aid in support.

15
Strawman Tier 2 Evolution
  • 2000 2005
  • Linux Farm 1,500 SI95 20,000 SI95
  • Disks on CPUs 4 TB 20 TB
  • RAID Array? 1 TB 20 TB
  • Tape Library 1 TB 50 - 100 TB
  • LAN Speed 0.1 - 1 Gbps 10 - 100 Gbps
  • WAN Speed 155 - 622 Mbps 2.5 - 10 Gbps
  • Collaborative MPEG2 VGA Realtime
    HDTVInfrastructure (1.5 - 3 Mbps) (10 - 20
    Mbps)
  • ? RAID disk used for higher availability data
  • Reflects lower Tier 2 component costs due to
    less demanding usage, e.g. simulation.

16
The GriPhyN Project
  • Joint project involving
  • US-CMS, US-ATLAS
  • LIGO Gravity wave experiment
  • SDSS Sloan Digital Sky Survey
  • http//www.phys.ufl.edu/avery/mre/
  • Requesting funds from NSF to build worlds first
    production-scale grid(s)
  • Sub-implementations for each experiment
  • NSF pays for Tier 2 centers, some RD, some
    networking
  • Realization of unified Grid system requires
    research
  • Many common problems for different
    implementations
  • Requires partnership with CS professionals

17
R D Foundations I
  • Globus (Grid middleware)
  • Grid-wide services
  • Security
  • Condor (see M. Livny paper)
  • General language for service seekers / service
    providers
  • Resource discovery
  • Resource scheduling, coordination, (co)allocation
  • GIOD (Networked object databases)
  • Nile (Fault-tolerant distributed computing)
  • Java-based toolkit, running on CLEO

18
R D Foundations II
  • MONARC
  • Construct and validate architectures
  • Identify important design parameters
  • Simulate extremely complex, dynamic system
  • PPDG (Particle Physics Data Grid)
  • DOE / NGI funded for 1 year
  • Testbed systems
  • Later program of work incorporated into GriPhyN

19
The NSF ITR Initiative
  • Information Technology Research Program
  • Aimed at funding innovative research in IT
  • 90M in funds authorized
  • Max of 12.5M for a single proposal (5 years)
  • Requires extensive student support
  • GriPhyN submitted preproposal Dec. 30, 1999
  • Intend that ITR fund most of our Grid research
    program
  • Major costs for people, esp. students / postdocs
  • Minimal equipment
  • Some networking
  • Full proposal due April 17, 2000

20
Summary of Data Grids and the LHC
  • Develop integrated distributed system, while
    meeting LHC goals
  • ATLAS/CMS production, data handling oriented
  • (LIGO/SDSS computation, commodity component
    oriented)
  • Build, test the regional center hierarchy
  • Tier 2 / Tier 1 partnership
  • Commission and test software, data handling
    systems, and data analysis strategies
  • Build, test the enabling collaborative
    infrastructure
  • Focal points for student-faculty interaction in
    each region
  • Realtime high-res video as part of collaborative
    environment
  • Involve students at universities in building the
    data analysis, and in the physics discoveries at
    the LHC
Write a Comment
User Comments (0)
About PowerShow.com