Grid Tutorial - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Grid Tutorial

Description:

This talk is based on a module of the tutorials delivered by the EGEE ... Roberto Barbera, INFN. Ian Foster, Argonne National Laboratories. Jeffrey Grethe, SDSC ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 39
Provided by: rosy160
Category:

less

Transcript and Presenter's Notes

Title: Grid Tutorial


1
Grid Tutorial
  • Norbert Podhorszki

Part I.What are Grids and e-Science?
EGEE is funded by the European Union under
contract IST-2003-508833
2
Acknowledgements
  • This talk is based on a module of the tutorials
    delivered by the EGEE training team and slides
    from
  • Andrew Grimshaw, University of Virginia
  • Bob Jones, EGEE Technical Director
  • Mark Parsons, EPCC
  • the EDG training team
  • Roberto Barbera, INFN
  • Ian Foster, Argonne National Laboratories
  • Jeffrey Grethe, SDSC
  • The National e-Science Centre
  • David Fergusson, ???
  • Peter Kacsuk, MTA SZTAKI

3
Goals of Part I
  • Introduce grid concepts and definitions
  • Why Grids?
  • A brief outline of history leading to EGEE

4
Overview
  • What is different about grids?
  • Characteristics of a grid
  • eScience
  • Applications (whats in it for the working
    scientist)
  • European grids, and the world

5
What is different about grids?
6
What is Grid Computing?
  • A Virtual Organisation is
  • People from different institutions working to
    solve a common goal
  • Sharing distributed processing and data resources
  • Grid infrastructure enables virtual organisations

Grid computing is coordinated resource sharing
and problem solving in dynamic,
multi-institutional virtual organizations
(I.Foster)
7
Grids vs. Distributed Computing?
8
A Real World Distributed Application
  • SETI_at_home
  • 3.8M users in 226 countries
  • 1200 CPU years/day
  • 38 TF sustained (Japanese Earth Simulator is 40
    TF peak)
  • 1.7 ZETAflop over last 3 years (1021, beyond
    peta and exa )
  • Highly heterogeneous gt77 different processor
    types

Credit to Fran Berman
9
Grids vs. Distributed Computing
  • Distributed applications already exist, but they
    tend to be specialised systems intended for a
    single purpose or user group
  • Grids go further and take into account
  • Different kinds of resources
  • Not always the same hardware, data and
    applications
  • Different kinds of interactions
  • User groups or applications want to interact with
    Grids in different ways
  • Dynamic nature
  • Resources and users added/removed/changed
    frequently

10
Grid vs. metacomputing
11
Motivations
metacomputing
  • Grand challenge problems run weeks and months
    even on supercomputers and clusters
  • Various supercomputers/clusters must be connected
    by wide area networks in order to solve grand
    challenge problems in reasonable time

12
Original meaning of metacomputing
13
Distributed Supercomputing
Caltech Exemplar
  • Issues
  • Resource discovery, scheduling
  • Configuration
  • Multiple communiation methods
  • Message passing (MPI)
  • Scalability
  • Fault tolerance

NCSA Origin
Maui SP
Argonne SP
SF-Express Distributed Interactive Simulation
Caltech, USC/ISI
14
What is a Metacomputer?
  • A metacomputer is a collection of
  • computers
  • that are heterogeneous in every aspects
  • geographically distributed
  • connected by a wide-area network
  • form the image of a single computer
  • Metacomputing means
  • network based
  • distributed supercomputing

15
What is a Grid?
  • A Grid is a collection of
  • computers, storage and other devices
  • that are heterogeneous in every aspects
  • geographically distributed
  • connected by a wide-area network
  • form the image of a single computer
  • Generalised metacomputing means
  • network based
  • distributed computing

16
Distributed Supercomputing
Caltech Exemplar
NCSA Origin
  • Issues
  • Resource discovery, scheduling
  • Configuration
  • Multiple comm methods
  • Message passing (MPI)
  • Scalability
  • Fault tolerance

Maui SP
Argonne SP
SF-Express Distributed Interactive Simulation
Caltech, USC/ISI
17
High-Throughput Computing
  • Schedule many independent tasks
  • Parameter studies
  • Data analysis
  • Issues
  • Resource discovery
  • Data Access
  • Scheduling
  • Reservation
  • Security
  • Accounting
  • Code management

Deadline
Cost
Available Machines
Nimrod-G Monash University
18
Characteristics of a grid
19
What are the characteristics of a Grid system?
  • Numerous Resources

Connected by Heterogeneous, Multi-Level Networks
Ownership by Mutually Distrustful Organizations
Individuals
Different Security Requirements Policies
Required
Different Resource Management Policies
Potentially Faulty Resources
Geographically Separated
Resources are Heterogeneous
20
What are the characteristics of a Grid system?
  • Numerous Resources

Connected by Heterogeneous, Multi-Level Networks
Ownership by Mutually Distrustful Organizations
Individuals
Different Security Requirements Policies
Required
Different Resource Management Policies
Potentially Faulty Resources
Geographically Separated
Resources are Heterogeneous
21
How Different 2004 is from 1994
  • Moores law everywhere
  • Instruments, detectors, sensors, scanners,
  • Organising their effective use is the challenge
  • Enormous quantities of data Petabytes
  • For an increasing number of communities
  • Gating step is not collection but analysis
  • Huge quantities of computing gt100 Top/s
  • Moores law gives us all supercomputers
  • Organising their effective use is the challenge
  • Ultra-high-speed networks gt10 Gb/s
  • Global optical networks
  • Bottlenecks last kilometre firewalls

22
Exponential Growth
Optical Fibre(bits per second)
Doubling Time(months)
Gilders Law(32X in 4 yrs)
Data Storage(bits per sq. inch)
Storage Law (16X in 4yrs)
Performance per Dollar Spent
Chip capacity( transistors)
Moores Law(5X in 4yrs)
0 1 2
3 4 5
Number of Years
Triumph of Light Scientific American. George
Stix, January 2001
23
The main drivers behind Grid
  • The relentless increase in microprocessor
    performance
  • you can buy multi-gigaflop systems for less than
    800
  • The availability of reliable high performance
    networking
  • in Europe the GEANT network links 32 countries at
    speeds of up to 10Gbps (and beyond)
  • in the UK we have gone from 100Mbps -gt 10Gbps
    academic backbone since 2000
  • 1Gbps is commonly available to the desktop
  • The desire to push the boundaries of scientific
    discovery by computational analysis and
    simulation e-Science

24
eScience
25
The Emergence of e-Science
  • Invention and exploitation of advanced
    computational methods
  • To generate, curate and analyse research data
  • From experiments, observations and simulations
  • Quality management, preservation and reliable
    evidence
  • To develop and explore models and simulations
  • Computation and data at extreme scales
  • Trustworthy, economic, timely and relevant
    results
  • To enable dynamic distributed virtual
    organisations
  • Facilitating collaboration with information and
    resource sharing
  • Security, reliability, accountability,
    manageability and agility

26
Why use Grids for Science?
  • Scale of the problems
  • Science increasingly done through distributed
    global collaborations enabled by the internet
  • Grids provide access to
  • Very large data collections
  • Terascale computing resources
  • High performance visualisation
  • Connected by high-bandwidth networks
  • e-Science is more than Grid Technology

It is what you do with it that counts
27
Challenges
  • Must share data between thousands of scientists
    with multiple interests
  • Must ensure that all data is accessible anywhere,
    anytime
  • Must be scalable and remain reliable for more
    than a decade
  • Must cope with different access policies
  • Must ensure data security

28
The Grid Vision
The Grid networked data processing centres and
middleware software as the glue of resources.
29
The Emergence of Global Knowledge Communities
Slide from Ian Fosters ssdbm 03 keynote
30
Applications (Whats in it for working
scientists)
31
Grid Applications
  • Medical/Healthcare (imaging, diagnosis and
    treatment )
  • Bioinformatics (study of the human genome and
    proteome to understand genetic diseases)
  • Nanotechnology (design of new materials from the
    molecular scale)
  • Engineering (design optimization, simulation,
    failure analysis and remote Instrument access and
    control)
  • Natural Resources and the Environment (weather
    forecasting, earth observation, modeling and
    prediction of complex systems)

32
CERN Data intensive science in a large
international facility
  • The Large Hadron Collider (LHC)
  • The most powerful instrument ever built to
    investigate elementary particles physics
  • Data Challenge
  • 10 Petabytes/year of data !!!
  • 20 million CDs each year!
  • Simulation, reconstruction, analysis
  • LHC data handling requires computing power
    equivalent to 100,000 of today's fastest PC
    processors!

Mont Blanc (4810 m)
Downtown Geneva
33
CrossGrid
  • 1. Interactive biomedical simulation and
    visualization
  • 2. Flooding crisis team support
  • 3. HEP distributed data analysis
  • 4. Weather forecasting and air pollution
    modelling

34
Connecting People Access Grid
Remote video
Visualisation
Microphones
Cameras
35
European grids And the world
36
Major EU GRID projects
  • European DataGrid (EDG) www.edg.org
  • LHC Computing GRID (LCG) cern.ch/lcg
  • CrossGRID
    www.crossgrid.org
  • DataTAG
    www.datatag.org
  • GridLab
    www.gridlab.org
  • EUROGRID
    www.eurogrid.org
  • European National Projects
  • INFNGRID,
  • UK e-Science Programme,
  • NorduGrid

37
EU DataGrid at a glance
Application Testbed 20 regular sites gt 60,000
jobs submitted (since 09/03, release 2.0) Peak
gt1000 CPUs 6 Mass Storage Systems
People 500 registered users 12 Virtual
Organisations 21 Certificate Authorities gt600
people trained 456 person-years of effort170
years funded
Software gt 65 use cases 7 major software releases
(gt 60 in total) gt 1,000,000 lines of code
Scientific Applications 5 Earth Obs
institutes 10 bio-medical apps 6 HEP experiments
38
Grid projects
  • Many Grid development efforts all over the
    world
  • UK OGSA-DAI, RealityGrid, GeoDise,
    Comb-e-Chem, DiscoveryNet, DAME, AstroGrid,
    GridPP, MyGrid, GOLD, eDiamond, Integrative
    Biology,
  • Netherlands VLAM, PolderGrid
  • Germany UNICORE, Grid proposal
  • France Grid funding approved
  • Italy INFN Grid
  • Eire Grid proposals
  • Switzerland - Network/Grid proposal
  • Hungary DemoGrid, Grid proposal
  • Norway, Sweden - NorduGrid
  • NASA Information Power Grid
  • DOE Science Grid
  • NSF National Virtual Observatory
  • NSF GriPhyN
  • DOE Particle Physics Data Grid
  • NSF TeraGrid
  • DOE ASCI Grid
  • DOE Earth Systems Grid
  • DARPA CoABS Grid
  • NEESGrid
  • DOH BIRN
  • NSF iVDGL
  • DataGrid (CERN, ...)
  • EuroGrid (Unicore)
  • DataTag (CERN,)
  • Astrophysical Virtual Observatory
  • GRIP (Globus/Unicore)
  • GRIA (Industrial applications)
  • GridLab (Cactus Toolkit)
  • CrossGrid (Infrastructure Components)
  • EGSO (Solar Physics)
Write a Comment
User Comments (0)
About PowerShow.com