Computational Data Grids - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Computational Data Grids

Description:

Gravity wave searches (e.g., LIGO, GEO, VIRGO) Astronomical sky surveys (e.g., Sloan Sky Survey) ... standard man. Physics Colloquium (Feb. 20, 2001) Paul ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 51
Provided by: paula92
Category:

less

Transcript and Presenter's Notes

Title: Computational Data Grids


1
  • Computational Data Grids
  • Solving the Problems of Data Intensive Science

Paul Avery University of Florida http//www.phys.u
fl.edu/avery/ avery_at_phys.ufl.edu
FSU Physics Colloquium Feb. 20, 2001
http//www.phys.ufl.edu/avery/griphyn/talks/aver
y_fsu_20feb01.ppt
2
What is a Grid?
  • Grid Geographically distributed computing
    resources configured for coordinated use
  • Physical resources networks provide raw
    capability
  • Middleware software ties it together

3
Why Grids?
  • Resources for complex problems are distributed
  • Advanced scientific instruments (accelerators,
    telescopes, )
  • Large amounts of storage
  • Large amounts of computing
  • Groups of smart people
  • Communities require access to common services
  • Scientific collaborations (physics, astronomy,
    biology, eng. )
  • Government agencies
  • Health care organizations
  • Large corporations
  • Other reasons
  • Resource allocations vary between individual
    institutions
  • Resource configurations change

4
Distributed Computation SETI_at_home
  • Community SETI researchers enthusiasts
  • Arecibo radio data sent to users (250KB data
    chunks)
  • Over 2M PCs used

5
Distributed Computation Optimization
  • Community mathematicians computer scientists
  • Exact solution of nug30
  • 30-Site Quadratic Assignment Problem
  • 2 ? computation of USA13509 (trav. salesman,
    13,509 cities)
  • ½ ? computation of factoring composite number
    ?10150
  • 32 year-old problem
  • Condor-G distributed compu-ting over several
    institutions
  • Delivered 4005 CPU days in 7 days (650 average,
    1009 peak)
  • Parallel computers, workstations, clusters (8
    sites, US-Italy)

14, 5, 28, 24, 1, 3, 16, 15, 10, 9, 21, 2, 4, 29,
25, 22, 13, 26, 17, 30, 6, 20, 19, 8, 18, 7, 27,
12, 11, 23
6
Distributed ComputationEvaluation of AIDS Drugs
  • Community
  • 1000s of home computer users
  • Philanthropic computing vendor (Entropia)
  • Research group (Scripps)
  • Common goal
  • Advance AIDS research

7
Grids Next Generation Web
Software catalogs
Computers
Grid Flexible, high-performance access to all
significant resources
Sensor nets
Colleagues
Data archives
On-demand creation of powerfulvirtual computing
systems
From Ian Foster
8
Grid Challenges
  • Overall goal
  • Coordinated sharing of resources
  • Numerous technical problems
  • Authentication, authorization, policy, auditing
  • Resource discovery, access, allocation, control
  • Failure detection recovery
  • Brokering
  • Additional issue lack of central control
    knowledge
  • Need to preserve local site independence
  • Policy discovery and negotiation important
  • Many interesting failure modes

9
Grids Why Now?
  • Improvements in internet infrastructure
  • Increasing bandwidth
  • Advanced services
  • Increased availability of compute/storage
    resources
  • Dense web server-clusters, supercomputers, etc.
  • Cheap storage 1 Terabyte
  • Advances in application concepts
  • Collaborative science and engineering
  • Distributed analysis and simulation
  • Advanced scientific instruments
  • Remote control room (UF - Mauna Kea)
  • ...

10
Todays Information Infrastructure
O(106) nodes
Network
  • Network-centric
  • Simple, fixed end systems
  • Few embedded capabilities
  • Few services
  • No user-level quality of service

11
Tomorrows Information Infrastructure
O(109) nodes
Caching
ResourceDiscovery
Processing
QoS
  • Application-centric
  • Heterogeneous, mobile end-systems
  • Many embedded capabilities
  • Rich services
  • User-level quality of service

Qualitatively different,not just faster
andmore reliable
12
Simple View of Grid Services
Apps
Rich set of applications
App Toolkits
Remote viz toolkit
Remote comp. toolkit
Remote data toolkit
Remote sensors toolkit
Remote collab. toolkit
...
Grid Services
Protocols, authentication, policy, resource
management, instrumentation, data discovery, etc.
Grid Fabric
Archives, networks, computers, display devices,
etc. associated local services

Globus Project http//www.globus.org/
From Ian Foster
13
Example Online Instrumentation
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
real-time collection
archival storage
tomographic reconstruction
DOE X-ray grand challenge ANL, USC/ISI, NIST,
U.Chicago
From Ian Foster
14
Emerging Production GridsEarthquake Engineering
Simulation
  • NEESgrid Argonne, Michigan, NCSA, UIUC, USC
  • National infrastructure to couple earthquake
    engineers with experimental facilities,
    databases, computers
  • On-demand access to experi-ments, data streams,
    computing, archives, collaboration

http//www.neesgrid.org/
15
Data IntensiveScience
16
Fundamental IT Challenge
  • Scientific communities of thousands, distributed
    globally, and served by networks with bandwidths
    varying by orders of magnitude, need to extract
    small signals from enormous backgrounds via
    computationally demanding (Teraflops-Petaflops)
    analysis of datasets that will grow by at least 3
    orders of magnitude over the next decade from
    the 100 Terabyte to the 100 Petabyte scale.

17
Data Intensive Science 2000-2015
  • Scientific discovery increasingly driven by IT
  • Computationally intensive analyses
  • Massive data collections
  • Rapid access to large subsets
  • Data distributed across networks of varying
    capability
  • Dominant factor data growth (1 Petabyte 1000
    TB)
  • 0.5 Petabyte in 2000
  • 10 Petabytes by 2005
  • 100 Petabytes by 2010
  • 1000 Petabytes by 2015?

18
Data Intensive Sciences
  • High energy nuclear physics
  • Gravity wave searches (e.g., LIGO, GEO, VIRGO)
  • Astronomical sky surveys (e.g., Sloan Sky Survey)
  • Virtual Observatories
  • Earth Observing System
  • Climate modeling
  • Geophysics
  • Computational chemistry?

19
Data Intensive Biology and Medicine
  • Radiology data
  • X-ray sources (APS crystallography data)
  • Molecular genomics (e.g., Human Genome)
  • Proteomics (protein structure, activities, )
  • Simulations of biological molecules in situ
  • Human Brain Project
  • Global Virtual Population Laboratory (disease
    outbreaks)
  • Telemedicine
  • Etc.

20
Example High Energy Physics
Compact Muon Solenoid at the LHC (CERN)
Smithsonianstandard man
21
LHC Computing Challenges
  • Events resulting from beam-beam collisions
  • Signal event is obscured by 20 overlapping
    uninteresting collisions in same crossing
  • CPU time does not scale from previous generations

2000
2006
22
LHC Higgs Decay into 4 muons
109 events/sec, selectivity 1 in 1013
23
LHC Computing Challenges
  • Complexity of LHC environment and resulting data
  • Scale Petabytes of data per year (100 PB by
    2010)
  • Geographical distribution of people and resources

1800 Physicists 150 Institutes 32 Countries
24
Example National Virtual Observatory
Multi-wavelength astronomy,Multiple surveys
25
NVO Data Challenge
  • Digital representation of the sky
  • All-sky deep fields
  • Integrated catalog and image databases
  • Spectra of selected samples
  • Size of the archived data
  • 40,000 square degrees
  • 2 trillion pixels (1/2 arcsecond)
  • One band (2 bytes/pixel) 4 Terabytes
  • Multi-wavelength 10-100 Terabytes
  • Time dimension Few Petabytes

26
NVO Computing Challenges
  • Large distributed database engines
  • Gbyte/s aggregate I/O speed
  • High speed (10 Gbits/s) backbones
  • Cross-connecting the major archives
  • Scalable computing environment
  • 100s of CPUs for statistical analysis and
    discovery

27
(No Transcript)
28
GriPhyN Institutions
  • U Florida
  • U Chicago
  • Boston U
  • Caltech
  • U Wisconsin, Madison
  • USC/ISI
  • Harvard
  • Indiana
  • Johns Hopkins
  • Northwestern
  • Stanford
  • U Illinois at Chicago
  • U Penn
  • U Texas, Brownsville
  • U Wisconsin, Milwaukee
  • UC Berkeley
  • UC San Diego
  • San Diego Supercomputer Center
  • Lawrence Berkeley Lab
  • Argonne
  • Fermilab
  • Brookhaven

29
GriPhyN App. Science CS Grids
  • GriPhyN Grid Physics Network
  • US-CMS High Energy Physics
  • US-ATLAS High Energy Physics
  • LIGO/LSC Gravity wave research
  • SDSS Sloan Digital Sky Survey
  • Strong partnership with computer scientists
  • Design and implement production-scale grids
  • Investigation of Virtual Data concept (fig)
  • Integration into 4 major science experiments
  • Develop common infrastructure, tools and services
  • Builds on existing foundations PPDG project,
    Globus tools
  • Multi-year project ? 70M total cost ? NSF
  • RD
  • Tier 2 center hardware, personnel (fig)
  • Networking?

30
Data Grid Hierarchy
Tier0 CERNTier1 National LabTier2 Regional
Center at UniversityTier3 University
workgroupTier4 Workstation
  • GriPhyN
  • RD
  • Tier2 centers
  • Unify all IT resources

31
LHC Grid Hierarchy
Experiment
PBytes/sec
Online System
100 MBytes/sec
Bunch crossing per 25 nsecs.100 triggers per
secondEvent is 1 MByte in size
CERN Computer Center 20 TIPS
Tier 0 1
  • HPSS

2.5 Gbits/sec
France Center
Italy Center
UK Center
USA Center
Tier 1
2.5 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
100 - 1000 Mbits/sec
Physics data cache
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels
Tier 4
Workstations,other portals
32
Tier2 Architecture and Cost (2006)
  • CPU Farm (32K SI95) 150K - 270K
  • RAID Array (150 TB) 215K - 355K
  • Data Server 60K - 140K
  • LAN Switches 60K
  • Small Tape Library 40K
  • Tape Media and Consumables 20K
  • Installation Infrastructure 30K
  • Collaborative Tools Infrastructure 20K
  • Software licenses 40K
  • Total Estimated Cost (First Year) 635K 955K
  • Require small (1.5 2) FTE support per Tier2

33
Tier 2 Site 2001 (One Version)
GEth Switch
VRVS MPEG2
FEth Switch
FEth Switch
FEth Switch
FEth Switch
FEth
GbEth
RAID
Router
Data Server
OC-3
OC-12
OC-3
34
GriPhyN RD Funded
  • NSF results announced Sep. 13, 2000
  • 11.9M from NSF Information Technology Research
    Program
  • 1.4M in matching from universities
  • Largest of all ITR awards
  • Scope of ITR funding
  • Major costs for people, esp. students, postdocs
  • 2/3 CS 1/3 application science
  • Industry partnerships needed to realize scope
  • Microsoft, Intel, IBM, Sun, HP, SGI, Compaq,
    Cisco
  • Education and outreach
  • Reach non-traditional students and other
    constituencies
  • University partnerships
  • Grids natural for integrating intellectual
    resources from all locations

35
GriPhyN Philosophy
  • Fundamentally alters conduct of scientific
    research
  • Old People, resources flow inward to labs
  • New Resources, data flow outward to universities
  • Strengthens universities
  • Couples universities to data intensive science
  • Couples universities to national international
    labs
  • Brings front-line research to students
  • Exploits intellectual resources of formerly
    isolated schools
  • Opens new opportunities for minority and women
    researchers
  • Builds partnerships to drive new IT/science
    advances
  • Physics ? Astronomy
  • Application Science ? Computer Science
  • Universities ? Laboratories
  • Fundamental sciences ? IT infrastructure
  • Research Community ? IT industry

36
GriPhyN Research Agenda
  • Virtual Data technologies (fig.)
  • Derived data, calculable via algorithm (e.g.,
    most HEP data)
  • Instantiated 0, 1, or many times
  • Fetch vs execute algorithm
  • Very complex (versions, consistency, cost
    calculation, etc)
  • Planning and scheduling
  • User requirements (time vs cost)
  • Global and local policies resource availability
  • Complexity of scheduling in dynamic environment
    (hierarchy)
  • Optimization and ordering of multiple scenarios
  • Requires simulation tools, e.g. MONARC

37
Virtual Datain Action
  • Data request may
  • Compute locally
  • Compute remotely
  • Access local data
  • Access remote data
  • Scheduling based on
  • Local policies
  • Global policies
  • Local autonomy

38
Research Agenda (cont.)
  • Execution management
  • Co-allocation of resources (CPU, storage, network
    transfers)
  • Fault tolerance, error reporting
  • Agents (co-allocation, execution)
  • Reliable event service across Grid
  • Interaction, feedback to planning
  • Performance analysis
  • Instrumentation and measurement of all grid
    components
  • Understand and optimize grid performance
  • Simulations (MONARC project at CERN)
  • Virtual Data Toolkit (VDT)
  • VDT virtual data services virtual data tools
  • One of the primary deliverables of RD effort
  • Ongoing activity feedback from experiments (5
    year plan)
  • Technology transfer mechanism to other scientific
    domains

39
GriPhyN PetaScale Virtual Data Grids
Production Team
Individual Investigator
Workgroups
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid
  • Resource
  • Security and
  • Other Grid

Security and
Management
  • Management
  • Policy
  • Services

Policy
Services
Services
  • Services
  • Services

Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
40
Model Architecture for Data Grids
Attribute Specification
Metadata Catalog
Replica Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MDS
Replica Selection
Selected Replica
NWS
GridFTP commands
Performance Information Predictions
Disk Cache
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
From Ian Foster
41
Cluster Engineering
  • Cluster management and software
  • Cluster-wide upgrades for software and operating
    system
  • Low-cost and low-personnel management
  • Queuing software
  • Performance monitoring tools (web-based)
  • Fault-tolerance, healing
  • Remote management

42
Cluster Engineering (cont)
  • Cluster performance
  • High-speed and distributed file systems
  • Scaling to very large clusters (100s 1000s)
  • Evolution of high speed I/O bus technologies
  • Linux vs. other operating systems
  • Multi-processor architectures
  • Lightweight multi-gigabit/s LAN protocols
  • New switching/routing technologies
  • Instrumentation to understand performance

43
Other Grid Research/Engineering
  • General grid research problems
  • Scheduling in complex, distributed environment
  • Wide area networks
  • High performance and distributed databases
  • Replication of databases at remote sites
    (caching)
  • Portable execution environments
  • Enterprise level fault tolerance
  • Monitoring

44
Remote Database Replication (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
  • First Round Goal Optimized cached read access
    to 10-100 Gbytes drawn from a total data set of
    0.1 to 1 Petabyte
  • Matchmaking, Co-Scheduling SRB, Condor, Globus
    services HRM, NWS

Multi-Site Cached File Access Service
45
Simulation of Analysis Model (MONARC)
3000 SI95sec/event 1 job / year
RAW Data
3000 SI95sec/event 3 jobs / year
Reconstruction
Experiment- Wide Activity (109 events)
New detector calibrations Or understanding
Re-processing 3 Times per year
25 SI95sec/event 20 jobs/month
5000 SI95sec/event
Monte Carlo
Trigger based and Physics based refinements
Iterative selection Once per month
Selection
20 Groups Activity (109 ? 107 events)
10 SI95sec/event 500 jobs / day
25 Individual per Group Activity (106 108
events)
Analysis
Different Physics cuts MC comparison Once per
day
Algorithms applied to data to get results
46
Major Data Grid Projects
  • Earth System Grid (DOE Office of Science)
  • Data Grid technologies, climate applications
  • http//www.scd.ucar.edu/css/esg/
  • Particle Physics Data Grid (DOE Science)
  • Data Grid applications for high energy physics
    experiments
  • http//www.ppdg.net/
  • European Data Grid (EU)
  • Data Grid technologies deployment in EU
  • http//grid.web.cern.ch/grid/
  • GriPhyN (NSF)
  • Investigation of Virtual Data concept
  • Integration into 4 major science experiments
  • Broad application to other disciplines via
    Virtual Data Toolkit
  • http//www.griphyn.org/

47
The Grid Landscape Today
  • Data Grid hierarchy is baseline computing
    assumption for large experiments
  • Recent development 6 months
  • Strong interest from other research groups
  • Nuclear physics (ALICE experiment at LHC)
  • VO community in Europe
  • Gravity wave community in Europe
  • Collaboration of major data grid projects
  • GriPhyN, PPDG, EU DataGrid
  • Develop common infrastructure, collaboration
  • Major meeting in Amsterdam March 4

48
The Grid Landscape Today (cont)
  • New players
  • CAL-IT2 (112M)
  • Distributed Teraflop Facility, NSF-0151 (45M)
  • UK-PPARC (40M)
  • Other national initiatives
  • iVDGL international Virtual-Data Grid
    Laboratory
  • For national, international scale Grid tests,
    operations
  • Initially US UK EU this year
  • Add other world regions later
  • Talks with Japan, Russia, China, South America,
    India, Pakistan
  • New GriPhyN proposal to NSF ITR2001 (15M)
  • Additional iVDGL deployment
  • Integration of grid tools in applications
  • Toolkit support
  • Deployment of small systems at small colleges
    (E/O)

49
Grid References
  • Grid Book
  • www.mkp.com/grids
  • Globus
  • www.globus.org
  • Global Grid Forum
  • www.gridforum.org
  • PPDG
  • www.ppdg.net
  • EU DataGrid
  • grid.web.cern.ch/grid/
  • GriPhyN
  • www.griphyn.org
  • www.phys.ufl.edu/avery/griphyn

50
Summary
  • Grids will qualitatively and quantitatively
    change the nature of collaborations and
    approaches to computing
  • Grids offer major benefits to data intensive
    sciences
  • Many challenges during the coming transition
  • New grid projects will provide rich experience
    and lessons
  • Difficult to predict situation even 3-5 years
    ahead
Write a Comment
User Comments (0)
About PowerShow.com