IBM and GRID Computing - PowerPoint PPT Presentation

About This Presentation
Title:

IBM and GRID Computing

Description:

How the Linux and Grid Communities can Build the Next-Generation Internet Platform Ian Foster Argonne National Lab University of Chicago Globus Project – PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 29
Provided by: BrianC211
Learn more at: https://www.mcs.anl.gov
Category:

less

Transcript and Presenter's Notes

Title: IBM and GRID Computing


1
How the Linux and Grid Communities can Build the
Next-Generation Internet Platform
Ian Foster Argonne National Lab University of
ChicagoGlobus Project
2
Ottawa Linux Symposium, July 24, 2003
  • Linux has gained tremendous traction as a
    server operating system. However, a variety of
    technology trends, the Grid being one, are
    converging to create a service-based future in
    which functions such as computing and storage are
    virtualized and services and resources are
    increasingly integrated within and across
    enterprises. The servers that will power this
    sort of environment will require new capabilities
    including high scalability, integrated resource
    management, and RAS. I discuss what I see as
    development priorities if Linux is to retain its
    leadership role as a server operating system.

3
The (Power) GridOn-Demand Access to Electricity
Quality, economies of scale
Time
4
By Analogy, A Computing Grid
  • Decouple production and consumption
  • Enable on-demand access
  • Achieve economies of scale
  • Enhance consumer flexibility
  • Enable new devices
  • On a variety of scales
  • Department
  • Campus
  • Enterprise
  • Internet

5
Requirements
  • Dynamically link resources/services
  • From collaborators, customers, eUtilities,
    (members of evolving virtual organization)
  • Into a virtual computing system
  • Dynamic, multi-faceted system spanning
    institutions and industries
  • Configured to meet instantaneous needs, for
  • Multi-faceted QoX for demanding workloads
  • Security, performance, reliability,

6
For ExampleReal-Time Online Processing
Applications Delivery
Application Services Distribution
Servers Execution
7
Examples of Linux-Based GridsHigh Energy Physics
  • Production Run on the Integration Testbed
  • Simulate 1.5 million full CMS events for physics
    studies 500 sec per event on 850 MHz processor
  • 2 months continuous running across 5 testbed
    sites
  • Managed by a single person at the US-CMS Tier 1

8
Examples of Linux-Based GridsEarthquake
Engineering
U.Nevada Reno
www.neesgrid.org
9
Grid Technologies Community
  • Grid technologies developed since mid-90s
  • Product of work on resource sharing for
    scientific collaboration commercial adoption
  • Open source Globus Toolkit has emerged as a de
    facto standard
  • International community of contributors
  • Thousands of deployments worldwide
  • Commercial support providers
  • Global Grid Forum serves as a community and
    standards body
  • Home to recent OGSA work

10
The Emergence ofOpen Grid Standards
Increased functionality, standardization
Custom solutions
1990
1995
2000
2005
2010
11
Open Grid Services Infrastructure (OGSI)
Resource allocation
Create Service
Authentication Authorization are applied to all
requests
Grid Service Handle
Service factory
Service requestor (e.g. user application)
Service data Keep-alives Notifications Service
invocation
Service discovery
Register Service
Service registry
Service instances
Interactions standardized using WSDL and SOAP
12
Open Grid Services Architecture

Users in Problem Domain X
Applications in Problem Domain X

Application Integration Technology for Problem
Domain X

Generic Virtual Service Access and Integration
Layer

OGSA










OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources




-

Distributed

Virtual Integration Architecture
13
But Its Not Turtles All the Way Down
  • Our ability to deliver virtualized services
    efficiently and with desired QoX ultimately
    depends on the underlying platform!
  • At multiple levels, including but not limited to
  • Dynamic provisioning resource management
  • Reliability, availability, manageability
  • Performance and parallelism
  • New demands on the OS in each area

14
(1) Dynamic Provisioning
  • Static provisioning dedicates resources
  • Typical of co-lo hosting
  • Reprovision manually as needed
  • But load is dynamic
  • Must overprovision for surges
  • High variable cost of capacity
  • Need dynamic provisioning toachieve true
    economies of scale
  • Load multiplexing
  • Tradeoff cost vs. quality
  • Service level agreements
  • Dynamic resource recruitment

15
Load Is Dynamic
  • ibm.com external site
  • February 2001
  • Daily fluctuations (3x)
  • Workday cycle
  • Weekends off

M T W Th F S S
  • World Cup soccer site
  • May-June 1998
  • Seasonal fluctuations
  • Event surges (11x)
  • ita.ee.lbl.gov

Week 6 7 8
16
For ExampleEnergy-Conscious Provisioning
  • Light load concentrate traffic on a minimal set
    of servers
  • Step down surplus servers to low-power state
  • APM and ACPI
  • Activate surplus servers on demand
  • Wake-On-LAN
  • Browndown provision for a specified energy
    target
  • Even smarter also manage air conditioning

17
Power Management via MUSEIBM Trace Run (Before)
Power draw (watts) Latency (ms50)
Throughput (requests/s)
1 ms
MUSE Jeff Chase et al., Duke University (SOSP
2003)
18
Power Management via MUSEIBM Trace Run (After)
1 ms
MUSE Jeff Chase et al., Duke University (SOSP
2003)
19
Dynamic Provisioning OS Issues
  • Hot plug memory, CPU, and I/O
  • For partitioning, core virtualization
    capabilities
  • Security
  • Containment data integrity in a virtualized
    environment user-mode Linux?
  • Scheduler improvements for resource and workload
    management
  • Allocate for required resource consumption
  • Dynamic, sub processor logical partitioning
  • Improved instrumentation accounting
  • Determine actual resource consumption

20
(2) Reliability, Availability, Manageablity
  • Error log and diagnostics frameworks
  • Foundation for automated error analysis and
    recovery of distributed remote systems
  • Enable problem determination, automated
    reconfiguration, localization of failure
  • Configuration management
  • Determine hardware configuration/inventory
  • Apply/remove service/support patches
  • Isolate failing components quickly

21
(3) Performance and ParallelismE.g., Data
Integration
  • Assume
  • Remote data at 1 GB/s
  • 10 local bytes per remote
  • 100 operations per byte

gt1 GByte/s achievable today (FAST, 7 streams,
LA?Geneva)
Local Network
Parallel computation 1000 Gop/s
Remote data
Wide area link (end-to-end switched lambda?) 1
GB/s
Parallel I/O 10 GB/s
22
Performance and Parallelism
  • Distributed/cluster/parallel file systems
  • Optimized TCP/IP stacks
  • Scheduling of computation communication
  • Web100 configuration instrumentation

23
Web100 Overcome TCP/IP Wizard Gap
24
Web100 Kernel Instrument Set
  • Definition
  • Set of instruments designed to collect as much of
    the information as possible to enable a user to
    isolate the performance problems of a TCP
    connection
  • How it is implemented
  • Each instrument is a variable in a "stats"
    structure that is linked through the kernel
    socket structure
  • Linux /proc interface is used to expose these
    instruments outside the kernel

25
For Example
  • Recent transAtlantic transfer showed frequent
    drops in data rate
  • But no loss or retransmit
  • Web100 identified problem as Linux send stall
    congestion events

26
Grid/Linux CooperationWe Have Testbeds, Users,
Applications
27
Evolution of the Server
Increased Flexibility (and Complexity)
Significant implications for the underlying
operating system
Time
28
Summary
  • The Grid community is creating middlewarefor
    distributed resource service sharing
  • Open source software for resource service
    virtualization, service management/integration
  • Motivated by wonderful applications
  • But we need help from the OS
  • Linux the next-generation Internet platform?
  • Could be but significant evolution is required
    to address provisioning/resource management
    availability, manageability performance and
    parallelism and other issues
  • Grid community can provide testbeds, users,
    requirements, applications

29
For More Information
  • The Globus Project
  • www.globus.org
  • Global Grid Forum
  • www.ggf.org
  • Background information
  • www.mcs.anl.gov/foster
  • GlobusWORLD 2004
  • www.globusworld.org
  • Jan 2023, San Fran

2nd Edition November 2003
Write a Comment
User Comments (0)
About PowerShow.com