IBM and GRID Computing - PowerPoint PPT Presentation

About This Presentation

Title:

IBM and GRID Computing

Description:

How the Linux and Grid Communities can Build the Next-Generation Internet Platform Ian Foster Argonne National Lab University of Chicago Globus Project – PowerPoint PPT presentation

Number of Views:185

Avg rating:3.0/5.0

Slides: 29

Provided by: BrianC211

Learn more at: https://www.mcs.anl.gov

Category:

more less

Transcript and Presenter's Notes

Title: IBM and GRID Computing

1
How the Linux and Grid Communities can Build the
Next-Generation Internet Platform
Ian Foster Argonne National Lab University of
ChicagoGlobus Project
2
Ottawa Linux Symposium, July 24, 2003

Linux has gained tremendous traction as a
server operating system. However, a variety of
technology trends, the Grid being one, are
converging to create a service-based future in
which functions such as computing and storage are
virtualized and services and resources are
increasingly integrated within and across
enterprises. The servers that will power this
sort of environment will require new capabilities
including high scalability, integrated resource
management, and RAS. I discuss what I see as
development priorities if Linux is to retain its
leadership role as a server operating system.

3
The (Power) GridOn-Demand Access to Electricity
Quality, economies of scale
Time
4
By Analogy, A Computing Grid

Decouple production and consumption
Enable on-demand access
Achieve economies of scale
Enhance consumer flexibility
Enable new devices
On a variety of scales
Department
Campus
Enterprise
Internet

5
Requirements

Dynamically link resources/services
From collaborators, customers, eUtilities,
(members of evolving virtual organization)
Into a virtual computing system
Dynamic, multi-faceted system spanning
institutions and industries
Configured to meet instantaneous needs, for
Multi-faceted QoX for demanding workloads
Security, performance, reliability,

6
For ExampleReal-Time Online Processing
Applications Delivery
Application Services Distribution
Servers Execution
7
Examples of Linux-Based GridsHigh Energy Physics

Production Run on the Integration Testbed
Simulate 1.5 million full CMS events for physics
studies 500 sec per event on 850 MHz processor
2 months continuous running across 5 testbed
sites
Managed by a single person at the US-CMS Tier 1

8
Examples of Linux-Based GridsEarthquake
Engineering
U.Nevada Reno
www.neesgrid.org
9
Grid Technologies Community

Grid technologies developed since mid-90s
Product of work on resource sharing for
scientific collaboration commercial adoption
Open source Globus Toolkit has emerged as a de
facto standard
International community of contributors
Thousands of deployments worldwide
Commercial support providers
Global Grid Forum serves as a community and
standards body
Home to recent OGSA work

10
The Emergence ofOpen Grid Standards
Increased functionality, standardization
Custom solutions
1990
1995
2000
2005
2010
11
Open Grid Services Infrastructure (OGSI)
Resource allocation
Create Service
Authentication Authorization are applied to all
requests
Grid Service Handle
Service factory
Service requestor (e.g. user application)
Service data Keep-alives Notifications Service
invocation
Service discovery
Register Service
Service registry
Service instances
Interactions standardized using WSDL and SOAP
12
Open Grid Services Architecture

Users in Problem Domain X
Applications in Problem Domain X

Application Integration Technology for Problem
Domain X

Generic Virtual Service Access and Integration
Layer

OGSA

OGSI Interface to Grid Infrastructure

Compute, Data Storage Resources

-

Distributed

Virtual Integration Architecture
13
But Its Not Turtles All the Way Down

Our ability to deliver virtualized services
efficiently and with desired QoX ultimately
depends on the underlying platform!
At multiple levels, including but not limited to
Dynamic provisioning resource management
Reliability, availability, manageability
Performance and parallelism
New demands on the OS in each area

14
(1) Dynamic Provisioning

Static provisioning dedicates resources
Typical of co-lo hosting
Reprovision manually as needed
But load is dynamic
Must overprovision for surges
High variable cost of capacity
Need dynamic provisioning toachieve true
economies of scale
Load multiplexing
Tradeoff cost vs. quality
Service level agreements
Dynamic resource recruitment

15
Load Is Dynamic

ibm.com external site
February 2001
Daily fluctuations (3x)
Workday cycle
Weekends off

M T W Th F S S

World Cup soccer site
May-June 1998
Seasonal fluctuations
Event surges (11x)
ita.ee.lbl.gov

Week 6 7 8
16
For ExampleEnergy-Conscious Provisioning

Light load concentrate traffic on a minimal set
of servers
Step down surplus servers to low-power state
APM and ACPI
Activate surplus servers on demand
Wake-On-LAN
Browndown provision for a specified energy
target
Even smarter also manage air conditioning

17
Power Management via MUSEIBM Trace Run (Before)
Power draw (watts) Latency (ms50)
Throughput (requests/s)
1 ms
MUSE Jeff Chase et al., Duke University (SOSP
2003)
18
Power Management via MUSEIBM Trace Run (After)
1 ms
MUSE Jeff Chase et al., Duke University (SOSP
2003)
19
Dynamic Provisioning OS Issues

Hot plug memory, CPU, and I/O
For partitioning, core virtualization
capabilities
Security
Containment data integrity in a virtualized
environment user-mode Linux?
Scheduler improvements for resource and workload
management
Allocate for required resource consumption
Dynamic, sub processor logical partitioning
Improved instrumentation accounting
Determine actual resource consumption

20
(2) Reliability, Availability, Manageablity

Error log and diagnostics frameworks
Foundation for automated error analysis and
recovery of distributed remote systems
Enable problem determination, automated
reconfiguration, localization of failure
Configuration management
Determine hardware configuration/inventory
Apply/remove service/support patches
Isolate failing components quickly

21
(3) Performance and ParallelismE.g., Data
Integration

Assume
Remote data at 1 GB/s
10 local bytes per remote
100 operations per byte

gt1 GByte/s achievable today (FAST, 7 streams,
LA?Geneva)
Local Network
Parallel computation 1000 Gop/s
Remote data
Wide area link (end-to-end switched lambda?) 1
GB/s
Parallel I/O 10 GB/s
22
Performance and Parallelism

Distributed/cluster/parallel file systems
Optimized TCP/IP stacks
Scheduling of computation communication
Web100 configuration instrumentation

23
Web100 Overcome TCP/IP Wizard Gap
24
Web100 Kernel Instrument Set

Definition
Set of instruments designed to collect as much of
the information as possible to enable a user to
isolate the performance problems of a TCP
connection
How it is implemented
Each instrument is a variable in a "stats"
structure that is linked through the kernel
socket structure
Linux /proc interface is used to expose these
instruments outside the kernel

25
For Example

Recent transAtlantic transfer showed frequent
drops in data rate
But no loss or retransmit
Web100 identified problem as Linux send stall
congestion events

26
Grid/Linux CooperationWe Have Testbeds, Users,
Applications
27
Evolution of the Server
Increased Flexibility (and Complexity)
Significant implications for the underlying
operating system
Time
28
Summary

The Grid community is creating middlewarefor
distributed resource service sharing
Open source software for resource service
virtualization, service management/integration
Motivated by wonderful applications
But we need help from the OS
Linux the next-generation Internet platform?
Could be but significant evolution is required
to address provisioning/resource management
availability, manageability performance and
parallelism and other issues
Grid community can provide testbeds, users,
requirements, applications

29
For More Information