October, 2003 - PowerPoint PPT Presentation

About This Presentation

Title:

October, 2003

Description:

From Clusters to Grids. 2. Agenda. Grid Computing Background. Legion. Existing Systems & Standards ... Compute Grids. Data Grids. What is a Grid? ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 42

Provided by: mmi62

Category:

Tags: october

more less

Transcript and Presenter's Notes

Title: October, 2003

1
From Clusters to Grids

October, 2003 Linkoping, Sweden
Andrew Grimshaw
Department of Computer Science, Virginia
CTO Founder Avaki Corporation

2
Agenda

Grid Computing Background
Legion
Existing Systems Standards
Summary

3
Grid Computing
4
First What is a Grid System?

A Grid system is a collection
of distributed resources
connected by a network

Examples of Distributed Resources
Desktop
Handheld hosts
Devices with embedded processing resources such
as digital cameras and phones
Tera-scale supercomputers

5
What is a Grid?
A grid is all about gathering together resources
and making them accessible to users and
applications.

A grid enables users to collaborate securely
by sharing processing, applications, and data
across heterogeneous systems and administrative
domains for collaboration, faster application
execution and easier access to data.
Compute Grids
Data Grids

6
What are the characteristics of a Grid system?

Numerous Resources

Ownership by Mutually Distrustful Organizations
Individuals
Connected by Heterogeneous, Multi-Level Networks
Different Security Requirements Policies
Required
Different Resource Management Policies
Potentially Faulty Resources
Geographically Separated
Resources are Heterogeneous
7
What are the characteristics of a Grid system?

Numerous Resources

Simple
Secure
Scalable
Extensible
Site Autonomy
Persistence I/O
Multi-Language
Legacy Support
Single Namespace
Transparency
Heterogeneity
Fault-tolerance Exception Management

Manage Complexity!!
9
ImplicationComplexity is THE Critical
Challenge
How should complexity be addressed?
10
As Application Complexity Increases, Differences
Between the Systems Increase Dramatically
High-level versus low-level solutions
High
Time Cost
Robustness
Low
Low
High
11
The Importance of Integration in a Grid
Architecture

If separate pieces are used, then the programmer
must integrate the solutions.
If all the pieces are not present, then the
programmer must develop enough of the missing
pieces to support the application.

Bottom Line Both raise the bar by putting the
cognitive burden on the programmer.
12
Misconceptions about Grids

Simple cycle aggregation
State of the state is essentially scheduling and
queuing for CPU cluster management
These definitions are selling short the promise
of Grid technology
AVAKI believes grids are not just about
aggregating and scheduling CPU cycles but also
Virtualizing many types of resources, internally
and across domains
Empowering anyone to have secure access to any
and all resources through easy administration

13
Compute Grids Categories

Sons of SETI_at_home
United Devices, Entropia, Data Synapse
Low-end, desktop cycle aggregation
Hard sell in corporate America
Cluster Load Management
LSF, PBS, SGE
High end, great for management of local clusters
but not well proven in multi-cluster environments
As soon as you go outside of the local cluster to
cross-domain multi-cluster, the game changes
dramatically with the introduction of three major
issues
Data
Security
Administration

To address these issues, you need a
fully-integrated solution, or a toolkit to build
one
14
Typical Grid Scenarios

Global Grids
Multiple enterprises, owners,
platforms, domains, file systems,
locations, and security policies
Legion, Avaki, Globus

Enterprise Grids
Single enterprise multiple owners, platforms,
domains, file systems, locations, and security
policies
SUN SGE EE, Platform Multi-cluster

Cluster Departmental Grids
Single owner, platform,
domain, file system and location
SUN SGE, Platform LSF, PBS

Desktop Cycle Aggregation
Desktop only
United Devices,
Entropia,
Data Synapse

15
What are grids being used for today?

Multiple sites with multiple data sources (public
and private)
Need secure access to data and applications for
sharing
Have partnership relationships with other
organizations internal, partners, or customers
Computationally challenging applications
Distributed RD groups across company, networks
and geographies
Staging large files
Want to utilize and leverage heterogeneous
compute resources
Need for accounting of resources
Need to handle multiple queuing systems
Considering purchasing compute cycles for spikes
in demand

16
Legion
17
Legion Grid Software
Users
Applications
Legion G R I D
Wide-area access to data, processing and
application resources in a single, uniform
operating environment that is secure and easy to
administer
Load Mgmt Queuing
Load Mgmt Queuing
Server
Application
Data
Desktop Server
Cluster
Application
Data
Server
Data
Department A
Partner
Vendor
Department B
18
Legion Combines Data and Compute Grid
Users
Applications
Legion G R I D
Compute
Data
Load Mgmt Queuing
Load Mgmt Queuing
Server
Application
Data
Desktop Server
Cluster
Application
Data
Server
Data
Department A
Partner
Vendor
Department B
19
The Legion Data Grid
20
Data Grid
Users
Applications
Legion G R I D
Wide-area access to data at its source location
based on business policies, eliminating manual
copying and errors caused by accessing
out-of-date copies
Server
Desktop Server
Application
Data
Cluster
Application
Data
Server
Data
Department A
Partner
Vendor
Department B
21
Data Grid Share
Legion Data Grid transparently handles client and
application requests, maps them to the global
namespace, and returns the data
Users
Applications
Data mapped to Grid namespace via Legion ExportDir
Linux
NT
Solaris
Solaris
Headquarters
Informatics Partner
Tools Vendor
Research Center
22
Data Grid Access

Access files using
standard NFS
protocol or Legion
commands
- NFS security issues eliminated
- Caches exploit semantics
Access files using
global name
Access based on
specified privileges

Users
Applications
Data
PM-1
sequence_a
Cluster HQ - 1
Server RD - 2
App_A
sequence_b
sequence_c
Cluster
BLAST
Headquarters
Informatics Partner
Tools Vendor
Research Center
23
Data Grid Access using virtual NFS

Complexity Servers Clients
Clients mount grid
Servers share files to grid
Clients access data using
NFS protocol
Wide-area access to data
outside administrative
domain

Data
sequence_a
sequence_c
Department A
Department B
Partner
24
Keeping Data in the grid

Legion storage servers
Data is copied into Legion storage servers that
execute on a set of hosts.
The particular set hosts used is a configuration
option - here five hosts are used
Access to the different files is completely
independent and asynchronous
Very high sustained read/write bandwidth is
possible using commodity resources

Local Disk
Local Disk
Local Disk
Local Disk
Local Disk
25
I/O Performance
Read performance in NFS, Legion-NFS, and Legion
I/Olibraries. The x axis indicates the number of
clients that simultaneously perform 1 MB reads on
10 MB files, and the y axis indicates total read
bandwidth. All results are the average of
multiple runs. All clients on 400 MHZ Intels,
NFS server on 800 MHZ Intel server.
26
Data Grid Benefits

Easy, convenient, wide-area access to data
regardless of location, administrative domain or
platform
Eliminates time-consuming copying and obtaining
accounts on machines where data resides
Provides access to the most recent data available
Eliminates confusion and errors caused by
inconsistent naming of data
Caches remote data for improved performance
Requires no changes to legacy or commercial
applications
Protects data with fine-grained security and
limits access privileges to those required
Eases data administration and management
Eases migration to new storage technologies

27
The Legion Compute Grid
28
Compute Grid
Users
Applications
Legion G R I D
Wide-area access to processing resources based on
business policies, managing utilization of
processing resources for fast, efficient job
completion
Server
Desktop Server
Application
Application
Data
Cluster
Application
Server
Data
Department A
Partner
Vendor
Department B
29
Compute Grid Access

The grid
Locates resources
Authenticates and
grants access privileges
Stages applications and data
Detects failures and recovers
Writes output to specified
location
Accounts for usage

Users
Applications
Compute
Application
Data
App_A
BLAST
Scheduling, Queuing, Usage Management,
Accounting, Recovery
NT Server PM-1
Data
Cluster HQ - 1
Solaris Server RD - 2
App_A
Data
Data
Linux Cluster
BLAST
Headquarters
Informatics Partner
Tools Vendor
Research Center
30
Tools - All are cross-platform

legion_make - remote builds
Fault-tolerant MPI libraries
post-mortem debugger
console objects
parallel 2D file objects
Collections

MPI
P-space studies - multi-run
Parallel C
Parallel object-based Fortran
CORBA binding
Object migration
Accounting

31
One Favorite
32
Related Work
33
Related Work

Avaki
All distributed systems literature
Globus
AFS/DFS
LSF, PBS, .
Global Grid Forum - OGSA

34
Avaki Company Background

Grid Pioneers - a Legion spin-off
Over 20M capitalization
The only commercial grid software provider with a
solution that addresses data access, security,
and compute power challenges
Standards efforts leader

Standards Organizations
Partners
Customers
35
AFS/DFS comparison with Legion Data Grid

AFS presumes that all files kept in AFS - no
federation with other file systems. Legion allows
data to be kept in Legion, or in an NFS, XFS,
PFS, or Samba file system.
AFS presumes all sites using Kerberos and that
realms trust each other - Legion assumes
nothing about local authentication mechanism and
there is no need for cross-realm trust
AFS semantics are fixed - copy on open - Legion
can support multiple semantics. Default is Unix
semantics.
AFS volume oriented (sub-trees) - Legion can be
volume oriented or file oriented
AFS caching semantics not extensible - Legion
caching semantics are extensible

36
Legion Globus GT2

Projects with many common goals
Metacomputing (or the Grid)
Middleware for wide-area systems
Heterogeneous resource sets
Disjoint administrative domains
High-performance, large-scale applications

37
Legion Specific Goals

Shared collaborative environment including shared
file system
Fault-tolerance and high-availability
Both HPC applications and distributed
applications
Complete security model including access control
Extensible
Integrated - create a meta-operating system

38
Many Similar Features

Resource Management Support
Message-passing libraries
e.g., MPI
Distributed I/O Facilities
Globus GASS/remote I/O vs. Avaki Data Grid
Security Infrastructure

39
Globus

The toolkit approach
Provide services as separate libraries
E.g. Nexus, GASS, LDAP
Pros
Decoupled architecture
easy to add new services into the mix
Low buy-in use only what you like!
In practice all the pieces use each other
Cons
No unifying abstractions
very complex environment to learn in full
composition of services difficult as number of
services grows
Interfaces keep changing due to ever evolving
design
Does not cover space of problems

40
Standards GGF

Background
Grid standards are now being developed at the
Global Grid Forum (GGF)
In-development standard, Open Grid Services
Infrastructure (OGSI) will extend Web Services
(SOAP/XML, WSDL, etc.)
Names and a two level name scheme
Factories and lifetime management
Mandatory set of interfaces, e.g., discovery
interfaces
OGSA Open Grid Services Architecture
Over-arching architecture
Still in development

41
Summary

Grids are about resource federation and sharing
Grids are here today. They are being used in
production computing in industry to solve real
problems and provide real value.
Compute Grids
Data Grids
We believe that users want high-level
abstractions - and dont want to think about the
grid.
Need low activation energy and legacy support
There are a number of challenges to be solved -
and different applications and organizations want
to solve them differently
Policy heterogeneity
Strong separation of policy and mechanism
Several areas where really good policies are
still lacking
Scheduling
Security and security policy interactions
Failure recovery (and the interaction of
different policies)