Title: Grid Basics CS7803 Notes
1 Grid BasicsCS-780-3 Notes
- In courtesy of Chaman Singh Verma
2Outline
- Introduction to Grid
- Grid applications
- Grid Architecture
- Synergy with other technologies
- Discussion
- Total Slides 49
- Paper Authors Ian Foster , Carl Kasselman
- Steven Tuecke Jeffrey M. Nick
3Power Generation
- Past
- Till the end of 19th Century, power
generation was considered a local luxury. Only
rich could generate them into their backyards. - Present
- We take electricity for granted, without
knowing the sources and complexities of
distribution. We use the services and pay for it.
Many of the countries provide high Quality of
Service. - Future
- We want to break the 19th century model in
computer usages. We want - to provide a service model in computation and
storage similar to power - generation.
4What is Grid ? Checklist
- A Grid is a system that
- Coordinates resources that are not subject to
centralized control (not for each single node) - Uses standard, open, general-purpose protocols
and interfaces. - Provide high quality of services
- Reference What is the Grid ? By Ian Foster
5Grid A Virtual Organization
- Grid resource sharing paradigm has greater
scope than P2P system. Grid implicitly allow
direct access to computers, software, data and
any other resources. - Both providers and consumers define clearly
what they will share, who can share and
conditions under which sharing will take place. - A set of individuals and/or institutions
defined by such sharing rules form what we call
Virtual Organization.
6Grid An Evolution, not revolution
Source IBM Grid Computing
- Grid can be seen as the latest and most
complete evolution of more familiar - development.
- Like the Web
- Grid keeps complexity hidden
multiple users enjoy a single unified experience. - Unlike the Web
- enables full collaboration toward
real business goal. - Like Peer-to-Peer
- It allows user to share files.
- Unlike Peer-to-Peer
- Not only files, but everything which
could be shared . - Like Clusters and distributed computing
- It bring computing resource together.
- Unlike Clusters and distributed Computing
- Grid can be geographically distributed
and heterogeneous. - Like Virtualization technologies
- enables virtualization of IT
resources. - Unlike Virtualization technologies
- It can enable virtualization of vast
and disparate resources.
7Originally Targeted Applications
- What types of applications will grid be used for
? - Distributed Supercomputing
- High-throughput Computing
- Cracking cryptosystems
- On-demand Computing
- NetSolve, large archives
- Data-Intensive Computing
- Sloan Digital Sky Survey, Weather forecasting
- Collaborative Computing
- Insors, GriPhyN, SciRUN
8Grid Problem Defined
- Grid problem is defined as Coordinated resource
sharing and problem solving in dynamic,
multi-institutional virtual organizations. - The sharing raises many issues which were not
addressed by distributed computing for example - How to structure flexible transient
relationships. - How to structure fine grained access control over
resources taking care of local and global
policies. - How to agree on quality of service, scheduling
and co-allocation.
9Top 500 Supercomputers (June 2003)
Earth Simulator NEC Yokohama 35.86 TFlops
ASCI Q LANL Los Alamos HP Alphaserver SC
13.88 TFlops
MCR Linux Cluster LLNL Livermore, 7.634 TFlops
ASCI White LLNL, Livermore IBM SP Power3, 7.304
TFlops
Seaborg NERSC/LBNL, Berkeley, IBM SP Power3,
7.303 TFlops
Source http//www.top500.org
10Latest News Nov 8,2003
- Virginia Tech. Big Mac replaced 3rd position. It
consists of 1100 Macintosh PCs and performed 17
TFlops.
11General highlights from Top 500 (June 2003)
- 157 systems reported to have peak performance
above 1 TFlops. - Total accumulated performance is 375 TFlops. ( up
from 293 TFlops ) - Entry level performance is 245.1 GFlops. (Up from
195.8) -
- A Total of 119 systems (up from 56) uses Intel
processors. - 149 systems are now labeled as clusters ( up from
53 ) - 23 of them are self-made ( Up from 14 )
- Among top 10, 7 from US, 2 from Japan, 1 from
France.
12Economics and Control
- The infrastructures are very expensive and
require years of hard work. - The shear force of economics will require that
these resources are under strict control and are
optimally utilized. - Many times freedom is costly and chaotic.
-
- This is the starting what we call Grid
Computing
13Changing face of Enterprise Computing
- Most of the recent, enterprise systems are
collection of heterogeneous resources. - Quality of services traditionally associated with
mainframe centric computing are now essential to
the effective conduct of e-business across
distributed resources, inside as well as outside
the enterprise. - Recently there is upsurge of services providers
of various types such as web-hosting SP, storage
SP, application SP - All these require standardization.
14-
- Birds Eye view
-
- In the next few slides, we will get some
broader picture followed by technical details.
15Web Services Architecture
Universal Description, Discovery and Integration
(UDDI) allows us to find Web Services which meet
certain requirements.
Web Services Description Language Web-Services
must be Self-describing and should Tell the
invoker about operations it supports and How to
invoke it.
Simple Object Access Protocol Message passing
between client and server using SOAP.
Note UDDI, WSDL, SOAP and HTTP are just an
examples. Different implementations can use
different technologies.
16A Typical Web Service Invocation
17End Users perspective
18Stateless machines
The above model is stateless. It can not remember
what is done from one invocation to
another. One client can mess up the another
clients operations.
19Factories
- The concept of factories solves the problems
mentioned earlier. - Make Grid Stateful Machine
- Create transient services
20Web Service Application
Client and Server stubs are generated
automatically from the specifications.
21Technical Details
- Service A service is a network-enabled entity
that provides a specific capability. ( example
the ability to move files, create processes or
verifying access rights. - Service protocols behavior
- Grid services are defined by OGSA ( Open Grid
Services Architecture). (OpenGrid Forum) - Grid services are specified by OGSI ( Open Grid
Services Infrastructure) - Globus Toolkit is the most popular open
implementation of OGSA.
22Major Players in Grid Service World
23Example from NetSolve
- Suppose you want to multiply Matrix A and
Matrix B. There is one site which provides the
facility. You may want to directly integrate the
function in your software. -
- request netsolve( matmul, a, b)
- C netsolve( wait, request)
24Nature of Grid Architecture
- Grid architecture is a set of protocols for
establishment, management and usage of dynamic,
cross-organizational virtual organizations. - The main issues in the architecture are
- Interoperability
- Standard Protocols
- Services
- Application Programming Interface( API) and
Software Development Kits (SDK)
25Hourglass Model
- Narrow neck of glass defines
- a small set of core abstractions
- and protocols. It consists of
- protocols for
- Connectivity
- Resource Management
- These protocols must be chosen
- so as to capture the fundamental
- mechanism of sharing across
- many different types.
26Grid Architecture
Fabric layer implements the local, resource
Specific operations that occurs on specific
Resources. Connectivity protocols are concerned
with communication and authentication. Resource
protocols are concerned with negotiating access
to individual resources Collective protocols and
services are concerned with coordinating use of
multiple resources.
27General list of services
- Identity Authentication
- Authorization policy
- Resource discovery
- Resource characterization
- Resource allocation
- Co-reservation, workflow
- High-Speed data transfer
- Remote data access
- Performance guarantees
- Monitoring
- Adaptation
- Intrusion detection
- Resource Management
- Accounting and payment
- Fault management
28Resource Management
- At the minimum the following resource should
be - available for query
- Computational
- Mechanism for starting program, monitoring
and controlling the execution, advanced
reservations, hardware and software
characteristics, state information such as
current load etc. - Storage
- Mechanism for putting and getting files,
state information such as available space and
bandwidth utilization. - Network
- Mechanism for control over resource
allocation for network transfer, information
about network characteristics and load - Code Repositories
- Management for versioned source and object
code. ( CVS style) - Catalogs
29Connectivity Layer
- This layer defines core communications and
authentication protocols. - Communication protocols enable the exchange of
data between different fabric layers. It include
transport, routing and naming services. - Authentications protocols build on
communication services to provide
cryptographically secure mechanisms for verifying
the identity of users and resources.
30Authentication Characteristics
- Single sign on
- Single log on should be sufficient for
access to multiple grid resources. - Delegation
- run a program on users behalf.
- Integration with local security
- example Kerberos or Unix security
- User-based trust relationships.
- If an user uses services from multiple
service providers at the same time, the security
mechanism should not require that each of the
resource providers to cooperate and interact with
each other.
31Resource layer
- It is built on top of communications. It
defines protocols for - Secure negotiation
- Initiations
- Monitoring
- Control
- Accounting
- Payment for sharing resources.
32Resource Layer
- Information protocols are used for obtaining
information about structure and state of a
resource. ( current load, usage policy,
configuration etc) - Management protocols are used to negotiate
access to shared resource, specifying resource
requirements - Advanced reservation
- Quality of service.
- Operations to perform
33Collective Coordinating Multiple Resources
- Directory Services
- A user may query for resource by name and/or
by its attributes such as type, availability,
load. - Co-allocation, scheduling and brokering services
- allow VO participants to request for some
specific resources for some specific purpose and
duration. - Monitoring and Diagnostic services
- allows monitoring for resource failure, attacks,
overload etc - Data replication services
- allows management of VO storage to maximize
data access performance with respect to some
metric such as response time, reliability and
cost.
34Collective
- Grid-enabled programming systems
- enable familiar programming models to be used
in Grid environment using other grid services
such as resource discovery, security etc. etc. - example Globus MPI
- Workload management and collaboration
- Allow problem solving environment.
- Software discovery
- allows selection of the best software
implementations and execution platform. Example
NetSolve and Ninf - Accounting and payment services
- gather usage information for the purpose of
accounting, payment for the services. -
35Collective
36OGSA
- Build on both Grid and Web-Services
communities, OGSA defines uniform service
semantic called Grid Services. - OGSA defines few persistent and many transient
services -
- OGSA defines interfaces for managing Grid service
instances. - Factory, registry, discovery, lifetime
-
- The OGSA defines interfaces and behavior for
- Reliable invocation, lifetime management,
discovery, authorization, - notification, upgradeability, concurrency,
manageability - OGSA also defines WSDL interface and associated
convention. - Protocols for reliable and secure management of
distributed state.
37Need for service oriented view
- It allows us to address the need for standard
interface definition, local/remote transparency
and adaptation to local OS. - It allows multiple protocols bindings to
facilitate localized optimization of services. - It simplify virtualization which in turn also
allows consistent resource access multiple
heterogeneous platform. - With service oriented view, we can partition the
interoperability into two sub-problems, namely
the definition of service interface and
identification of protocols that can be used to
invoke a particular interface
38Globus Toolkit
- Globus toolkit is an open-architecture and
open-source set of services and software
libraries that support Grid and Grid
applications. - This toolkit address issues of security,
information discovery, resource management, data
management, communication, fault detection and
portability. - GRAM Grid Resource Allocation and
Management - MDS Meta Directory Service
- GSI Grid Security Infrastructure
- This toolkit will be described in detail in
the next presentation, therefore I will skip any
more description.
39Nature of Service
- Services are location transparent.
- Services are created and destroyed dynamically.
- Services are stateful. Every service is assigned
a globally unique name, called Grid Service
Handle (GSH) - Grid services can change during their lifetime (
for example support new protocols).
40Web Services
- Web services are the basis for Grid services
which are the cornerstones of OGSA and OGSI. - Web Services use simple Internet based protocols
to address heterogeneous distributed computing. - Web Services define a technique for describing
software components to be accessed, methods for
accessing them and discovery about the
components. - Web Services are language, programming model and
system software neutral.
41Web Services
- Presently, this word has been over-used and
become a buzz-word. - There is distinction between website and web
services. Although web services rely on
web-technologies, they have no relation to web
browsers and HTML. - Website is for humans, Web services are for
software.
42Web Services
-
- RMI, CORBA, EJB etc etc are oriented towards
highly coupled distributed systems, where the
client and servers are dependent on each other,
web-services are oriented towards loosely coupled
systems, where the client might have no prior
knowledge of the Web Service until it actually
invokes it.
43Web services Advantages and disadvantages
- Web services are platform and language
independent, since they use rely on XML. - Most Web services use HTTP for transmitting
messages, and most of the internet proxies and
firewalls do not mess with HTTP traffic. - Overhead are high Transmitting XML is
expensive. No real-time application will use web
service using this model. - Lack of versatility Currently provide basic
services compared to CORBA
44Service Lifetime Management
- Who terminates transient state services ?
- In normal circumstance, the request from the
service invoker, but in distributed machine it is
difficult. Component may fail, messages may be
lost. - OGSA solves this problem using Soft State.
Every service is created with a specified
lifetime which can be extended by the request
from client or other grid service. If no request
is received, service is automatically terminated. - Soft state lifetime management avoids
- Explicit client teardown of complex state
- Resource leaks in hosting environment.
45Lifetime management
- OGSA has SetTermination operation within
GridService interface. - The use of absolute time in lifetime management
implies existence of global clock that is well
synchronized. - Network Time Protocol (NTP) provide standardized
mechanisms for clock synchronization ( Up-to tens
of milliseconds)
46Upgradeability
- Services within the complex systems must be
independently upgradeable. - Versioning and compatibility between services
must be managed and expressed so that clients can
discover not only the specific service versions
but also compatible services. - OGSA defines conventions that allow us to
identify when a service changes and when those
changes are backwardly compatible with respect to
interface and semantics.
47Notification
- OGSA notification framework allow clients to
register interest in being notified of particular
message using asynchronous, one-way delivery. - OGSA defines common abstraction and interfaces
for NotificationSource and NotificationSink
48Some myths (misunderstanding) about Grid Computing
- Grid is next generation Internet.
- The grid is a source of free cycles.
- Grid requires a distributed operating system.
- Grid requires a new programming model.
- Grid makes high-performance computing
superfluous.
49Distributed Computing Economics (Views of Jim
Gray)
- An equivalent price for following items
- one data base access
- 10 bytes of internet traffic
- 100,000 instructions
- 10 bytes of disk storage
- a megabyte of disk bandwidth
- Break-even point is 10,000 instructions / byte.
- This serves a basis how we do cost-effective
Internet-based computing, such as grid computing.
50How are the numbers computed?
- A 2GH CPU with 2 GB RAM box 2,000
- A 200 GB disk,100 accesses/s, or 50MB/s 200
- 1 Mbps WAN link 100/month
- 1 is equivalent to
- 3.24 GB sent over WAN (7.2 hours)
- 100 Tera CPU instructions 7.2 hours of CPU
time - 1 GB disk
- 2.592 million database accesses (in 7.2 hours)
- 1.296 Tera Byte disk bandwidths (in 7.2 hours)
51Cycle-based Computing is Almost Free
- The accumulated cycles in SETI_at_Home are 54
Teraflops. - Google freely provides a trillion searches a year
from the largest database (2 peterbytes). - Hotmail freely carries a trillion e-mails per
year. - Amazon.com offers a free book search tool.
- Many well-known media sites offer free news
- The maintenance prices paid are low and worthy.
52What is SETI_at_Home?
- It uses millions of computers in homes/offices
world wide to analyze radio signals from space. - SETI Search for Extraterrestrial Intelligence is
to detect intelligent life outside Earth. - Uses radio telescope to listen (collect) for
narrow-bandwidth radio signals from space. - Data analysis (1) computing power spectrums, (2)
finding candidate signals, (3) eliminating
meaningless signals. - Embarrassing Parallelism CPU and Data Intensive,
but infrequent communications. (high bandwidths
interconnects in supercomputers are not
necessary!)
53Who are paying thefree Computing
- Advertisers pay it.
- Google, hotmail, amazon.com collect 1 from a
company for profits if its site is visited 1,000
time via these free services Cost Per
thousand iMpressions (CPM). - Big companies are eager to pay maintenance.
- Low cost but very effective promotion.
- A Web site almost becomes the only Spoke-man.
- SETI_at_Home rely on donated cycles world wide.
- It provided a 1,300 years of free computing on
2/3/03.
54Cases for Grid Computing at least 10,000 Ins/Byte
- A cryptographic search problem
- only a few Kbytes input/output, but computing for
days. - A representative job submitted to SETI_at_Home
- computing on 12 hours on 1/2 Mbytes of input
- A CFD computation at Cornell
- 7 years computing for 100 MB of input, 10 GB
output. - Making animated movie of Toy Story
- a 200 MB image to take several hours to render.
(200,000-600,000 Ins/Byte).
55Grid Computing Should Follow the Economics
- Suitable Applications can be very limited.
- A good solution to send a GB over Internet to
save years of computing. It is not economic to
send a KB if the result can be computed locally
in a second. - If Internet cost drops slower than Moores Law,
the analysis becomes stronger. - Over the 40 years, network cost fallen much
slower. - Cluster computing has different economics
- a GBps Ethernet costs 200/port, delivers 50 MBps
- it is comparable to disk bandwidth cost, 10,000
lower than Internet costs. (so the CFD fits
better on clusters).
56Opportunities for challenges
- It seems to me that most of challenges in Grid
are related to management or development of
applications which need Grid. - In my view, I do not see any challenging issues
which are specific to Grid. Application,
networking, Internet protocols are changing
orthogonally. Therefore success of Grid depends
on success of their components. - How successful will be Grid in future ? Well,
keep mum about future.
57