Title: Fundamentals of Grid Computing
1Fundamentals of Grid Computing
- IBM Redbooks paper
- Viktors Berstis
- Presented by
- Saeed Ghanbari
2What is Grid Computing?
- The term Grid computing originated in the early
1990s as a metaphor for making computer power as
easy to access as an electric power grid. - The definitive definition of a Grid is provided
by Ian Foster in his article "What is the Grid? - Computing resources are not administered
centrally. - Open standards are used.
- Non-trivial quality of service is achieved.
- Plaszczak/Wellner define Grid technology as "the
technology that enables resource virtualization,
on-demand provisioning, and service (resource)
sharing between organizations." - IBM "A Grid is a type of parallel and
distributed system that enables the sharing,
selection, and aggregation of resources
distributed across multiple administrative
domains based on the resources availability,
capacity, performance, cost and users'
quality-of-service requirements"
3Topics to be covered
- What grid computing can do
- Grid concepts and components
- Grid construction
- Using a grid
- A users perspective
- An administrators perspective
- An application developers perspective
4What grid computing can do(1)
- Exploiting underutilized resources
- Computing
- Desktop less than 5
- Even servers in many organizations
- Unused disk capacity
- Implications
- without undue overhead.
- remote machine must meet any special hardware,
software, or resource requirements - Parallel CPU capacity
- Subjobs on different machines
- Barriers often exist to perfect scalability.
5What grid computing can do(2)
- Applications
- Grid-enabled applications
- no practical tools for transforming arbitrary
applications to exploit the parallel capabilities
of a grid.
6What grid computing can do(3)
- Virtual resources and virtual organizations for
collaboration - More capable than distributed computing
- Wider audience
- Open standards, hence highly heterogeneous
systems - Data, equipment, software, services, licenses,
- Several real and virtual organizations
7What grid computing can do(3)
- Access to additional resources
- special equipment, software, licenses, and other
services - Resource balancing
8What grid computing can do(4)
- Reliability
- Now redundancy in hardware
- Future Software
- Utilize autonomic computing
- Management
- More disperse IT infrastructure
- Priority among projects
9Grid concepts and components(1)Types of resources
- Computation
- Storage
- Primary/secondary storage
- Mountable networked filed system
- AFS, NFS, DFS, GPFS
- Capacity increase
- Uniform name space
- Data Stripping
10Grid concepts and components(2)Types of
resources (cont)
- Communications
- Redundant communication paths
- Software and licenses
- License management software
- Special equipment, capacities, architectures, and
policies - different architectures, operating systems,
devices, capacities, and equipment. - Jobs and applications
- Application is a collection of jobs
- Specific dependencies
11Grid concepts and components(3)Types of
resources (cont)
- Scheduling, reservation, and scavenging
- scheduler
- automatically finds the most appropriate machine
on which to run any given job - scavenging
- report its idle status to the grid management
node. - SETI_at_home Search for Extraterrestrial
Intelligence at Home - Reserved
- dedicated resources
12Grid concepts and components(4)
- Intragrid to Intergrid
- cluster
- same hardware/software
- Intragrid
- heterogeneous machines/software
- multiple department/same organization
- Intergrid
- heterogeneous machines/software
- multiple department/multiple organization
13Grid construction(1)Grid software components
- Management components
- resource accounting
- load sensors
- resource evaluation
- overall usage patterns
- autonomic computing
- Donor software
- each machine needs to enroll as a member of the
grid and install some software that manages the
grids use of its resources - authentication
- monitoring
- check pointing / resuming
- Submission software
14Grid construction(2)Grid software components
(cont.)
- Distributed grid management
- hierarchy of clusters
- Schedulers
- job priority system
- react to immediate load
- monitor the progress of scheduled jobs
re-submisson - reservation system
- meta-scheduler
- Communications
- jobs communicate with each other.
- The open standard Message Passing Interface (MPI)
15Using a grid A users perspective(1)
- Enrolling and installing grid software
- authentication for security purposes
- certificate authority
- decide which resources to donate to the grid
- Logging onto the grid
- grid login ID
16Using a grid A users perspective(2)
- Queries and submitting jobs
- staging the input data
- different architectures multiple versions of
the program - job execution
- sandbox
- collect results
- Data configuration
- data replication
- networked file system
- caching feature enabled
-
17Using a grid A users perspective(3)
- Monitoring progress and recovery
- Degree of recovery for subjobs that fail
- Failures
- Programming error
- Hardware or power failure
- Communications interruption
- Excessive slowness
- Recovery
- Scheduler
- User
18Using a grid An administrators perspective(1)
- Planning
- Installation
- Managing enrollment of donors and users
- Certificate authority
- It is critical to ensure the highest levels of
security in a grid because the grid is designed
to execute code and not just share data - Positively identify entities requesting
certificates - Issuing, removing, and archiving certificates
- Protecting the certificate authority server
- Maintaining a namespace of unique names for
certificate owners - Serve signed certificates to those needing to
authenticate entities - Logging activity
19Using a grid An administrators perspective(2)
- Resource management
- setting permissions
- Tracking resource usage
- Implementing a billing system
- policies to achieve better utilization
20Using a grid An application developers
perspective(1)
- Applications that are not enabled for using
multiple processors but can be executed on
different machines. - Applications that are already designed to use
the multiple processors of a grid setting. - Applications that need to be modified or
rewritten to better exploit a grid - Tools for debugging and measuring the behavior of
grid applications
21Using a grid An application developers
perspective(2)
- Globus
- developers toolkit
- Manage grid operations
- Measurement
- Repair
- Debug grid applications
- Open Grid Services Architecture (OGSA)
22A brief survey
23A quick survey
24A quick survey
25A quick survey
26A quick survey
27A quick survey
28A quick survey
29Enabling Grids for E-sciencE (EGEE)
- CERN's new particle accelerator
- 15 petabytes(15 million gigabytes) a year
- stack of CDs  more than 20 km high!!!
- 200 sites around the globe
- Over 20Â 000 computers
- Runing up to 30Â 000 jobs per day
- Has already served for
- 300Â 000 chemical compounds in search of potential
drugs for Flu - Simulations of over 40 million potential drug
molecules against malaria
30Questions