CS Buzzwords/

About This Presentation

Title:

CS Buzzwords/

Description:

From 'The Anatomy of the Grid: Enabling Scalable Virtual Organizations' ... Civil engineers collaborate to design, execute, & analyze shake table experiments ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 41

Provided by: scotta8

Learn more at: https://w3.pppl.gov

Category:

more less

Transcript and Presenter's Notes

Title: CS Buzzwords/

1
CS Buzzwords/ The Grid and the Future of
computing

Scott A. Klasky
sklasky_at_pppl.gov

2
Why?

Why do you have to program in a language which
doesnt let you program in equations?
Why do you have to care about the machine you are
programming on?
Why do you care which machine the computer runs
on?
Why cant you visualize/analyze your data as soon
as the data is produced?
Why do you run your codes at NERSC?
Silly question for those who use 100s/1000s of
processors.
Why do your results from your analysis dont
always get stored in a database?
Why cant the computer do the data analysis for
you, and have it ask you questions?
Why are people still talking about vector
computers?

I just dont have TIME!!!
COLLABORATION IS THE KEY!

3
Scotts view of computing (HYPE)

Why cant we program in high level languages?
RNPL (Rapid Numerical Programming Language)
http//godel.ph.utexas.edu/Members/marsa/rnpl/user
s_guide/node4.html
Mathmatica/Maple
Use object oriented programming to manage memory,
state, etc.
This is the framework for your code.
You write modules in this framework.
Use F90/F77/C, as modules for the code.
These modules can be reused for multiple codes,
multiple authors.
Compute Fundamental Variables on main computers,
other variables on secondary computers.
Cactus code is a good example (2001 Gordon Bell
Prize Winner)
What are the benefits?
Let the CS people worry about memory management,
data I/O, visualization, security, machine
locations
Why should you care about the machine you are
running on?
All you should care about is running you code,
and getting your accurate results as fast as
possible.

4
Buzzwords

Fortran, HPF, C, C, Java
MPI, MPI-G2, OpenMP
Python, PERL, TCL/TK
HTML, SGML, XML
JavaScript, DHTML
FLTK (Fast Light Toolkit)
The Grid
Globus
Web Services
DataMining
WireGL, Chromium
AccessGrid
Portals (Discover Portal)
CCA

SOAP (Simple Object Access Protocol)
A way to create widely distributed, complex
computing environments that run over the Internet
using existing infrastructure.
It is about applications cumminicating directly
with each other over the Internet in a very rich
way).
HTC (High Throughput Computing)
Deliver large amounts of processing capacity over
long periods of time
CONDOR (http//www.cs.wisc.edu/condor/)
Goal
develop, implement, deploy, and evaluate
mechanisms and policies that support High
Throughput Computing (HTC) on large collections
of distributively owned computing resources.

5
Cactus (http//www.cactuscode.org) (Allen,
Dramlitsch, Seidel, Shalf, Radke)

Modular, portable framework for parallel,
multidimensional simulations
Construct codes by linking
Small core (flesh) mgmt services
Selected modules (thorns) Numerical methods,
grids domain decomps, visualization and
steering, etc.
Custom linking/configuration tools
Developed for astrophysics, but not
astrophysics-specific
They have
Cactus Worms
Remote monitoring and steering of an application
from any web browser
Streaming of isosurfaces from a simulation, which
can then be viewed on a local machine
Remote visualization of 2D slices from any grid
function in a simulation as jpegs in a web
browser
Accessible MPI-based parallelism for finite
difference grids
Access to a variety of supercomputing
architectures and clusters
Several parallel I/O layers
Fixed and Adaptive mesh refinement under
development
Elliptic solvers
Parallel interpolators and reductions
Metacomputing and distributed computing

Thorns
6
Discover Portal

http//tassl-pc-5.rutgers.edu/discover/main.php
Discover is a virtual, interactive and
collaborative PSE
Enables geographically distributed scientists and
engineers to collaboratively monitor, and control
high performance parallel/distributed
applications using web-based portals.
Its primary objective is to transform
high-performance simulation into true research
and instructional modalities
Bring large distributed simulations to the
scientists/engineers desktop by providing
collaborative web-based portals for interaction
and control.
Provides a 3-tier architecture composed of
detachable thin-clients at the front-end, a
network of web servers in the middle, and a
control network of sensors, actuators,
interaction agents superimposed on the
application at the back-end.

7
MPICH-G2 (http//www.hpclab.niu.edu/mpi/)

What is MPICH-G2?
It is a grid-enabled implementation of MPI v1.1
standard.
Using Globus services (job startup, security),
MPICH-G2 allows you to couple multiple machines,
MPICH-G2 automatically converts data in messages
sent between machines of different architectures
and supports multiprotocol communication by
automatically selecting TCP for intermachine
messaging and vendor-supplied MPI for
intramachine messaging

8
Accessgrid
Supporting group-to-group interaction across the
Grid http//www.accessgrid.org Over 70 AG sites
(PPPL will be next!)

Extending the Computational Grid
Group-to-group interactions are different from
and more complex than individual-to-individual
interactions.
Large-scale scientific and technical
collaborations often involve multiple teams
working together.
The Access Grid concept complements and extends
the concept of the Computational Grid.
The Access Grid project aims at exploring and
supporting this more complex set of requirements
and functions.
An Access Grid node involves 3-20 people per
site.
Access Grid nodes are designed spaces that
support the high-end audio/video technology
needed to provide a compelling and productive
user experience.
The Access Grid consists of large-format
multimedia display, presentation, and interaction
software environments interfaces to grid
middleware and interfaces to remote
visualization environments.
With these resources, the Access Grid supports
large-scale distributed meetings, collaborative
teamwork sessions, seminars, lectures, tutorials,
and training.

Providing New Capabilities
The Alliance Access Grid project has prototyped a
number of Access Grid Nodes and uses these nodes
to conduct remote meetings, site visits, training
sessions and educational events.
Capabilities will include
high-quality multichannel digital video and
audio,
prototypic large-format display
integrated presentation technologies (PowerPoint
slides, mpeg movies, shared OpenGL windows),
prototypic recording capabilities
integration with Globus for basic services
(directories, security, network resource
management),
macroscreen management
integration of local desktops into the Grid
multiple session capability

9
Access Grid
10
Chromium

http//graphics.stanford.edu/humper/chromium_docu
mentation/
Chromium is a new system for interactive
rendering on clusters of workstations.
It is a completely extensible architecture, so
that parallel rendering algorithms can be
implemented on clusters with ease.
We are still using WireGL, but will be switching
to Chromium.
Basically, it will allow us to run a program
which uses OpenGL, and have it display on a
cluster tiled display wall.
There are parallel APIs!

11
Common Component Architecture (http//www.acl.lanl
.gov/cca/)

Goal provide interoperable components and
frameworks for rapid construction of complex,
high-performance applications.
CCA is needed because existing component
standards (EJB, CORBA, COM) are not designed for
large-scale, high-performance computing or
parallel components.
The CCA will leverage existing standards'
infrastructure such as name service, event
models, builders, security, and tools.

12
Requirements of Component Architectures for
High-Performance Computing

Component characteristics. The CCA will be used
primarily for high-performance components of both
coarse and fine grain, implemented according to
different paradigms such as SPMD-style as well as
shared memory multi-threaded models.
Heterogeneity. Whenever technically possible, the
CCA should be able to combine within one
multi-component application components executing
on multiple architectures, implemented in
different languages, and using different run-time
systems. Furthermore, design priorities should be
geared towards addressing software needs most
common in HPC environment for example
interoperability with languages popular in
scientific programming such as Fortran, C and C
should be given priority.
Local and remote components. Whenever possible we
would like to stage interoperability of both
local and remote components and be able to
seamlessly change interactions from local to
remote. We will address the needs both of remote
components running over a local area network and
wide area network component applications running
over the HPC grid should be able to satisfy
real-time constraints and interact with diverse
supercomputing schedulers.
Integration. We will try to make the integration
of components as smooth as possible. In general
it should not be necessary to develop a component
specially to integrate with the framework, or to
rewrite an existing component substantially.
High-Performance. It is essential that the set of
standard features agreed on contain mechanisms
for supporting high-performance interactions
whenever possible we should be able to avoid
extra copies, extra communication or
synchronization and encourage efficient
implementation such as parallel data transfers.
Openess. The CCA specification should be open,
and used with open software. In HPC this
flexibility is needed to keep pace with the
ever-changing demands of the scientific
programming world.

13
The Grid (http//www.globus.org)

The Grid Problem
Flexible, secure, coordinated resource sharing
among dynamic collections of individuals,
institutions, and resource
From The Anatomy of the Grid Enabling Scalable
Virtual Organizations
Enable communities (virtual organizations) to
share geographically distributed resources as
they pursue common goals -- assuming the absence
of
central location,
central control,
omniscience,
existing trust relationships.

14
Elements of the Problem

Resource sharing
Computers, storage, sensors, networks,
Sharing always conditional issues of trust,
policy, negotiation, payment,
Coordinated problem solving
Beyond client-server distributed data analysis,
computation, collaboration,
Dynamic, multi-institutional virtual orgs
Community overlays on classic org structures
Large or small, static or dynamic

15
Why Grids?

A biochemist exploits 10,000 computers to screen
100,000 compounds in an hour
1,000 physicists worldwide pool resources for
petaop analyses of petabytes of data
Civil engineers collaborate to design, execute,
analyze shake table experiments
Climate scientists visualize, annotate, analyze
terabyte simulation datasets
An emergency response team couples real time
data, weather model, population data
A multidisciplinary analysis in aerospace couples
code and data in four companies
A home user invokes architectural design
functions at an application service provider
An application service provider purchases cycles
from compute cycle providers
Scientists working for a multinational soap
company design a new product
A community group pools members PCs to analyze
alternative designs for a local road

16
Online Access to Scientific Instruments
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
real-time collection
archival storage
tomographic reconstruction
DOE X-ray grand challenge ANL, USC/ISI, NIST,
U.Chicago
17
Data Grids for High Energy Physics
Image courtesy Harvey Newman, Caltech
18
Broader Context

Grid Computing has much in common with major
industrial thrusts
Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers,
Distributed Computing, Internet Computing
Sharing issues not adequately addressed by
existing technologies
Complicated requirements run program X at site
Y subject to community policy P, providing access
to data at Z according to policy Q
High performance unique demands of advanced
high-performance systems

19
Why Now?

Moores law improvements in computing produce
highly functional end-systems
The Internet and burgeoning wired and wireless
provide universal connectivity
Changing modes of working and problem solving
emphasize teamwork, computation
Network exponentials produce dramatic changes in
geometry and geography

20
Network Exponentials

Network vs. computer performance
Computer speed doubles every 18 months
Network speed doubles every 9 months
Difference order of magnitude per 5 years
1986 to 2000
Computers x 500
Networks x 340,000
2001 to 2010
Computers x 60
Networks x 4000

Moores Law vs. storage improvements vs. optical
improvements. Graph from Scientific American
(Jan-2001) by Cleo Vilett, source Vined Khoslan,
Kleiner, Caufield and Perkins.
21
The Globus ProjectMaking Grid computing a
reality

Close collaboration with real Grid projects in
science and industry
Development and promotion of standard Grid
protocols to enable interoperability and shared
infrastructure
Development and promotion of standard Grid
software APIs and SDKs to enable portability and
code sharing
The Globus Toolkit Open source, reference
software base for building grid infrastructure
and applications
Global Grid Forum Development of standard
protocols and APIs for Grid computing

22
One View of Requirements

Identity authentication
Authorization policy
Resource discovery
Resource characterization
Resource allocation
(Co-)reservation, workflow
Distributed algorithms
Remote data access
High-speed data transfer
Performance guarantees
Monitoring

Adaptation
Intrusion detection
Resource management
Accounting payment
Fault management
System evolution
Etc.
Etc.

23
Three Obstacles to Making Grid Computing Routine

New approaches to problem solving
Data Grids, distributed computing, peer-to-peer,
collaboration grids,
Structuring and writing programs
Abstractions, tools
Enabling resource sharing across distinct
institutions
Resource discovery, access, reservation,
allocation authentication, authorization,
policy communication fault detection and
notification

Programming Problem
Systems Problem
24
Programming Systems Problems

The programming problem
Facilitate development of sophisticated apps
Facilitate code sharing
Requires prog. envs APIs, SDKs, tools
The systems problem
Facilitate coordinated use of diverse resources
Facilitate infrastructure sharing e.g.,
certificate authorities, info services
Requires systems protocols, services
E.g., port/service/protocol for accessing
information, allocating resources

25
The Systems ProblemResource Sharing Mechanisms
That

Address security and policy concerns of resource
owners and users
Are flexible enough to deal with many resource
types and sharing modalities
Scale to large number of resources, many
participants, many program components
Operate efficiently when dealing with large
amounts of data computation

26
Aspects of the Systems Problem

Need for interoperability when different groups
want to share resources
Diverse components, policies, mechanisms
E.g., standard notions of identity, means of
communication, resource descriptions
Need for shared infrastructure services to avoid
repeated development, installation
E.g., one port/service/protocol for remote access
to computing, not one per tool/appln
E.g., Certificate Authorities expensive to run
A common need for protocols services

27
Hence, a Protocol-Oriented View of Grid
Architecture, that Emphasizes

Development of Grid protocols services
Protocol-mediated access to remote resources
New services e.g., resource brokering
On the Grid speak Intergrid protocols
Mostly (extensions to) existing protocols
Development of Grid APIs SDKs
Interfaces to Grid protocols services
Facilitate application development by supplying
higher-level abstractions
The (hugely successful) model is the Internet

28
The Data Grid Problem

Enable a geographically distributed community
of thousands to perform sophisticated,
computationally intensive analyses on Petabytes
of data

29
Major Data Grid Projects
Name URL/Sponsor Focus
Grid Application Dev. Software hipersoft.rice.edu/grads NSF Research into program development technologies for Grid applications
Grid Physics Network griphyn.org NSF Technology RD for data analysis in physics expts ATLAS, CMS, LIGO, SDSS
Information Power Grid ipg.nasa.gov NASA Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory ivdgl.org NSF Create international Data Grid to enable large-scale experimentation on Grid technologies applications
Network for Earthquake Eng. Simulation Grid neesgrid.org NSF Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid ppdg.net DOE Science Create and apply production Grids for data analysis in high energy and nuclear physics experiments
TeraGrid teragrid.org NSF U.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center grid-support.ac.uk U.K. eScience Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
FusionGrid? ??? Link TBs of data from NERSC, generated by fusion codes, to clusters at PPPL
30
Data Intensive Issues Include

Harness potentially large numbers of data,
storage, network resources located in distinct
administrative domains
Respect local and global policies governing what
can be used for what
Schedule resources efficiently, again subject to
local and global constraints
Achieve high performance, with respect to both
speed and reliability
Catalog software and virtual data

31
Data IntensiveComputing and Grids

The term Data Grid is often used
Unfortunate as it implies a distinct
infrastructure, which it isnt but easy to say
Data-intensive computing shares numerous
requirements with collaboration, instrumentation,
computation,
Security, resource mgt, info services, etc.
Important to exploit commonalities as very
unlikely that multiple infrastructures can be
maintained
Fortunately this seems easy to do!

32
Examples ofDesired Data Grid Functionality

High-speed, reliable access to remote data
Automated discovery of best copy of data
Manage replication to improve performance
Co-schedule compute, storage, network
Transparency wrt delivered performance
Enforce access control on data
Allow representation of global resource
allocation policies
Central Q How must Grid architecture be extended
to support these functions?

33
Grid Protocols, Services, ToolsEnabling Sharing
in Virtual Organizations

Protocol-mediated access to resources
Mask local heterogeneities
Extensible to allow for advanced features
Negotiate multi-domain security, policy
Grid-enabled resources speak protocols
Multiple implementations are possible
Broad deployment of protocols facilitates
creation of Services that provide integrated view
of distributed resources
Tools use protocols and services to enable
specific classes of applications

34
A Model Architecture for Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MetacomputingDirectoryService
Selected Replica
Replica Selection
Performance Information Predictions
NetworkWeatherService
GridFTP Control Channel
Disk Cache
GridFTPDataChannel
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
35
Globus Toolkit Components

Two major Data Grid components
1. Data Transport and Access
Common protocol
Secure, efficient, flexible, extensible data
movement
Family of tools supporting this protocol
2. Replica Management Architecture
Simple scheme for managing
multiple copies of files
collections of files
APIs, white papers http//www.globus.org

36
Motivation for a Common Data Access Protocol

Existing distributed data storage systems
DPSS, HPSS focus on high-performance access,
utilize parallel data transfer, striping
DFS focus on high-volume usage, dataset
replication, local caching
SRB connects heterogeneous data collections,
uniform client interface, metadata queries
Problems
Incompatible (and proprietary) protocols
Each require custom client
Partitions available data sets and storage
devices
Each protocol has subset of desired functionality

37
A Common, Secure, EfficientData Access Protocol

Common, extensible transfer protocol
Common protocol means all can interoperate
Decouple low-level data transfer mechanisms from
the storage service
Advantages
New, specialized storage systems are
automatically compatible with existing systems
Existing systems have richer data transfer
functionality
Interface to many storage systems
HPSS, DPSS, file systems
Plan for SRB integration

38
A UniversalAccess/Transport Protocol

Suite of communication libraries and related
tools that support
GSI, Kerberos security
Third-party transfers
Parameter set/negotiate
Partial file access
Reliability/restart
Large file support
Data channel reuse
All based on a standard, widely deployed protocol

39
And the Universal Protocol is GridFTP

Why FTP?
Ubiquity enables interoperation with many
commodity tools
Already supports many desired features, easily
extended to support others
Well understood and supported
We use the term GridFTP to refer to
Transfer protocol which meets requirements
Family of tools which implement the protocol
Note GridFTP gt FTP
Note that despite name, GridFTP is not restricted
to file transfer!

40
Summary
Supercomputer
PPPL petrel
PPPL Pared Display Wall
Webservices are run (Data Analysis, Data mining)
Accessgrid is run here Chromium XPLIT Scirun or
VTK
CPU
AVS/Express IDL HTTPAccessgrid docking

Write a Comment

User Comments (0)