CS Buzzwords/ - PowerPoint PPT Presentation

About This Presentation
Title:

CS Buzzwords/

Description:

From 'The Anatomy of the Grid: Enabling Scalable Virtual Organizations' ... Civil engineers collaborate to design, execute, & analyze shake table experiments ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 41
Provided by: scotta8
Learn more at: https://w3.pppl.gov
Category:
Tags: anatomy | buzzwords | shake | up

less

Transcript and Presenter's Notes

Title: CS Buzzwords/


1
CS Buzzwords/ The Grid and the Future of
computing
  • Scott A. Klasky
  • sklasky_at_pppl.gov

2
Why?
  • Why do you have to program in a language which
    doesnt let you program in equations?
  • Why do you have to care about the machine you are
    programming on?
  • Why do you care which machine the computer runs
    on?
  • Why cant you visualize/analyze your data as soon
    as the data is produced?
  • Why do you run your codes at NERSC?
  • Silly question for those who use 100s/1000s of
    processors.
  • Why do your results from your analysis dont
    always get stored in a database?
  • Why cant the computer do the data analysis for
    you, and have it ask you questions?
  • Why are people still talking about vector
    computers?
  • I just dont have TIME!!!
  • COLLABORATION IS THE KEY!

3
Scotts view of computing (HYPE)
  • Why cant we program in high level languages?
  • RNPL (Rapid Numerical Programming Language)
    http//godel.ph.utexas.edu/Members/marsa/rnpl/user
    s_guide/node4.html
  • Mathmatica/Maple
  • Use object oriented programming to manage memory,
    state, etc.
  • This is the framework for your code.
  • You write modules in this framework.
  • Use F90/F77/C, as modules for the code.
  • These modules can be reused for multiple codes,
    multiple authors.
  • Compute Fundamental Variables on main computers,
    other variables on secondary computers.
  • Cactus code is a good example (2001 Gordon Bell
    Prize Winner)
  • What are the benefits?
  • Let the CS people worry about memory management,
    data I/O, visualization, security, machine
    locations
  • Why should you care about the machine you are
    running on?
  • All you should care about is running you code,
    and getting your accurate results as fast as
    possible.

4
Buzzwords
  • Fortran, HPF, C, C, Java
  • MPI, MPI-G2, OpenMP
  • Python, PERL, TCL/TK
  • HTML, SGML, XML
  • JavaScript, DHTML
  • FLTK (Fast Light Toolkit)
  • The Grid
  • Globus
  • Web Services
  • DataMining
  • WireGL, Chromium
  • AccessGrid
  • Portals (Discover Portal)
  • CCA
  • SOAP (Simple Object Access Protocol)
  • A way to create widely distributed, complex
    computing environments that run over the Internet
    using existing infrastructure.
  • It is about applications cumminicating directly
    with each other over the Internet in a very rich
    way).
  • HTC (High Throughput Computing)
  • Deliver large amounts of processing capacity over
    long periods of time
  • CONDOR (http//www.cs.wisc.edu/condor/)
  • Goal
  • develop, implement, deploy, and evaluate
    mechanisms and policies that support High
    Throughput Computing (HTC) on large collections
    of distributively owned computing resources.

5
Cactus (http//www.cactuscode.org) (Allen,
Dramlitsch, Seidel, Shalf, Radke)
  • Modular, portable framework for parallel,
    multidimensional simulations
  • Construct codes by linking
  • Small core (flesh) mgmt services
  • Selected modules (thorns) Numerical methods,
    grids domain decomps, visualization and
    steering, etc.
  • Custom linking/configuration tools
  • Developed for astrophysics, but not
    astrophysics-specific
  • They have
  • Cactus Worms
  • Remote monitoring and steering of an application
    from any web browser
  • Streaming of isosurfaces from a simulation, which
    can then be viewed on a local machine
  • Remote visualization of 2D slices from any grid
    function in a simulation as jpegs in a web
    browser
  • Accessible MPI-based parallelism for finite
    difference grids
  • Access to a variety of supercomputing
    architectures and clusters
  • Several parallel I/O layers
  • Fixed and Adaptive mesh refinement under
    development
  • Elliptic solvers
  • Parallel interpolators and reductions
  • Metacomputing and distributed computing

Thorns
6
Discover Portal
  • http//tassl-pc-5.rutgers.edu/discover/main.php
  • Discover is a virtual, interactive and
    collaborative PSE
  • Enables geographically distributed scientists and
    engineers to collaboratively monitor, and control
    high performance parallel/distributed
    applications using web-based portals.
  • Its primary objective is to transform
    high-performance simulation into true research
    and instructional modalities
  • Bring large distributed simulations to the
    scientists/engineers desktop by providing
    collaborative web-based portals for interaction
    and control.
  • Provides a 3-tier architecture composed of
    detachable thin-clients at the front-end, a
    network of web servers in the middle, and a
    control network of sensors, actuators,
    interaction agents superimposed on the
    application at the back-end.

7
MPICH-G2 (http//www.hpclab.niu.edu/mpi/)
  • What is MPICH-G2?
  • It is a grid-enabled implementation of MPI v1.1
    standard.
  • Using Globus services (job startup, security),
    MPICH-G2 allows you to couple multiple machines,
  • MPICH-G2 automatically converts data in messages
    sent between machines of different architectures
    and supports multiprotocol communication by
    automatically selecting TCP for intermachine
    messaging and vendor-supplied MPI for
    intramachine messaging

8
Accessgrid
Supporting group-to-group interaction across the
Grid http//www.accessgrid.org Over 70 AG sites
(PPPL will be next!)
  • Extending the Computational Grid
  • Group-to-group interactions are different from
    and more complex than individual-to-individual
    interactions.
  • Large-scale scientific and technical
    collaborations often involve multiple teams
    working together.
  • The Access Grid concept complements and extends
    the concept of the Computational Grid.
  • The Access Grid project aims at exploring and
    supporting this more complex set of requirements
    and functions.
  • An Access Grid node involves 3-20 people per
    site.
  • Access Grid nodes are designed spaces that
    support the high-end audio/video technology
    needed to provide a compelling and productive
    user experience.
  • The Access Grid consists of large-format
    multimedia display, presentation, and interaction
    software environments interfaces to grid
    middleware and interfaces to remote
    visualization environments.
  • With these resources, the Access Grid supports
    large-scale distributed meetings, collaborative
    teamwork sessions, seminars, lectures, tutorials,
    and training.
  • Providing New Capabilities
  • The Alliance Access Grid project has prototyped a
    number of Access Grid Nodes and uses these nodes
    to conduct remote meetings, site visits, training
    sessions and educational events.
  • Capabilities will include
  • high-quality multichannel digital video and
    audio,
  • prototypic large-format display
  • integrated presentation technologies (PowerPoint
    slides, mpeg movies, shared OpenGL windows),
  • prototypic recording capabilities
  • integration with Globus for basic services
    (directories, security, network resource
    management),
  • macroscreen management
  • integration of local desktops into the Grid
  • multiple session capability

9
Access Grid
10
Chromium
  • http//graphics.stanford.edu/humper/chromium_docu
    mentation/
  • Chromium is a new system for interactive
    rendering on clusters of workstations. 
  • It is a completely extensible architecture, so
    that parallel rendering algorithms can be
    implemented on clusters with ease.
  • We are still using WireGL, but will be switching
    to Chromium.
  • Basically, it will allow us to run a program
    which uses OpenGL, and have it display on a
    cluster tiled display wall.
  • There are parallel APIs!

11
Common Component Architecture (http//www.acl.lanl
.gov/cca/)
  • Goal provide interoperable components and
    frameworks for rapid construction of complex,
    high-performance applications.
  • CCA is needed because existing component
    standards (EJB, CORBA, COM) are not designed for
    large-scale, high-performance computing or
    parallel components.
  • The CCA will leverage existing standards'
    infrastructure such as name service, event
    models, builders, security, and tools.

12
Requirements of Component Architectures for
High-Performance Computing
  • Component characteristics. The CCA will be used
    primarily for high-performance components of both
    coarse and fine grain, implemented according to
    different paradigms such as SPMD-style as well as
    shared memory multi-threaded models.
  • Heterogeneity. Whenever technically possible, the
    CCA should be able to combine within one
    multi-component application components executing
    on multiple architectures, implemented in
    different languages, and using different run-time
    systems. Furthermore, design priorities should be
    geared towards addressing software needs most
    common in HPC environment for example
    interoperability with languages popular in
    scientific programming such as Fortran, C and C
    should be given priority.
  • Local and remote components. Whenever possible we
    would like to stage interoperability of both
    local and remote components and be able to
    seamlessly change interactions from local to
    remote. We will address the needs both of remote
    components running over a local area network and
    wide area network component applications running
    over the HPC grid should be able to satisfy
    real-time constraints and interact with diverse
    supercomputing schedulers.
  • Integration. We will try to make the integration
    of components as smooth as possible. In general
    it should not be necessary to develop a component
    specially to integrate with the framework, or to
    rewrite an existing component substantially.
  • High-Performance. It is essential that the set of
    standard features agreed on contain mechanisms
    for supporting high-performance interactions
    whenever possible we should be able to avoid
    extra copies, extra communication or
    synchronization and encourage efficient
    implementation such as parallel data transfers.
  • Openess. The CCA specification should be open,
    and used with open software. In HPC this
    flexibility is needed to keep pace with the
    ever-changing demands of the scientific
    programming world.

13
The Grid (http//www.globus.org)
  • The Grid Problem
  • Flexible, secure, coordinated resource sharing
    among dynamic collections of individuals,
    institutions, and resource
  • From The Anatomy of the Grid Enabling Scalable
    Virtual Organizations
  • Enable communities (virtual organizations) to
    share geographically distributed resources as
    they pursue common goals -- assuming the absence
    of
  • central location,
  • central control,
  • omniscience,
  • existing trust relationships.

14
Elements of the Problem
  • Resource sharing
  • Computers, storage, sensors, networks,
  • Sharing always conditional issues of trust,
    policy, negotiation, payment,
  • Coordinated problem solving
  • Beyond client-server distributed data analysis,
    computation, collaboration,
  • Dynamic, multi-institutional virtual orgs
  • Community overlays on classic org structures
  • Large or small, static or dynamic

15
Why Grids?
  • A biochemist exploits 10,000 computers to screen
    100,000 compounds in an hour
  • 1,000 physicists worldwide pool resources for
    petaop analyses of petabytes of data
  • Civil engineers collaborate to design, execute,
    analyze shake table experiments
  • Climate scientists visualize, annotate, analyze
    terabyte simulation datasets
  • An emergency response team couples real time
    data, weather model, population data
  • A multidisciplinary analysis in aerospace couples
    code and data in four companies
  • A home user invokes architectural design
    functions at an application service provider
  • An application service provider purchases cycles
    from compute cycle providers
  • Scientists working for a multinational soap
    company design a new product
  • A community group pools members PCs to analyze
    alternative designs for a local road

16
Online Access to Scientific Instruments
Advanced Photon Source
wide-area dissemination
desktop VR clients with shared controls
real-time collection
archival storage
tomographic reconstruction
DOE X-ray grand challenge ANL, USC/ISI, NIST,
U.Chicago
17
Data Grids for High Energy Physics
Image courtesy Harvey Newman, Caltech
18
Broader Context
  • Grid Computing has much in common with major
    industrial thrusts
  • Business-to-business, Peer-to-peer, Application
    Service Providers, Storage Service Providers,
    Distributed Computing, Internet Computing
  • Sharing issues not adequately addressed by
    existing technologies
  • Complicated requirements run program X at site
    Y subject to community policy P, providing access
    to data at Z according to policy Q
  • High performance unique demands of advanced
    high-performance systems

19
Why Now?
  • Moores law improvements in computing produce
    highly functional end-systems
  • The Internet and burgeoning wired and wireless
    provide universal connectivity
  • Changing modes of working and problem solving
    emphasize teamwork, computation
  • Network exponentials produce dramatic changes in
    geometry and geography

20
Network Exponentials
  • Network vs. computer performance
  • Computer speed doubles every 18 months
  • Network speed doubles every 9 months
  • Difference order of magnitude per 5 years
  • 1986 to 2000
  • Computers x 500
  • Networks x 340,000
  • 2001 to 2010
  • Computers x 60
  • Networks x 4000

Moores Law vs. storage improvements vs. optical
improvements. Graph from Scientific American
(Jan-2001) by Cleo Vilett, source Vined Khoslan,
Kleiner, Caufield and Perkins.
21
The Globus ProjectMaking Grid computing a
reality
  • Close collaboration with real Grid projects in
    science and industry
  • Development and promotion of standard Grid
    protocols to enable interoperability and shared
    infrastructure
  • Development and promotion of standard Grid
    software APIs and SDKs to enable portability and
    code sharing
  • The Globus Toolkit Open source, reference
    software base for building grid infrastructure
    and applications
  • Global Grid Forum Development of standard
    protocols and APIs for Grid computing

22
One View of Requirements
  • Identity authentication
  • Authorization policy
  • Resource discovery
  • Resource characterization
  • Resource allocation
  • (Co-)reservation, workflow
  • Distributed algorithms
  • Remote data access
  • High-speed data transfer
  • Performance guarantees
  • Monitoring
  • Adaptation
  • Intrusion detection
  • Resource management
  • Accounting payment
  • Fault management
  • System evolution
  • Etc.
  • Etc.

23
Three Obstacles to Making Grid Computing Routine
  • New approaches to problem solving
  • Data Grids, distributed computing, peer-to-peer,
    collaboration grids,
  • Structuring and writing programs
  • Abstractions, tools
  • Enabling resource sharing across distinct
    institutions
  • Resource discovery, access, reservation,
    allocation authentication, authorization,
    policy communication fault detection and
    notification

Programming Problem
Systems Problem
24
Programming Systems Problems
  • The programming problem
  • Facilitate development of sophisticated apps
  • Facilitate code sharing
  • Requires prog. envs APIs, SDKs, tools
  • The systems problem
  • Facilitate coordinated use of diverse resources
  • Facilitate infrastructure sharing e.g.,
    certificate authorities, info services
  • Requires systems protocols, services
  • E.g., port/service/protocol for accessing
    information, allocating resources

25
The Systems ProblemResource Sharing Mechanisms
That
  • Address security and policy concerns of resource
    owners and users
  • Are flexible enough to deal with many resource
    types and sharing modalities
  • Scale to large number of resources, many
    participants, many program components
  • Operate efficiently when dealing with large
    amounts of data computation

26
Aspects of the Systems Problem
  • Need for interoperability when different groups
    want to share resources
  • Diverse components, policies, mechanisms
  • E.g., standard notions of identity, means of
    communication, resource descriptions
  • Need for shared infrastructure services to avoid
    repeated development, installation
  • E.g., one port/service/protocol for remote access
    to computing, not one per tool/appln
  • E.g., Certificate Authorities expensive to run
  • A common need for protocols services

27
Hence, a Protocol-Oriented View of Grid
Architecture, that Emphasizes
  • Development of Grid protocols services
  • Protocol-mediated access to remote resources
  • New services e.g., resource brokering
  • On the Grid speak Intergrid protocols
  • Mostly (extensions to) existing protocols
  • Development of Grid APIs SDKs
  • Interfaces to Grid protocols services
  • Facilitate application development by supplying
    higher-level abstractions
  • The (hugely successful) model is the Internet

28
The Data Grid Problem
  • Enable a geographically distributed community
    of thousands to perform sophisticated,
    computationally intensive analyses on Petabytes
    of data

29
Major Data Grid Projects
Name URL/Sponsor Focus
Grid Application Dev. Software hipersoft.rice.edu/grads NSF Research into program development technologies for Grid applications
Grid Physics Network griphyn.org NSF Technology RD for data analysis in physics expts ATLAS, CMS, LIGO, SDSS
Information Power Grid ipg.nasa.gov NASA Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory ivdgl.org NSF Create international Data Grid to enable large-scale experimentation on Grid technologies applications
Network for Earthquake Eng. Simulation Grid neesgrid.org NSF Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid ppdg.net DOE Science Create and apply production Grids for data analysis in high energy and nuclear physics experiments
TeraGrid teragrid.org NSF U.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center grid-support.ac.uk U.K. eScience Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
FusionGrid? ??? Link TBs of data from NERSC, generated by fusion codes, to clusters at PPPL
30
Data Intensive Issues Include
  • Harness potentially large numbers of data,
    storage, network resources located in distinct
    administrative domains
  • Respect local and global policies governing what
    can be used for what
  • Schedule resources efficiently, again subject to
    local and global constraints
  • Achieve high performance, with respect to both
    speed and reliability
  • Catalog software and virtual data

31
Data IntensiveComputing and Grids
  • The term Data Grid is often used
  • Unfortunate as it implies a distinct
    infrastructure, which it isnt but easy to say
  • Data-intensive computing shares numerous
    requirements with collaboration, instrumentation,
    computation,
  • Security, resource mgt, info services, etc.
  • Important to exploit commonalities as very
    unlikely that multiple infrastructures can be
    maintained
  • Fortunately this seems easy to do!

32
Examples ofDesired Data Grid Functionality
  • High-speed, reliable access to remote data
  • Automated discovery of best copy of data
  • Manage replication to improve performance
  • Co-schedule compute, storage, network
  • Transparency wrt delivered performance
  • Enforce access control on data
  • Allow representation of global resource
    allocation policies
  • Central Q How must Grid architecture be extended
    to support these functions?

33
Grid Protocols, Services, ToolsEnabling Sharing
in Virtual Organizations
  • Protocol-mediated access to resources
  • Mask local heterogeneities
  • Extensible to allow for advanced features
  • Negotiate multi-domain security, policy
  • Grid-enabled resources speak protocols
  • Multiple implementations are possible
  • Broad deployment of protocols facilitates
    creation of Services that provide integrated view
    of distributed resources
  • Tools use protocols and services to enable
    specific classes of applications

34
A Model Architecture for Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
Logical Collection and Logical File Name
MetacomputingDirectoryService
Selected Replica
Replica Selection
Performance Information Predictions
NetworkWeatherService
GridFTP Control Channel
Disk Cache
GridFTPDataChannel
Tape Library
Disk Array
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
35
Globus Toolkit Components
  • Two major Data Grid components
  • 1. Data Transport and Access
  • Common protocol
  • Secure, efficient, flexible, extensible data
    movement
  • Family of tools supporting this protocol
  • 2. Replica Management Architecture
  • Simple scheme for managing
  • multiple copies of files
  • collections of files
  • APIs, white papers http//www.globus.org

36
Motivation for a Common Data Access Protocol
  • Existing distributed data storage systems
  • DPSS, HPSS focus on high-performance access,
    utilize parallel data transfer, striping
  • DFS focus on high-volume usage, dataset
    replication, local caching
  • SRB connects heterogeneous data collections,
    uniform client interface, metadata queries
  • Problems
  • Incompatible (and proprietary) protocols
  • Each require custom client
  • Partitions available data sets and storage
    devices
  • Each protocol has subset of desired functionality

37
A Common, Secure, EfficientData Access Protocol
  • Common, extensible transfer protocol
  • Common protocol means all can interoperate
  • Decouple low-level data transfer mechanisms from
    the storage service
  • Advantages
  • New, specialized storage systems are
    automatically compatible with existing systems
  • Existing systems have richer data transfer
    functionality
  • Interface to many storage systems
  • HPSS, DPSS, file systems
  • Plan for SRB integration

38
A UniversalAccess/Transport Protocol
  • Suite of communication libraries and related
    tools that support
  • GSI, Kerberos security
  • Third-party transfers
  • Parameter set/negotiate
  • Partial file access
  • Reliability/restart
  • Large file support
  • Data channel reuse
  • All based on a standard, widely deployed protocol

39
And the Universal Protocol is GridFTP
  • Why FTP?
  • Ubiquity enables interoperation with many
    commodity tools
  • Already supports many desired features, easily
    extended to support others
  • Well understood and supported
  • We use the term GridFTP to refer to
  • Transfer protocol which meets requirements
  • Family of tools which implement the protocol
  • Note GridFTP gt FTP
  • Note that despite name, GridFTP is not restricted
    to file transfer!

40
Summary
Supercomputer
PPPL petrel
PPPL Pared Display Wall
Webservices are run (Data Analysis, Data mining)
Accessgrid is run here Chromium XPLIT Scirun or
VTK
CPU
AVS/Express IDL HTTPAccessgrid docking
Write a Comment
User Comments (0)
About PowerShow.com