Adaptive Computing on the Grid - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Computing on the Grid

Description:

AppLeS applications are 'point solutions' ... AppLeS 'templates' are user-level middleware designed to promote performance and ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 39
Provided by: casa45
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Computing on the Grid


1
Adaptive Computingon the Grid The AppLeS
Project
  • Francine Berman
  • U.C. San Diego

2
Computing Today
Wireless
MPPs
clusters
PCs
Workstations
3
The Computational Grid
  • Computational Grid is a collection of
    distributed, possibly heterogeneous resources
    which can be used as an ensemble to execute
    large-scale applications

4
Grid Computing
  • What is it?
  • Running parallel and distributed programs on
    multiple resources by coordinating tasks and data
  • Running any program on
  • whatever resources are available
  • resources which execute the program best
  • Why is Grid Computing important?
  • Why is Grid Computing hard?

5
Why is Grid Computing Important?
  • Internet/Grid increasingly serving as execution
    platform for large-scale computations
  • Web browsing large-scale distributed search
    application
  • Seti_at_home large-scale distributed data mining
    application
  • Walmart uses network to support massive inventory
    control applications
  • Remote instruments, visualization facilities
    connected to computers for analysis in real-time
    through networks
  • Large distributed databases being developed for
    science and engineering applications (Digital
    Sky, weather prediction, Digital Libraries, etc.)

6
Why is Grid Computing Hard - I
  • Difficult to achieve predictable program
    performance in dynamic, multi-user environments
  • To achieve performance, programs must adapt to
    deliverable resource performance at execution time

7
Why is Grid Computing hard - II
  • Lots of infrastructure needed
  • Basic services (Grid middleware)
  • Single login
  • Authentication
  • File transfer
  • Multi-protocol communication
  • User environments (User-level middleware)
  • Development environments and tools
  • Application scheduling and deployment
  • Performance monitoring, analysis, tuning

8
Grid Computing Lab Research
  • Adaptive Grid Computing
  • The AppLeS Project
  • User-level Middleware
  • APST
  • New Directions Megacomputing
  • Genome_at_home
  • and other projects

9
Adaptive Grid Computing with AppLeS
  • Joint project with Rich Wolski (U. Tenn.)
  • Goal
  • To develop self-scheduling Grid programs which
    can adapt to deliverable Grid resource
    performance at execution time
  • Approach
  • Develop adaptive application schedulers which can
  • predict program performance
  • use these predictions to determine the most
    performance-efficient schedule
  • deploy the best schedule on Grid resources
  • within a reasonable timeframe

10
How Does AppLeS Work?
AppLeS application self-scheduling
application
accessible resources
feasible resource sets
Grid Middleware
NWS
evaluatedschedules
Resources
best schedule
11
Network Weather Service (Wolski, U. Tenn.)
  • NWS
  • monitors current system state
  • provides best forecast of resource load from
    multiple models
  • NWS can provide dynamic resource information for
    AppLeS
  • NWS is stand-alone system

12
An Example AppLeS Simple SARA
  • SARA Synthetic Aperture Radar Atlas
  • application developed at JPL and SDSC
  • Goal Assemble/process files for users desired
    image
  • Radar organized into tracks
  • User selects track of interestand properties to
    be highlighted
  • Raw data is filtered and converted to an image
    format
  • Image displayed in web browser

13
Simple SARA
  • AppLeS focuses on resource selection problem
    Which site can deliver data the fastest?
  • Code developed by Alan Su

Network shared by variable number of users
Compute serveraccesses target tracksfrom one or
moredata servers
Data Servers
Compute Servers
Client
Data serversmay storereplicated files
. . .
14
Simple SARA
  • Simple Performance Model
  • Prediction of available bandwidth provided by
    Network Weather Service
  • Users goal is to optimize performance by
    minimizing file transfer time
  • Common assumptions (gt performs better)
  • vBNS gt general internet
  • geographically close sites gt geographically far
    sites
  • west coast sites gt east coast sites

15
Experimental Setup
  • Data for image accessed over shared networks
  • Data sets 1.4 - 3 megabytes, representative of
    SARA file sizes
  • Servers used for experiments
  • lolland.cc.gatech.edu
  • sitar.cs.uiuc
  • perigee.chpc.utah.edu
  • mead2.uwashington.edu
  • spin.cacr.caltech.edu

16
Experimental Results
  • Experiment with larger data set (3 Mbytes)
  • During this time-frame, farther sites provide
    data faster than closer site

17
9/21/98 Experiments
  • Clinton Grand Jury webcast commenced at trial 25
  • At beginning of experiment, general internet
    provides data faster than vBNS

18
Supercomputing 99
  • From Portland SC99 floor during experimental
    timeframe, UCSD and UTK generally closer than
    Oregon Graduate Institute (OGI) in Portland

19
AppLeS Applications
  • Weve developed many AppLeS applications
  • Simple SARA (Su)
  • Jacobi2D (Wolski)
  • PMHD3D (Dail, Obertelli)
  • MCell (Casanova)
  • INS2D (Zagorodnov, Casanova)
  • SOR (Schopf)
  • Tomography (Smallen, Frey, Cirne, Hayes)
  • Mandelbrot, Ray tracing (Shao)
  • Supercomputer AppLeS (Cirne)

20
User-level Middleware
  • AppLeS applications are point solutions
  • What if we want to develop schedulers for
    structurally similar classes of applications?
  • AppLeS templates are user-level middleware
    designed to promote performance and ease-of
    programming for application classes
  • Current GCL template activity
  • APST (Casanova)
  • AMWAT (Shao, Hayes)

21
Example template APST AppLeS Parameter Sweep
Template
  • Parameter Sweeps class of applications which
    are structured as multiple instances of an
    experiment with distinct parameter sets
  • Common application structure used in various
    fields of science and engineering (Monte Carlo
    and other simulations, etc.)
  • Joint work with Henri Casanova
  • Large number of independent tasks
  • First AppLeS Middleware package to be distributed
    to users

22
Example Parameter Sweep Application MCell
  • MCell General simulator for cellular
    microphysiology
  • Uses Monte Carlo diffusion and chemical reaction
    algorithm in 3D to simulate complex biochemical
    interactions of molecules
  • Simulation many experiments conducted on
    different parameter configurations
  • Experiments can be performed on separate machines
  • Driving application for APST middleware

23
APST Programming Model
experiments
  • Why isnt scheduling easy?

24
APST Programming Model
  • Why isnt scheduling easy?
  • Staging of large shared files may complicate
    the scheduling process
  • Post-processing must minimize file transfer
    time
  • Adaptive scheduling necessary to account for
    dynamic environment

25
APST Scheduling Approach
  • Contingency Scheduling Allocation developed by
    dynamically generating a Gantt chart for
    scheduling unassigned tasks between scheduling
    events
  • Basic skeleton
  • Compute the next scheduling event
  • Create a Gantt Chart G
  • For each computation and file transfer currently
    underway, compute an estimate of its completion
    time and fill in the corresponding slots in G
  • Select a subset T of the tasks that have not
    started execution
  • Until each host has been assigned enough work,
    heuristically assign tasks to hosts, filling in
    slots in G
  • Implement schedule

Network links
Hosts(Cluster 1)
Hosts(Cluster 2)
Resources
1 2 1 2
1 2
Scheduling event
Time
Scheduling event
G
26
APST Scheduling
  • Free Parameters
  • Frequency of scheduling events
  • Accuracy of task completion time estimates
  • Subset T of unexecuted tasks
  • Scheduling heuristic used

Network links
Hosts(Cluster 1)
Hosts(Cluster 2)
Resources
1 2 1 2
1 2
Scheduling event
Time
Scheduling event
G
27
APST Scheduling Heuristics
Scheduling Algorithms for APST Applications
  • Self-scheduling Algorithms
  • workqueue
  • workqueue w/ work stealing
  • workqueue w/ work duplication
  • ...
  • Gantt chart heuristics
  • MinMin, MaxMin
  • Sufferage, XSufferage
  • ...
  • Gantt Chart Algorithms
  • Min-min
  • Max-min
  • Sufferage,
  • XSufferage

? Easy to implement and quick ? No need for
performance predictions ? Insensitive to data
placement
? More difficult to implement ? Needs performance
predictions ? Sensitive to data placement
  • Simulation results (HCW 00 paper) show that
  • Heuristics are worth it
  • Xsufferage is good heuristic even when
    predictions are bad
  • Complex environments require better planning
    (Gantt chart)

28
APST Architecture
Command-line client
APST Client
Controller
interacts
triggers
Scheduler
APST Daemon
Actuator
Metadata Bookkeeper
store
Grid Resourcesand Middleware
29
APST
  • APST being used for
  • INS2D, INS3D (NASA Fluid Dynamics applications)
  • MCell (Salk, Biological Molecular Modeling
    application)
  • Tphot (SDSC, Proton Transport application)
  • NeuralObjects (NSI, Neural Network simulations)
  • CS simulation applications for our own research
    (Model validation)
  • Actuators APIs are interchangeable and mixable
  • (NetSolveIBP) (GRAMGASS) (GRAMNFS)
  • Scheduler allows for dynamic adaptation,
    multithreading
  • No Grid software is required
  • However lack of it (NWS, GASS, IBP) may lead to
    poorer performance
  • Details in SC00 paper
  • Will be released in next 2 months to PACI, IPG
    users

30
How Do We Know the APST Scheduling Heuristics are
Good?
  • Experiments
  • We ran large-sized instances of MCell across a
    distributed platform
  • We compared execution times for both workqueue
    and Gantt chart heuristics.

31
Results
  • Experimental Setting
  • Mcell simulation with 1,200 tasks
  • composed of 6 Monte-Carlo simulations
  • input files 1, 1, 20, 20, 100, and 100 MB
  • 4 scenarios
  • Initially
  • (a) all input files are only in Japan
  • (b) 100MB files replicated in California
  • (c) in addition, one 100MB file
  • replicated in Tennessee
  • (d) all input files replicated everywhere

32
New GCL Directions Megacomputing (Internet
Computing)
  • Grid programs
  • Can reasonably obtain some information about
    environment (NWS predictions, MDS, HBM, )
  • Can assume that login, authentication,
    monitoring, etc. available on target execution
    machines
  • Can assume that programs run to completion on
    execution platform
  • Mega-programs
  • Cannot assume any information about target
    environment
  • Must be structured to treat target device as
    unfriendly host (cannot assume ambient services)
  • Must be structured for throwaway end devices
  • Must be structured to run continuously

33
Success with Megacomputing
  • Seti_at_home
  • Over 2 million users
  • Sustains over 22 teraflops in production use
  • Entropia.com
  • Can we run non-embarrassingly parallel codes
    successfully at this scale?
  • Computational Biology, Genomics
  • Genome_at_home

34
Genome_at_home
  • Joint work with Derrick Kondo, Joy Xin, Matt
    DeVico
  • Application template for peer-to-peer platforms
  • First algorithm (Needleman-Wunsch Global
    Alignment) uses dynamic programming
  • Plan is to use template with additional genomics
    applications
  • Being developed for internet rather than Grid
    environment

G T A A G
A 0 0 1 1 0
T 0 1 0 1 1
A 0 0 2 2 1
C 0 0 1 2 2
C 0 0 1 2 2
G 1 0 1 2 3
Optimal alignments determined by traceback
35
Mega-programs
  • Provide the algorithmic/application counterpart
    for very large scale platforms
  • peer-to-peer platforms, Entropia, etc.
  • Condor flocks
  • Large free agent environments
  • Globus
  • New platforms networks of low-level devices,
    etc.
  • Different computing paradigm than MPP, Grid

Genome_at_home

DNAAlignment

Condor
Entropia
free agents
Globus
36
  • Grid Computing Lab
  • Fran Berman (berman_at_cs.ucsd.edu)
  • Henri Casanova
  • Walfredo Cirne
  • Holly Dail
  • Matt DeVico
  • Marcio Faerman
  • Jim Hayes
  • Derrick Kondo
  • Graziano Obertelli
  • Gary Shao
  • Otto Sievert
  • Shava Smallen
  • Alan Su
  • Atsuko Takefusa (visiting)
  • Renata Teixeira
  • Nadya Williams
  • Eric Wing
  • Qiao Xin
  • Thanks!
  • NSF, NPACI, NASA IPG, TITECH, UTK
  • Coming soon to a computer near you
  • Release of APST and AMWAT (AppLeS Master/ Worker
    Application Template) v0.1 by NPACI All-hands
    meeting (Feb 01)
  • First prototype of genome_at_home 2001
  • GCL software and papers http//gcl.ucsd.edu

37
Parameter Sweep Heuristics
  • Currently studying scheduling heuristics useful
    for parameter sweeps in Grid environments
  • HCW 2000 paper compares several heuristics
  • Min-Min task/resource that can complete the
    earliest is assigned first
  • Max-Min longest of task/earliest resource times
    assigned first
  • Sufferage task that would suffer most if given
    a poor schedule assigned
  • first, as computed by max
    - second max completion times
  • Extended Sufferage minimal completion times
    computed for task on
  • each cluster, sufferage
    heuristic applied to these
  • Workqueue randomly chosen task assigned first
  • Criteria for evaluation
  • How sensitive are heuristics to location of
    shared input files and cost of data transmission?
  • How sensitive are heuristics to inaccurate
    performance information?

38
APST/MCell Simulation Results with Quality of
Information
Write a Comment
User Comments (0)
About PowerShow.com