Programming Paradigms and Algorithms - PowerPoint PPT Presentation

About This Presentation

Programming Paradigms and Algorithms


Compute distance D from locus based on unit-sized application-specific benchmark ... Distributed UCSD/SDSC platform: Sparcs, RS6000, Alpha Farm, SP-2. Jacobi2D ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 27
Provided by: csewe4
Learn more at:


Transcript and Presenter's Notes

Title: Programming Paradigms and Algorithms

Programming Paradigms and Algorithms
  • WA 3.1, 3.2, p. 178, 5.1, 5.3.3, Chapter 6,
    9.2.8, 10.4.1,
  • Kumar 12.1.3
  • 1.       Berman, F., Wolski, R., Figueira, S.,
    Schopf, J. and Shao, G., "Application-Level
    Scheduling on Distributed Heterogeneous
    Networks," Proceedings of Supercomputing '96
  • (

Common Parallel Programming Paradigms
  • Embarrassingly parallel programs
  • Workqueue
  • Master/Slave programs
  • Monte Carlo methods
  • Regular, Iterative (Stencil) Computations
  • Pipelined Computations
  • Synchronous Computations

Regular, Iterative Stencil Applications
  • Many scientific applications have the format
  • Loop until some condition is true
  • Perform computation which involvescommunicati
    ng with N,E,W,S neighborsof a point (5 point
  • Convergence test?

Stencil Example Jacobi2D
  • Jacobi algorithm, also known as the method of
    simultaneous corrections is an iterative method
    for approximating the solution to a system of
    linear equations.
  • Jacobi addresses the problem of solving n linear
    equations in n unknowns Axb where the ith
    equation is
    or alternatively
  • as and bs are known, want to solve for xs

Jacobi 2D Strategy
  • Jacobi strategy iterates until the computation
    converges to an exact solution, i.e. each
    iteration we solve
  • where the values from the (k-1)st iteration are
    used to compute the values for the kth iteration
  • For important classes of problems, Jacobi
    converges to a good solution after O(logN)
    iterations Leighton
  • typically, the solution is approximated to a
    desired error threshold

Jacobi 2D
  • Equation is most efficient to solve when most as
    are 0
  • When most as entries are non-zero, A is dense
  • When most as are 0, A is sparse
  • Sparse matrices are regularly found in many
    scientific applications.

La Places Equation
  • Jacobi strategy can be used effectively to solve
    sparse linear equations.
  • One such equation is La Places equation
  • f is solved over a 2D space having coordinates x
    and y
  • If the distance between points (D) is small
    enough, f can be approximated by
  • These equations reduce to

La Places Equation
  • Note the relationship between the parameters
  • This forms a 4 point stencil
  • Any update will involve only local communication!

Solving La Place using Jacobi strategy
  • Note that in La Place equation, we want to solve
    for all f(x,y) which has 2 parameters
  • In Jacobi, we want to solve for x_i which has
    only 1 index
  • How do we convert f(x,y) into x_i ?
  • Associate x_is with the f(x,y)s by distributing
    them in the f 2D matrix in row-major (natural)
  • For an nxn matrix, there are then nxn x_is, so
    the A matrix will need to be (nxn)X(nxn)

Solving La Place using Jacobi strategy
  • When the x_is are distributed in the f 2D
    matrix in row-major (natural) order
  • becomes

Working backward
  • Now we want to work backward to find out what the
    A matrix and b vector will be for Jacobi
  • Our solution to the La Place equation gives us
    equations of this form
  • Rewriting, we get
  • So the b_i are 0, what is the A matrix?

Finding the A matrix
  • Each row only at most 5 non-zero entries
  • All entries on the diagonal are 4

N9, n3
Jacobi Implementation Strategy
  • An initial guess is made for all the unknowns,
    typically x_i b_i
  • New values for the x_is are calculated using the
    iteration equations
  • The updated values are substituted in the
    iteration equations and the process repeats again
  • The user provides a "termination condition" to
    end the iteration.
  • An example termination condition is
  • error

Data Parallel Jacobi 2D Pseudo-code
  • Initialize ghost regions
  • for (i1 iltN i)
  • x0i northi
  • xN1i southi
  • xi0 westi
  • xiN1 easti
  • Initialize matrix
  • for (i1 iltN i)
  • for (j1 jltN j)
  • xij initvalue
  • Iterative refinement of x until values converge
  • while (maxdiff gt CONVERG)
  • Update x array
  • for (i1 iltN i)
  • for (j1 jltN j)
  • newxij ¼ (xi-1j xij1 xi1j
  • Convergence test
  • maxdiff 0
  • for (i1 iltN i)

Jacobi2D Programming Issues
  • Synchronization
  • Should we synchronize between iterations?
    Between multiple iterations?
  • Should we tag information and let the application
    run asynchronously? (How bad can things get?)
  • How often should we test for convergence?
  • How important is it to know when were done?
  • How expensive is it?

Jacobi2D Programming Issues
  • Block decomposition or strip decomposition?
  • How big should the blocks or strips be?
  • How should blocks/strips be allocated to

Uniform Strip
Non-uniform Strip
HPF-Style Data Decompositions
  • 1D (Processors P0 P1 P2 P3 , tasks
  • Block decomposition (Task i allocated to
    processor floor (i/p))
  • Cyclic decomposition (Task i allocated to
    processor i mod p)
  • Block-Cycle Decomposition (Block i allocated to
    processor i mod p)

HPF-Style Data Decompositions
  • 2D
  • Each dimension partitioned by block, cyclic,
    block-cyclic or (do nothing)
  • Useful set of uniform decompositions can be

Block, Block
, Cyclic
Jacobi on a Cluster
  • If each partition of Jacobi is executed on a
    processor in a lab cluster, we can no longer
    assume we have dedicated processors and network
  • In particular, the performance exhibited by the
    cluster will vary over time and with load
  • How can we go about developing a
    performance-efficient implementation in a more
    dynamic environment?

Jacobi AppLeS
  • We developed an AppLeS application scheduler
  • AppLeS Application-Level Scheduler
  • AppLeS is scheduling agent which integrates with
    application to form a Grid-aware adaptive
    self-scheduling application
  • Targeted Jacobi AppLeS to a distributed clustered

How Does AppLeS Work?
AppLeS application self-scheduling
accessible resources
feasible resource sets
Grid Infrastructure
best schedule
Network Weather Service (Wolski, U. Tenn.)
  • The NWS provides dynamic resource information
    for AppLeS
  • NWS is stand-alone system
  • NWS
  • monitors current system state
  • provides best forecast of resource load from
    multiple models

Jacobi2D AppLeS Resource Selector
  • Feasible resources determined according to
    application-specific distance metric
  • Choose fastest machine as locus
  • Compute distance D from locus based on unit-sized
    application-specific benchmark
  • Dlocus,X compunit,locus-compunit,X
    commW,E columns
  • Resources sorted according to distance from
    locus, forming a desirability list
  • Feasible resource sets formed from initial
    subsets of sorted desirability list
  • Next step plan a schedule for each feasible
    resource set
  • Scheduler will choose schedule with best
    predicted execution time

Jacobi2D Performance Model and Schedule Planning
  • Execution time for ith strip
  • where load predicted percentage of CPU time
    available (NWS)
  • comm time to send and receive messages factored
  • predicted BW (NWS)
  • AppLeS uses time-balancing to determine best
    partition on a given set of resources
  • Solve
  • for

Jacobi2D Experiments
  • Experiments compare
  • Compile-time block HPF partitioning
  • Compile-time irregular strip partitioning no NWS
    forecasts, no resource selection
  • Run-time strip AppLeS
  • Runs for different partitioning methods performed
    back-to-back on production systems
  • Average execution time recorded
  • Distributed UCSD/SDSC platform Sparcs, RS6000,
    Alpha Farm, SP-2

Jacobi2D AppLeS Experiments
  • Representative Jacobi 2D AppLeS experiment
  • Adaptive scheduling leverages deliverable
    performance of contended system
  • Spike occurs when a gateway between PCL and SDSC
    goes down
  • Subsequent AppLeS experiments avoid slow link
Write a Comment
User Comments (0)