Programming Paradigms and Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Programming Paradigms and Algorithms

Description:

Compute distance D from locus based on unit-sized application-specific benchmark ... Distributed UCSD/SDSC platform: Sparcs, RS6000, Alpha Farm, SP-2. Jacobi2D ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 27

Provided by: csewe4

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Programming Paradigms and Algorithms

1
Programming Paradigms and Algorithms

WA 3.1, 3.2, p. 178, 5.1, 5.3.3, Chapter 6,
9.2.8, 10.4.1,
Kumar 12.1.3
1. Berman, F., Wolski, R., Figueira, S.,
Schopf, J. and Shao, G., "Application-Level
Scheduling on Distributed Heterogeneous
Networks," Proceedings of Supercomputing '96
(httpapples.ucsd.edu)

2
Common Parallel Programming Paradigms

Embarrassingly parallel programs
Workqueue
Master/Slave programs
Monte Carlo methods
Regular, Iterative (Stencil) Computations
Pipelined Computations
Synchronous Computations

3
Regular, Iterative Stencil Applications

Many scientific applications have the format
Loop until some condition is true
Perform computation which involvescommunicati
ng with N,E,W,S neighborsof a point (5 point
stencil)
Convergence test?

4
Stencil Example Jacobi2D

Jacobi algorithm, also known as the method of
simultaneous corrections is an iterative method
for approximating the solution to a system of
linear equations.
Jacobi addresses the problem of solving n linear
equations in n unknowns Axb where the ith
equation is
or alternatively
as and bs are known, want to solve for xs

5
Jacobi 2D Strategy

Jacobi strategy iterates until the computation
converges to an exact solution, i.e. each
iteration we solve
where the values from the (k-1)st iteration are
used to compute the values for the kth iteration
For important classes of problems, Jacobi
converges to a good solution after O(logN)
iterations Leighton
typically, the solution is approximated to a
desired error threshold

6
Jacobi 2D

Equation is most efficient to solve when most as
are 0
When most as entries are non-zero, A is dense
When most as are 0, A is sparse
Sparse matrices are regularly found in many
scientific applications.

7
La Places Equation

Jacobi strategy can be used effectively to solve
sparse linear equations.
One such equation is La Places equation
f is solved over a 2D space having coordinates x
and y
If the distance between points (D) is small
enough, f can be approximated by
These equations reduce to

8
La Places Equation

Note the relationship between the parameters
This forms a 4 point stencil
Any update will involve only local communication!

9
Solving La Place using Jacobi strategy

Note that in La Place equation, we want to solve
for all f(x,y) which has 2 parameters
In Jacobi, we want to solve for x_i which has
only 1 index
How do we convert f(x,y) into x_i ?
Associate x_is with the f(x,y)s by distributing
them in the f 2D matrix in row-major (natural)
order
For an nxn matrix, there are then nxn x_is, so
the A matrix will need to be (nxn)X(nxn)

10
Solving La Place using Jacobi strategy

When the x_is are distributed in the f 2D
matrix in row-major (natural) order
becomes

11
Working backward

Now we want to work backward to find out what the
A matrix and b vector will be for Jacobi
Our solution to the La Place equation gives us
equations of this form
Rewriting, we get
So the b_i are 0, what is the A matrix?

12
Finding the A matrix

Each row only at most 5 non-zero entries
All entries on the diagonal are 4

N9, n3
13
Jacobi Implementation Strategy

An initial guess is made for all the unknowns,
typically x_i b_i
New values for the x_is are calculated using the
iteration equations
The updated values are substituted in the
iteration equations and the process repeats again
The user provides a "termination condition" to
end the iteration.
An example termination condition is
error
threshold.

14
Data Parallel Jacobi 2D Pseudo-code

Initialize ghost regions
for (i1 iltN i)
x0i northi
xN1i southi
xi0 westi
xiN1 easti
Initialize matrix
for (i1 iltN i)
for (j1 jltN j)
xij initvalue
Iterative refinement of x until values converge
while (maxdiff gt CONVERG)
Update x array
for (i1 iltN i)
for (j1 jltN j)
newxij ¼ (xi-1j xij1 xi1j
xij-1)
Convergence test
maxdiff 0
for (i1 iltN i)

15
Jacobi2D Programming Issues

Synchronization
Should we synchronize between iterations?
Between multiple iterations?
Should we tag information and let the application
run asynchronously? (How bad can things get?)
How often should we test for convergence?
How important is it to know when were done?
How expensive is it?

16
Jacobi2D Programming Issues

Block decomposition or strip decomposition?
How big should the blocks or strips be?
How should blocks/strips be allocated to
processors?

Block
Uniform Strip
Non-uniform Strip
17
HPF-Style Data Decompositions

1D (Processors P0 P1 P2 P3 , tasks
0-15)
Block decomposition (Task i allocated to
processor floor (i/p))
Cyclic decomposition (Task i allocated to
processor i mod p)
Block-Cycle Decomposition (Block i allocated to
processor i mod p)

Block
Cyclic
Block-cyclic
18
HPF-Style Data Decompositions

2D
Each dimension partitioned by block, cyclic,
block-cyclic or (do nothing)
Useful set of uniform decompositions can be
constructed

Block, Block
Block,
, Cyclic
19
Jacobi on a Cluster

If each partition of Jacobi is executed on a
processor in a lab cluster, we can no longer
assume we have dedicated processors and network
In particular, the performance exhibited by the
cluster will vary over time and with load
How can we go about developing a
performance-efficient implementation in a more
dynamic environment?

20
Jacobi AppLeS

We developed an AppLeS application scheduler
AppLeS Application-Level Scheduler
AppLeS is scheduling agent which integrates with
application to form a Grid-aware adaptive
self-scheduling application
Targeted Jacobi AppLeS to a distributed clustered
environment

21
How Does AppLeS Work?
AppLeS application self-scheduling
application
accessible resources
feasible resource sets
Grid Infrastructure
NWS
evaluatedschedules
Resources
best schedule
22
Network Weather Service (Wolski, U. Tenn.)

The NWS provides dynamic resource information
for AppLeS
NWS is stand-alone system
NWS
monitors current system state
provides best forecast of resource load from
multiple models

23
Jacobi2D AppLeS Resource Selector

Feasible resources determined according to
application-specific distance metric
Choose fastest machine as locus
Compute distance D from locus based on unit-sized
application-specific benchmark
Dlocus,X compunit,locus-compunit,X
commW,E columns
Resources sorted according to distance from
locus, forming a desirability list
Feasible resource sets formed from initial
subsets of sorted desirability list
Next step plan a schedule for each feasible
resource set
Scheduler will choose schedule with best
predicted execution time

24
Jacobi2D Performance Model and Schedule Planning

Execution time for ith strip
where load predicted percentage of CPU time
available (NWS)
comm time to send and receive messages factored
by
predicted BW (NWS)
AppLeS uses time-balancing to determine best
partition on a given set of resources
Solve
for

25
Jacobi2D Experiments

Experiments compare
Compile-time block HPF partitioning
Compile-time irregular strip partitioning no NWS
forecasts, no resource selection
Run-time strip AppLeS
partitioning
Runs for different partitioning methods performed
back-to-back on production systems
Average execution time recorded
Distributed UCSD/SDSC platform Sparcs, RS6000,
Alpha Farm, SP-2

26
Jacobi2D AppLeS Experiments