Domain decomposition in parallel computing - PowerPoint PPT Presentation

About This Presentation

Title:

Domain decomposition in parallel computing

Description:

Domain decomposition in parallel computing Ashok Srinivasan ... * Three phases Graph coarsening Combine vertices to create a smaller graph Example: ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 34

Provided by: asri9

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Domain decomposition in parallel computing

1
Domain decomposition in parallel computing
COT 5410 Spring 2004

Ashok Srinivasan
www.cs.fsu.edu/asriniva
Florida State University

2
Outline

Background
Geometric partitioning
Graph partitioning
Static
Dynamic
Important points

3
Background

Tasks in a parallel computation need access to
certain data
Same datum may be needed by multiple tasks
Example In matrix-vector multiplication, b2 is
needed for the computation of all ci2, 1 lt i lt n
If a process does not own a datum needed by its
task, then it has to get it from a process that
has it
This communication is expensive
Aims of domain decomposition
Distribute the data in such a manner that the
communication required is minimized
Ensure that the computational loads on
processes are balanced

4
Domain decomposition example

Finite difference computation
New value of a node depends on old values of its
neighbors

We want to divide the nodes amongst the processes
so that
Communication is minimized
Measure of partition quality
Computational load is evenly balanced

5
Geometric partitioning

Partition a set of points
Uses only coordinate information
Balances the load
The heuristic tries to ensure that communication
costs are low
Algorithms are typically fast, but partition not
of high quality
Examples
Orthogonal recursive bisection
Inertial
Space filling curves

6
Orthogonal recursive bisection

Recursively bisect orthogonal to the longest
dimension
Assume communication is proportional to the
surface area of the domain, and aligned with
coordinate axes
Recursive bisection
Divide into two pieces, keeping load balanced
Apply recursively, until desired number of
partitions obtained

7
Inertial

ORB may not be effective if cuts along the x, y,
or z directions are not good ones
Inertial
Recursively bisect orthogonal to the inertial axis

8
Space filling curves

Space filling curves
A continuous curve that fills the space
Order the points based on their relative position
on the curve
Choose a curve that preserves proximity
Points that are close in space should be close in
the ordering too
Example
Hilbert curve

9
Hilbert curve

Sources
http//www.dcs.napier.ac.uk/andrew/hilbert.html
http//www.fractalus.com/kerry/tutorials/hilbert/h
ilbert-tutorial.html

10
Domain decomposition with a space filling curve

Order points based on their position on the curve
Divide into P parts
P is the number of processes
Space filling curves can be used in adaptive
computations too
They can be extended to higher dimensions too

11
Graph partitioning

Model as graph partitioning
Graph G (V, E)
Each task is represented by a vertex
A weight can be used to represent the
computational effort
An edge exists between tasks if one needs data
owned by the other
Weights can be associated with edges too
Goal
Partition vertices into P parts such that each
partition has equal vertex weights
Minimize the weights of edges cut
Problem is NP hard
Edge cut metric
Judge the quality of the partitioning by the
number of edges cut

12
Static graph partitioning

Combinatorial
Levelized nested dissection
Kernighan-Lin/Feduccia-Matheyses
Spectral partitioning
Multi-level methods

13
Combinatorial partitioning

Use only connectivity information
Examples
Levelized nested dissection
Kernighan-Lin/Feduccia-Matheyses

14
Levelized nested dissection (LND)

Idea is similar to the geometric methods
But cannot use coordinate information
Instead of projecting vertices along the longest
axis, order them based on distance from a vertex
that may be one extreme of the longest dimension
of a graph
Pseudo-peripheral vertex
Perform a breadth-first search, starting from an
arbitrary vertex
The vertex that is encountered last might be a
good approximation to a peripheral vertex

15
LND example Finding a pseudoperipheral vertex
3
2
3
2
1
3
1
2
Initial vertex
1
3
4
Pseudoperipheral vertex
16
LND example Partitioning
5
6
3
4
5
2
5
4
2
3
1
Partition
Initial vertex
Recursively bisect the subgraphs
17
Kernighan-Lin/Fiduccia-Matheyses

Refines an existing partition
Kernighan-Lin
Consider pairs of vertices from different
partitions
Choose a pair whose swapping will result in the
best improvement in partition quality
The best improvement may actually be a worsening
Perform several passes
Choose best partition among those encountered
Fiduccia-Matheyses
Similar but more efficient
Boundary Kernighan-Lin
Consider only boundary vertices to swap
... and many other variants

18
Kernighan-Lin example
Swap these
Better partition Edge cut 3
Existing partition Edge cut 4
19
Spectral method

Based on the observation that a Fiedler vector of
a graph contains connectivity information
Laplacian of a graph L
lii di (degree of vertex i)
lij -1 if edge i,j exists, otherwise 0
Smallest eigenvalue of L is 0 with eigenvector
all 1
All other eigenvalues are positive for a
connected graph
Fiedler vector
Eigenvector corresponding to the second smallest
eigenvalue

20
Fiedler vector

Consider a partitioning of V into A and B
Let yi 1 if vi e A, and yi -1 if vi e B
For load balance, Si yi 0
Also Seij e E (yi-yj)2 4 x number of edges
across partitions
Also, yTLy Si di yi2 2 Seij e E yiyj
Seij e E (yi-yj)2

21
Optimization problem

The optimal partition is obtain by solving
Minimize yTLy
Constraints
yi e -1,1
Si yi 0
This is NP hard
Relaxed problem
Minimize yTLy
Constraints
Si yi 0
Add a constraint on a norm of y, example, y2
n0.5
Note
(1, 1, ..., 1)T is an eigenvector with eigenvalue
0
For a connected graph, all other eigenvalues are
positive and orthogonal to this eigenvector,
which implies Si yi 0
The objective function is minimized by a Fiedler
vector

22
Spectral algorithm

Find a Fiedler vector of the Laplacian of the
graph
Note that the Fiedler value (the second smallest
eigenvalue) yields a lower bound on the
communication cost, when the load is balanced
From the Fiedler vector, bisect the graph
Let all vertices with components in the Fiedler
vector greater than the median be in one
component, and the rest in the other
Recursively apply this to each partition
Note Finding the Fiedler vector of a large graph
can be time consuming

23
Multilevel methods

Idea
It takes time to partition a large graph
So partition a small graph instead!
Three phases
Graph coarsening
Combine vertices to create a smaller graph
Example Find a suitable matching
Apply this recursively until a suitably small
graph is obtained
Partitioning
Use spectral or another partitioning algorithm to
partition the small graph
Multilevel refinement
Uncoarsen the graph to get a partitioning of the
original graph
At each level, perform some graph refinement

24
Multilevel example(without refinement)
9
10
5
7
3
11
2
4
8
12
16
1
1
6
15
13
14
25
Multilevel example(without refinement)
9
10
5
7
3
11
2
4
8
12
16
1
1
6
15
13
14
26
Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
1
12
16
1
1
1
6
15
1
13
14
27
Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
1
12
16
1
1
6
15
1
13
14
28
Multilevel example(without refinement)
9
10
5
7
3
1
1
2
11
1
2
1
2
2
4
8
1
12
16
1
1
6
15
1
13
14
1
2
2
1
29
Dynamic partitioning

We have an initial partitioning
Now, the graph changes
Determine a good partition, fast
Also minimize the number of vertices that need
to be moved
Examples
PLUM
Jostle
Diffusion

30
PLUM

Partition based on the initial mesh
Vertex and edge weights alone changed
Map partitions to processors
Use more partitions than processors
Ensures finer granularity
Compute a similarity matrix based on data already
on a process
Measures savings on data redistribution cost for
each (process, partition) pair
Choose assignment of partitions to processors
Example Maximum weight matching
Duplicate each processor of partitions/P times
Alternative Greedy approximation algorithm
Assign in order of maximum similarity value
http//citeseer.nj.nec.com/oliker98plum.html

31
JOSTLE

Use Hu and Blakes scheme for load balancing
Solve Lx b using Conjugate Gradient
L Laplacian of processor graph, bi Weight on
process Pi Average weight
Move max(xi-xj, 0) weight between Pi and Pj
Leads to balanced load
Equivalent to Pi sending xi load to each neighbor
j, and each neighbor Pj sending xj to Pi
Net loss in load for Pi di xi - Sneighborj xj
L(i)x bi
where L(i) is row i of L, and di is degree of i
New load for Pi weight on Pi - bi average
weight
Leads to minimum L2 norm of load moved
Using max(xi-xj, 0)
Select vertices to move, based on relative gain
http//citeseer.nj.nec.com/walshaw97parallel.html

32
Diffusion

Involves only communication with neighbors
A simple scheme
Processor Pi repeatedly sends a wi weight to each
neighbor
wi weight on Pi
wk (I a L) wk-1 , wk weight vector at
iteration k
Simple criteria exist for choosing a to ensure
convergence
Example a 0.5/(maxi di),
More sophisticated schemes exist

33
Important points

Goals of domain decomposition
Balance the load
Minimize communication
Space filling curves
Graph partitioning model
Spectral method
Relax NP hard integer optimization to floating
point, and then discretize to get approximate
integer solution
Multilevel methods
Three phases
Dynamic partitioning additional requirements
Use old solution to find new one fast
Minimize number of vertices moved