Title: CS 290H 24 October Parallel computing and preconditioning
1CS 290H 24 OctoberParallel computing and
preconditioning
- Read Chen Toledo, Vaidya's
preconditioners Implementation and experimental
study (See references page.) - Homework 2 on web page by end of today, due Mon 7
Nov. - Parallel CG, matrix-vector multiplication, graph
partitioning - Parallel triangular solves, graph coloring for
ILU - Introduction to sparse approximate inverse
preconditioners
2Preconditioned conjugate gradient iteration
x0 0, r0 b, d0 B-1 r0, y0
B-1 r0 for k 1, 2, 3, . . . ak
(yTk-1rk-1) / (dTk-1Adk-1) step length xk
xk-1 ak dk-1 approx
solution rk rk-1 ak Adk-1
residual yk B-1 rk
preconditioning
solve ßk (yTk rk) / (yTk-1rk-1)
improvement dk yk ßk dk-1
search direction
- Several vector inner products per iteration (easy
to parallelize) - One matrix-vector multiplication per iteration
(medium to parallelize) - One solve with preconditioner per iteration (hard
to parallelize)
3Matrix-vector product Parallel implementation
- Lay out matrix and vectors by rows
- Hard part is matrix-vector product
y Ax - Algorithm
- Each processor j
- Broadcast x(j)
- Compute y(j) A(j,)x
- May send more of x than needed
- Partition / reorder matrix to reduce communication
4Graph partitioning in theory
- If G is a planar graph with n vertices, there
exists a set of at most sqrt(6n) vertices whose
removal leaves no connected component with more
than 2n/3 vertices. (Planar graphs have
sqrt(n)-separators.) - Well-shaped finite element meshes in 3
dimensions have n2/3 - separators. - Also some other classes of graphs trees, graphs
of bounded genus, chordal graphs,
bounded-excluded-minor graphs, - Mostly these theorems come with efficient
algorithms, but they arent used much.
5Graph partitioning in practice
- Graph partitioning heuristics have been an active
research area for many years, often motivated by
partitioning for parallel computation. See CS
240A. - Some techniques
- Spectral partitioning (uses eigenvectors of
Laplacian matrix of graph) - Geometric partitioning (for meshes with specified
vertex coordinates) - Iterative-swapping (Kernighan-Lin,
Fiduccia-Matheysses) - Breadth-first search (GLN 7.3.3, fast but dated)
- Many popular modern codes (e.g. Metis, Chaco) use
multilevel iterative swapping - Matlab graph partitioning toolbox see course web
page
6Parallel Incomplete Cholesky and ILU Issues
- Computing the preconditioner
- Parallel direct methods well developed
- But IC/ILU is sparser gt harder to speed up
- Still, you only have to do it once
- Applying the preconditioner
- Triangular solves are not very parallel
- Reordering by graph coloring (see example)
- But the orderings are not great for convergence
7Sparse approximate inverses
- Compute B-1 ? A explicitly
- Minimize A B-1 I F (in parallel, by
columns) - Variants factored form of B-1, more fill, . .
- Good very parallel, seldom breaks down
- Bad effectiveness varies widely