Title: Computational meshes, matrices, conjugate gradients, and mesh partitioning
1Computational meshes, matrices, conjugate
gradients, and mesh partitioning
- Some slides are from David Culler, Jim Demmel,
Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB
CS267
2Parallelism in Regular meshes
- Computing a Stencil on a regular mesh
- need to communicate mesh points near boundary to
neighboring processors. - Often done with ghost regions
- Surface-to-volume ratio keeps communication down,
but - Still may be problematic in practice
Implemented using ghost regions. Adds memory
overhead
3Irregular mesh NASA Airfoil in 2D
4Composite Mesh from a Mechanical Structure
5Adaptive Mesh Refinement (AMR)
- Adaptive mesh around an explosion
- Refinement done by calculating errors
- Parallelism
- Mostly between patches, dealt to processors
for load balance - May exploit some within a patch (SMP)
6Adaptive Mesh
fluid density
Shock waves in a gas dynamics using AMR (Adaptive
Mesh Refinement) See http//www.llnl.gov/CASC/SAM
RAI/
7Irregular mesh Tapered Tube (Multigrid)
8Challenges of Irregular Meshes for PDEs
- How to generate them in the first place
- E.g. Triangle, a 2D mesh generator by Jonathan
Shewchuk - 3D harder! E.g. QMD by Stephen Vavasis
- How to partition them
- ParMetis, a parallel graph partitioner
- How to design iterative solvers
- PETSc, a Portable Extensible Toolkit for
Scientific Computing - Prometheus, a multigrid solver for finite element
problems on irregular meshes - How to design direct solvers
- SuperLU, parallel sparse Gaussian elimination
- These are challenges to do sequentially, more so
in parallel
9Converting the Mesh to a Matrix
10Sparse matrix data structure (stored by rows)
31 53 59 41 26
31 0 53
0 59 0
41 26 0
1 3 2 1 2
- Full
- 2-dimensional array of real or complex numbers
- (nrowsncols) memory
- Sparse
- compressed row storage
- about (2nzs nrows) memory
11Distributed row sparse matrix data structure
P0
P1
- Each processor stores
- of local nonzeros
- range of local rows
- nonzeros in CSR form
P2
Pn
12Conjugate gradient iteration to solve Axb
x0 0, r0 b, d0 r0 for k 1, 2,
3, . . . ak (rTk-1rk-1) / (dTk-1Adk-1)
step length xk xk-1 ak dk-1
approx solution rk rk-1 ak
Adk-1 residual ßk
(rTk rk) / (rTk-1rk-1) improvement dk
rk ßk dk-1
search direction
- One matrix-vector multiplication per iteration
- Two vector dot products per iteration
- Four n-vectors of working storage
13Parallel Dense Matrix-Vector Product (Review)
- y Ax, where A is a dense matrix
- Layout
- 1D by rows
- Algorithm
- Foreach processor j
- Broadcast X(j)
- Compute A(p)x(j)
- A(i) is the n by n/p block row that processor Pi
owns - Algorithm uses the formula
- Y(i) A(i)X Sj A(i)X(j)
P0 P1 P2 P3
x
P0 P1 P2 P3
y
14Parallel sparse matrix-vector product
- Lay out matrix and vectors by rows
- y(i) sum(A(i,j)x(j))
- Only compute terms with A(i,j) ? 0
- Algorithm
- Each processor i
- Broadcast x(i)
- Compute y(i) A(i,)x
- Optimizations
- Only send each proc the parts of x it needs, to
reduce comm - Reorder matrix for better locality by graph
partitioning - Worry about balancing number of nonzeros /
processor, if rows have very different nonzero
counts
15Other memory layouts for matrix-vector product
- Column layout of the matrix eliminates the
broadcast - But adds a reduction to update the destination
same total comm - Blocked layout uses a broadcast and reduction,
both on only sqrt(p) processors less total
comm - Blocked layout has advantages in multicore /
Cilk too
16Irregular mesh NASA Airfoil in 2D
17Graphs and Sparse Matrices
- Sparse matrix is a representation of a (sparse)
graph
1 2 3 4 5 6
1 1 1 2 1 1
1 3
1 1 1 4 1
1 5 1 1
6 1 1
3
2
4
1
5
6
- Matrix entries are edge weights
- Number of nonzeros per row is the vertex degree
- Edges represent data dependencies in
matrix-vector multiplication
18Graph partitioning
- Assigns subgraphs to processors
- Determines parallelism and locality.
- Tries to make subgraphs all same size (load
balance) - Tries to minimize edge crossings (communication).
- Exact minimization is NP-complete.
19Link analysis of the web
- Web page vertex
- Link directed edge
- Link matrix Aij 1 if page i links to page j
20Web graph PageRank (Google) Brin,
Page
An important page is one that many important
pages point to.
- Markov process follow a random link most of the
time otherwise, go to any page at random. - Importance stationary distribution of Markov
process. - Transition matrix is pA (1-p)ones(size(A)),
scaled so each column sums to 1. - Importance of page i is the i-th entry in the
principal eigenvector of the transition matrix. - But the matrix is 1,000,000,000,000 by
1,000,000,000,000.
21A Page Rank Matrix
- Importance ranking of web pages
- Stationary distribution of a Markov chain
- Power method matvec and vector arithmetic
- MatlabP page ranking demo (from SC03) on
a web crawl of mit.edu (170,000 pages)
22Social Network Analysis in Matlab 1993
Co-author graph from 1993 Householdersymposium
23Social Network Analysis in Matlab 1993
Sparse Adjacency Matrix
- Which author hasthe most collaborators?
- gtgtcount,author max(sum(A))
- count 32
- author 1
- gtgtname(author,)
- ans Golub
24Social Network Analysis in Matlab 1993
- Have Gene Golub and Cleve Moler ever been
coauthors? - gtgt A(Golub,Moler)
- ans 0
- No.
- But how many coauthors do they have in common?
- gtgt AA A2
- gtgt AA(Golub,Moler)
- ans 2
- And who are those common coauthors?
- gtgt name( find ( A(,Golub) . A(,Moler) ), )
- ans
- Wilkinson
- VanLoan