Computational meshes, matrices, conjugate gradients, and mesh partitioning presentation

About This Presentation

Transcript and Presenter's Notes

Title: Computational meshes, matrices, conjugate gradients, and mesh partitioning

1
Computational meshes, matrices, conjugate
gradients, and mesh partitioning

Some slides are from David Culler, Jim Demmel,
Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB
CS267

2
Parallelism in Regular meshes

Computing a Stencil on a regular mesh
need to communicate mesh points near boundary to
neighboring processors.
Often done with ghost regions
Surface-to-volume ratio keeps communication down,
but
Still may be problematic in practice

Implemented using ghost regions. Adds memory
overhead
3
Irregular mesh NASA Airfoil in 2D
4
Composite Mesh from a Mechanical Structure
5
Adaptive Mesh Refinement (AMR)

Adaptive mesh around an explosion
Refinement done by calculating errors
Parallelism
Mostly between patches, dealt to processors
for load balance
May exploit some within a patch (SMP)

6
Adaptive Mesh
fluid density
Shock waves in a gas dynamics using AMR (Adaptive
Mesh Refinement) See http//www.llnl.gov/CASC/SAM
RAI/
7
Irregular mesh Tapered Tube (Multigrid)
8
Challenges of Irregular Meshes for PDEs

How to generate them in the first place
E.g. Triangle, a 2D mesh generator by Jonathan
Shewchuk
3D harder! E.g. QMD by Stephen Vavasis
How to partition them
ParMetis, a parallel graph partitioner
How to design iterative solvers
PETSc, a Portable Extensible Toolkit for
Scientific Computing
Prometheus, a multigrid solver for finite element
problems on irregular meshes
How to design direct solvers
SuperLU, parallel sparse Gaussian elimination
These are challenges to do sequentially, more so
in parallel

9
Converting the Mesh to a Matrix
10
Sparse matrix data structure (stored by rows)
31 53 59 41 26
31 0 53
0 59 0
41 26 0
1 3 2 1 2

Full
2-dimensional array of real or complex numbers
(nrowsncols) memory

Sparse
compressed row storage
about (2nzs nrows) memory

11
Distributed row sparse matrix data structure
P0
P1

Each processor stores
of local nonzeros
range of local rows
nonzeros in CSR form

P2
Pn
12
Conjugate gradient iteration to solve Axb
x0 0, r0 b, d0 r0 for k 1, 2,
3, . . . ak (rTk-1rk-1) / (dTk-1Adk-1)
step length xk xk-1 ak dk-1
approx solution rk rk-1 ak
Adk-1 residual ßk
(rTk rk) / (rTk-1rk-1) improvement dk
rk ßk dk-1
search direction

One matrix-vector multiplication per iteration
Two vector dot products per iteration
Four n-vectors of working storage

13
Parallel Dense Matrix-Vector Product (Review)

y Ax, where A is a dense matrix
Layout
1D by rows
Algorithm
Foreach processor j
Broadcast X(j)
Compute A(p)x(j)
A(i) is the n by n/p block row that processor Pi
owns
Algorithm uses the formula
Y(i) A(i)X Sj A(i)X(j)

P0 P1 P2 P3
x
P0 P1 P2 P3
y
14
Parallel sparse matrix-vector product

Lay out matrix and vectors by rows
y(i) sum(A(i,j)x(j))
Only compute terms with A(i,j) ? 0
Algorithm
Each processor i
Broadcast x(i)
Compute y(i) A(i,)x
Optimizations
Only send each proc the parts of x it needs, to
reduce comm
Reorder matrix for better locality by graph
partitioning
Worry about balancing number of nonzeros /
processor, if rows have very different nonzero
counts

15
Other memory layouts for matrix-vector product

Column layout of the matrix eliminates the
broadcast
But adds a reduction to update the destination
same total comm
Blocked layout uses a broadcast and reduction,
both on only sqrt(p) processors less total
comm
Blocked layout has advantages in multicore /
Cilk too

16
Irregular mesh NASA Airfoil in 2D
17
Graphs and Sparse Matrices

Sparse matrix is a representation of a (sparse)
graph

1 2 3 4 5 6
1 1 1 2 1 1
1 3
1 1 1 4 1
1 5 1 1
6 1 1
3
2
4
1
5
6

Matrix entries are edge weights
Number of nonzeros per row is the vertex degree
Edges represent data dependencies in
matrix-vector multiplication

18
Graph partitioning

Assigns subgraphs to processors
Determines parallelism and locality.
Tries to make subgraphs all same size (load
balance)
Tries to minimize edge crossings (communication).
Exact minimization is NP-complete.

19
Link analysis of the web

Web page vertex
Link directed edge
Link matrix Aij 1 if page i links to page j

20
Web graph PageRank (Google) Brin,
Page
An important page is one that many important
pages point to.

Markov process follow a random link most of the
time otherwise, go to any page at random.
Importance stationary distribution of Markov
process.
Transition matrix is pA (1-p)ones(size(A)),
scaled so each column sums to 1.
Importance of page i is the i-th entry in the
principal eigenvector of the transition matrix.
But the matrix is 1,000,000,000,000 by
1,000,000,000,000.

21
A Page Rank Matrix

Importance ranking of web pages
Stationary distribution of a Markov chain
Power method matvec and vector arithmetic
MatlabP page ranking demo (from SC03) on
a web crawl of mit.edu (170,000 pages)

22
Social Network Analysis in Matlab 1993
Co-author graph from 1993 Householdersymposium
23
Social Network Analysis in Matlab 1993
Sparse Adjacency Matrix

Which author hasthe most collaborators?
gtgtcount,author max(sum(A))
count 32
author 1
gtgtname(author,)
ans Golub

24
Social Network Analysis in Matlab 1993

Have Gene Golub and Cleve Moler ever been
coauthors?
gtgt A(Golub,Moler)
ans 0
No.
But how many coauthors do they have in common?
gtgt AA A2
gtgt AA(Golub,Moler)
ans 2
And who are those common coauthors?
gtgt name( find ( A(,Golub) . A(,Moler) ), )
ans
Wilkinson
VanLoan

Write a Comment

User Comments (0)

About PowerShow.com

Computational meshes, matrices, conjugate gradients, and mesh partitioning PowerPoint PPT Presentation