Title: The Landscape of Axb Solvers
1The Landscape of Axb Solvers
More Robust
Less Storage (if sparse)
D
2Conjugate gradient iteration
x0 0, r0 b, p0 r0 for k 1, 2,
3, . . . ak (rTk-1rk-1) / (pTk-1Apk-1)
step length xk xk-1 ak pk-1
approx solution rk rk-1 ak
Apk-1 residual ßk
(rTk rk) / (rTk-1rk-1) improvement pk
rk ßk pk-1
search direction
- One matrix-vector multiplication per iteration
- Two vector dot products per iteration
- Four n-vectors of working storage
3Conjugate gradient Krylov subspaces
- Eigenvalues Au ?u ?1, ?2 ,
. . ., ?n - Cayley-Hamilton theorem
- (A ?1I)(A ?2I) (A ?nI) 0
- Therefore S ciAi 0 for some ci
- so A-1 S (ci/c0) Ai1
- Krylov subspace
- Therefore if Ax b, then x A-1 b and
- x ? span (b, Ab, A2b, . . ., An-1b) Kn (A, b)
0 ? i ? n
1 ? i ? n
4Conjugate gradient Orthogonal sequences
- Krylov subspace Ki (A, b) span (b, Ab, A2b, .
. ., Ai-1b) - Conjugate gradient algorithm for i 1, 2, 3,
. . . find xi ? Ki (A, b) such that ri
(Axi b) ? Ki (A, b) - Notice ri ? Ki1 (A, b), so ri ? rj for all
j lt i - Similarly, the directions are
A-orthogonal (xi xi-1 )TA (xj xj-1 ) 0 - The magic Short recurrences. . . A is symmetric
gt can get next residual and direction from
the previous one, without saving them all.
5Conjugate gradient Convergence
- In exact arithmetic, CG converges in n steps
(completely unrealistic!!) - Accuracy after k steps of CG is related to
- consider polynomials of degree k that are equal
to 1 at 0. - how small can such a polynomial be at all the
eigenvalues of A? - Thus, eigenvalues close together are good.
- Condition number ?(A) A2 A-12
?max(A) / ?min(A) - Residual is reduced by a constant factor by
O(?1/2(A)) iterations of CG.
6(Matlab demo)
- CG on grid5(15) and bcsstk08
- n steps of CG on bcsstk08
7Conjugate gradient Parallel implementation
- Lay out matrix and vectors by rows
- Hard part is matrix-vector product
y Ax - Algorithm
- Each processor j
- Broadcast x(j)
- Compute y(j) A(j,)x
- May send more of x than needed
- Partition / reorder matrix to reduce communication
8(Matlab demo)
- 2-way partition of eppstein mesh
- 8-way dice of eppstein mesh
9Preconditioners
- Suppose you had a matrix B such that
- condition number ?(B-1A) is small
- By z is easy to solve
- Then you could solve (B-1A)x B-1b instead of Ax
b - B A is great for (1), not for (2)
- B I is great for (2), not for (1)
- Domain-specific approximations sometimes work
- B diagonal of A sometimes works
- Or, bring back the direct methods technology. . .
10(Matlab demo)
- bcsstk08 with diagonal precond
11Incomplete Cholesky factorization (IC, ILU)
- Compute factors of A by Gaussian elimination,
but ignore fill - Preconditioner B RTR ? A, not formed explicitly
- Compute B-1z by triangular solves (in time
nnz(A)) - Total storage is O(nnz(A)), static data structure
- Either symmetric (IC) or nonsymmetric (ILU)
12(Matlab demo)
13Incomplete Cholesky and ILU Variants
- Allow one or more levels of fill
- unpredictable storage requirements
- Allow fill whose magnitude exceeds a drop
tolerance - may get better approximate factors than levels of
fill - unpredictable storage requirements
- choice of tolerance is ad hoc
- Partial pivoting (for nonsymmetric A)
- Modified ILU (MIC) Add dropped fill to
diagonal of U or R - A and RTR have same row sums
- good in some PDE contexts
14Incomplete Cholesky and ILU Issues
- Choice of parameters
- good smooth transition from iterative to direct
methods - bad very ad hoc, problem-dependent
- tradeoff time per iteration (more fill gt more
time) vs of iterations (more
fill gt fewer iters) - Effectiveness
- condition number usually improves (only) by
constant factor (except MIC for some problems
from PDEs) - still, often good when tuned for a particular
class of problems - Parallelism
- Triangular solves are not very parallel
- Reordering for parallel triangular solve by graph
coloring
15(Matlab demo)
16Sparse approximate inverses
- Compute B-1 ? A explicitly
- Minimize B-1A I F (in parallel, by
columns) - Variants factored form of B-1, more fill, . .
- Good very parallel
- Bad effectiveness varies widely
17Support graph preconditioners example
Vaidya
G(A)
G(B)
- A is symmetric positive definite with negative
off-diagonal nzs - B is a maximum-weight spanning tree for A (with
diagonal modified to preserve row sums) - factor B in O(n) space and O(n) time
- applying the preconditioner costs O(n) time per
iteration
18Support graph preconditioners example
G(A)
G(B)
- support each edge of A by a path in B
- dilation(A edge) length of supporting path in B
- congestion(B edge) of supported A edges
- p max congestion, q max dilation
- condition number ?(B-1A) bounded by pq (at most
O(n2))
19Support graph preconditioners example
G(A)
G(B)
- can improve congestion and dilation by adding a
few strategically chosen edges to B - cost of factorsolve is O(n1.75), or O(n1.2) if A
is planar - in recent experiments Chen Toledo, often
better than drop-tolerance MIC for 2D problems,
but not for 3D.
20Domain decomposition (introduction)
A
- Partition the problem (e.g. the mesh) into
subdomains - Use solvers for the subdomains B-1 and C-1 to
precondition an iterative solver on the
interface - Interface system is the Schur complement
S G ET B-1E FT C-1F - Parallelizes naturally by subdomains
21(Matlab demo)
- grid and matrix structure for overlapping 2-way
partition of eppstein
22Multigrid (introduction)
- For a PDE on a fine mesh, precondition using a
solution on a coarser mesh - Use idea recursively on hierarchy of meshes
- Solves the model problem (Poissons eqn) in
linear time! - Often useful when hierarchy of meshes can be
built - Hard to parallelize coarse meshes well
- This is just the intuition lots of theory and
technology
23Other Krylov subspace methods
- Nonsymmetric linear systems
- GMRES for i 1, 2, 3, . . . find xi ? Ki
(A, b) such that ri (Axi b) ? Ki (A,
b)But, no short recurrence gt save old vectors
gt lots more space - BiCGStab, QMR, etc.Two spaces Ki (A, b) and Ki
(AT, b) w/ mutually orthogonal basesShort
recurrences gt O(n) space, but less robust - Convergence and preconditioning more delicate
than CG - Active area of current research
- Eigenvalues Lanczos (symmetric), Arnoldi
(nonsymmetric)
24The Landscape of Sparse Axb Solvers
25Complexity of direct methods
Time and space to solve any problem on any
well-shaped finite element mesh
26Complexity of linear solvers
Time to solve model problem (Poissons equation)
on regular mesh