The Landscape of Axb Solvers - PowerPoint PPT Presentation

About This Presentation
Title:

The Landscape of Axb Solvers

Description:

condition number usually improves (only) by constant factor (except MIC for some ... still, often good when tuned for a particular class of problems. Parallelism ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 27
Provided by: csU45
Category:
Tags: axb | landscape | solvers

less

Transcript and Presenter's Notes

Title: The Landscape of Axb Solvers


1
The Landscape of Axb Solvers
More Robust
Less Storage (if sparse)
D
2
Conjugate gradient iteration
x0 0, r0 b, p0 r0 for k 1, 2,
3, . . . ak (rTk-1rk-1) / (pTk-1Apk-1)
step length xk xk-1 ak pk-1
approx solution rk rk-1 ak
Apk-1 residual ßk
(rTk rk) / (rTk-1rk-1) improvement pk
rk ßk pk-1
search direction
  • One matrix-vector multiplication per iteration
  • Two vector dot products per iteration
  • Four n-vectors of working storage

3
Conjugate gradient Krylov subspaces
  • Eigenvalues Au ?u ?1, ?2 ,
    . . ., ?n
  • Cayley-Hamilton theorem
  • (A ?1I)(A ?2I) (A ?nI) 0
  • Therefore S ciAi 0 for some ci
  • so A-1 S (ci/c0) Ai1
  • Krylov subspace
  • Therefore if Ax b, then x A-1 b and
  • x ? span (b, Ab, A2b, . . ., An-1b) Kn (A, b)

0 ? i ? n
1 ? i ? n
4
Conjugate gradient Orthogonal sequences
  • Krylov subspace Ki (A, b) span (b, Ab, A2b, .
    . ., Ai-1b)
  • Conjugate gradient algorithm for i 1, 2, 3,
    . . . find xi ? Ki (A, b) such that ri
    (Axi b) ? Ki (A, b)
  • Notice ri ? Ki1 (A, b), so ri ? rj for all
    j lt i
  • Similarly, the directions are
    A-orthogonal (xi xi-1 )TA (xj xj-1 ) 0
  • The magic Short recurrences. . . A is symmetric
    gt can get next residual and direction from
    the previous one, without saving them all.

5
Conjugate gradient Convergence
  • In exact arithmetic, CG converges in n steps
    (completely unrealistic!!)
  • Accuracy after k steps of CG is related to
  • consider polynomials of degree k that are equal
    to 1 at 0.
  • how small can such a polynomial be at all the
    eigenvalues of A?
  • Thus, eigenvalues close together are good.
  • Condition number ?(A) A2 A-12
    ?max(A) / ?min(A)
  • Residual is reduced by a constant factor by
    O(?1/2(A)) iterations of CG.

6
(Matlab demo)
  • CG on grid5(15) and bcsstk08
  • n steps of CG on bcsstk08

7
Conjugate gradient Parallel implementation
  • Lay out matrix and vectors by rows
  • Hard part is matrix-vector product
    y Ax
  • Algorithm
  • Each processor j
  • Broadcast x(j)
  • Compute y(j) A(j,)x
  • May send more of x than needed
  • Partition / reorder matrix to reduce communication

8
(Matlab demo)
  • 2-way partition of eppstein mesh
  • 8-way dice of eppstein mesh

9
Preconditioners
  • Suppose you had a matrix B such that
  • condition number ?(B-1A) is small
  • By z is easy to solve
  • Then you could solve (B-1A)x B-1b instead of Ax
    b
  • B A is great for (1), not for (2)
  • B I is great for (2), not for (1)
  • Domain-specific approximations sometimes work
  • B diagonal of A sometimes works
  • Or, bring back the direct methods technology. . .

10
(Matlab demo)
  • bcsstk08 with diagonal precond

11
Incomplete Cholesky factorization (IC, ILU)
  • Compute factors of A by Gaussian elimination,
    but ignore fill
  • Preconditioner B RTR ? A, not formed explicitly
  • Compute B-1z by triangular solves (in time
    nnz(A))
  • Total storage is O(nnz(A)), static data structure
  • Either symmetric (IC) or nonsymmetric (ILU)

12
(Matlab demo)
  • bcsstk08 with ic precond

13
Incomplete Cholesky and ILU Variants
  • Allow one or more levels of fill
  • unpredictable storage requirements
  • Allow fill whose magnitude exceeds a drop
    tolerance
  • may get better approximate factors than levels of
    fill
  • unpredictable storage requirements
  • choice of tolerance is ad hoc
  • Partial pivoting (for nonsymmetric A)
  • Modified ILU (MIC) Add dropped fill to
    diagonal of U or R
  • A and RTR have same row sums
  • good in some PDE contexts

14
Incomplete Cholesky and ILU Issues
  • Choice of parameters
  • good smooth transition from iterative to direct
    methods
  • bad very ad hoc, problem-dependent
  • tradeoff time per iteration (more fill gt more
    time) vs of iterations (more
    fill gt fewer iters)
  • Effectiveness
  • condition number usually improves (only) by
    constant factor (except MIC for some problems
    from PDEs)
  • still, often good when tuned for a particular
    class of problems
  • Parallelism
  • Triangular solves are not very parallel
  • Reordering for parallel triangular solve by graph
    coloring

15
(Matlab demo)
  • 2-coloring of grid5(15)

16
Sparse approximate inverses
  • Compute B-1 ? A explicitly
  • Minimize B-1A I F (in parallel, by
    columns)
  • Variants factored form of B-1, more fill, . .
  • Good very parallel
  • Bad effectiveness varies widely

17
Support graph preconditioners example
Vaidya
G(A)
G(B)
  • A is symmetric positive definite with negative
    off-diagonal nzs
  • B is a maximum-weight spanning tree for A (with
    diagonal modified to preserve row sums)
  • factor B in O(n) space and O(n) time
  • applying the preconditioner costs O(n) time per
    iteration

18
Support graph preconditioners example
G(A)
G(B)
  • support each edge of A by a path in B
  • dilation(A edge) length of supporting path in B
  • congestion(B edge) of supported A edges
  • p max congestion, q max dilation
  • condition number ?(B-1A) bounded by pq (at most
    O(n2))

19
Support graph preconditioners example
G(A)
G(B)
  • can improve congestion and dilation by adding a
    few strategically chosen edges to B
  • cost of factorsolve is O(n1.75), or O(n1.2) if A
    is planar
  • in recent experiments Chen Toledo, often
    better than drop-tolerance MIC for 2D problems,
    but not for 3D.

20
Domain decomposition (introduction)
A
  • Partition the problem (e.g. the mesh) into
    subdomains
  • Use solvers for the subdomains B-1 and C-1 to
    precondition an iterative solver on the
    interface
  • Interface system is the Schur complement
    S G ET B-1E FT C-1F
  • Parallelizes naturally by subdomains

21
(Matlab demo)
  • grid and matrix structure for overlapping 2-way
    partition of eppstein

22
Multigrid (introduction)
  • For a PDE on a fine mesh, precondition using a
    solution on a coarser mesh
  • Use idea recursively on hierarchy of meshes
  • Solves the model problem (Poissons eqn) in
    linear time!
  • Often useful when hierarchy of meshes can be
    built
  • Hard to parallelize coarse meshes well
  • This is just the intuition lots of theory and
    technology

23
Other Krylov subspace methods
  • Nonsymmetric linear systems
  • GMRES for i 1, 2, 3, . . . find xi ? Ki
    (A, b) such that ri (Axi b) ? Ki (A,
    b)But, no short recurrence gt save old vectors
    gt lots more space
  • BiCGStab, QMR, etc.Two spaces Ki (A, b) and Ki
    (AT, b) w/ mutually orthogonal basesShort
    recurrences gt O(n) space, but less robust
  • Convergence and preconditioning more delicate
    than CG
  • Active area of current research
  • Eigenvalues Lanczos (symmetric), Arnoldi
    (nonsymmetric)

24
The Landscape of Sparse Axb Solvers
25
Complexity of direct methods
Time and space to solve any problem on any
well-shaped finite element mesh
26
Complexity of linear solvers
Time to solve model problem (Poissons equation)
on regular mesh
Write a Comment
User Comments (0)
About PowerShow.com