Cache Aware Multigrid on AMR Hierarchies - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Cache Aware Multigrid on AMR Hierarchies

Description:

Cache Aware Multigrid on AMR Hierarchies – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 75
Provided by: nam571
Category:

less

Transcript and Presenter's Notes

Title: Cache Aware Multigrid on AMR Hierarchies


1
Cache Aware Multigrid on AMR Hierarchies
Danny Thorne Computer Science Department Universit
y of Kentucky thorne_at_ccs.uky.edu http//www.ccs.uk
y.edu/thorne
Supported in part by Sandia National Laboratories
and the National Science Foundation.
2
Definition of the Problem
Solve elliptic boundary value problems
(BVPs). using adaptive mesh refinement
(AMR) and multigrid (MG) with cache aware (CA)
optimizations.
3
Motivation for AMR
  • Problems with variations in scale, e.g.
    combustion.

CRF Example
4
Motivation for Cache Aware
http//www.surriel.com/lectures/hierarchy.gif
5
Outline
  • Background.
  • Tools for AMRMG.
  • AMRMG.
  • Cache aware smoothers.
  • Cache aware AMRMG.
  • Numerical Results.
  • Future work.

6
Discretization of PDEs 101
From PDE to Linear System (in 1D).
7
Iterative Methods
For Solving Linear Systems, .
Alternately solve for x1 and x2 as though the
other were correct, and watch the result converge
to u
8
Iterative Methods
For Solving Linear Systems, .
For the following special cases,
  • Jacobi
  • Gauss-Seidel
  • SOR

9
Smoothers
  • Iterative methods are very slow to converge for
    life sized problems.
  • They damp (smooth) the high frequency components
    of the error in the approximation quickly,
    though.
  • Multigrid (next slide) exploits this smoothing
    effect to achieve fast convergence.
  • A smoother is typically a couple (e.g. 2 to 4)
    sweeps of an iterative method like Gauss-Seidel.

10
Cell Centered Multigrid
Illustration in 1D with four grid levels
Smooth A4u4f4.
Smooth A4u4f4.
Smooth A3u3f3.
Smooth A3u3f3.
Smooth A2u2f2.
Smooth A2u2f2.
Solve A1u1f1 directly.
11
Definition of a Grid Hierarchy
12
Transfer Operators
13
Nesting
We require strict nesting of the grids
This requirement can be written in terms of the
domains as
and
14
Internal Patch Boundaries
a.k.a. Coarse/Fine Interfaces
C1 continuity guarantees consistency which is
important because consistency stability
convergence.
Note We use cell-centered grids so that coarse
and fine grids wont share grid points at the
interfaces.
15
Stencils for Constant Coefficients
16
Stencils for Variable Coefficients
17
How to Interpolate Ghost Points
  • First, interpolate values at the blue xs using
    coarse grid values.
  • Then interpolate the red points using fine grid
    values along with the values at the blue xs.
  • The red points are the ghost points that will be
    used by the discrete operator.

18
Flux Matching
1
2
The same approximation for the derivative across
the coarse/fine interface is used for the grid
points on both sides of the interface, so C1
continuity is preserved across the interface.
Notice that these fluxes match the fluxes used in
the stencils for the boundary points on the fine
grid side. Hence, flux matching.
19
Flux Matching versus Simple Taylor Series Flux
Approximation
20
Flux Matching Formula by Taylors Series
21
3D Ghost Point Interpolation
Hard to illustrate in 3D
Interpolate the green points using coarse grid
values and then interpolate the blue points using
the green points.
Interpolate the red points using fine grid values
and the blue points. The red points are the
ghost points.
A piece of a coarse grid thats adjacent to a
fine grid
22
Flux Matching in 3D
Hard to illustrate in 3D
At the interface, average the fluxes across the
four fine grid cell walls using the interpolated
values denoted by the red points.
23
3D GP Interp, Step 1
24
3D GP Interp, Step 2
25
3D Flux Matching
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Flux Matching in 3D
Image created with Povray (http//www.povray.org)
37
AMRMG
An Illustration in 1D
38
Preparation For Illustration
  • Assume the hierarchy is already constructed by a
    calling AMR code.
  • The following sequence of slides illustrates our
    AMR MG algorithm on the smallest possible example
    (in 1D) that captures all the aspects of the
    algorithm
  • Three levels,
  • One refinement patch per level,
  • Four grid points per level.
  • Problem
  • Residual
  • Correction

Fasten your seatbelts, grasp the safety bar,
place head firmly against head rest
39
Illustration in 1D
The dashed lines and dots represent the
completion of the composite grid on each level.
The algorithm is patch-based. The solid lines
represent the domains of the patches, and the
dark dots represent the grid points in the
patches.
40
Illustration in 1D
Start with an initial guess for the solution on
the composite grid.
Represent the grid points with a 2-by-2 array of
spaces to keep track of the status of the
different variables during the algorithm. See
the key to the left.
Solution
Correction
Residual
Temporary
41
Illustration in 1D
Interpolate ghost point(s) so that regular
stencils can be applied at patch boundary points.
Checkerboard pattern denotes values that were
updated in previous steps.
Solution
Correction
Residual
Temp
Temporary
42
Illustration in 1D
Compute the residual on the finest grid level
(using the ghost point(s) to complete the
stencil(s) on the patch boundary points).
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
43
Illustration in 1D
Smooth to get a correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
44
Illustration in 1D
Store temporarily corrected solution, needed for
acquisition of ghost points for the residual
computation on the next level. (The permanently
corrected solution for this level of this V cycle
will be computed on the other side of the V
cycle.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
45
Illustration in 1D
Project the residual onto the part of the coarser
grid that is covered by the finer grid.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
46
Illustration in 1D
Interpolate ghost point(s) for the temporarily
corrected solution. This is needed for residual
computation on the uncovered part of the next
level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
47
Illustration in 1D
Interpolate ghost point(s) so that regular
stencils can be applied at patch boundary points.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
48
Illustration in 1D
Compute the residual on the uncovered nodes. Use
fine grid data (including ghost points) to do
flux matching at the interface nodes. Use the
newly interpolated ghost points to apply a
regular stencil at the boundary points.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
49
Illustration in 1D
Smooth to get a correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
50
Illustration in 1D
Store temporarily corrected solution, needed for
acquisition of ghost points for the residual
computation on the next level. (The permanently
corrected solution for this level of this V cycle
will be computed on the other side of the V
cycle.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
51
Illustration in 1D
Project the residual onto the part of the coarser
grid that is covered by the finer grid.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
52
Illustration in 1D
Compute ghost point(s) for the temporarily
corrected solution.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
53
Illustration in 1D
Compute the residual on the uncovered nodes. Use
fine grid data (including ghost points) to do
flux matching at the interface nodes.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
54
Illustration in 1D
Solve for the correction on the coarsest level.
(We use geometric multigrid here.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
55
Illustration in 1D
Correct the solution on the uncovered part of the
coarsest level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
56
Illustration in 1D
Update the correction on the next finer level
with interpolated values from the coarse level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
57
Illustration in 1D
Update the residual.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
58
Illustration in 1D
Smooth to get a correction for the correction.
We avoid smoothing the current correction because
that would require interpolating ghost points at
the patch boundaries.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
59
Illustration in 1D
Update the original correction with the newly
computed correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
60
Illustration in 1D
Now update the solution on the uncovered nodes
with the latest correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
61
Illustration in 1D
Update the correction on the next finer level
with interpolated values from the coarser level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
62
Illustration in 1D
Update the residual.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
63
Illustration in 1D
Smooth to get a correction for the correction.
We avoid smoothing the current correction because
that would require interpolating ghost points at
the patch boundaries.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
64
Illustration in 1D
Update the original correction with the newly
computed correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
65
Illustration in 1D
Now update the solution on this level with the
latest correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
66
Illustration in 1D
Final solution on the composite grid after one V
cycle.
Can do more V cycles by starting the procedure
again from the beginning with the new solution as
the initial guess.
67
AMR MG Algorithm
Solve on the coarsest grid.
Compute residual on finest grid.
Smooth a correction.
Store temporarily corrected solution.
Project part of the residual.
Compute the other part of the residual.
Recursively apply to coarser levels.
Interpolate and correct.
Update the residual.
Smooth an intermediate correction.
Update the final correction.
Apply the final correction to the solution on
this level.
68
Cache Aware Smoothers
Preparation for Illustration
  • Want data to pass through cache only once per
    application of the smoother.
  • Application of the smoother means multiple
    (e.g. two to four) Gauss-Seidel updates per
    point.
  • Whereas normally one sweeps the entire grid over
    and over to apply successive updates, we want to
    stagger the updates in such a way that we need to
    sweep the grid only once.
  • Want the answer to be exactly the same as the
    normal application of the smoother. (Bit-wise
    equivalence.)
  • The following illustration is of a 2D grid with
    12-by-4 grid points and for three updates per
    point. Updates occur from left to right and
    bottom to top.

69
Cache Aware Smoothers
Note that updating the third row again would
require updated data in the fourth row, so we
have to wait since the fourth row hasnt been
updated yet. Similarly for the second row which
depends on a second update of row three.
Before updating new points, apply more updates to
the current points in cache where possible.
Update points in the first partition.
70
Cache Aware Smoothers
71
Cache Aware Smoothers
72
Cache Aware Smoothers
73
Cache Aware Smoothers
74
2D Blocking
75
Cache Seams
76
Tall Cache Block
To minimize the amount of seam points.
77
Seams with Tall Cache Block
78
3D Cache Block
79
Only Post-smoothing
  • Apparently, the cache aware smoothing technique
    will give better performance with greater numbers
    of updates (within a limit imposed by the cache
    size, anyway).
  • When doing both pre- and post- smoothing, it is
    common to do two pre-smoothing updates and two
    post-smoothing updates.
  • Q What if we instead do four post-smoothing
    updates and no pre-smoothing?
  • A Better cache effects.
  • In addition, doing only post-smoothing
    simplifies our AMR MG algorithm, as illustrated
    on the next slide, and is more conducive to
    making multigrid cache aware.

80
Only Post-Smoothing
  • No pre-smoothing.
  • No need for a temporarily corrected solution.
  • Just project/compute the residual.
  • Only need to compute ghost points once.
  • No residual update.
  • Can avoid intermediate correction by
    interpolating ghost points (good for
    implementation of cache aware multigrid).
  • Number of smoothing updates here should equal
    the number of pre-smoothing updates plus the
    number of post-smoothing updates used in the
    original implementation.

81
Cache Aware Multigrid
82
Numerical Results
  • The following slides contain graphs of numerical
    results.
  • The results are for the 2D case with a variety
    of refinement patterns
  • Laplaces equation with solution function
  • Variable function
  • Dirichlet boundary conditions.

83
Numerical Results
For V(0,4)
gt50, gt100
84
Numerical Results
(Average Speedup per Vcycle)
gt50, gt100
85
Numerical Results
(Speedups for Run Until Convergence)
gt400, gt500
86
Future Work
  • 3D // Have the 3D code and am working on cache
    aware algorithms for it. Much harder to get good
    speedups in the 3D AMR environment than in the 2D
    AMR environment
  • Parallel // Rudimentary parallel mechanisms in
    the code are waiting to be extended and
    generalized. Have incorporated Zoltan for load
    balancing.
  • Clustering Algorithms // To make the code a
    standalone app, need to incorporate clustering
    algorithms for generating the AMR hierarchies
    from scratch instead of relying on hierarchies
    provided by calling routines.

87
Papers
  • Papers
  • C. C. Douglas, J. Hu, J. Ray, D. T. Thorne, and
    R. S. Tuminaro, Fast, Adaptively Refined
    Computational Elements in 3D, in Proceedings of
    the International Conference on Computational
    Science, Amsterdam, 2002, Springer-Verlag,
    Berlin, 2002, 10 pages. (Refereed)
  • C. C. Douglas and D. T. Thorne, A Note on Cache
    Memory Methods for Multigrid in Three Dimensions,
    11 pages, to appear, American Mathematics
    Society, 2002-2003.
  • Cache Aware Multigrid for Variable Coefficient
    Elliptic Problems on Adaptive Mesh Refinement
    Hierarchies, C. C. Douglas, J. Hu, J. Ray, D. T.
    Thorne, and R. Tuminaro, to appear in Journal
    Numerial Linear Algrebra and Applications, 2004.

88
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com