Title: Cache Aware Multigrid on AMR Hierarchies
1Cache Aware Multigrid on AMR Hierarchies
Danny Thorne Computer Science Department Universit
y of Kentucky thorne_at_ccs.uky.edu http//www.ccs.uk
y.edu/thorne
Supported in part by Sandia National Laboratories
and the National Science Foundation.
2Definition of the Problem
Solve elliptic boundary value problems
(BVPs). using adaptive mesh refinement
(AMR) and multigrid (MG).
3Motivation for AMR
- Problems with variations in scale, e.g.
combustion.
CRF Example
4Outline
- Background.
- Tools for AMRMG.
- AMRMG.
- Cache aware smoothers.
- Cache aware AMRMG.
- Numerical Results.
- Future work.
5Discretization of PDEs 101
From PDE to Linear System (in 1D).
6Iterative Methods
For Solving Linear Systems, .
Alternately solve for x1 and x2 as though the
other were correct, and watch the result converge
to u
7Iterative Methods
For Solving Linear Systems, .
For the following special cases,
8Smoothers
- Iterative methods are very slow to converge for
life sized problems. - They damp (smooth) the high frequency components
of the error in the approximation quickly,
though. - Multigrid (next slide) exploits this smoothing
effect to achieve fast convergence. - A smoother is typically a couple (e.g. 2 to 4)
sweeps of an iterative method like Gauss-Seidel.
9Cell Centered Multigrid
Illustration in 1D with four grid levels
Smooth A4u4f4.
Smooth A4u4f4.
Smooth A3u3f3.
Smooth A3u3f3.
Smooth A2u2f2.
Smooth A2u2f2.
Solve A1u1f1 directly.
10Definition of a Grid Hierarchy
11Transfer Operators
12Nesting
We require strict nesting of the grids
This requirement can be written in terms of the
domains as
and
13Internal Patch Boundaries
a.k.a. Coarse/Fine Interfaces
C1 continuity guarantees consistency which is
important because consistency stability
convergence.
Note We use cell-centered grids so that coarse
and fine grids wont share grid points at the
interfaces.
14Stencils for Constant Coefficients
15Stencils for Variable Coefficients
16How to Interpolate Ghost Points
- First, interpolate values at the blue xs using
coarse grid values. - Then interpolate the red points using fine grid
values along with the values at the blue xs. - The red points are the ghost points that will be
used by the discrete operator.
17Flux Matching
1
2
The same approximation for the derivative across
the coarse/fine interface is used for the grid
points on both sides of the interface, so C1
continuity is preserved across the interface.
Notice that these fluxes match the fluxes used in
the stencils for the boundary points on the fine
grid side. Hence, flux matching.
18Flux Matching versus Simple Taylor Series Flux
Approximation
19Flux Matching Formula by Taylors Series
20Flux Matching in 3D
Image created with Povray (http//www.povray.org)
21AMRMG
An Illustration in 1D
22Preparation For Illustration
- Assume the hierarchy is already constructed by a
calling AMR code. - The following sequence of slides illustrates our
AMR MG algorithm on the smallest possible example
(in 1D) that captures all the aspects of the
algorithm - Three levels,
- One refinement patch per level,
- Four grid points per level.
- Problem
- Residual
- Correction
Fasten your seatbelts, grasp the safety bar,
place head firmly against head rest
23Illustration in 1D
The dashed lines and dots represent the
completion of the composite grid on each level.
The algorithm is patch-based. The solid lines
represent the domains of the patches, and the
dark dots represent the grid points in the
patches.
24Illustration in 1D
Start with an initial guess for the solution on
the composite grid.
Represent the grid points with a 2-by-2 array of
spaces to keep track of the status of the
different variables during the algorithm. See
the key to the left.
Solution
Correction
Residual
Temporary
25Illustration in 1D
Interpolate ghost point(s) so that regular
stencils can be applied at patch boundary points.
Checkerboard pattern denotes values that were
updated in previous steps.
Solution
Correction
Residual
Temp
Temporary
26Illustration in 1D
Compute the residual on the finest grid level
(using the ghost point(s) to complete the
stencil(s) on the patch boundary points).
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
27Illustration in 1D
Smooth to get a correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
28Illustration in 1D
Store temporarily corrected solution, needed for
acquisition of ghost points for the residual
computation on the next level. (The permanently
corrected solution for this level of this V cycle
will be computed on the other side of the V
cycle.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
29Illustration in 1D
Project the residual onto the part of the coarser
grid that is covered by the finer grid.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
30Illustration in 1D
Interpolate ghost point(s) for the temporarily
corrected solution. This is needed for residual
computation on the uncovered part of the next
level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
31Illustration in 1D
Interpolate ghost point(s) so that regular
stencils can be applied at patch boundary points.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
32Illustration in 1D
Compute the residual on the uncovered nodes. Use
fine grid data (including ghost points) to do
flux matching at the interface nodes. Use the
newly interpolated ghost points to apply a
regular stencil at the boundary points.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
33Illustration in 1D
Smooth to get a correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
34Illustration in 1D
Store temporarily corrected solution, needed for
acquisition of ghost points for the residual
computation on the next level. (The permanently
corrected solution for this level of this V cycle
will be computed on the other side of the V
cycle.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
35Illustration in 1D
Project the residual onto the part of the coarser
grid that is covered by the finer grid.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
36Illustration in 1D
Compute ghost point(s) for the temporarily
corrected solution.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
37Illustration in 1D
Compute the residual on the uncovered nodes. Use
fine grid data (including ghost points) to do
flux matching at the interface nodes.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
38Illustration in 1D
Solve for the correction on the coarsest level.
(We use geometric multigrid here.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
39Illustration in 1D
Correct the solution on the uncovered part of the
coarsest level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
40Illustration in 1D
Update the correction on the next finer level
with interpolated values from the coarse level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
41Illustration in 1D
Update the residual.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
42Illustration in 1D
Smooth to get a correction for the correction.
We avoid smoothing the current correction because
that would require interpolating ghost points at
the patch boundaries.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
43Illustration in 1D
Update the original correction with the newly
computed correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
44Illustration in 1D
Now update the solution on the uncovered nodes
with the latest correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
45Illustration in 1D
Update the correction on the next finer level
with interpolated values from the coarser level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
46Illustration in 1D
Update the residual.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
47Illustration in 1D
Smooth to get a correction for the correction.
We avoid smoothing the current correction because
that would require interpolating ghost points at
the patch boundaries.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
48Illustration in 1D
Update the original correction with the newly
computed correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
49Illustration in 1D
Now update the solution on this level with the
latest correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
50Illustration in 1D
Final solution on the composite grid after one V
cycle.
Can do more V cycles by starting the procedure
again from the beginning with the new solution as
the initial guess.
51AMR MG Algorithm
Solve on the coarsest grid.
Compute residual on finest grid.
Smooth a correction.
Store temporarily corrected solution.
Project part of the residual.
Compute the other part of the residual.
Recursively apply to coarser levels.
Interpolate and correct.
Update the residual.
Smooth an intermediate correction.
Update the final correction.
Apply the final correction to the solution on
this level.
52Cache Aware Smoothers
Preparation for Illustration
- Want data to pass through cache only once per
application of the smoother. - Application of the smoother means multiple
(e.g. two to four) Gauss-Seidel updates per
point. - Whereas normally one sweeps the entire grid over
and over to apply successive updates, we want to
stagger the updates in such a way that we need to
sweep the grid only once. - Want the answer to be exactly the same as the
normal application of the smoother. (Bit-wise
equivalence.) - The following illustration is of a 2D grid with
12-by-4 grid points and for three updates per
point. Updates occur from left to right and
bottom to top.
53Cache Aware Smoothers
Note that updating the third row again would
require updated data in the fourth row, so we
have to wait since the fourth row hasnt been
updated yet. Similarly for the second row which
depends on a second update of row three.
Before updating new points, apply more updates to
the current points in cache where possible.
Update points in the first partition.
54Cache Aware Smoothers
55Cache Aware Smoothers
56Cache Aware Smoothers
57Cache Aware Smoothers
582D Blocking
59Cache Seams
60Tall Cache Block
To minimize the amount of seam points.
61Seams with Tall Cache Block
623D Cache Block
63Only Post-smoothing
- Apparently, the cache aware smoothing technique
will give better performance with greater numbers
of updates (within a limit imposed by the cache
size, anyway). - When doing both pre- and post- smoothing, it is
common to do two pre-smoothing updates and two
post-smoothing updates. - Q What if we instead do four post-smoothing
updates and no pre-smoothing? - A Better cache effects.
- In addition, doing only post-smoothing
simplifies our AMR MG algorithm, as illustrated
on the next slide, and is more conducive to
making multigrid cache aware.
64Only Post-Smoothing
- No pre-smoothing.
- No need for a temporarily corrected solution.
- Just project/compute the residual.
- Only need to compute ghost points once.
- No residual update.
- Can avoid intermediate correction by
interpolating ghost points (good for
implementation of cache aware multigrid). - Number of smoothing updates here should equal
the number of pre-smoothing updates plus the
number of post-smoothing updates used in the
original implementation.
65Cache Aware Multigrid
66Numerical Results
- The following slides contain graphs of numerical
results. - The results are for the 2D case with a variety
of refinement patterns
- Laplaces equation with solution function
- Variable function
- Dirichlet boundary conditions.
67Numerical Results
For V(0,4)
50, 100
68Numerical Results
(Average Speedup per Vcycle)
50, 100
69Numerical Results
(Speedups for Run Until Convergence)
70Future Work
- Improved cache optimizations (with help from
ROSE?). - 3D
- Parallel
- Clustering Algorithms
- Compare with preconditioned CG.
- Try to apply cache optimizations to
preconditioned CG.
71Thank You!