Title: Paul Fischer
1 - Paul Fischer
- Mathematics and Computer Science Division
- Argonne National Laboratory
fischer_at_mcs.anl.gov www.mcs.anl.gov/fischer
Temperature distribution in an fcc-packed array
of spheres subject to unit-flux boundary
conditions.
2Outline
- Objective Give an overview of motivation, and
some detail, for the development of high-order
methods in scientific applications - Introduction
- Architecture / Algorithm Coupling
- Accuracy
- Stability
- Solvers
- Applications will be interwoven throughout
3Motivation Spectral Element Simulation Examples
- transitional boundary layers
- heat transfer enhancement
M. Greiner (UNR) - convection dynamics in deep atmospheres A. Deane
(U.Md) - pattern formation in nonequilibrium flows
- H.
Greenside (Duke), M. Paul M. Cross (Caltech) - vascular flows F. Loth (UIC), S. Lee
(MIT), H. Bassiouny (UofC) - MHD F. Cattaneo, A.
Obabko, R. Rosner (UofC)
4Heat Transfer in an fcc Lattice of Spheres at ReD
30,000
- E1536 elements of order N14
-
- 10 hours on 64 nodes for 1 flow-through time
- sphere-centric view
- void-centric view
-
- Max temperature fluctuation
- 14 x mean temperature change.
5 Transition for Pulsatile Flow in a Model
Stenosis
Calculation by Sonu Sam Varghese, Steve Frankel,
Purdue.
E1600, N10
6Spatial Discretization Spectral Element Method
(Patera 84, Maday Patera 89)
- Variational method, similar to FEM, using GL
quadrature. - Domain partitioned into E high-order
quadrilateral (or hexahedral) elements
(decomposition may be nonconforming - localized
refinement) - Trial and test functions represented as N
th-order tensor-product polynomials within each
element. (N 4 -- 15, typ.) - EN 3 gridpoints in 3D, EN 2 gridpoints in 2D.
- Converges exponentially fast with N for smooth
solutions.
3D nonconforming mesh for arterio-venous graft
simulations E 6168 elements, N 7
7 PN - PN-2 Spectral Element Method for
Navier-Stokes (MP 89)
Velocity, u in PN , continuous Pressure, p
in PN-2 , discontinuous
Gauss-Lobatto Legendre points (velocity)
Gauss Legendre points (pressure)
8Local Spectral Element Basis in 2D
Nodal bases on the Gauss-Lobatto-Legendre points
N4
N10
9- Architecture / Algorithm Considerations
10Ubiquitous Loosely-Coupled Network of
Processor/Memory Units
- Paralellism easy SP h P S1 h O(1)
even for P? 10,000. - Concurrency
- Reasonable communication requirements
- Difficulty is getting good single processor
performance S1 - Moores Law CPU speed doubles every 18 months
11Moores Law Measured Performance 19862000
12Ubiquitous Loosely-Coupled Network of
Processor/Memory Units
- Paralellism easy SP h P S1 h O(1)
even for P? 10,000. - Concurrency
- Reasonable communication requirements
- Difficulty is getting good single processor
performance - Moores Law CPU speed doubles every 18 months
- Memory speeds double every 120 months
- ? Growing processor-memory gap.
- On current machines, 10 of peak is considered
reasonable.
13Ubiquitous Loosely-Coupled Network of
Processor/Memory Units
- Paralellism easy SP h P S1 h O(1)
even for P? 10,000. - Concurrency
- Reasonable communication requirements
- Difficulty is getting good single processor
performance - Moores Law CPU speed doubles every 18 months
- Memory speeds double every 120 months
- ? Growing processor-memory gap.
- On current machines, 10 of peak is considered
reasonable. - Need algorithms that re-use data, i.e. FLOPS gtgt
memory accesses
14Work vs. Memory Examples
-
ops mem ratio - Vector-Vector a a c b 2n
2n 1 - Matrix-Vector x Ay
2n2 n2 2 - Sparse Mat-Vec x Asy 2mn
mn 2 - Matrix-Matrix W AU 2n3
2n2 n - Amortize cost of accessing A over n vectors
15 Matrix-Matrix Product Performance on
Cached-Based Architectures
- Time for AB vs AB for N10 is
- 2.0 x on DEC Alpha,
- 1.5 x on IBM SP
16-
- Examples of data re-use via matrix-matrix
products in PDE operator evaluation (kernel of
iterative solvers)
17 Example 1 Spectral Element Derivative
Evaluation
- Local tensor-product form (2D),
- allows derivative to be evaluated as
matrix-matrix product - In Rd (N1)d points/element, O(Nd1)
ops/element - Similar complexities hold for all SE operations
18Example 2 Derivative Evaluation for p-type or
DG Triangles
(Teng, Hesthaven, Warburton,)
- Standard uxe Dxe ue , e 1,, E
- O(E p4 ) memory references for O(E p2) gridpoints
- Modified uxe rxe Dre ue sxe Dse ue
- For triangles, metrics rxe and sxe are constants
- Dre ue , Dse ue are matrix-matrix products on
canonical triangle - O(E p2 ) memory references for O(E p2) gridpoints
- In 3D, savings is more significant
s W r
y W e x
19Example 3 Block-Jacobi Preconditioning for
Triangles / Tets
20Summary Architectures
- It is possible to construct high-order methods
having memory-access demands identical to
standard low-order schemes. - The extra work of the high-order methods can be
(potentially) covered by the processor-memory
performance gap.
21- Motivation for High-Order Accuracy
22High-Order Methods for the Incompressible
Navier-Stokes Equations
- Reynolds number Re 1000 - 2000
- small amount of diffusion
- highly nonlinear (small scale structures result)
- High-order methods yield savings for simulations
featuring - long time-integration
- minimal physical dissipation (Regtgt1, no
turbulence modeling) - Examples
- DNS (direct numerical simulation)
- LES (large eddy simulation)
- Chaotic systems
23Exponential Convergence
Exact Navier-Stokes solution due
to Kovazsnay(1948)
24Excellent transport properties, even for
non-smooth solutions
Convection of non-smooth data on a 32x32
grid (K1 x K1 spectral elements of order N).
(cf. Gottlieb Orszag 77)
25Long-time integration 1-D periodic convection
For 0.5 error, only 64 pts. are required for
N8, vs. 256 pts. for N2. Savings is cubed in
3D.
26Relative Phase Error for h vs. p Refinement ut
ux 0
- h-refinement
p-refinement -
- Fraction of accurately resolved modes is
increased only through increasing order ( N or p
)
27- Example Applications of High-Order Methods
28Scattered Field Solution of Maxwells Equations
- Calculation by Jan Hesthaven Tim Warburton
USEMe - Computed in the time domain
- Discontinuous Galerkin in space 250,000 tets,
p4 - jan.hesthaven_at_brown.edu
- timwar_at_math.umn.edu
29Anisotropic Diffusion in Plasma Fusion Reactors
- For the next generation fusion codes, the DOE
Center for Extended Magnetohydrodynamics Modeling
(CEMM) at PPPL has posed three challenging model
problems to test new discretizations. - As part of SciDAC funded research, we have
tackled the first of these.
30Anisotropic Diffusion in a Tokamak (PPPL
Challenge Problem)
31Anisotropic Diffusion in Plasma Fusion Reactors
- Because of the high-degree of anisotropy, this
problem is more like a (difficult) hyperbolic
problem than straightforward diffusion.
Error vs. k and N
32- Stability of High-Order Methods
33Stability Considerations
- Stability for nodal bases essentially comes from
use of Gauss-type nodal distributions avoids
Runge phenomenon - This has been generalized to tets / triangles by
- Hesthaven points based on steady-state charge
distributions - Wingate Taylor Fekete points that minimize
det Vandermonde matrix - Other stability issues also arise, because there
is little numerical or physical dissipation in
the numerical systems. - Problem usually stems from spurious eigenvalues
and can be controlled by - Least-Squares
- Upwinding
- Bubble functions
- Hyperviscosity or spectrally vanishing viscosity
- Filtering
34Filter-Based Stabilization
(Gottlieb et al., Don et al., Vandeven, Boyd, ...)
- At end of each time step,
- Interpolate u onto GLL points for PN-1
- Interpolate back to GLL points for PN
- Results are smoother with linear combination
(F. Mullen 01)
F1 (u) IN-1 u
Fa (u) (1-a) u a IN-1 u (a 0.05
- 0.2)
- Preserves interelement continuity and spectral
accuracy - Equivalent to multiplying by (1-a) the N th
coefficient in the expansion - u(x) S uk fk (x)
(Boyd 98) - fk (x) Lk(x) - Lk-2(x)
35Numerical Stability Test Shear Layer Roll-Up
(Bell et al.
JCP 89, Brown Minion, JCP 95, F. Mullen, CRAS
2001)
2562
2562
1282
2562
2562
1282
36 Error in Predicted Growth Rate for
Orr-Sommerfeld Problem at Re7500
Spatial and Temporal Convergence (FM,
2001)
Base velocity profile and perturbation streamlines
37Filtering permits Red99 gt 700 for transitional
boundary layer calculations
Re 1000
Re 3500
38- Navier-Stokes Time Advancement
- Iterative Linear Solvers
39Navier-Stokes Time Advancement
- Nonlinear term explicit
- k th-order backward difference formula /
extrapolation - characteristics (Pironneau 82, MPR 90)
- Stokes problem pressure/viscous decoupling
- 3 Helmholtz solves for velocity
- (consistent) Poisson equation for pressure
(computationally dominant)
40 PN - PN-2 Spectral Element System Matrices for
Stokes (MP 89)
41Consistent Splitting for Unsteady Stokes(MPR 90,
Blair-Perot 93, Couzy 95)
- E - consistent Poisson operator for pressure, SPD
- Stiffest substep in N-S time advancement
- Most compute-intensive phase
42Pressure Solution Strategy Epn gn
- Projection compute best approximation from
previous time steps - Set p S gi p n-i, projection onto L previous
time steps - Choose gi such that Dp E p n - p E
is minimized - Cost 2 matvecs in E per timestep, L2 vectors of
storage - Multilevel preconditioned CG or GMRES to solve
- E Dp gn - E p
-
43Initial guess for Axn bn via projection (
AE, SPD)
(best fit solution)
44Initial guess for Epn gn via projection onto
previous solutions
(F 93, 98)
- pn - pE O(Dtl) O( etol )
(Chan Wan, 95) - Results with/without projection (1.6 million
pressure nodes) - Similar results for pulsatile carotid artery
simulations
108-fold reduction in initial residual
Pressure Iteration Count
Initial Pressure Residual
45Pressure Solution Strategy Epn gn
- Projection compute best approximation from
previous time steps - Multilevel preconditioned CG or GMRES to solve
- E Dp gn - E p
-
46Two-Level Overlapping Additive Schwarz
Preconditioner for Pressure Solve
(Dryja Widlund 87, Pahl 93, PF 97, FMT 00)
Local Overlapping Solves Poisson problems with
homogeneous Dirichlet boundary conditions.
Coarse Grid Solve Poisson problem using linear
finite elements on entire spectral element mesh
(GLOBAL). (Fast solver Tufo F. 01)
47Overlapping Schwarz SEM Fast Local Solvers
- Exploit local tensor-product structure --
- A is separable in canonical domain
- Fast diagonalization method (FDM) (Lynch et al
64) - Local solve cost is 4d K N(d1)
48Convergence Rates
- Schwarz theory says that convergence rate depends
only on d/H (Helement size), and not on number
of elements or local order N. - In practice, however, d N-2
- Recent work (Lottes F. 04a, 04b) indicates that
N-dependence is diminished if AC-1 is replaced
with a multigrid V-cycle (at nominal cost) using
weighted additive Schwarz smoothing. - Convergence rate is strongly dependent on element
geometry particularly aspect-ratio.
49Two-Level Additive Schwarz Performance
- Iteration count deteriorates in presence of
high-aspect ratio elements - Nonconforming discretizations eliminate
unnecessary elements in the far field and result
in better conditioned systems.
50- Navier-Stokes Case Study
- Blood Flow in Arteriovenous Grafts
51A Study of Transition in Arterio-Venous Grafts
- Provides a port for dialysis patients to obtain
high flow rates and avoid repeated vessel injury - PTFE plastic tubing surgically attached from
artery to vein - short-circuit of high to low pressure results in
very high flow ratestransitional flow - Failure often results after 3 months from
occlusion (intimal hyperplasia) forming
downstream of attachment to vein, where flow is
turbulent
52 In Vitro Validation Transitional Flow,
Arteriovenous Graft
Mesh constructed by Seung Lee, MIT
53Comparison of spanwise vorticity for laminar and
turbulent cases
Re 1696
Re 2912
54Spectral Element Simulation of Transitional Flow
in an Arteriovenous-Graft Model, Re2912
55Experimental/Numerical Comparison at Re2912
LDA measurements CFD results
(Loth, et al. JBME 2002)
56Canine In-Vivo Model of AV Graft
NIH RO1, PI
Bassiouny
57 Canine In-Vitro Model
- CT Scan image (GE Medical Systems)
- Slice thickness 1.25mm (contiguous)
- Pixel size .19x.19mm
A
B(1-4)
C
A
B-1
B-2
B-3
C
C
B-4
CS view
58Canine Graft Pulsatile Flow Simulation
Sang-Wook Lee, UIC
- In vivo turbulence is observed
- CFD does not indicate turbulence
- Frequency peak 30 Hz, not 200-300
- Re too low 1200
- Re in arterial side is 2400
- points to arterial flow as possible controllable
source of turbulence
Venous Anastomosis
59Where is Turbulence Coming From ?
- Suspected that turbulence is generated at
arterial anastomosis and sustained through the
graft and into the PVS
60Simulations of Full Graft Geometry
- Arterial Anastomosis ? PTFE graft ? Venous
Anastomosis
Preliminary CFD results for 3-6 mm transition at
arterial anastomosis 300-400 Hz fluctuation
Spectral Element Mesh for PTFE graft
61Simulations of Full Graft Geometry
- Arterial Anastomosis ? PTFE graft ? Venous
Anastomosis
E4800 Spectral Element Mesh for PTFE graft
N10 ( 4.7 M points ) N14 ( 13.0 M
points )
62Where is Turbulence Coming From ?
- Turbulence generated at arterial anastomosis is
not sustained through the graft and into the
PVS. - Recent detective work by students David Smith and
Sang-Wook Lee in the UIC Biofluids Lab points to
flow-reversal in the DVS as a probable source of
turbulence. - This conjecture is supported by data from
concurrent numerical simulations (Lee) and
in-vitro LDA measurements (Smith).
63Where is Turbulence Coming From ?
- Three simulations with steady inlet conditionsat
ReG 1200 - 100, 85, and 70 flow through PVS
- 70 case shows strong turbulence, despite the
fact that ReDVV 950
100
85
70
64Coherent Structures for 70/30 Split _at_ ReG 1200
65Conclusions
- Highlighted several considerations in the
development of high-order methods for transport
applications - Architectures processor/memory gap
- Accuracy minimal dispersion
- Stability filtering
- Solvers multilevel Schwarz methods
66Future Directions
- Expect to see further development of high-order
methods as new physical problems are addressed. - Recent areas include
- Vascular flows
- Electromagnetics, Hesthaven Warburton (Brown,
UNM) - MHD codes LANL, PPPL, ANL / U of C
- Compressible aero codes D. Mavriplis UWY, D.
Darmofals group MIT - Atmospheric models NCAR
- Geophysics, etc.,
67(No Transcript)
68Conclusions
- Hex-based SEM can be effectively used for
transitional flow problems in moderately complex
domains - Avoid severe mesh distortion impact on accuracy
and conditioning - Gordon-Hall generally sufficient to blend local
element distortion - Sub-parametric mappings and / or over-integration
can be important for treating first-order
derivative terms (i.e., - div u , u.grad u )
- PDE-based approach to mesh generation can be
employed to generate high-quality meshes (dates
back to aero-industry)
69 70(No Transcript)
71(No Transcript)
72(No Transcript)
73Vascular Flow Simulations
F. Loth (UIC), S. Lee (MIT), H. Bassiouny
(UofC)
- Pulsatile carotid artery simulations
- 3 million grid points (K 3000 elements of
order N 10) - 20 hours / 256 processors for single cardiac
cycle (temporal scales span 3 decades) - Fast turn-around time is required for clinical
relevance and for large patient-population
studies ? fast solvers are important
Patient-Specific Simulation Scenario
74Challenges in Vascular Flow Simulation
- Automated mesh creation based on medical image
data is needed. - Boundary conditions must be deduced from
ultrasound data. - The correct flow-split must be imposed in
bifurcating geometries. - Strong turbulence can wreak havoc at outflow
boundaries. - Modeling issues
- Non-Newtonian at certain sites?
- Role of wall compliance?
75Performance pressure solve -- Epngn
- Project solution onto previous solutions ( 6
orders of magnitude residual reduction for
carotid simulations) - p Sk bk pk
- Solve remaining perturbed problem with conjugate
gradients, preconditioned by the two-level
additive Schwarz method of Dryja and Widlund (87).