Paul Fischer

About This Presentation

Title:

Paul Fischer

Description:

Paul Fischer – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 76

Provided by: paulfi8

Category:

more less

Transcript and Presenter's Notes

Title: Paul Fischer

1

Paul Fischer
Mathematics and Computer Science Division
Argonne National Laboratory

fischer_at_mcs.anl.gov www.mcs.anl.gov/fischer
Temperature distribution in an fcc-packed array
of spheres subject to unit-flux boundary
conditions.
2
Outline

Objective Give an overview of motivation, and
some detail, for the development of high-order
methods in scientific applications
Introduction
Architecture / Algorithm Coupling
Accuracy
Stability
Solvers
Applications will be interwoven throughout

3
Motivation Spectral Element Simulation Examples

transitional boundary layers
heat transfer enhancement
M. Greiner (UNR)
convection dynamics in deep atmospheres A. Deane
(U.Md)
pattern formation in nonequilibrium flows
H.
Greenside (Duke), M. Paul M. Cross (Caltech)
vascular flows F. Loth (UIC), S. Lee
(MIT), H. Bassiouny (UofC)
MHD F. Cattaneo, A.
Obabko, R. Rosner (UofC)

4
Heat Transfer in an fcc Lattice of Spheres at ReD
30,000

E1536 elements of order N14
10 hours on 64 nodes for 1 flow-through time
sphere-centric view

void-centric view
Max temperature fluctuation
14 x mean temperature change.

5
Transition for Pulsatile Flow in a Model
Stenosis
Calculation by Sonu Sam Varghese, Steve Frankel,
Purdue.
E1600, N10
6
Spatial Discretization Spectral Element Method

(Patera 84, Maday Patera 89)

Variational method, similar to FEM, using GL
quadrature.
Domain partitioned into E high-order
quadrilateral (or hexahedral) elements
(decomposition may be nonconforming - localized
refinement)
Trial and test functions represented as N
th-order tensor-product polynomials within each
element. (N 4 -- 15, typ.)
EN 3 gridpoints in 3D, EN 2 gridpoints in 2D.
Converges exponentially fast with N for smooth
solutions.

3D nonconforming mesh for arterio-venous graft
simulations E 6168 elements, N 7
7
PN - PN-2 Spectral Element Method for
Navier-Stokes (MP 89)
Velocity, u in PN , continuous Pressure, p
in PN-2 , discontinuous
Gauss-Lobatto Legendre points (velocity)
Gauss Legendre points (pressure)
8
Local Spectral Element Basis in 2D
Nodal bases on the Gauss-Lobatto-Legendre points
N4
N10
9

Architecture / Algorithm Considerations

10
Ubiquitous Loosely-Coupled Network of
Processor/Memory Units

Paralellism easy SP h P S1 h O(1)
even for P? 10,000.
Concurrency
Reasonable communication requirements
Difficulty is getting good single processor
performance S1
Moores Law CPU speed doubles every 18 months

11
Moores Law Measured Performance 19862000
12
Ubiquitous Loosely-Coupled Network of
Processor/Memory Units

Paralellism easy SP h P S1 h O(1)
even for P? 10,000.
Concurrency
Reasonable communication requirements
Difficulty is getting good single processor
performance
Moores Law CPU speed doubles every 18 months
Memory speeds double every 120 months
? Growing processor-memory gap.
On current machines, 10 of peak is considered
reasonable.

13
Ubiquitous Loosely-Coupled Network of
Processor/Memory Units

Paralellism easy SP h P S1 h O(1)
even for P? 10,000.
Concurrency
Reasonable communication requirements
Difficulty is getting good single processor
performance
Moores Law CPU speed doubles every 18 months
Memory speeds double every 120 months
? Growing processor-memory gap.
On current machines, 10 of peak is considered
reasonable.
Need algorithms that re-use data, i.e. FLOPS gtgt
memory accesses

14
Work vs. Memory Examples

ops mem ratio
Vector-Vector a a c b 2n
2n 1
Matrix-Vector x Ay
2n2 n2 2
Sparse Mat-Vec x Asy 2mn
mn 2
Matrix-Matrix W AU 2n3
2n2 n
Amortize cost of accessing A over n vectors

15
Matrix-Matrix Product Performance on
Cached-Based Architectures

Time for AB vs AB for N10 is
2.0 x on DEC Alpha,
1.5 x on IBM SP

Examples of data re-use via matrix-matrix
products in PDE operator evaluation (kernel of
iterative solvers)

17
Example 1 Spectral Element Derivative
Evaluation

Local tensor-product form (2D),
allows derivative to be evaluated as
matrix-matrix product
In Rd (N1)d points/element, O(Nd1)
ops/element
Similar complexities hold for all SE operations

18
Example 2 Derivative Evaluation for p-type or
DG Triangles

(Teng, Hesthaven, Warburton,)

Standard uxe Dxe ue , e 1,, E
O(E p4 ) memory references for O(E p2) gridpoints
Modified uxe rxe Dre ue sxe Dse ue
For triangles, metrics rxe and sxe are constants
Dre ue , Dse ue are matrix-matrix products on
canonical triangle
O(E p2 ) memory references for O(E p2) gridpoints
In 3D, savings is more significant

s W r
y W e x
19
Example 3 Block-Jacobi Preconditioning for
Triangles / Tets
20
Summary Architectures

It is possible to construct high-order methods
having memory-access demands identical to
standard low-order schemes.
The extra work of the high-order methods can be
(potentially) covered by the processor-memory
performance gap.

Motivation for High-Order Accuracy

22
High-Order Methods for the Incompressible
Navier-Stokes Equations

Reynolds number Re 1000 - 2000
small amount of diffusion
highly nonlinear (small scale structures result)
High-order methods yield savings for simulations
featuring
long time-integration
minimal physical dissipation (Regtgt1, no
turbulence modeling)
Examples
DNS (direct numerical simulation)
LES (large eddy simulation)
Chaotic systems

23
Exponential Convergence
Exact Navier-Stokes solution due
to Kovazsnay(1948)
24
Excellent transport properties, even for
non-smooth solutions
Convection of non-smooth data on a 32x32
grid (K1 x K1 spectral elements of order N).
(cf. Gottlieb Orszag 77)
25
Long-time integration 1-D periodic convection
For 0.5 error, only 64 pts. are required for
N8, vs. 256 pts. for N2. Savings is cubed in
3D.
26
Relative Phase Error for h vs. p Refinement ut
ux 0

h-refinement
p-refinement
Fraction of accurately resolved modes is
increased only through increasing order ( N or p
)

Example Applications of High-Order Methods

28
Scattered Field Solution of Maxwells Equations

Calculation by Jan Hesthaven Tim Warburton
USEMe
Computed in the time domain
Discontinuous Galerkin in space 250,000 tets,
p4
jan.hesthaven_at_brown.edu
timwar_at_math.umn.edu

29
Anisotropic Diffusion in Plasma Fusion Reactors

For the next generation fusion codes, the DOE
Center for Extended Magnetohydrodynamics Modeling
(CEMM) at PPPL has posed three challenging model
problems to test new discretizations.
As part of SciDAC funded research, we have
tackled the first of these.

30
Anisotropic Diffusion in a Tokamak (PPPL
Challenge Problem)

k 109 k 1

31
Anisotropic Diffusion in Plasma Fusion Reactors

Because of the high-degree of anisotropy, this
problem is more like a (difficult) hyperbolic
problem than straightforward diffusion.

Error vs. k and N
32

Stability of High-Order Methods

33
Stability Considerations

Stability for nodal bases essentially comes from
use of Gauss-type nodal distributions avoids
Runge phenomenon
This has been generalized to tets / triangles by
Hesthaven points based on steady-state charge
distributions
Wingate Taylor Fekete points that minimize
det Vandermonde matrix
Other stability issues also arise, because there
is little numerical or physical dissipation in
the numerical systems.
Problem usually stems from spurious eigenvalues
and can be controlled by
Least-Squares
Upwinding
Bubble functions
Hyperviscosity or spectrally vanishing viscosity
Filtering

34
Filter-Based Stabilization
(Gottlieb et al., Don et al., Vandeven, Boyd, ...)

At end of each time step,
Interpolate u onto GLL points for PN-1
Interpolate back to GLL points for PN
Results are smoother with linear combination
(F. Mullen 01)

F1 (u) IN-1 u
Fa (u) (1-a) u a IN-1 u (a 0.05
- 0.2)

Preserves interelement continuity and spectral
accuracy
Equivalent to multiplying by (1-a) the N th
coefficient in the expansion
u(x) S uk fk (x)

(Boyd 98)
fk (x) Lk(x) - Lk-2(x)

35
Numerical Stability Test Shear Layer Roll-Up
(Bell et al.
JCP 89, Brown Minion, JCP 95, F. Mullen, CRAS
2001)
2562
2562
1282
2562
2562
1282
36
Error in Predicted Growth Rate for
Orr-Sommerfeld Problem at Re7500
Spatial and Temporal Convergence (FM,
2001)
Base velocity profile and perturbation streamlines
37
Filtering permits Red99 gt 700 for transitional
boundary layer calculations
Re 1000
Re 3500
38

Navier-Stokes Time Advancement
Iterative Linear Solvers

39
Navier-Stokes Time Advancement

Nonlinear term explicit
k th-order backward difference formula /
extrapolation
characteristics (Pironneau 82, MPR 90)
Stokes problem pressure/viscous decoupling
3 Helmholtz solves for velocity
(consistent) Poisson equation for pressure
(computationally dominant)

40
PN - PN-2 Spectral Element System Matrices for
Stokes (MP 89)
41
Consistent Splitting for Unsteady Stokes(MPR 90,
Blair-Perot 93, Couzy 95)

E - consistent Poisson operator for pressure, SPD
Stiffest substep in N-S time advancement
Most compute-intensive phase

42
Pressure Solution Strategy Epn gn

Projection compute best approximation from
previous time steps
Set p S gi p n-i, projection onto L previous
time steps
Choose gi such that Dp E p n - p E
is minimized
Cost 2 matvecs in E per timestep, L2 vectors of
storage
Multilevel preconditioned CG or GMRES to solve
E Dp gn - E p

43
Initial guess for Axn bn via projection (
AE, SPD)
(best fit solution)
44
Initial guess for Epn gn via projection onto
previous solutions
(F 93, 98)

pn - pE O(Dtl) O( etol )
(Chan Wan, 95)
Results with/without projection (1.6 million
pressure nodes)
Similar results for pulsatile carotid artery
simulations
108-fold reduction in initial residual

Pressure Iteration Count
Initial Pressure Residual
45
Pressure Solution Strategy Epn gn

Projection compute best approximation from
previous time steps
Multilevel preconditioned CG or GMRES to solve
E Dp gn - E p

46
Two-Level Overlapping Additive Schwarz
Preconditioner for Pressure Solve
(Dryja Widlund 87, Pahl 93, PF 97, FMT 00)
Local Overlapping Solves Poisson problems with
homogeneous Dirichlet boundary conditions.
Coarse Grid Solve Poisson problem using linear
finite elements on entire spectral element mesh
(GLOBAL). (Fast solver Tufo F. 01)
47
Overlapping Schwarz SEM Fast Local Solvers

Exploit local tensor-product structure --
A is separable in canonical domain
Fast diagonalization method (FDM) (Lynch et al
64)
Local solve cost is 4d K N(d1)

48
Convergence Rates

Schwarz theory says that convergence rate depends
only on d/H (Helement size), and not on number
of elements or local order N.
In practice, however, d N-2
Recent work (Lottes F. 04a, 04b) indicates that
N-dependence is diminished if AC-1 is replaced
with a multigrid V-cycle (at nominal cost) using
weighted additive Schwarz smoothing.
Convergence rate is strongly dependent on element
geometry particularly aspect-ratio.

49
Two-Level Additive Schwarz Performance

Iteration count deteriorates in presence of
high-aspect ratio elements
Nonconforming discretizations eliminate
unnecessary elements in the far field and result
in better conditioned systems.

Navier-Stokes Case Study
Blood Flow in Arteriovenous Grafts

51
A Study of Transition in Arterio-Venous Grafts

Provides a port for dialysis patients to obtain
high flow rates and avoid repeated vessel injury
PTFE plastic tubing surgically attached from
artery to vein
short-circuit of high to low pressure results in
very high flow ratestransitional flow
Failure often results after 3 months from
occlusion (intimal hyperplasia) forming
downstream of attachment to vein, where flow is
turbulent

52
In Vitro Validation Transitional Flow,
Arteriovenous Graft
Mesh constructed by Seung Lee, MIT
53
Comparison of spanwise vorticity for laminar and
turbulent cases
Re 1696
Re 2912
54
Spectral Element Simulation of Transitional Flow
in an Arteriovenous-Graft Model, Re2912
55
Experimental/Numerical Comparison at Re2912
LDA measurements CFD results
(Loth, et al. JBME 2002)
56
Canine In-Vivo Model of AV Graft
NIH RO1, PI
Bassiouny
57
Canine In-Vitro Model

CT Scan image (GE Medical Systems)
Slice thickness 1.25mm (contiguous)
Pixel size .19x.19mm

A
B(1-4)
C
A
B-1
B-2
B-3
C
C
B-4
CS view
58
Canine Graft Pulsatile Flow Simulation
Sang-Wook Lee, UIC

In vivo turbulence is observed
CFD does not indicate turbulence
Frequency peak 30 Hz, not 200-300
Re too low 1200
Re in arterial side is 2400
points to arterial flow as possible controllable
source of turbulence

Venous Anastomosis
59
Where is Turbulence Coming From ?

Suspected that turbulence is generated at
arterial anastomosis and sustained through the
graft and into the PVS

60
Simulations of Full Graft Geometry

Arterial Anastomosis ? PTFE graft ? Venous
Anastomosis

Preliminary CFD results for 3-6 mm transition at
arterial anastomosis 300-400 Hz fluctuation
Spectral Element Mesh for PTFE graft
61
Simulations of Full Graft Geometry

Arterial Anastomosis ? PTFE graft ? Venous
Anastomosis

E4800 Spectral Element Mesh for PTFE graft
N10 ( 4.7 M points ) N14 ( 13.0 M
points )
62
Where is Turbulence Coming From ?

Turbulence generated at arterial anastomosis is
not sustained through the graft and into the
PVS.
Recent detective work by students David Smith and
Sang-Wook Lee in the UIC Biofluids Lab points to
flow-reversal in the DVS as a probable source of
turbulence.
This conjecture is supported by data from
concurrent numerical simulations (Lee) and
in-vitro LDA measurements (Smith).

63
Where is Turbulence Coming From ?

Three simulations with steady inlet conditionsat
ReG 1200
100, 85, and 70 flow through PVS
70 case shows strong turbulence, despite the
fact that ReDVV 950

100
85
70
64
Coherent Structures for 70/30 Split _at_ ReG 1200
65
Conclusions

Highlighted several considerations in the
development of high-order methods for transport
applications
Architectures processor/memory gap
Accuracy minimal dispersion
Stability filtering
Solvers multilevel Schwarz methods

66
Future Directions

Expect to see further development of high-order
methods as new physical problems are addressed.
Recent areas include
Vascular flows
Electromagnetics, Hesthaven Warburton (Brown,
UNM)
MHD codes LANL, PPPL, ANL / U of C
Compressible aero codes D. Mavriplis UWY, D.
Darmofals group MIT
Atmospheric models NCAR
Geophysics, etc.,

67
(No Transcript)
68
Conclusions

Hex-based SEM can be effectively used for
transitional flow problems in moderately complex
domains
Avoid severe mesh distortion impact on accuracy
and conditioning
Gordon-Hall generally sufficient to blend local
element distortion
Sub-parametric mappings and / or over-integration
can be important for treating first-order
derivative terms (i.e.,
div u , u.grad u )
PDE-based approach to mesh generation can be
employed to generate high-quality meshes (dates
back to aero-industry)

70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
Vascular Flow Simulations
F. Loth (UIC), S. Lee (MIT), H. Bassiouny
(UofC)

Pulsatile carotid artery simulations
3 million grid points (K 3000 elements of
order N 10)
20 hours / 256 processors for single cardiac
cycle (temporal scales span 3 decades)
Fast turn-around time is required for clinical
relevance and for large patient-population
studies ? fast solvers are important

Patient-Specific Simulation Scenario
74
Challenges in Vascular Flow Simulation

Automated mesh creation based on medical image
data is needed.
Boundary conditions must be deduced from
ultrasound data.
The correct flow-split must be imposed in
bifurcating geometries.
Strong turbulence can wreak havoc at outflow
boundaries.
Modeling issues
Non-Newtonian at certain sites?
Role of wall compliance?

75
Performance pressure solve -- Epngn

Project solution onto previous solutions ( 6
orders of magnitude residual reduction for
carotid simulations)
p Sk bk pk
Solve remaining perturbed problem with conjugate
gradients, preconditioned by the two-level
additive Schwarz method of Dryja and Widlund (87).

Write a Comment

User Comments (0)

About PowerShow.com

Paul Fischer - PowerPoint PPT Presentation

Paul Fischer

Paul Fischer – PowerPoint PPT presentation