Title: Multiscale simulations of internal waves and other coastal processes
1Multiscale simulations of internal waves and
other coastal processes
- Oliver Fringer
- Collaborators Bob Street and Margot Gerritsen
- Students/Postdocs Y. Chou, S. Jachec, D. Kang,
S. Venayagamoorthy, - B. Wang, Z. Zhang, G. Zhao
- Environmental Fluid Mechanics Laboratory
- Dept. Civil and Environmental Engineering
- Stanford University
- GRC Coastal Ocean Modeling, 17-22 June 2007
2Outline
- Multiscale physics and overview of the SUNTANS
model - The nonhydrostatic pressure
- Physics and computation
- Examples
- Energetics of internal tides in Monterey Bay
- Nonlinear internal waves in the South China Sea
3Multiscale Internal Waves
O(104 m)?O(101 m)
Klymak Moum, 2003
Venayagamoorthy Fringer, 2004
4Multiscale Eddies
North Puget Sound (Microsoft Virtual earth)
O(104 m)?O(100 m)
Remote Sensing and Modeling of Coherent
Structures in River and Estuarine Flows
(COHSTREX COHerent STRuctures in Rivers and
Estuaries eXperiment) COHSTREX
TEAM Remote-Sensing (UW) Andy Jessup Kate
Edwards Bill Plant In-situ measurements
Derek Fong (Stanford) Stephen Monismith
(Stanford) Alex Horner-Devine (UW) Parker
MacCready (UW) Modeling Oliver Fringer
(Stanford) Robert Street (Stanford)
10 km
5Dealing with multiple scales Unstructured grids
N
50,000 cells have Dx 1 m
Dx
Grid spacing histogram
Snohomish River Grid Dxmin0.80 m Dxmax300 m
6Model Overview
- SUNTANS
- Stanford
- Unstructured
- Nonhydrostatic
- Terrain-following
- Adaptive
- Navier-Stokes
- Simulator
- Free-surface wetting/drying
- Finite-volume prisms using z-levels
- Parallel computing MPI ParMetis
- Based on formulation of Casulli et al. (1999)
Side view (z levels)
Top view
7Scaled Nonhydrostatic Equations Multiscale
behavior
Hydrostatic No vertical Coriolis, no
nonhydrostatic pressure Quasihydrostatic
Inclusion of all Coriolis terms (Marshall, et
al., 1997) Nonhydrostatic Quasi-hydrostatic
nonhydrostatic pressure
8Lock exchange
- Large-scale exchange e1/8, small-scale billows
e1/1 - Which calculation costs more?
- Nonhydrostatic simulation takes 5 times longer
per time step - Hydrostatic result has vertical velocity that is
16 times larger - To achieve the same vertical Courant number,
hydrostatic result requires 16 times as many time
steps - Hydrostatic calculation takes 3.2 times longer!
From Fringer et al., Ocean Modelling, 2006
9Nonhydrostatic Codes
- Hydrostatic Step
- Operation count O(NHNz)
- Correction Step 3D Poisson Equation
- Operation count O(aNzNHNz) ? Added cost is
factor of O(aNz)
10Increasing the efficiency of the nonhydrostatic
pressure solver
- Preconditioning (Free lunch 1)
- Parallel computing
- Grid reordering (Free lunch 2)
11Conditioning of the Pressure-Poisson equation
- The 2D x-z Poisson equation is given by
- For large aspect ratio flows,
- To a good approximation,
- The preconditioned equation is then?M-1 is
much better conditioned than .
12Effect of the preconditioner
- Internal seiche test problemFix L (fix Dx)Vary
D (and Dz)eeg - As e increases, preconditioner becomes less
effective - Using the preconditioner, workload decreases as
flow becomes more hydrostatic.
Using CG
Using PCG
Hydrostatic
Nonhydrostatic
From Fringer et al., 2006
13Increasing the efficiency of the nonhydrostatic
pressure solver
- Preconditioning (Free lunch 1)
- Parallel computing
- Grid reordering (Free lunch 2)
14ParMetis Parallel Unstructured Graph Ordering
(Karypis et al., U. Minnesota)
Unordered 1089-node connectivity matrix
Ordered 1089-node connectivity matrix
Ordering increases per-processor performance by
up to 20
15What grid resolution is needed?
16Internal tides in Monterey Bay
- Dx 300 m?2 km
- Dz 10 m 100 m egO(0.1)
- Boundary conditions OTIS M2
- Initial density field
- Average of 50 CTD casts (Petruncio, et al., 2002)
- Bottom drag Cd0.0025
- Eddy-viscosity nV0.002 m2 s-1
- nH1 m2 s-1
- 16 Processors, 4.3 million grid cells, 4.8
seconds/time step (2X real time) - Nonhydrostatic pressure overhead 6X
- Memory savings with z-levels 60 (Max 120 levels)
17The effect of grid resolution on internal tide
generation
It's all in the GRID!!!
Grid size 3000 m (80K cells) ? Umax 2
cm/s Grid size 300 m (4M cells) ? Umax 16.1
cm/s Grid size 60 m (45M cells) ? Umax ?
Field data from Petruncio et al. (1998) ITEX1
Mooring A2
SUNTANS results w/300 m grid Max U 16.1 cm/s
Simulation time 8 M2 Tides
Jachec et al., 2007
U velocity (cm/s)
18Internal tide generation at critical topography
u
North
Transect 1
v
Generation results from interaction of barotropic
flow with critical topography for FriO(0.1)
(Vlasenko et al., 2005)
19Where are internal waves generated?
Energy flux
Across 1000 m of water,
E 1 MW
1 KW/m
Blue vectors SUNTANS, Red/Green Kunze et al.,
2002
Jachec et al., 2006
20Nonhydrostatic effects
note scale difference!
- Nonhydrostatic effects on net radiation/dissipatio
n are small. - Small effects at shelf break near entrance to
canyon - Nonhydrostatic effects to O(eg0.1) are small.
21Nonhydrostatic effects on internal waves in the
South China Sea
image internalwaveatlas.com
South China Sea
22Idealized simulations with SUNTANS
Levitus stratification
28 C
Depth at sill DS 200 m
eg0.02
3 C
Nonhydrostatic overhead 2.6X
Ocean depth D0 3500 m
Barotropic forcing at diurnal frequency
Radiation of first-mode baroclinic wave. Sponge
layers are also employed to damp internal waves
at the boundaries.
23Nonlinear effects (Nonhydrostatic code)
Isotherms 16, 20, 24, 28 degrees C
Frsill0.27
Frsill1.60
24Nonhydrostatic effects (Frsill1.60)
Nonhydrostatic code
Isotherms 16, 20, 24, 28 degrees C
Hydrostatic code
25Nonhydrostatic-Hydrostatic comparison
Dispersion in the hydrostatic model is purely
numerical. The numerical dispersion is much
smaller than the physical, nonhydrostatic
dispersion, which leads to excessive steepening
of the front. The oversteepened front is
diffused due to grid-scale numerical diffusion,
and this causes a reduction in the wave
amplitude. Reduction in the wave
amplitude reduces the amplitude dispersion and
thereby reduces the speed of propagation of the
wavetrain.
15o C isotherm after 3 tidal periods.
L
a
L8000 m a130 m ea/L0.016 eg0.02 gt e
26Using the KdV equation as a model
Initial wave Half-sine wave (first-mode internal
tidal wavelength) with amplitude 70 m.
27Computed vs. modeled (KdV) results
SUNTANS
KdV
28General observations of nonhydrostatic effects
- If a hydrostatic model is used to simulate
inherently nonhydrostatic phenomena (eO(1)) - The vertical velocity is always overpredicted
- For linear flows
- Phase speeds are overpredicted (they are always
shallow) - For nonlinear flows
- Phase speeds are underpredicted
- Horizontal momentum is reduced
- Mixing and dissipation are overpredicted
- If a nonhydrostatic model is used to simulate
inherently hydrostatic phenomena (eltltO(1)) - See your doctor (M.D., not Ph.D.)
29Do I need to compute the nonhydrostatic pressure?
- Computation of the nonhydrostatic pressure incurs
considerable overhead, but there is some free
lunch if you look for it. - Should you compute the nonhydrostatic pressure?
- ?It depends on what you're looking for
- Monterey Bay Internal Tides
- Physical aspect ratio 0.01, Grid aspect ratio
0.1 - Nonhydrostatic overhead 6X
- Nonhydrostatic overhead is probably not worth it
- South China Sea Weakly Nonlinear Internal Waves
- Physical aspect ratio 0.01, Grid aspect ratio
0.02 - Nonhydrostatic overhead 2.7X
- Nonhydrostatic overhead is worth it
30Acknowledgments
- Funding ONR Grants N00014-05-1-0294,
N00014-05-1-0485 - Stanford University Leavell Family Faculty
Scholarship. - Ph.D. student hackers
- S. Jachec (Monterey Bay), K. Venayagamoorthy
(Internal waves on slopes), B. Wang (Snohomish
river), Z. Zhang (South China Sea) - Collaborators Prof. Margot Gerritsen, Prof. Bob
Street - Monterey Bay field data E. Petruncio, L.
Rosenfeld, J. Paduan - JVN cluster at the ARL MSRC
- Stanford Center for Computational Earth and
Environmental Science - (CEES) Cluster
- http//suntans.stanford.edu
island wake test case
31Parallel computing details
Simulation Monterey Bay (180 km by 100
km) Finest grid Resolution30 m Number of grid
cells72 million4243 Computer Army Research Lab
MSRC 2048 Processor LinuxNetworx Intel Xeon
Cluster 8 tide simulation 50 days on 256
Processors (305,000 CPU Hours) Without parallel
computing 20 years of compute time!
32The nonhydrostatic dilemma
Hydrostatic
Nonhydrostatic
Simulation time per time step
Hydrostatic code
Resolution (Fixed Dz)
33Tidal energy budgets
NA West Coast
Barotropic tides
14 GW (Egbert Ray 2001)
83
17
Internal tide generation
Barotropic tide dissipation
11.6 GW
2.4 GW
Internal tide dissipation
5.8 GW
Internal tide radiation
5.8 GW
8.2 GW (59)
(41)
34Lee wave vs. flood wave
15o Isotherms after 2 tidal periods
First flood
First ebb
Fr0.25
Increasing Fr
Stronger ebb pushes peak of depression farther
from the crest of the sill.
Fr3.00
After 2 tidal periods, the flood wave has
propagated roughly 2.5 mode-1 wavelengths,
while the ebb wave has propagated roughly 1
mode-1 wavelength. The crests of the flood
wave with increasing amplitude have propagated
further because of amplitude dispersion, while
the crests of the ebb wave have been delayed due
to the excursion of the peak of the depression
during the ebb.
35Speedup with the preconditionerwhen applied to a
domain with dD/L0.01
No preconditioner (22.8X)
Diagonal (8.5X)
Block-diagonal (1 X)
36Parallel Graph Partitioning
- Given a graph, partition with the following
constraints - Balance the workload
- All processors should perform the same amount of
work. - Each graph node is weighted by the local depth.
- Minimize the number of edge cuts
- Processors must communicate information at
interprocessor boundaries. - Graph partitioning must minimize the number of
edge cuts in order to minimize cost of
communication.
Delaunay edges Voronoi graph
Voronoi graph of Monterey Bay
37ParMetis Parallel Unstructured Graph
Partitioning (Karypis et al., U. Minnesota)
Five-processor partitioning Workloads 20.0
20.2 19.4 20.2 20.2
Original 1089-node graph of Monterey Bay, CA
Use the depths as weights for the workload
38Cache-unfriendly code vs...
- Data is transferred from RAM to cache in blocks
before it is loaded for use in the cpu
Consider the simple triangulation shown
transfer slow
load fast
RAM (large)
cache (small)
cpu
s(1)s(1)as(0)bs(14) cs(17) Machine
code 1) Transfer block s(0),s(1) 2) Load
s(0) 3) Load s(1) (cache hit) 4) Transfer block
s(14),s(15) 5) Load s(14) 6) Transfer block
s(16),s(17) 7) Load s(17) 8) Obtain new s(1) 4
Loads, 3 Transfers (3 cache misses, 1 hit)
0
8
16
1
9
17
2
10
11
3
4
12
5
13
6
14
7
15
39Cache-friendly code
- Data is transferred from RAM to cache in blocks
before it is loaded for use in the cpu
Consider the simple triangulation shown
transfer slow
load fast
RAM (large)
cache (small)
cpu
s(1)s(1)as(0)bs(2) cs(3) Machine
code 1) Transfer block s(0),s(1) 2) Load
s(0) 3) Load s(1) (cache hit) 4) Transfer block
s(2),s(3) 5) Load s(2) 6) Load s(3) (cache
hit) 7) Obtain new s(1) 4 Loads, 2 Transfers (2
cache misses, 2 cache hits)
0
8
16
1
9
17
2
10
3
11
4
12
5
13
6
14
7
15
40When do we need to compute the nonhydrostatic
pressure?
Internal wave lab-scale parameters from
experiments of Michallet and Ivey, 1999
41Increasing the efficiency of the nonhydrostatic
pressure solver
- Preconditioning (Free lunch 1)
- Parallel computing
- Grid reordering (Free lunch 2)
42Parallel Computing Speedup
8-processor partitioning of Monterey Bay
43Ideal vs. Actual Speedup
larger problem size
larger problem size
72 million grid-cell Monterey Bay calculation (30
m res 8 tides50 days w/256 procs)
- Ahmdal's Law There is always a portion of a code
(f) that will execute sequentially and is not
parallelizable - Number of processors Np
- As Np ? 8, S ? 1/f
- Ideal S Np, Theoretical maximum S1/f
44Effect of reordering
Main memory
Reordered grid
Main memory
45Generation at critical topography
s
g
Critical topography gtopography
slope sinternal wave beam slope Critical when
g/s1
46log10(W/m2)
Net 53 MW
53 MW/100 km of coastline 530 W/m Hawaiian
ridge 15 GW/1200 km 12,500 W/m (24X)
47Net tidal energy budgets
- Monterey Bay (100 km)
- Internal tidal radiation 53 MW ? 530 W/m
- Extrapolated to North American West Coast (11000
km) - Barotropic tidal dissipation 14 GW (Egbert
Ray, JGR 2001) - 8.2 GW dissipated locally (59)
- 5.8 GW (41) radiates away as internal tides
- Hawaii Ocean Ridge
- Total tidal dissipation 18/-6 GW (Egbert and
Ray 2000) - 8-25 is dissipated locally (Klymak et al., 2006)
- 75-92 radiates away as internal tides
48Case with Frsill0.27
Isotherms 16, 20, 24, 28 degrees C
49Case with Frsill1.60
Isotherms 16, 20, 24, 28 degrees C
50Beginning of ebb tide
Generation mechanism "Lee wave release"
(Maxworthy, 1979)
t10
End of ebb tide
t2T/2
LT/2c1
Beginning of flood Lee wave release
t3T/2
Arrival at B1
t4t3LB1/c1
DtLB1/c1