Title: Multiresolution Adaptive Numerical Scientific Simulation
1(No Transcript)
2Multiresolution Adaptive Numerical Scientific
Simulation
Ariana Beste1, George I. Fann1, Robert J.
Harrison1,2, Rebecca Hartman-Baker1, Shinichiro
Sugiki11Oak Ridge National Laboratory 2University
of Tennessee, Knoxville In collaboration with
Gregory Beylkin4, Fernando Perez4, Lucas
Monzon4, Martin Mohlenkamp5 and
others 4University of Colorado5Ohio
University harrisonrj_at_ornl.gov
3The DOE funding
- This work is funded by the U.S. Department of
Energy, the division of Basic Energy Science,
Office of Science, under contract
DE-AC05-00OR22725 with Oak Ridge National
Laboratory. This research was performed in part
using - resources of the National Energy Scientific
Computing Center which is supported by the Office
of Energy Research of the U.S. Department of
Energy under contract DE-AC03-76SF0098, - and the Center for Computational Sciences at Oak
Ridge National Laboratory under contract
DE-AC05-00OR22725 .
4Outline
- Multiresolution basics
- Parallel decomposition and tools
- Underlying representation
- Application characteristics
- Current storage strategy
5Molecular Science Software Project EMSL / PNNL
PNNL Yuri Alexeev, Eric Bylaska, Bert
deJong, Mahin Hackler, Karol Kowalski, Lisa
Pollack, Tjerk Straatsma, Marat Valiev,
ORNL Edo Apra, Robert Harrison Vincent Meunier
Ames Ricky Kendall TL Windus
Gary Black, Brett Didier, Todd Elsenthagen, Sue
Havre, Carina Lansing, Bruce Palmer, Karen
Schuchardt, Lisong Sun Erich Vorpagel
Manoj Krishnan, Jarek Nieplocha, Bruce Palmer,
Vinod Tipparaju
http//www.emsl.pnl.gov/docs/nwchem/nwchem.html
6Computational Chemistry EndstationInternational
collaboration spanning 8 universities and 5
national labs
- Capabilties
- Chemically accurate thermochemistry
- Many-body methods required
- Mixed QM/QM/MM dynamics
- Accurate free-energy integration
- Simulation of extended interfaces
- Families of relativistic methods
- Led out of UT/ORNL
- Focus
- Actinides, Aerosols, Catalysis
- ORNL Cray XT3, ANL BG/L
- NWChem Largest CCSD(T) calculation
- - Pollack, EMSL, 2005.
- - 1960 processor Itanium2 cluster
- 1468 basis functions (aug-cc-pVQZ)
- Perturbative triples (T)
- 23 hours on 1400 processors
- 75 of peak 6.3 TFlops.
Scaling of MADNESS 64-4096 cpu on XT3
7Multiresolution chemistry objectives
- Complete elimination of the basis error
- One-electron models (e.g., HF, DFT)
- Pair models (e.g., MP2, CCSD, )
- Correct scaling of cost with system size
- General approach
- Readily accessible by students and researchers
- Higher level of composition
- Direct computation of chemical energy differences
- New computational approaches
- Fast algorithms with guaranteed precision
8How to think multiresolution
- Consider a ladder of function spaces
- E.g., increasing quality atomic basis sets, or
finer resolution grids, - Telescoping series
- Instead of using the most accurate
representation, use the difference between
successive approximations - Representation on V0 small/dense differences
sparse - Computationally efficient many possible insights
9(No Transcript)
10High-level composition using functions and
operators
- Conventional quant. chem. uses explicitly indexed
sparse arrays of matrix elements - Complex, tedious and error prone
- Python classes for Function and Operator
- in 1,2,3,6 and general dimensions
- wide range of operations
- Hpsi -0.5Delsqpsi Vpsi
- J Coulomb.apply(rho)
- All with guaranteed speed and precision
11New MADNESS solver
- Total rewrite in C
- Three levels of parallelism targeting massively
parallel computer using multi-processor nodes - In anticipation of highly-threaded processors
- Ideally targets low latency AMMPIthreads
- Portable implementation pollingMPIthreads
- Core math functionality is now running
- 3D functions, real and complex (1-6D functions
will be added this FYI) - Scaling demonstrated up to 4096 processors
designed for 100K.
121-D Example Sub-Tree Parallelism
0
1
2
3
4
5
6
Both sub-trees can be done in parallel. In 3-D
nodes split into 8 children in 6-D there are
64 children
13Distributed-memory Cilk-like model
Parameter MPI rank probe()
set() get()
Task Input parameters Output
parameters probe() run()
Compress(tree,result) Parameter left, right if
(tree.left) Compress(tree.left, left) if
(tree.right) Compress(tree.right,
right) AddTask(Op, left, right,
result) WaitTasks()
Benefits Most receives pre-posted greatly
increasing scalability Communication
latency transfer time largely hidden Much
simpler composition than explicit message
passing Positions code to use intelligent
runtimes with work stealing Positions code
for efficient use of multi-core chips
14Essential techniques for fast computation
- Multiresolution
- Low-separation rank
- Low-operator rank
15Separated representations
- Key to computing in higher dimensions
- Analogs of SVD exploit low operator rank
- Generalized form exploits other operator
properties - E.g., these all have full operator rank but
low-separation rank constructions exist - Identity operator
- Greens functions of many PDEs (Poisson,
Helmholtz) - All-electron Schrödinger Hamiltonian
16x
x-y
x-y
r separation rank
x-y
x-y
In 3D, ideally mustbe one box removedfrom the
diagonalDiagonal box hasfull rank Boxes
touching diagonal (face, edge,or corner) have
increasingly low rank Away from diagonalr
O(-log e)
x-y
y-x
y
x-y
x-y
x-y
x-y
y-x
y-x
x-y
17Molecular electronic Schrödinger equation
- A 3-N dimensional, non-separable, second-order
differential equation
18Dynamics of fundamental few electron systems
(Krstic and Harrison)
- Electronatom/molecule scatteringMolecules in
intense radiation field - Challenges
- Scattering highly oscillatory states
- Dissociation continuum states
- Quantum treatment of light nuclei
- Rydberg states very large volumes
- In principle, adaptive multiresolution techniques
are ideal - Single basis treats bound and continuum states on
equal footing - Long time steps possible via integral operator
for time evolution - Separated representations provide path to higher
dimensions - Waiting for new production code before can apply
free-particle propagator efficiently for implicit
scheme (integral kernel is exp(-ix2/2t) ) - Need a more strongly band limited basis?
- Want to do this in at least 5-9D, 12D being
considered
19Independent particle models
- Atomic and molecular orbitals
- Each electron feels the mean field of all other
electrons (self-consistent field, Hartree-Fock) - Replaces linear 3N-D Schrödinger w. non-linear
3-D eigen-problem - Provides the structure of the periodic table and
the chemical bond - Linear combination of atomic orbitals - LCAO
- E.g., molecular orbitals for water, H2O
20Density functional theory (DFT)
- Hohenberg-Kohn theorem
- The energy is a functional of the density (3D)
- Kohn-Sham
- Practical approach to DFT, parameterizing the
density with orbitals (easier treatment of
kinetic energy) - Very similar computationally to Hartree-Fock, but
potentially exact
21Reduced scaling method
- Eigen-functions (canonical orbitals) can be
delocalized - Limits to O(VN) data and O(VN2) compute
- Solve instead for localized orbitals that span
the same space - Limits to O(NlnV) data and compute
- Multiresolution representation makes this easy
- Remaining linear algebra has small pre-factor and
is sparse
22Current I/O Strategy
- Looked seriously at HDF and Phils API
- Substantial effort for adoption HDF perf.
questions - Substantial benefits from interoperability
- Short-term driver is check point restart
- Tunable subset of nodes doing I/O
- Currently nodes at a level in tree (in 3D 1, 8,
64, ) - Collect data from other nodes
- Serialize to disk in either binary or text (XML)
- Already want interfaces to viz. tools
- Starting to consider interface to external
solvers - Sundance, PetSc,
23Summary of MADNESS data
- Discontinuous spectral element
- Legendre polynomials, or
- Approximate prolate spheroidal functions
- Structured, deeply-refined, adaptive mesh
- In higher-dimensions
- Separated representations in most elements
- Mix of data types
- Float, double, float-complex, double-complex
- 100s to 10Ks of distinct functions in 3D
- 10s of Gb to 10s of Tb of data
- Few functions in 6D
- 100s of Gb to 10s of Tb