Performance and Productivity: NWChem - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Performance and Productivity: NWChem

Description:

NWChem is funded by the U.S. Department of Energy, Office of Science, Office of ... and the Center for Computational Sciences at Oak Ridge National Laboratory under ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 60
Provided by: robertjh
Category:

less

Transcript and Presenter's Notes

Title: Performance and Productivity: NWChem


1
Performance and Productivity NWChem
  • Robert J. Harrison
  • Oak Ridge National Laboratory,
  • University of Tennessee

NWChem is funded by the U.S. Department of
Energy, Office of Science, Office of Biological
and Environmental Research, under contract
DE-AC06-76RLO 1830 with Battelle Memorial
Institute (Pacific Northwest National Laboratory,
PNNL) as part of the Environmental Molecular
Sciences Laboratory, PNNL.
2
NWChem Citation
T.H. Dunning, Jr., D.A. Dixon, M.F. Guest
  • R. J. Harrison, J. A. Nichols, T. P. Straatsma,
    M. Dupuis,E. J. Bylaska, G. I. Fann, T. L.
    Windus, E. Apra, W. de Jong,S. Hirata, M. T.
    Hackler, J. Anchell, D. Bernholdt, P.
    Borowski,T. Clark, D. Clerc, H. Dachsel, M.
    Deegan, K. Dyall, D. Elwood,H. Fruchtl, E.
    Glendening, M. Gutowski, K. Hirao, A. Hess,J.
    Jaffe, B. Johnson, J. Ju, R. Kendall, R.
    Kobayashi, R. Kutteh,Z. Lin, R. Littlefield, X.
    Long, B. Meng, T. Nakajima,J. Nieplocha, S. Niu,
    M. Rosing, G. Sandrone, M. Stave, H. Taylor,G.
    Thomas, J. van Lenthe, K. Wolinski, A. Wong, and
    Z. Zhang,
  • "NWChem, A Computational Chemistry Package for
    Parallel
  • Computers, Version 4.1" (2002),
  • Pacific Northwest National Laboratory,Richland,
    Washington 99352-0999, USA.

3
NWChem Overview
  • Provides major new modeling and simulation
    capability for molecular science
  • Broad range of molecules, including biomolecules
  • Electronic structure of molecules
    (non-relativistic, relativistic,
    one-/two-component, ECPs, second deriv.)
  • Increasingly extensive solid state capability
  • Molecular dynamics, molecular mechanics
  • Extensible and long-lived
  • Freely distributed installed at about 1000
    sites worldwide
  • Performance characteristics designed for MPP
  • Single node performance comparable to best serial
    codes
  • Scalability to 1000s of processors
  • Portable runs on a wide range of computers

4
Molecular Science Software Suite (MS3)
http//www.emsl.pnl.gov/pub/docs/ecce/
http//www.emsl.pnl.gov/pub/docs/parsoft/
http//www.emsl.pnl.gov/pub/docs/nwchem/
5
(No Transcript)
6
(No Transcript)
7
NWChem to go ...
  • Compaq iPAQ
  • Linux (Intimate)
  • 64 Mbyte RAM,
  • 16 Mbyte flash,
  • 2 Gbyte
  • PCMCIA disk
  • Strongarm CPU
  • Sadly, no FPU

8
Higher-level composition
  • Modular, hierarchical design
  • Easy to access high level features
  • Easy to extend with new high level features
  • Standardized interfaces
  • Reuse of low-level functionality without side
    effects
  • Distributed-shared memory parallel programming
    model
  • Non-uniform memory access (NUMA) aware algorithms
  • Python interface
  • Users developers can write NWChem programs in
    Python
  • Cool stuff now becoming available
  • Automatic code generation compose with
    many-body theory and/or tensor expressions
    (already in NWChem 4.5)
  • Multiresolution quantum chemistry compose with
    operators and functions

9
NWChem Architecture
  • Object-oriented design
  • abstraction, data hiding, APIs
  • Parallel programming model
  • non-uniform memory access, global arrays, MPI
  • Infrastructure
  • GA, Parallel I/O, RTDB, MA, ...
  • Program modules
  • communication only through the database
  • persistence for easy restart

10
Issues in Parallel Computing
  • Expressing and managing concurrency
  • The memory hierarchy
  • Efficient sequential execution

11
The Memory Hierarchy
  • Non-uniform memory access - NUMA
  • Your workstation is NUMA - registers, cache, main
    memory, virtual memory
  • Parallel computers just add non-local memory(s)
  • Unites sequential and parallel computation
  • Differ only in expression and management of
    concurrency
  • Distributed data
  • Do not limit calculation by resources of one node
  • Exploit aggregate resources of the whole machine
  • SCF and DFT can distribute all data gt O(N)
  • MP2 gradients distribute all data gt O(N2)

12
Parallel Programming ModelGlobal Arrays MPI
  • J. Nieplocha
  • Supported by DOE/ASCR/MICS
  • Shared-memory-like model
  • Fast local access
  • NUMA aware and easy to use
  • MIMD and data-parallel modes
  • Inter-operates with MPI,
  • BLAS and linear algebra interface
  • Used by most major chemistry codes, also in
    financial futures forecasting, astrophysics,
    computer graphics,
  • Ported to major parallel machines
  • IBM, Cray, SGI, clusters, ...

http//www.emsl.pnl.gov/pub/docs/global
13
Non-uniform memory access model of computation
Shared Object
Shared Object
1-sided communication
1-sided communication
copy to shared object
copy to local memory
compute/update
local memory
local memory
local memory
14
O(1) programmers O(1000) nodes O(100,000)
processors O(10,000,000) threads
  • Expressing/managing concurrency at the petascale
  • It is too trite to say that the parallelism is in
    the physics
  • Must express and discover parallelism at more
    levels
  • Low level tools (MPI, Co-Array Fortran, UPC, )
    dont discover parallelism or hide complexity or
    facilitate abstraction
  • Management of the memory hierarchy
  • Sending data from one multiprocessor chip to
    another will be like us taking a trip to Europe
  • Memory will be deeper less uniformity between
    vendors
  • Need tools to automate and manage this, even at
    runtime

15
Synthesis of High Performance Algorithms for
Electronic Structure Calculations
  • Sadayappan, Baumgartner, Cociorva, Pitzer (OSU)
    Ramanujam (LSU)Bernholdt, Dean, White
    III, Harrison (ORNL)Hirata (PNNL)Nooijen
    (Waterloo)
  • Objective
  • Automate the implementation of optimized parallel
    computer programs for many-electron methods
    expressed as tensor contractions
  • Multi-disciplinary, multi-institution project
  • Collaboration between NSF ITR, DOE SciDAC, and
    ORNL LDRD projects

16
CCSD Doubles Equation
  • hbara,b,i,j sumfb,cti,j,a,c,c
    -sumfk,ctk,bti,j,a,c,k,c
    sumfa,cti,j,c,b,c -sumfk,ctk,ati
    ,j,c,b,k,c -sumfk,jti,k,a,b,k
    -sumfk,ctj,cti,k,a,b,k,c
    -sumfk,itj,k,b,a,k -sumfk,cti,ctj
    ,k,b,a,k,c sumti,ctj,dva,b,c,d,c,d
    sumti,j,c,dva,b,c,d,c,d
    sumtj,cva,b,i,c,c -sumtk,bva,k,i,j
    ,k sumti,cvb,a,j,c,c
    -sumtk,avb,k,j,i,k -sumtk,dti,j,c,b
    vk,a,c,d,k,c,d -sumti,ctj,k,b,dvk,a,
    c,d,k,c,d -sumtj,ctk,bvk,a,c,i,k,c
    2sumtj,k,b,cvk,a,c,i,k,c
    -sumtj,k,c,bvk,a,c,i,k,c
    -sumti,ctj,dtk,bvk,a,d,c,k,c,d
    2sumtk,dti,j,c,bvk,a,d,c,k,c,d
    -sumtk,bti,j,c,dvk,a,d,c,k,c,d
    -sumtj,dti,k,c,bvk,a,d,c,k,c,d
    2sumti,ctj,k,b,dvk,a,d,c,k,c,d
    -sumti,ctj,k,d,bvk,a,d,c,k,c,d
    -sumtj,k,b,cvk,a,i,c,k,c
    -sumti,ctk,bvk,a,j,c,k,c
    -sumti,k,c,bvk,a,j,c,k,c
    -sumti,ctj,dtk,avk,b,c,d,k,c,d
    -sumtk,dti,j,a,cvk,b,c,d,k,c,d
    -sumtk,ati,j,c,dvk,b,c,d,k,c,d
    2sumtj,dti,k,a,cvk,b,c,d,k,c,d
    -sumtj,dti,k,c,avk,b,c,d,k,c,d
    -sumti,ctj,k,d,avk,b,c,d,k,c,d
    -sumti,ctk,avk,b,c,j,k,c
    2sumti,k,a,cvk,b,c,j,k,c
    -sumti,k,c,avk,b,c,j,k,c
    2sumtk,dti,j,a,cvk,b,d,c,k,c,d
    -sumtj,dti,k,a,cvk,b,d,c,k,c,d
    -sumtj,ctk,avk,b,i,c,k,c
    -sumtj,k,c,avk,b,i,c,k,c
    -sumti,k,a,cvk,b,j,c,k,c
    sumti,ctj,dtk,atl,bvk,l,c,d,k,l,c
    ,d -2sumtk,btl,dti,j,a,cvk,l,c,d,k
    ,l,c,d -2sumtk,atl,dti,j,c,bvk,l,c,d
    ,k,l,c,d sumtk,atl,bti,j,c,dvk,l,c
    ,d,k,l,c,d -2sumtj,ctl,dti,k,a,bvk
    ,l,c,d,k,l,c,d -2sumtj,dtl,bti,k,a,c
    vk,l,c,d,k,l,c,d sumtj,dtl,bti,k,c,
    avk,l,c,d,k,l,c,d -2sumti,ctl,dtj,
    k,b,avk,l,c,d,k,l,c,d sumti,ctl,at
    j,k,b,dvk,l,c,d,k,l,c,d sumti,ctl,b
    tj,k,d,avk,l,c,d,k,l,c,d
    sumti,k,c,dtj,l,b,avk,l,c,d,k,l,c,d
    4sumti,k,a,ctj,l,b,dvk,l,c,d,k,l,c,d
    -2sumti,k,c,atj,l,b,dvk,l,c,d,k,l,c,d
    -2sumti,k,a,btj,l,c,dvk,l,c,d,k,l,c,d
    -2sumti,k,a,ctj,l,d,bvk,l,c,d,k,l,c,
    d sumti,k,c,atj,l,d,bvk,l,c,d,k,l,c,d
    sumti,ctj,dtk,l,a,bvk,l,c,d,k,l,c
    ,d sumti,j,c,dtk,l,a,bvk,l,c,d,k,l,c,
    d -2sumti,j,c,btk,l,a,dvk,l,c,d,k,l,c
    ,d -2sumti,j,a,ctk,l,b,dvk,l,c,d,k,l,
    c,d sumtj,ctk,btl,avk,l,c,i,k,l,c
    sumtl,ctj,k,b,avk,l,c,i,k,l,c
    -2sumtl,atj,k,b,cvk,l,c,i,k,l,c
    sumtl,atj,k,c,bvk,l,c,i,k,l,c
    -2sumtk,ctj,l,b,avk,l,c,i,k,l,c
    sumtk,atj,l,b,cvk,l,c,i,k,l,c
    sumtk,btj,l,c,avk,l,c,i,k,l,c
    sumtj,ctl,k,a,bvk,l,c,i,k,l,c
    sumti,ctk,atl,bvk,l,c,j,k,l,c
    sumtl,cti,k,a,bvk,l,c,j,k,l,c
    -2sumtl,bti,k,a,cvk,l,c,j,k,l,c
    sumtl,bti,k,c,avk,l,c,j,k,l,c
    sumti,ctk,l,a,bvk,l,c,j,k,l,c
    sumtj,ctl,dti,k,a,bvk,l,d,c,k,l,c,d
    sumtj,dtl,bti,k,a,cvk,l,d,c,k,l,c,
    d sumtj,dtl,ati,k,c,bvk,l,d,c,k,l,
    c,d -2sumti,k,c,dtj,l,b,avk,l,d,c,k,l
    ,c,d -2sumti,k,a,ctj,l,b,dvk,l,d,c,k,
    l,c,d sumti,k,c,atj,l,b,dvk,l,d,c,k,l
    ,c,d sumti,k,a,btj,l,c,dvk,l,d,c,k,l,
    c,d sumti,k,c,btj,l,d,avk,l,d,c,k,l,c
    ,d sumti,k,a,ctj,l,d,bvk,l,d,c,k,l,c,
    d sumtk,atl,bvk,l,i,j,k,l
    sumtk,l,a,bvk,l,i,j,k,l
    sumtk,btl,dti,j,a,cvl,k,c,d,k,l,c,d
    sumtk,atl,dti,j,c,bvl,k,c,d,k,l,c,
    d sumti,ctl,dtj,k,b,avl,k,c,d,k,l,
    c,d -2sumti,ctl,atj,k,b,dvl,k,c,d,
    k,l,c,d sumti,ctl,atj,k,d,bvl,k,c,d
    ,k,l,c,d sumti,j,c,btk,l,a,dvl,k,c,d,
    k,l,c,d sumti,j,a,ctk,l,b,dvl,k,c,d,
    k,l,c,d -2sumtl,cti,k,a,bvl,k,c,j,k,l
    ,c sumtl,bti,k,a,cvl,k,c,j,k,l,c
    sumtl,ati,k,c,bvl,k,c,j,k,l,c
    va,b,i,j

17
TCE Components
Sequence of Matrix Products Element-wise Matrix
Operations Element-wise Function Eval.
Tensor Expressions
Algebraic Transformations
  • Algebraic Transformations
  • Minimize operation count
  • Memory Minimization
  • Reduce intermediate storage
  • Space-Time Transformation
  • Trade storage for recomputation
  • Storage Management and Data Locality Optimization
  • Optimize use of storage hierarchy
  • Data Distribution and Partitioning
  • Optimize parallel layout

System Memory Specification
No soln fits disk
Memory Minimization
No soln fits disk
Soln fits disk, not mem.
Soln fits mem.
Space-Time Trade-Offs
Storage and Data Locality Management
Soln fits mem.
Data Distribution and Partitioning
Performance Model
Parallel Code Fortran/C/ OpenMP/MPI/Global Arrays
18
Multiresolution Quantum Chemistry Robert J.
Harrison, George I. Fann, Takeshi Yanai,
Zhengting GanOak Ridge National Laboratory
andUniversity of Tennessee, KnoxvilleandGregory
BeylkinUniversity of Coloradoharrisonrj_at_ornl.
gov
19
The funding
  • This work is funded by the U.S. Department of
    Energy, the division of Basic Energy Science,
    Office of Science, under contract
    DE-AC05-00OR22725 with Oak Ridge National
    Laboratory. This research was performed in part
    using
  • the Molecular Science Computing Facility in the
    Environmental Molecular Sciences Laboratory at
    the Pacific Northwest National Laboratory under
    contract DE-AC06-76RLO 1830 with Battelle
    Memorial Institute,
  • resources of the National Energy Scientific
    Computing Center which is supported by the Office
    of Energy Research of the U.S. Department of
    Energy under contract DE-AC03-76SF0098,
  • and the Center for Computational Sciences at Oak
    Ridge National Laboratory under contract
    DE-AC05-00OR22725 .
  • ORNL LDRD

20
Outline
  • Brief introduction to methodology
  • Practical computation in higher dimensions
  • Separated form for operators
  • Analytic derivatives
  • Initial results
  • Accuracy, timing and scaling
  • MP2
  • Path to basis set limit results?

21
Objectives
  • Complete elimination of the basis error
  • One-electron models (e.g., HF, DFT)
  • Pair models (e.g., MP2, CCSD, )
  • Correct scaling of cost with system size
  • General approach
  • Readily accessible by students and researchers
  • Much smaller computer code than Gaussians
  • No two-electron integrals replaced by fast
    application of integral operators
  • Fast algorithms with guaranteed precision

22
References
  • The (multi)wavelet methods in this work are
    primarily based upon
  • Alpert, Beylkin, Grimes, Vozovoi (J. Comp. Phys.,
    in press)
  • B. Alpert (SIAM Journal on Mathematical Analysis
    24, 246-262, 1993).
  • Beylkin, Coifman, Rokhlin (Communications on Pure
    and Applied Mathematics, 44, 141-183, 1991.)
  • The following are useful further reading
  • Daubechies, Ten lectures on wavelets
  • Walnut, An introduction to wavelets
  • Meyer, Wavelets, algorithms and applications
  • Burrus et al, Wavelets and Wavelet transforms

23
Linear Combination of Atomic Orbitals (LCAO)
  • Molecules are composed of (weakly) perturbed
    atoms
  • Use finite set of atomic wave functions as the
    basis
  • Hydrogen-like wave functions are exponentials
  • E.g., hydrogen molecule (H2)
  • Smooth function ofmolecular geometry
  • MOs cusp at nucleuswith exponential decay

24
LCAO
  • A fantastic success, but
  • Basis functions have extended support
  • causes great inefficiency in high accuracy
    calculations
  • origin of non-physical density matrix
  • Basis set superposition error (BSSE)
  • incomplete basis on each center leads to
    over-binding as atoms are brought together
  • Linear dependence problems
  • accurate calculations require balanced approach
    to a complete basis on every atom
  • Must extrapolate to complete basis limit
  • unsatisfactory and not feasible for large systems

25
Why think multiresolution?
  • It is everywhere in nature/chemistry/physics
  • Core/valence high/low frequency short/long
    range smooth/non-smooth atomic/nano/micro/macro
    scale
  • Common to separate just two scales
  • E.g., core orbital heavily contracted, valence
    flexible
  • More efficient, compact, and numerically stable
  • Multiresolution
  • Recursively separate all length/time scales
  • Computationally efficient and numerically stable
  • Coarse-scale models that capture fine-scale detail

26
How to think multiresolution
  • Consider a ladder of function spaces
  • E.g., increasing quality atomic basis sets, or
    finer resolution grids,
  • Telescoping series
  • Instead of using the most accurate
    representation, use the difference between
    successive approximations
  • Representation on V0 small/dense differences
    sparse
  • Computationally efficient possible insights

27
Scaling Function Basis
  • Divide domain into 2n pieces (level n)
  • Adaptive sub-division (local refinement)
  • lth sub-interval l2-n,(l1)2-n l0,,n-1
  • In each sub-interval define a polynomial basis
  • First k Legendre polynomials
  • Orthonormal, disjoint support

28
Scaling Function Basis - II
i1
i0
i3
i2
29
Multiwavelet Basis
  • Space of polynomials on level n is Vn
  • Wavelets - an orthonormal basis to span
  • Currently use Alperts basis
  • Vanishing moments
  • Critically important property
  • Since Wn is orthogonal to Vn the first k moments
    of functions in Wn vanish, i.e.,
  • Sparse representations of many physically
    important kernels

30
Some Consequences of Vanishing Moments
  • Compact representation of smooth functions
  • Consider Taylor series the first k terms vanish
    and smooth implies higher order terms are small
  • Compact representation of integral operators
  • E.g., 1/r-s
  • Consider double Taylor series or multipole
    expansion
  • Interaction between wavelets decays as r-2k-1
  • Derivatives at origin vanish in Fourier space
  • Diminishes effect of singularities at that point

31
  • Slice thru grid used to represent the nuclear
    potential for H2 using k7 to a precision of
    10-5.
  • Automatically adapts it does not know a priori
    where the nuclei are.
  • Nuclei at dyadic points on level 5 refinement
    stops at level 8
  • If were at non-dyadic points refinement
    continues (to level ??) but the precision is
    still guaranteed.
  • In future will unevenly subdivide boxes to force
    nuclei to dyadic points.

32
(No Transcript)
33
Integral Formulation
  • E.g., used by Kalos, 1962

34
Integral operators in 3D
  • Non-standard matrix elements easy to evaluate
    from compressed form of kernel K(x)
  • Application in 1-d is fairly efficient
  • O(Nboxk2) operations
  • In 3-d seems to need O(Nboxk6) operations
  • Prohibitively expensive
  • Separated form
  • Beylkin, Cramer, Mohlenkamp, Monzon
  • O(Nboxk4) or better in 3D

35
Low Separation Rank Representation
  • Many functions/operators have short expansions
  • Different from low operator rank
  • E.g., identity has full operator rank, but unit
    separation rank.

36
Separated form for integral operators
  • Approach in current prototype code
  • Represent the kernel over a finite range as a sum
    of Gaussians
  • Only need compute 1D transition matrices (X,Y,Z)
  • SVD the 1-D operators (low rank away from
    singularity)
  • Apply most efficient choice of low/full rank 1-D
    operator
  • Even better algorithms not yet implemented

37
Accurate Quadratures
  • Trapezoidal quadrature
  • Geometric precision for periodic functions with
    sufficient smoothness.

The kernel for x1e-4,1e-3,1e-2,1e-,1e0. The
curve for x1e-4 is the rightmost
38
Automatically generated representations
of exp(-30r)/r accurate to 1e-10, 1e-8, 1e-6,
1e-4, and 1e-2 (measured by the weighted error
r(exp(-30r)/r - fit(r))) for r in 1e-8,1 were
formed with 92, 74, 57, 39 and 21 terms,
respectively. Note logarithmic dependence
upon precision.
39
Smoothed Nuclear Potential
  • u(r/c)/c shifts error to rltc
  • e0.00435Z5c3
  • ltVgt accurate
  • ltTgt main source of error

40
Translational Invariance
  • Dyadic
  • 10-3 -75.9139
  • 10-5 -75.913564
  • 10-7 -75.91355634
  • Non-dyadic
  • -75.9139
  • -75.913564
  • -75.91355635
  • Uncontracted aug-cc-pVQZ 75.913002
  • Solving with e1e-3, 1e-5, 1e-7 (k7,9,11)
  • Demonstrates translation invariance and that
    forcing to dyadic points is only an optimization
    and does not change the obtained precision.
  • Average orbital sizes 1.6Mb, 8Mb, 56Mb

41
Analytic Derivatives
  • Hellman-Feynman theorem applies

42
N2 Hartree-Fock R2.0 a.u.
  • Basis Grad.Err. EnergyErr.
  • cc-pVDZ 5e-2 4e-2
  • aug-cc-pVDZ 5e-2 4e-2
  • cc-pVTZ 7e-3 1e-2
  • aug-cc-pVTZ 6e-3 9e-3
  • cc-pVQZ 8e-4 2e-3
  • aug-cc-pVQZ 9e-4 2e-3
  • cc-pV5Z 1e-4 4e-4
  • aug-cc-pV5Z 2e-5 2e-4
  • k5 6e-3 1e-2
  • k7 4e-5 2e-5
  • k9 3e-7 -2e-7
  • k11 0.0 0.0
  • 0.026839623 -108.9964232

43
Sources of error in the gradient
  • Partially converged orbitals
  • Same as for conventional methods
  • Smoothed potential
  • Numerical errors in the density/potential
  • Higher-order convergence except where the
    functions are not sufficiently smooth
  • Inadequate refinement (clearly adequate for the
    energy, but not necessarily for other properties)
  • Exacerbated by nuclei at non-dyadic points
  • Gradient measures loss of spherical symmetry
    around the nucleus the large value of the
    derivative potential amplifies small errors

44
Dependence on potential smoothing parameter
(c) Absolute errors ofderivatives for
diatomics with the nuclei at dyadic points. For
energy accuracyof 1e-6 H 0.039 Li 0.0062 B 0.0026
N 0.0015 O 0.0012 F 0.00099
45
Dependence on potential smoothing parameter
(c) Absolute errors ofderivatives for
diatomics with the nuclei at non-dyadic
points. For energy accuracyof
1e-6 H 0.039 Li 0.0062 B 0.0026 N 0.0015 O 0.0012
F 0.00099
46
Comparison with NUMOL and aug-cc-pVTZ
  • H2, Li2, LiH, CO, N2, Be2, HF, BH, F2, P2, BH3,
    CH2, CH4, C2H2, C2H4, C2H6, NH3, H2O, CO2, H2CO,
    SiH4, SiO, PH3, HCP
  • NUMOL, Dickson Becke JCP 99 (1993) 3898
  • Dyadic points (0.001a.u.) Newton correction
  • Agrees with NUMOL to available precision
  • LDA (k7,0.002 k9, 0.0006)
  • k9 vs. aug-cc-pVTZ rms error
  • Hartree-Fock 0.004 a.u. (0.019 SiO)
  • LDA 0.003 a.u. (0.018 SiO)

47
High-precision Hartree-Fock geometry for water
  • Pahl and Handy Mol. Phys. 100 (2002) 3199
  • Plane waves polynomials for the core
  • Finite box (L18) requires extrapolation
  • Estimated error 3mH, 1e-5 Angstrom
  • k11, conv.tol1e-8,e1e-9, L40
  • Max. gradient 3e-8, RMS step5e-8
  • Difference to Pahl 10mH, 4e-6 Angstrom, 0.0012
  • Basis OH HOH Energy
  • k11 0.939594 106.3375 -76.06818006
  • Pahl 0.939598 106.3387 -76.068170
  • cc-pVQZ 0.93980 106.329 -76.066676

48
Energy Timing
  • Water LDA with energy error of 1e-5
  • Initial prototype code with lots of Python
    overhead
  • 450s on 2.4 GHz Pentium IV processor
  • Current version (revised tensor class, integral
    operators)
  • 96s on 2.4 GHz Pentium IV processor
  • Predicted future performance
  • lt 30s with known algorithmic improvements
  • faster still with better representations of the
    separated operators, alternative basis sets,
    improved iterative solution

49
Asymptotic Scaling
  • Current implementation
  • Based upon canonical orbitals O(N) to O(N2)
    currently dominant ( O(N3) linear algebra)
  • Density matrix/spectral projector
  • Well established O(Natomlogm(e)) to any finite
    precision (Goedecker, Beylkin, )
  • This is not possible with conventional AO
    Gaussians
  • Need separated representation for efficiency
  • Gradient
  • each dV/dx requires O(-log(e)log(vol.)) terms
  • All gradients evaluated in O(-Natomlog(e)log(vol.)
    )

50
Water dimer LDAaug-cc-pVTZ geometry, kcal/mol.
51
Benzene dimer LDAaug-cc-pVDZ geometry, kcal/mol.
52
Benzene dimer timings(Sequential Pentium IV 2.4
GHz)
53
Benzene monomer, dimer and trimer
  • (aug-cc-pVDZ LDA geometry)
  • Dimer binding energy -0.96 kcal/mol.
  • Trimer -1.67 kcal/mol.
  • Single processor times for k9 energy(energy
    accurate to about 1e-6).
  • Monomer 56 minutes
  • Dimer 200 (3.6x 21.84)
  • Trimer 457 (2.3x (3/2)2.05)

54
Also working
  • Takeshi Yanai
  • Analytic derivatives
  • Fast (O(N)) Hartree-Fock exchange
  • TDDFT within Tamm-Damcoff approximation
  • GGA
  • Abelian point group symmetry (D2h subgroups)
  • Thanks also to
  • So Hirata for guidance with TDDFT
  • Edo Apra for insights into DFT

55
(No Transcript)
56
Putting it all together A path to O(N) exact
MP2
  • HF provably O(N) to arbitrary finite precision
  • Based upon the density matrix
  • Localized orbitals also possible (Bernholc)
  • Need an MP2 scheme based upon density matrices

57
The Resolvent has low separation rank
  • Already known Almlöf Laplace factorization

58
Density matrix form of MP2
59
Summary
  • Multiresolution provides a general framework for
    computational chemistry
  • Accurate and efficient with a very small code
  • Multiwavelets provide high-order convergence and
    accommodate singularities
  • Familiar orthonormal basis (Legendre polynomials)
  • Compression and reconstruction (c.f., FFT)
  • Fast integral operators (c.f., FMM)
  • Separated form for operators and functions
  • Critical for efficient computation in higher
    dimension
  • Expect speed competitive to Gaussians in near
    future
  • Optimal separated forms for kernels, multi-scale
    non-linear solver, better implementation
  • Real impact will be application to many-body
    models
Write a Comment
User Comments (0)
About PowerShow.com