HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY presentation

About This Presentation

Transcript and Presenter's Notes

Title: HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY

1
HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY

Mark S. Gordon, Klaus Ruedenberg
Ames Laboratory
Iowa State University

BBG
2
OUTLINE

Methods and Strategies
Correlated electronic structure methods
Distributed Data Interface (DDI)
Approaches to efficient HPC in chemistry
Scalability with examples

3
CORRELATED ELECTRONIC STRUCTURE METHODS

Well Correlated Methods Needed for
Accurate relative energies, dynamics
Treatment of excited states, photochemistry
Structures of diradicals, complex species
Computationally demanding Scalability important
HF Often Reasonable Starting Point for Ground
States, Small Diradical Character
Single reference perturbation theory
MP2/MBPT2 Scales N5
Size consistent
Higher order MBPT methods often perform worse

4
SINGLE REFERENCE COUPLED CLUSTER METHODS

Cluster expansion is more robust
Can sum all terms in expansion
Size-consistent
State-of-the-art single reference method
CCSD, CCSDT, CCSDTQ,
CCSD(T), CR-CCSD(T) efficient compromise
Scales N7
Methods often fail for bond-breaking consider N2
Breaking 3 bonds s 2 p
Minimal active space (6,6)

5
(No Transcript)
6
MCSCF METHODS

Single configuration methods can fail for
Species with significant diradical character
Bond breaking processes
Often for excited electronic states
Unsaturated transition metal complexes
Then MCSCF-based method is necessary
Most common approach is
Complete active space SCF (CASSCF/FORS)
Active space orbitalselectrons involved in
process
Full CI within active space optimize orbitals
CI coeffs
Size-consistent

7
MULTI-REFERENCE METHODS

Multi reference methods, based on MCSCF
Second order perturbation theory (MRPT2)
Relatively computationally efficient
Size consistency depends on implementation
Multi reference configuration interaction (MRCI)
Very accurate, very time-consuming
Highly resource demanding
Most common is MR(SD)CI
Generally limited to (14,14) active space
Not size-consistent
How to improve efficiency?

8
(No Transcript)
9
DISTRIBUTED PARALLEL COMPUTING

Distribute large arrays among available
processors
Distributed Data Interface (DDI) in GAMESS
Developed by G. Fletcher, M. Schmidt, R. Olson
Based on one-sided message passing
Implemented on T3E using SHMEM
Implemented on clusters using sockets or MPI, and
paired CPU/data server

10
The virtual shared-memory model. Each large box
(grey) represents the memory available to a given
CPU. The inner boxes represent the memory used
by the parallel processes (rank in lower right).
The gold region depicts the memory reserved for
the storage of distributed data. The arrows
indicate memory access (through any means) for
the distributed operations get, put and
accumulate.
11
FULL shared-memory model All DDI processes
within a node attach to all the shared-memory
segments. The accumulate operation shown can
now be completed directly through memory.
12
CURRENTLY DDI ENABLED

Currently implemented
Closed shell MP2 energies gradients
Most efficient closed shell correlated method
when appropriate (single determinant)
Geometry optimizations
Reaction path following
On-the-fly direct dynamics
Unrestricted open shell MP2 energies gradients
Simplest correlated method for open shells
Restricted open shell (ZAPT2) energies grad
Most efficient open shell correlated method
No spin contamination through second order

13
CURRENTLY DDI ENABLED

CASSCF Hessians
Necessary for vibrational frequencies, transition
state searches, building potential energy
surfaces
MRMP2 energies
Most efficient correlated multi-reference method
Singles CI energies gradients
Simplest qualitative method for excited
electronic states
Full CI energies
Exact wavefunction for a given atomic basis
Effective fragment potentials
Sophisticated model for intermolecular
interactions

14
COMING TO DDI

In progress
Vibronic (derivative) coupling (Tim Dudley)
Conical intersections, photochemistry
GVVPT2 energiesgradients Mark Hoffmann
ORMAS energies, gradients
Joe Ivanic, Andrey Adsatchev
Subdivides CASSCF active space into subspaces
Coupled cluster methods
Ryan Olson, Ian Pimienta, Alistair Rendell
Collaboration w/ Piotr Piecuch, Ricky Kendall
Key Point
Must grow problem size to maximize scalability

15
FULL CI ZHENGTING GAN

Full CI exact wavefunction for given atomic
basis
Extremely computationally demanding
Scales eN
Can generally only be applied to atoms small
molecules
Very important because all other approximate
methods can be benchmarked against Full CI
Can expand the size of applicable molecules by
making the method highly scalable/parallel
CI part of FORS/CASSCF

Parallel performance for FCI on IBM P3 cluster
singlet state of H3COH
14 electrons in 14 orbitals
11,778,624 determinants
singlet state of H2O2
14 electrons in 15 orbitals
41,409,225 determinants

JCP, 119, 47 (2003)
17

Parallel performance for FCI on Cray X1 (ORNL)
O-
Aug-cc-pVTZ atomic basis, O 1s orbitals frozen
7 valence electrons in 79 orbitals
14,851,999,576 determinants 8-10 Gflops/12.5
theoretical

Latest resultsaug-cc-pVTZ C2, 8 electrons in 68
orbitals
64,931,348,928 determinants, lt 4 hours wall time!

18
(No Transcript)
19
(No Transcript)
20

Comparison with Coupled Cluster

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Full Potential Energy Surfaces
32
F2 potential energy curves cc-pVTZ
33
(No Transcript)
34
MCSCF HESSIANS TIM DUDLEY

Analytic Hessians generally superior to numerical
or semi-numerical
Finite displacements frequently cause artificial
symmetry breaking or root flipping
Necessary step for derivative coupling
Computationally demanding Parallel efficiency
desirable
DDI-based MCSCF Hessians
IBM clusters, 64-bit Linux

35
304 basis fxns, small active space Dominated by
calc of derivative integrals
36
(No Transcript)
37
Large active space, small AO basis Dominated by
calc of CI blocks of H
38
(No Transcript)
39
216 basis fxns, full p active space Calc is mix
of all bottlenecks
40
(No Transcript)
41
ZAPT2 BENCHMARKS

IBM p640 nodes connected by dual Gigabit Ethernet
4 Power3-II processors at 375 MHz
16 GB memory
Tested
Au3H4
Au3O4
Au5H4
Ti2Cl2Cp4
Fe-porphyrin imidazole

42
Au3H4

Basis set
aug-cc-pVTZ on H
uncontracted SBKJC with 3f2g polarization
functions and one diffuse sp function on Au
380 spherical harmonic basis functions
31 DOCC, 1 SOCC
9.5 MWords replicated
170 MWords distributed

43
Au3O4

Basis set
aug-cc-pVTZ on O
uncontracted SBKJC with 3f2g polarization
functions and one diffuse sp function on Au
472 spherical harmonic basis functions
44 DOCC, 1 SOCC
20.7 MWords replicated
562 MWords distributed

44
Au5H4

Basis set
aug-cc-pVTZ on H
uncontracted SBKJC with 3f2g polarization
functions and one diffuse sp function on Au
572 spherical harmonic basis
functions
49 DOCC, 1 SOCC
30.1 MWords replicated
1011 MWords distributed

45
Ti2Cl2Cp4

Basis set
TZV
486 basis functions (N 486)
108 DOCC, 2 SOCC
30.5 MWords replicated
2470 MWords distributed

46
Fe-porphyrin imidazole

Two basis sets
MIDI with d polarization functions (N 493)
TZV with d,p polarization functions (N 728)
110 DOCC, 2 SOCC
N 493
32.1 MWords replicated
2635 MWords distributed
N 728
52.1 MWords replicated
5536 MWords distributed

47
(No Transcript)
48
Load Balancing

Au3H4 on 64 processors
Total CPU time ranged from 1124 to 1178 sec.
Master spent 1165 sec.
average 1147 sec.
standard deviation 13.5 sec.
Large Fe-porphyrin on 64 processors
Total CPU time ranged from 50679 to 51448 sec.
Master spent 50818 sec.
average 51024 sec.
standard deviation 162 sec.

HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY PowerPoint PPT Presentation