Title: HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY
 1HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY
- Mark S. Gordon, Klaus Ruedenberg 
- Ames Laboratory 
- Iowa State University
BBG 
 2OUTLINE
- Methods and Strategies 
- Correlated electronic structure methods 
- Distributed Data Interface (DDI) 
- Approaches to efficient HPC in chemistry 
- Scalability with examples
3CORRELATED ELECTRONIC STRUCTURE METHODS
- Well Correlated Methods Needed for 
- Accurate relative energies, dynamics 
- Treatment of excited states, photochemistry 
- Structures of diradicals, complex species 
- Computationally demanding Scalability important 
- HF Often Reasonable Starting Point for Ground 
 States, Small Diradical Character
- Single reference perturbation theory 
- MP2/MBPT2 Scales N5 
- Size consistent 
- Higher order MBPT methods often perform worse
4SINGLE REFERENCE COUPLED CLUSTER METHODS
- Cluster expansion is more robust 
- Can sum all terms in expansion 
- Size-consistent 
- State-of-the-art single reference method 
- CCSD, CCSDT, CCSDTQ,  
- CCSD(T), CR-CCSD(T) efficient compromise 
- Scales N7 
- Methods often fail for bond-breaking consider N2 
- Breaking 3 bonds s  2 p 
- Minimal active space  (6,6)
5(No Transcript) 
 6MCSCF METHODS
- Single configuration methods can fail for 
- Species with significant diradical character 
- Bond breaking processes 
- Often for excited electronic states 
- Unsaturated transition metal complexes 
- Then MCSCF-based method is necessary 
- Most common approach is 
- Complete active space SCF (CASSCF/FORS) 
- Active space  orbitalselectrons involved in 
 process
- Full CI within active space optimize orbitals  
 CI coeffs
- Size-consistent
7MULTI-REFERENCE METHODS
- Multi reference methods, based on MCSCF 
- Second order perturbation theory (MRPT2) 
- Relatively computationally efficient 
- Size consistency depends on implementation 
- Multi reference configuration interaction (MRCI) 
- Very accurate, very time-consuming 
- Highly resource demanding 
- Most common is MR(SD)CI 
- Generally limited to (14,14) active space 
- Not size-consistent 
- How to improve efficiency?
8(No Transcript) 
 9DISTRIBUTED PARALLEL COMPUTING
- Distribute large arrays among available 
 processors
- Distributed Data Interface (DDI) in GAMESS 
- Developed by G. Fletcher, M. Schmidt, R. Olson 
- Based on one-sided message passing 
- Implemented on T3E using SHMEM 
- Implemented on clusters using sockets or MPI, and 
 paired CPU/data server
10 The virtual shared-memory model. Each large box 
(grey) represents the memory available to a given 
CPU. The inner boxes represent the memory used 
by the parallel processes (rank in lower right). 
The gold region depicts the memory reserved for 
the storage of distributed data. The arrows 
indicate memory access (through any means) for 
the distributed operations get, put and 
accumulate. 
 11FULL shared-memory model All DDI processes 
within a node attach to all the shared-memory 
segments. The accumulate operation shown can 
now be completed directly through memory. 
 12CURRENTLY DDI ENABLED
- Currently implemented 
- Closed shell MP2 energies  gradients 
- Most efficient closed shell correlated method 
 when appropriate (single determinant)
- Geometry optimizations 
- Reaction path following 
- On-the-fly direct dynamics 
- Unrestricted open shell MP2 energies  gradients 
- Simplest correlated method for open shells 
- Restricted open shell (ZAPT2) energies  grad 
- Most efficient open shell correlated method 
- No spin contamination through second order
13CURRENTLY DDI ENABLED
- CASSCF Hessians 
- Necessary for vibrational frequencies, transition 
 state searches, building potential energy
 surfaces
- MRMP2 energies 
- Most efficient correlated multi-reference method 
- Singles CI energies  gradients 
- Simplest qualitative method for excited 
 electronic states
- Full CI energies 
- Exact wavefunction for a given atomic basis 
- Effective fragment potentials 
- Sophisticated model for intermolecular 
 interactions
14COMING TO DDI
- In progress 
- Vibronic (derivative) coupling (Tim Dudley) 
- Conical intersections, photochemistry 
- GVVPT2 energiesgradients Mark Hoffmann 
- ORMAS energies, gradients 
- Joe Ivanic, Andrey Adsatchev 
- Subdivides CASSCF active space into subspaces 
- Coupled cluster methods 
- Ryan Olson, Ian Pimienta, Alistair Rendell 
- Collaboration w/ Piotr Piecuch, Ricky Kendall 
- Key Point 
- Must grow problem size to maximize scalability 
15FULL CI ZHENGTING GAN
- Full CI  exact wavefunction for given atomic 
 basis
- Extremely computationally demanding 
- Scales  eN 
- Can generally only be applied to atoms  small 
 molecules
- Very important because all other approximate 
 methods can be benchmarked against Full CI
- Can expand the size of applicable molecules by 
 making the method highly scalable/parallel
- CI part of FORS/CASSCF
16- Parallel performance for FCI on IBM P3 cluster 
-  singlet state of H3COH 
- 14 electrons in 14 orbitals 
- 11,778,624 determinants 
-  singlet state of H2O2 
- 14 electrons in 15 orbitals 
- 41,409,225 determinants
JCP, 119, 47 (2003) 
 17- Parallel performance for FCI on Cray X1 (ORNL) 
- O- 
- Aug-cc-pVTZ atomic basis, O 1s orbitals frozen 
- 7 valence electrons in 79 orbitals 
- 14,851,999,576 determinants  8-10 Gflops/12.5 
 theoretical
- Latest resultsaug-cc-pVTZ C2, 8 electrons in 68 
 orbitals
- 64,931,348,928 determinants, lt 4 hours wall time!
18(No Transcript) 
 19(No Transcript) 
 20- Comparison with Coupled Cluster 
21(No Transcript) 
 22(No Transcript) 
 23(No Transcript) 
 24(No Transcript) 
 25(No Transcript) 
 26(No Transcript) 
 27(No Transcript) 
 28(No Transcript) 
 29(No Transcript) 
 30(No Transcript) 
 31Full Potential Energy Surfaces 
 32F2 potential energy curves cc-pVTZ  
 33(No Transcript) 
 34MCSCF HESSIANS TIM DUDLEY
- Analytic Hessians generally superior to numerical 
 or semi-numerical
- Finite displacements frequently cause artificial 
 symmetry breaking or root flipping
- Necessary step for derivative coupling 
- Computationally demanding Parallel efficiency 
 desirable
- DDI-based MCSCF Hessians 
- IBM clusters, 64-bit Linux
35304 basis fxns, small active space Dominated by 
calc of derivative integrals 
 36(No Transcript) 
 37Large active space, small AO basis Dominated by 
calc of CI blocks of H 
 38(No Transcript) 
 39216 basis fxns, full p active space Calc is mix 
of all bottlenecks 
 40(No Transcript) 
 41ZAPT2 BENCHMARKS
- IBM p640 nodes connected by dual Gigabit Ethernet 
- 4 Power3-II processors at 375 MHz 
- 16 GB memory 
- Tested 
- Au3H4 
- Au3O4 
- Au5H4 
- Ti2Cl2Cp4 
- Fe-porphyrin imidazole
42Au3H4
- Basis set 
- aug-cc-pVTZ on H 
- uncontracted SBKJC with 3f2g polarization 
 functions and one diffuse sp function on Au
- 380 spherical harmonic basis functions 
- 31 DOCC, 1 SOCC 
- 9.5 MWords replicated 
- 170 MWords distributed
43Au3O4
- Basis set 
- aug-cc-pVTZ on O 
- uncontracted SBKJC with 3f2g polarization 
 functions and one diffuse sp function on Au
- 472 spherical harmonic basis functions 
- 44 DOCC, 1 SOCC 
- 20.7 MWords replicated 
- 562 MWords distributed
44Au5H4
- Basis set 
- aug-cc-pVTZ on H 
- uncontracted SBKJC with 3f2g polarization 
 functions and one diffuse sp function on Au
- 572 spherical harmonic basis 
-  functions 
- 49 DOCC, 1 SOCC 
- 30.1 MWords replicated 
- 1011 MWords distributed
45Ti2Cl2Cp4
- Basis set 
- TZV 
- 486 basis functions (N  486) 
- 108 DOCC, 2 SOCC 
- 30.5 MWords replicated 
- 2470 MWords distributed
46Fe-porphyrin imidazole
- Two basis sets 
- MIDI with d polarization functions (N  493) 
- TZV with d,p polarization functions (N  728) 
- 110 DOCC, 2 SOCC 
- N  493 
- 32.1 MWords replicated 
- 2635 MWords distributed 
- N  728 
- 52.1 MWords replicated 
- 5536 MWords distributed
47(No Transcript) 
 48Load Balancing
- Au3H4 on 64 processors 
- Total CPU time ranged from 1124 to 1178 sec. 
- Master spent 1165 sec. 
- average 1147 sec. 
- standard deviation 13.5 sec. 
- Large Fe-porphyrin on 64 processors 
- Total CPU time ranged from 50679 to 51448 sec. 
- Master spent 50818 sec. 
- average 51024 sec. 
- standard deviation 162 sec.
49THANKS!
- GAMESS Gang 
- DOE SciDAC program 
- IBM SUR grants