Title: Case Studies: Quantum Chemistry
1Case Studies Quantum Chemistry
- CS320
- Spring 2003
- Laxmikant Kale
- http//charm.cs.uiuc.edu
- Parallel Programming Laboratory
- Dept. of Computer Science
- University of Illinois at Urbana Champaign
2All-pairs-meet
- Example
- N atoms distributed across P processors
- Must calculate forces between each pair of atoms
- (Better algorithms exist, but we will focus on
explicit calculation for illustration) - Straightforward implementation
- Each proc broadcasts its atoms to all
- Problems?
3Ring
- Each processor sends its atoms to the next
processor - In each subsequent phase, they forward the forces
and atoms to the next proc
4Pair-objects
- Let there be a set of kxk objects
- Each responsible for calculating interaction
between a subset of pairs of processors - There are k subgroups of size P/k each
- Object i,j computes interactions between
processor subgroup i and j - A special case k sqrt(P)
5Application Matrix multiplication
- NxN matrix A and B (to calculate C AXB)
- Distributed by rows (A) and columns (B) to P
processors - Let g N/P (assume integer), s sqrt(P)
- Processor I has g rows gI ..g(I1)-1 of A
- And g columns of B
- Pattern of communication
- Notice each pair (A-row-bunch and B-column-bunch)
must meet to calculate the corresponding section
of C - Organize procs in a 2D square array procs,s
- Let proc (x,y) compute interaction
- between rows at procs sx .. (s1)x 1
- and columns at sy .. (s1)y 1
6Case Study quantum chemistry
- Car-Parinello ab initio structure determination
- Use first-principle equations to calculate
electronic structure - Used in Material Science, Biophysics, solid state
physics.. - Car Parineelo is a particular algorithm
- (based on plane wave density function theory..
- But that we can ignore..
- Structure of the algorithm
- Simplified for this class ignores nucleus
related computations
7CPAIMD basic data structures
- Each electronic state (2 electrons in the outer
shell) is represented by a 3-D array of
coefficients in g-space - What is g-space? Doesnt matter for us
parallelizers - Real-space 3D array of coefficients for each
state - Rho-real 3D array of probability density
function - Aggregated
- Rho-g
- For concreteness 32 water molecule simulation
- 128 states
- 128 100x100x100 arrays of complex numbers
8Sequential Algorithm
9Parallelization decomposition
- Decompose each 3D array into planes
- You get 12,800 virtual processors for g-space and
real-space - 100 more each for rho
10Parallelization decomposition
11Optimizations
- Calculation of S matrix
- Mapping
- Parallelization of multiple concurrent FFTs
- Parallelization of the single FFT within
rho-density - Communication operations
12Correlation Matrix S
Calclulation of S Do I1 to S Do J I to S
SI,J 0.0 Do x 1 to N
Do y 1 to N Do z 1 to N
SI,J SI,J GI,x,y,zGJ,x,y,z
13main
14Calclulation of S Do I1 to S Do J I to S
SI,J 0.0 Do x 1 to N
Do y 1 to N Do z 1 to N
SI,J SI,J GI,x,y,zGJ,x,y,z
GI,x,, on one processor All pairs of GI,x..
and GJ,x,.. must meet for each x! Remember the
all-pairs-must-meet pattern?
15Parallelization of S calculation
16Mapping
- Virtual Processor sets
- Real-Space Planes
- By planes, uniform
- G-space Planes
- By planes
- S-calculators
- By planes
17Problem I
18Solution
- The middle portion is slow
- because the work in rho-real is larger than
expected, - and is happening only on 100 processors out out
of 1000 - Parallelize that work further
- But keep the fft/transposes to 100 processors
- Simplifies code changes
- Small calculation time
19Problem 2 concurrent FFTs
- Each processor has 12-13 planes of real-space
- 12800 / 1024
- Each belonging to a different state
- Each participates in a 100-way all-to-all for its
FFTs - Use concurrent asynchronous all-to-alls!
- Overlap computation of one with communication of
the other
20Multiple overlapping FFTs