Case Studies: Quantum Chemistry - PowerPoint PPT Presentation

About This Presentation
Title:

Case Studies: Quantum Chemistry

Description:

N atoms distributed across P processors. Must calculate forces between ... (Better algorithms exist, but we will focus on explicit calculation ... basic data ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 20
Provided by: san7196
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Case Studies: Quantum Chemistry


1
Case Studies Quantum Chemistry
  • CS320
  • Spring 2003
  • Laxmikant Kale
  • http//charm.cs.uiuc.edu
  • Parallel Programming Laboratory
  • Dept. of Computer Science
  • University of Illinois at Urbana Champaign

2
All-pairs-meet
  • Example
  • N atoms distributed across P processors
  • Must calculate forces between each pair of atoms
  • (Better algorithms exist, but we will focus on
    explicit calculation for illustration)
  • Straightforward implementation
  • Each proc broadcasts its atoms to all
  • Problems?

3
Ring
  • Each processor sends its atoms to the next
    processor
  • In each subsequent phase, they forward the forces
    and atoms to the next proc

4
Pair-objects
  • Let there be a set of kxk objects
  • Each responsible for calculating interaction
    between a subset of pairs of processors
  • There are k subgroups of size P/k each
  • Object i,j computes interactions between
    processor subgroup i and j
  • A special case k sqrt(P)

5
Application Matrix multiplication
  • NxN matrix A and B (to calculate C AXB)
  • Distributed by rows (A) and columns (B) to P
    processors
  • Let g N/P (assume integer), s sqrt(P)
  • Processor I has g rows gI ..g(I1)-1 of A
  • And g columns of B
  • Pattern of communication
  • Notice each pair (A-row-bunch and B-column-bunch)
    must meet to calculate the corresponding section
    of C
  • Organize procs in a 2D square array procs,s
  • Let proc (x,y) compute interaction
  • between rows at procs sx .. (s1)x 1
  • and columns at sy .. (s1)y 1

6
Case Study quantum chemistry
  • Car-Parinello ab initio structure determination
  • Use first-principle equations to calculate
    electronic structure
  • Used in Material Science, Biophysics, solid state
    physics..
  • Car Parineelo is a particular algorithm
  • (based on plane wave density function theory..
  • But that we can ignore..
  • Structure of the algorithm
  • Simplified for this class ignores nucleus
    related computations

7
CPAIMD basic data structures
  • Each electronic state (2 electrons in the outer
    shell) is represented by a 3-D array of
    coefficients in g-space
  • What is g-space? Doesnt matter for us
    parallelizers
  • Real-space 3D array of coefficients for each
    state
  • Rho-real 3D array of probability density
    function
  • Aggregated
  • Rho-g
  • For concreteness 32 water molecule simulation
  • 128 states
  • 128 100x100x100 arrays of complex numbers

8
Sequential Algorithm
9
Parallelization decomposition
  • Decompose each 3D array into planes
  • You get 12,800 virtual processors for g-space and
    real-space
  • 100 more each for rho

10
Parallelization decomposition
11
Optimizations
  • Calculation of S matrix
  • Mapping
  • Parallelization of multiple concurrent FFTs
  • Parallelization of the single FFT within
    rho-density
  • Communication operations

12
Correlation Matrix S
Calclulation of S Do I1 to S Do J I to S
SI,J 0.0 Do x 1 to N
Do y 1 to N Do z 1 to N
SI,J SI,J GI,x,y,zGJ,x,y,z
13
main
14
Calclulation of S Do I1 to S Do J I to S
SI,J 0.0 Do x 1 to N
Do y 1 to N Do z 1 to N
SI,J SI,J GI,x,y,zGJ,x,y,z
GI,x,, on one processor All pairs of GI,x..
and GJ,x,.. must meet for each x! Remember the
all-pairs-must-meet pattern?
15
Parallelization of S calculation
16
Mapping
  • Virtual Processor sets
  • Real-Space Planes
  • By planes, uniform
  • G-space Planes
  • By planes
  • S-calculators
  • By planes

17
Problem I
18
Solution
  • The middle portion is slow
  • because the work in rho-real is larger than
    expected,
  • and is happening only on 100 processors out out
    of 1000
  • Parallelize that work further
  • But keep the fft/transposes to 100 processors
  • Simplifies code changes
  • Small calculation time

19
Problem 2 concurrent FFTs
  • Each processor has 12-13 planes of real-space
  • 12800 / 1024
  • Each belonging to a different state
  • Each participates in a 100-way all-to-all for its
    FFTs
  • Use concurrent asynchronous all-to-alls!
  • Overlap computation of one with communication of
    the other

20
Multiple overlapping FFTs
Write a Comment
User Comments (0)
About PowerShow.com