Matt Challacombe,Theoretical Division - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Matt Challacombe,Theoretical Division

Description:

In a local basis f, quantum effects are short ranged for non-metallic systems ... Sparse Blocked AINV. for computation. of incomplete inverse. Cholesky factors Z=S-L ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 25
Provided by: ipam
Category:

less

Transcript and Presenter's Notes

Title: Matt Challacombe,Theoretical Division


1
An Irregular Approach to Parallelism in Linear
Scaling Quantum Chemistry
LAUR01-4856
Linear Scaling Electronic Structure
Methods, IPAM, UCLA Spring 2002
Matt Challacombe,Theoretical Division
Overview of MondoSCF Full O(N) Examples Nearsighte
dness and the scalability of O(N) methods Data
locality Scalable algorithms Data
structures Tiling and dynamic load balancing
2
The Nearsighted Principle and Linear Scaling
  • In a local basis f, quantum effects are short
    ranged for non-metallic systems
  • Locality is manifested in approximate exponential
    decay of PAB with atomic separation
  • Locality of P may be exploited to achieve O(N)
    algorithms for SCF theory
  • The non-local Coulomb problem can be overcome
    with multiscale methods

MondoSCF
3
(No Transcript)
4
MondoSCF Capabilities, Current and Future
Recently accomplished, unpublished Currently in
development at LANL Currently in development at
LLNL
  • O(N) Periodic boundary conditions C.J. Tymczak
    (LANL)
  • O(N) Exact exchange Eric Schwegler (LLNL)
  • Parallel Fock builds and Space Filling Curves,
    C.K. Gan (LANL)
  • O(N) methods for QM/MM, internal coordinate
    approaches to transition state and geometry
    optimization Karoly Nemeth (LANL)

5
MondoSCF
6
Sequestration of Carbon Dioxide (Neil Henson and
MC)
  • Lizardite (L) is Mg3Si2O5(OH)4
  • L-OH H2CO3 L-HCO3H2O
  • Can carbon dioxide dissolved in water carbonate
    the mineral surface?

MondoSCF
7
QM/MM Crambin in a Water Droplet (Karoly Nemeth
and MC)
MM from F90 Dynamo libraries (M. Field et
al). Seamless QM/MM electrostatics via QCTC
One simplified density.
MondoSCF
8
Additional Thoughts on the Promise of Linear
Scaling
We have sorted most of the "details" and have a
fully O(N) ab initio code, but....
  • Even with O(N) , going large costs and is not
    enough in itself to solve many real world
    problems.
  • Size matters, but so does speed
  • Must have scalable algorithms to realize the
    promise of O(N) ab initio.
  • This is challenging because to scale, 5 leading
    edge methods must play well together in parallel.
  • Need good, globally applicable paradigms

MondoSCF
9
Scalability of O(N) Methods
  • Locality
  • The nearsighted principle suggests that
    communication costs can be reduced to O(1) with
    respect to p/N
  • Load Balance
  • Domain decomposition after locality enhancement
  • Leads to irregular work loads
  • Requires
  • good data structures
  • good algorithms
  • dynamic load balancing

MondoSCF
10
Data Locality in Sparse Matrix Algebra
  • Eliminate all to all communications
  • Reduction of sparse matrices to banded form
    Graph or Geometric methods
  • Radial Cutoffs
  • Identical graphs
  • Good for uniform systems (ie Si)
  • 1 Graph theoretical methods
  • Thresholding
  • General, transferable
  • Dissimilar graphs
  • Good for inhomogeneous systems
  • Can exploit incremental matrices
  • 1 Geometric methods

MondoSCF
11
Geometric Ordering Space Filling Curves (SCFs)
  • SFCs map points in space onto a line such that
    points close in space are also close on the line.
  • Loopback problem the converse is not always
    true.
  • Decay in matrix elements with atom-atom
    separation should lead to banded matrices

Loopback problem with Hilbert (H) and Morton (Z)
orderings points close in space can also be far
apart on the line. Example, 2-D Z curve
MondoSCF
12
Band Width Reducing Curves (C. K. Gan and MC)
Hilbert and BWR orderings for the overlap matrix
of (H2O)350
  • BWR curves
  • Give up heuristics and self-similarity
  • Avoid loopback, gain locality

BWR
Frequency
Hilbert
13
Scalable Algorithms for Electrostatics
For a fixed error, all methods are O(N lg N)
Fast Multipole Method Tree to tree cannot be load
balanced Particle Mesh Ewald FFTs require all
to all communication Hard to treat mixed boundary
conditions (ie 2-D periodic) Tree Code Teraflop
performance achieved (M. Warren at LANL) Uses
Space Filling Curves in domain decomposition
MondoSCF
14
Scalability of O(N) Algorithms for Exact Exchange
  • ONX
  • No permutational symmetry
  • Density driven
  • 1 Maintains data locality of K and P
  • SONX/LinK
  • Permutational symmetry
  • Serial version 1 to 2 times faster than ONX
  • Integral driven
  • 1 Global communication of K and P

MondoSCF
15
Data Structures for Early O(N) and Support of
Parallelism
Trees, trees and more trees!
Fast Matrix DS
  • Row-wise LL, column wise k-d tree
  • Leaf nodes contain dense atom-atom blocks
  • Supports skip pointers for efficient leaf-wise
    traversal (no recursion)
  • O(lg N) overhead. Useful for summation of
    fragment matrices over sub-volumes and from
    different processors
  • Compare with CSR format which is O(N)

O
H
H
O
H
H
MondoSCF
16
k-d trees for Fast Sparse Matrix Access
  • Skip pointers on
  • Fast access of leaf nodes
  • No recursion

OH
HH
O-O
O-O
  • Skip pointers off
  • Fast recursive searching, insertion, deletion etc

OH
HH
O-O
O-O
MondoSCF
17
Hierarchical Representation of the Density
r-tree
  • Hierarchical representation of the density using
    the k-d tree data structure allows for very
    efficient range queries.
  • Recursive bisection on position, width and
    magnitude
  • Range queries enable rapid access of all density
    elements with "overlap".
  • Implemented in F95 with recursive subroutines and
    doubly linked lists using pointers.

Bounding Box
Leaf Node
MondoSCF
18
k-d tree Data Structures Allow Early Onset Linear
Scaling
  • Use tests involving only box-box overlap
  • Very fast access of minimally essential data
  • True linear scaling for 3-D systems

Building a Hierarchical Grid for
Exchange-Correlation Cubature
Cube with integration grid and bounding box
Coulomb sums with a Tree Code
r-tree
Error in each leaf-cube ltã
Evaluate density on cube grid
MondoSCF
MondoSCF
19
The HiCu grid for RB3LYP/6-31G (H2O)70
Early onset of O(N) for RB3LYP/6-31G (H2O)N
MondoSCF
20
Paradigms for Irregular Parallel Computation
  • ORB Decomposition
  • Recursive bisection to generate (ideally)
    equivalent units of work
  • Tiling
  • Tethering data to a spatial neighborhood and
    processor
  • Limited overlapping of data between processors
    allows work to move between neighboring
    processors while retaining locality

Ideal work performed with all volumes equal
(ideal case)
Additional work that can be performed to achieve
balance
Locally essential data supporting encompassed
volumes
MondoSCF
21
Paradigms for Irregular Parallel Computation
  • Space filling curves for atom ordering
  • BWR curve yields banded matrices for 3-D systems
  • One curve works for all matrices S, F, P, Z
  • SFCs for volume ordering and decomposition
  • Sectioning yields a locality preserving
    decomposition

2 Ideal work load decomposition
2 Work distribution possible with tiling
MondoSCF
22
Why Static Load Balancing Does Not Scale
  • For example, Static Load Balancing by attempting
    to evenly distribute non-zero matrix elements of
    a target matrix (F or P)
  • One decomposition does not fit all (multiplies
    involving disparate graphs).
  • Requires global algorithms that do not scale (eg
    for solving the bin packing problem).

MondoSCF
23
Paradigms for Irregular Parallel Computation
  • Diffusion Based Load Balancing
  • Distributed method involves only local
    communications between neighbors
  • Hydrodynamic like work percolates between
    processors with unequal loads
  • Proven scalability
  • Does not require exact work load estimation
  • Data locality enabled with tiling
  • Sender vs receiver initiated diffusion depends on
    granularity

Estimated work load
Low Water Mark
0
1
2
3
Redistribution of work load between processors 0
and 1 with receiver initiated diffusion
0
1
2
3
MondoSCF
24
  • Summary
  • The promise of O(N) ab initio methods will only
    be realized with scalable parallelism.
  • Data locality is central to achieving scaling
    with p/N
  • This competes with achieving a uniform work load
  • We are developing and extending globally
    applicable strategies for irregular parallel
    computation to achieve scalability and
    interoperability of the following O(N)
    algorithms
  • Exchange-Correlation (HiCu)
  • Exact Exchange (ONX)
  • Coulomb Summation (QCTC)
  • SCF Equations (SDMM)
  • Orthogonalization (BlokAINV)

MondoSCF
Write a Comment
User Comments (0)
About PowerShow.com