Large scale simulations of branched Sinanowires - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Large scale simulations of branched Sinanowires

Description:

Generalized Tight Bound Molecular Dynamics ... Generalized Tight-Binding MD (Menon ... sp3s* tight-binding model that correctly reproduces the band gap ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 48
Provided by: inl4
Category:

less

Transcript and Presenter's Notes

Title: Large scale simulations of branched Sinanowires


1
Large scale simulations of branched Si-nanowires
  • Madhu Menon(U. Kenturkey),
  • Ernst Richter(Germany),
  • Ingyu Lee, Keita Teranishi and
  • Padma Raghavan (Penn. State)

2
Contents
  • Background
  • Computational Methods
  • Generalized Tight Bound Molecular Dynamics
  • Eigenvalue computation
  • Generalized Eigenvalue Problems
  • Parallel Dense Algorithm using SCALAPACK
  • Spectrum Slicing for Sparse Matrix
  • Experimental Results
  • Conclusion

3
Nanowire
  • Nanowire
  • Popular, afford tailoring of their electronic
    properties through selective doping.
  • Nanowire with branching, nanotree
  • Could be used for controlling the switching
    mechanism, power gain or other transisting
    applications.
  • Nanowires from Si
  • Natural extension of silicon technology to
    nanoscale.
  • Enables integration of nanoscale devices into
    traditional large-scale silicon electronic
    technology.

4
Computational Methods
Parameterized
  • Classical MD
  • Newton mechanics
  • Tight Binding MD
  • Semiempirical
  • Ab Initio (first-principle)
  • - Schrödinger Equations
  • Born Oppenheimer
  • Car-Parrinello

Classical MD are not accurate enough and ab
initio computations are not feasible.
Accurate, High Computational Cost
Less Accurate, Computationally Cheap
5
Schrödinger Equations
  • Describes atoms as a collection of quantum
    mechanical particles, nuclei, and electrons,
    governed by the Schrödinger equation
  • Born-Oppenheimer
  • Electronic degrees of freedom follow the
    corresponding nuclear positions.

6
Generalized Tight-Binding MD (Menon and
Subbaswamy)
  • Total energy
  • one-electronic band structure energy
  • Repulsive pair potential depends on the distance
  • Ubond is constant that merely shifts the zero of
    energy

7
Generalized Tight-Binding MD(Menon and
Subbaswamy)
  • Construct Hamiltonian
  • Nonorthogonal basis of atomic orbitals
  • Hamiltonian and overlap matrix elements
  • One-electron energies are obtained by solving the
    generalized eigenvalue equation

8
Generalized Tight-Binding MD(Menon and
Subbaswamy)
  • Obtaining Hij and Sij
  • Nonorthogonality between two sp3 hybrids.

9
Generalized Tight-Binding MD(Menon and
Subbaswamy)
  • Advantange for semiempirical TBMD
  • System is still described in a quantum-mechanical
    manner, while the computational effort is kept
    small, since a minimal basis is used and the
    interaction matrix elements can be
    parameterized.
  • GTBMD
  • Computationally efficient because we can
    parameterize the Hamiltonian, H(RI).
  • Transperable parametrization scheme by including
    explicitly the effects of the nonorthogonality of
    the atomic basis.
  • To find an energy-minimized structure of a
    nanoscale system under consideration without
    symmetry constraints.

10
Generalized Tight-Binding MD(Menon and
Subbaswamy)
  • Advantages
  • Allows for full relaxation of covalent systems
    with no symmetry constraints.
  • Disadvantages
  • Computationally expensive
  • Each iterations requires at least half the
    eigenvalues and eigenvectors.

11
Generalized Tight-Binding MD(Menon and
Subbaswamy)
  • Investigate the stability of branched nanowires
    made of Si atoms using generalized tight-binding
    molecular dynamics (GTBMD) scheme of Menon and
    Subbaswamy.

12
Contents
  • Background
  • Computational Methods
  • Generalized Tight Bound Molecular Dynamics
  • Eigenvalue computation
  • Generalized Eigenvalue Problems
  • Parallel Dense Algorithm using SCALAPACK
  • Spectrum Slicing for Sparse Matrix
  • Experimental Results
  • Conclusion

13
Symmetric Eigenvalue Problems
  • Solving Shrödinger Eqautions
  • In general, expressed as a linear equation
  • AX ?BX
  • A is symmetric, B is symmetric positive
    definite.
  • ? is diagonal matrix (eigenvalues), X is set of
    eigenvectors.

14
Steps for Solving Eigenvalue Problems
  • Constitutes 3 steps
  • Tridiagonalize a matrix by orthogonal
    transformation
  • T QAQT where Q is orthogonal
  • Compute eigenvalues and eigenvectors of the
    tridiagonal matrix, T
  • QR iteration
  • Bisection Inverse Iteration
  • Divide Conquer
  • Relatively Robust Representation (RRR)
  • Compute eigenvectors of A using Q
  • Multiply the orthogonal factor Q for each
    eigenvector
  • Finding eigenvectors is much more expensive than
    finding eigenvalues.

15
Tridiagonalization
  • Tridiagonal matrix is simple and easy to handle
    many useful properties applicable for finding
    eigenvalues and eigenvectors.
  • Householder transformation is used
  • Cache efficient
  • Destroys the sparsity if matrix is sparse.
  • If A is banded, Givens rotation is used
  • Saves space and operations
  • Not cache efficient
  • Usually slower than dense method.
  • Both methods need to compute and store another
    dense matrix Q if eigenvectors are computed.

16
Inverse iteration
  • Find eigenvectors once eigenvalues are found
  • Solve (A-lI)vx where l is an eigenvalue
  • Repeating this operation, v becomes a
    corresponding eigenvector for l.
  • In typical direct method, A is tridiagonal
  • Then, back-transformation with Q is performed for
    eigenvectors of A.
  • If A is an sparse matrix, no back transformation
    is needed
  • Inverse iteration is implemented by using sparse
    LDLT.
  • No need to store and update Q.

17
Data Distribution for SCALAPACK
  • Column Block Cyclic
  • Generally works well.
  • Does not need to gather eigenvectors.
  • 2D Block Cyclic
  • Faster for dense matrix
  • Should gather eigenvectors after computation.

P1
P2
P3
P4
P1
P2
P3
P4
P1
P2
P1
P2
P3
P4
P3
P4
P1
P2
P1
P2
P3
P4
P3
P4
18
Solution Time using SCALAPACK
  • Based on column block cyclic distribution
  • 2.4 GHz, 8GB Memory

19
Matrix Nonzero Pattern
Si40_stem_1776
Si40_tree_1872
Si_tree is close to block diagonal matrix
20
Problem Dimension
Matrix Dimension Nonzero
Ratio(Dense/Sparse)

Si34_stem_1789 7156
367,084 92.4 Si34_tree_2386 9544
623,172 96.9 Si40_stem_1776
7104 411,263 81.3 Si40_tree_1872
8232 449,781 99.8 Si40_stem_2058
7488 423,314 87.7 Si46_stem_2198
8792 459,320 111 C46_tree_2564
10,256 685,322 101 Tetra2_tree_2352
9,408 543,076 108
Integer 4bytes, Double 8bytes
21
Contents
  • Background
  • Computational Methods
  • Generalized Tight Bound Molecular Dynamics
  • Eigenvalue computation
  • Generalized Eigenvalue Problems
  • Parallel Dense Algorithm using SCALAPACK
  • Spectrum Slicing for Sparse Matrix
  • Experimental Results
  • Conclusion

22
Sparse Method All Eigenpairs
  • Transform A into band matrix, B.
  • Tridiagonalize B by Givens rotations
  • Q is not computed.
  • Find all eigenvalues of T with any method
  • QR, Bisection, Divide and Conquer, RRR.
  • Find all eigenvectors of A by sparse inverse
    iteration, using eigenvalues obtained in the
    previous step.

23
Band Ordering
  • Use Reverse Cuthill-McKee (RCM) algorithm to
    transform A into band matrix.

24
Row Compressed Sparse Matrix Format
  • Row compressed
  • (N1) row index
  • NNZ column index
  • NNZ nonzero values
  • Memory Usage
  • Dense
  • Double N2
  • Sparse
  • Integer NNZN1
  • Double NNZ

1
1
0
0
1
1
2
0
0
0
0
0
3
1
0
0
0
1
4
1
1
0
0
1
5
1
4
6
8
11
14
1
2
5
1
2
3
4
3
4
5
1
4
5
1
1
1
1
2
3
1
1
4
1
1
1
5
25
Empirical Results
  • Coded with ARPACK, DSCPACK (sparse direct solver)
    and LAPACK
  • Test problems
  • Size of matrices ranges 894 - 6000.
  • Bcsstk_
  • Construction Problems from Harwell-Boeing
    Collections
  • dis_, spa_ (img)
  • Image Analysis
  • s1rmq4m1 and others (str)
  • Structure Mechanics
  • xerox2000_
  • Colloidal modeling at Xerox
  • The result is compared with the best dense and
    band routines in LAPACK

26
Performance of Sparse Inverse Iteration
27
Performance of Sparse Inverse Iteration
28
Performance of Sparse Inverse Iteration
29
Performance of Sparse Inverse Iteration
  • Re-orthogonalization cost is trivial.
  • Sparse matrix already has near orthogonal
    columns.
  • Numerical Factorization for LDLT is dominant
  • One factorization is required to find each
    eigenvector
  • Minimum cost for inverse iteration.
  • Can we find multiple eigenvectors per
    factorization?

30
Lanczos Method
  • Iterative method for sparse symmetric eigenvalue
    problems
  • Dominant cost is spase matrix-vector
    multiplication (Axy).
  • Suitable for finding several eigenvalues and
    eigenvectors with maximum absolute values
  • Not suitable to find all the eigenvalues.
  • By using an shifted inverse implicitly, the
    method can find eigenvalues near the shift
    (eigenvectors for A-lI)
  • Similar concept to inverse iteration.
  • Called Shifted Inverse Lancozs.
  • Can find several eigenpairs per factorization.

31
Shifted Inverse Lanczos Method(w/ Chao Yang in
LBNL)
  • Eigenvalues near the shift converge very quickly.
  • If all eigenvalues are known, they can be used as
    a shift
  • Separate eigenvalues into groups according to
    their distribution (slicing spectrum).
  • Select a shift in the middle of the group.
  • Run a Lanczos iteration for each group.

32
Performance of Sparse Method with Shifted Inverse
Lancozs
33
Performance of Sparse Method with Shifted Inverse
Lancozs
34
Performance of Sparse Method with Shifted Inverse
Lancozs
35
Sparse Inverse Eigenvalue Solver
  • Sparse Inverse iteration suffers from the cost of
    the sparse factorization
  • Sparse Factorization for each eigenvector is
    costly.
  • Lancozs method works effectively for finding
    eigenvectors
  • Finding eigenvalues does not take much time.
  • Eigenvalues are good information for shift.
  • Reduce time of factorization and sophistcated use
    of re-orthogonalization cut down the whole
    solution time.

Time relative to band RRR
Time relative to dense RRR
36
Contents
  • Background
  • Computational Methods
  • Generalized Tight Bound Molecular Dynamics
  • Eigenvalue computation
  • Generalized Eigenvalue Problems
  • Parallel Dense Algorithm using SCALAPACK
  • Spectrum Slicing for Sparse Matrix
  • Experimental Results
  • Conclusion

37
Performance of Sparse Method with Shifted Inverse
Lancozs
  • Use ARPACK for Lanczos method
  • Built upon LAPACK and BLAS.
  • Achieves better performance than other sparse
    methods
  • Close to the best dense method (RRR).
  • Space requirement is little bit more than sparse
    inverse iteration.

38
Solution Time Comparison
  • 2.4 GHz, 8GB Memory
  • Based on column block cyclic distribution
  • 1 time step solution
  • Single processor

39
Memory Ratio (Sparse vs. Dense)
Integer 4bytes, Double 8bytes
40
Si Nanowires
  • Most stable Si crystalline phases
  • All four-fold coordinated
  • Include diamond, clathrate phases
  • Investigate on Si-nanotrees with diamond or
    clathrate type inner cores.
  • Modelling
  • GTBMD
  • Reliable in obtaining very good agreement with
    experiment for structural and vibrational
    properties of Si from dimer to the bulk.
  • Electronic structure analysis
  • sp3s tight-binding model that correctly
    reproduces the band gap for bulk Si in the
    diamond and clathrate phases.

41
Experiments
  • We consider
  • Si-nanotrees and stems.
  • 2000-2500 atoms.
  • Four-fold coordinated inner core surrounded by an
    outer surface of atoms with three-fold
    coordination.
  • Two category
  • Tetrahedral Si(Td)
  • Clathrate(cage-like) Si (fcc), Si(sc)
  • Inner core made of the Si clathrate structure
    consisting of fcc and sc.

42
Results (Structures)
  • Top
  • Nanowire with an inner crystalline core
    consisting of tetrahedral structure.
  • Middle
  • Clathrate structure with 34 atoms basis in a
    face-centered cubic unit cell.
  • Bottom
  • Clathrate structure with 46 atoms basis in a
    simple cubic unit cell.

43
Results (Structures)
  • Surface reconstruction, coupled with branching
    results in interesting junction regions
  • Smooth for Si(34), Si(46) nanotrees.
  • Si(Td) junction appear abrupt.
  • Cage-like arrangement allow seamless connection
    between stems and the branches
  • Cage-like forms are more isotropic than the
    tetrahedral forms.

44
Results (DOS)
  • Electronic densities of states (DOS), HOMO-LUMO
    gap
  • Stems 0.57eV
  • Bulk diamond phase of silicon 1.1eV
  • Trees 0.16eV
  • More states from the unoccupied levels are pulled
    in towards the Fermi level, EF(dashed)
  • Enhance conductivity due to more conduction
    channels being available.

45
Conclusions
  • GTBMD computation results for structure and
    stability studies using large scale quantum
    mechanical simulations of nano-trees from Si.
  • Sparse alternative method to reduce memory usage.
  • Si structures are stable, electronic structure
    shows narrow HOMO-LUMO gap.

46
Future Research Plans
  • Sparse Spectrum slicing technique reduce memory
    usage
  • Compute eigenpairs based on the previous time
    step eigenpairs for a low rank update.
  • Attempt to update into GTBMD.

47
Q A
Write a Comment
User Comments (0)
About PowerShow.com