Title: Large scale simulations of branched Sinanowires
1Large scale simulations of branched Si-nanowires
- Madhu Menon(U. Kenturkey),
- Ernst Richter(Germany),
- Ingyu Lee, Keita Teranishi and
- Padma Raghavan (Penn. State)
2Contents
- Background
- Computational Methods
- Generalized Tight Bound Molecular Dynamics
- Eigenvalue computation
- Generalized Eigenvalue Problems
- Parallel Dense Algorithm using SCALAPACK
- Spectrum Slicing for Sparse Matrix
- Experimental Results
- Conclusion
3Nanowire
- Nanowire
- Popular, afford tailoring of their electronic
properties through selective doping. - Nanowire with branching, nanotree
- Could be used for controlling the switching
mechanism, power gain or other transisting
applications. - Nanowires from Si
- Natural extension of silicon technology to
nanoscale. - Enables integration of nanoscale devices into
traditional large-scale silicon electronic
technology.
4Computational Methods
Parameterized
- Classical MD
- Newton mechanics
- Tight Binding MD
- Semiempirical
- Ab Initio (first-principle)
- - Schrödinger Equations
- Born Oppenheimer
- Car-Parrinello
Classical MD are not accurate enough and ab
initio computations are not feasible.
Accurate, High Computational Cost
Less Accurate, Computationally Cheap
5Schrödinger Equations
- Describes atoms as a collection of quantum
mechanical particles, nuclei, and electrons,
governed by the Schrödinger equation - Born-Oppenheimer
- Electronic degrees of freedom follow the
corresponding nuclear positions.
6Generalized Tight-Binding MD (Menon and
Subbaswamy)
- Total energy
- one-electronic band structure energy
- Repulsive pair potential depends on the distance
- Ubond is constant that merely shifts the zero of
energy
7Generalized Tight-Binding MD(Menon and
Subbaswamy)
- Construct Hamiltonian
- Nonorthogonal basis of atomic orbitals
- Hamiltonian and overlap matrix elements
- One-electron energies are obtained by solving the
generalized eigenvalue equation
8Generalized Tight-Binding MD(Menon and
Subbaswamy)
- Obtaining Hij and Sij
- Nonorthogonality between two sp3 hybrids.
9Generalized Tight-Binding MD(Menon and
Subbaswamy)
- Advantange for semiempirical TBMD
- System is still described in a quantum-mechanical
manner, while the computational effort is kept
small, since a minimal basis is used and the
interaction matrix elements can be
parameterized. - GTBMD
- Computationally efficient because we can
parameterize the Hamiltonian, H(RI). - Transperable parametrization scheme by including
explicitly the effects of the nonorthogonality of
the atomic basis. - To find an energy-minimized structure of a
nanoscale system under consideration without
symmetry constraints.
10Generalized Tight-Binding MD(Menon and
Subbaswamy)
- Advantages
- Allows for full relaxation of covalent systems
with no symmetry constraints. - Disadvantages
- Computationally expensive
- Each iterations requires at least half the
eigenvalues and eigenvectors.
11Generalized Tight-Binding MD(Menon and
Subbaswamy)
- Investigate the stability of branched nanowires
made of Si atoms using generalized tight-binding
molecular dynamics (GTBMD) scheme of Menon and
Subbaswamy.
12Contents
- Background
- Computational Methods
- Generalized Tight Bound Molecular Dynamics
- Eigenvalue computation
- Generalized Eigenvalue Problems
- Parallel Dense Algorithm using SCALAPACK
- Spectrum Slicing for Sparse Matrix
- Experimental Results
- Conclusion
13Symmetric Eigenvalue Problems
- Solving Shrödinger Eqautions
- In general, expressed as a linear equation
- AX ?BX
- A is symmetric, B is symmetric positive
definite. - ? is diagonal matrix (eigenvalues), X is set of
eigenvectors.
14Steps for Solving Eigenvalue Problems
- Constitutes 3 steps
- Tridiagonalize a matrix by orthogonal
transformation - T QAQT where Q is orthogonal
- Compute eigenvalues and eigenvectors of the
tridiagonal matrix, T - QR iteration
- Bisection Inverse Iteration
- Divide Conquer
- Relatively Robust Representation (RRR)
- Compute eigenvectors of A using Q
- Multiply the orthogonal factor Q for each
eigenvector - Finding eigenvectors is much more expensive than
finding eigenvalues.
15Tridiagonalization
- Tridiagonal matrix is simple and easy to handle
many useful properties applicable for finding
eigenvalues and eigenvectors. - Householder transformation is used
- Cache efficient
- Destroys the sparsity if matrix is sparse.
- If A is banded, Givens rotation is used
- Saves space and operations
- Not cache efficient
- Usually slower than dense method.
- Both methods need to compute and store another
dense matrix Q if eigenvectors are computed.
16Inverse iteration
- Find eigenvectors once eigenvalues are found
- Solve (A-lI)vx where l is an eigenvalue
- Repeating this operation, v becomes a
corresponding eigenvector for l. - In typical direct method, A is tridiagonal
- Then, back-transformation with Q is performed for
eigenvectors of A. - If A is an sparse matrix, no back transformation
is needed - Inverse iteration is implemented by using sparse
LDLT. - No need to store and update Q.
17Data Distribution for SCALAPACK
- Column Block Cyclic
- Generally works well.
- Does not need to gather eigenvectors.
- 2D Block Cyclic
- Faster for dense matrix
- Should gather eigenvectors after computation.
P1
P2
P3
P4
P1
P2
P3
P4
P1
P2
P1
P2
P3
P4
P3
P4
P1
P2
P1
P2
P3
P4
P3
P4
18Solution Time using SCALAPACK
- Based on column block cyclic distribution
- 2.4 GHz, 8GB Memory
19Matrix Nonzero Pattern
Si40_stem_1776
Si40_tree_1872
Si_tree is close to block diagonal matrix
20Problem Dimension
Matrix Dimension Nonzero
Ratio(Dense/Sparse)
Si34_stem_1789 7156
367,084 92.4 Si34_tree_2386 9544
623,172 96.9 Si40_stem_1776
7104 411,263 81.3 Si40_tree_1872
8232 449,781 99.8 Si40_stem_2058
7488 423,314 87.7 Si46_stem_2198
8792 459,320 111 C46_tree_2564
10,256 685,322 101 Tetra2_tree_2352
9,408 543,076 108
Integer 4bytes, Double 8bytes
21Contents
- Background
- Computational Methods
- Generalized Tight Bound Molecular Dynamics
- Eigenvalue computation
- Generalized Eigenvalue Problems
- Parallel Dense Algorithm using SCALAPACK
- Spectrum Slicing for Sparse Matrix
- Experimental Results
- Conclusion
22Sparse Method All Eigenpairs
- Transform A into band matrix, B.
- Tridiagonalize B by Givens rotations
- Q is not computed.
- Find all eigenvalues of T with any method
- QR, Bisection, Divide and Conquer, RRR.
- Find all eigenvectors of A by sparse inverse
iteration, using eigenvalues obtained in the
previous step.
23Band Ordering
- Use Reverse Cuthill-McKee (RCM) algorithm to
transform A into band matrix.
24Row Compressed Sparse Matrix Format
- Row compressed
- (N1) row index
- NNZ column index
- NNZ nonzero values
- Memory Usage
- Dense
- Double N2
- Sparse
- Integer NNZN1
- Double NNZ
1
1
0
0
1
1
2
0
0
0
0
0
3
1
0
0
0
1
4
1
1
0
0
1
5
1
4
6
8
11
14
1
2
5
1
2
3
4
3
4
5
1
4
5
1
1
1
1
2
3
1
1
4
1
1
1
5
25Empirical Results
- Coded with ARPACK, DSCPACK (sparse direct solver)
and LAPACK - Test problems
- Size of matrices ranges 894 - 6000.
- Bcsstk_
- Construction Problems from Harwell-Boeing
Collections - dis_, spa_ (img)
- Image Analysis
- s1rmq4m1 and others (str)
- Structure Mechanics
- xerox2000_
- Colloidal modeling at Xerox
- The result is compared with the best dense and
band routines in LAPACK
26Performance of Sparse Inverse Iteration
27Performance of Sparse Inverse Iteration
28Performance of Sparse Inverse Iteration
29Performance of Sparse Inverse Iteration
- Re-orthogonalization cost is trivial.
- Sparse matrix already has near orthogonal
columns. - Numerical Factorization for LDLT is dominant
- One factorization is required to find each
eigenvector - Minimum cost for inverse iteration.
- Can we find multiple eigenvectors per
factorization?
30Lanczos Method
- Iterative method for sparse symmetric eigenvalue
problems - Dominant cost is spase matrix-vector
multiplication (Axy). - Suitable for finding several eigenvalues and
eigenvectors with maximum absolute values - Not suitable to find all the eigenvalues.
- By using an shifted inverse implicitly, the
method can find eigenvalues near the shift
(eigenvectors for A-lI) - Similar concept to inverse iteration.
- Called Shifted Inverse Lancozs.
- Can find several eigenpairs per factorization.
31Shifted Inverse Lanczos Method(w/ Chao Yang in
LBNL)
- Eigenvalues near the shift converge very quickly.
- If all eigenvalues are known, they can be used as
a shift - Separate eigenvalues into groups according to
their distribution (slicing spectrum). - Select a shift in the middle of the group.
- Run a Lanczos iteration for each group.
32Performance of Sparse Method with Shifted Inverse
Lancozs
33Performance of Sparse Method with Shifted Inverse
Lancozs
34Performance of Sparse Method with Shifted Inverse
Lancozs
35Sparse Inverse Eigenvalue Solver
- Sparse Inverse iteration suffers from the cost of
the sparse factorization - Sparse Factorization for each eigenvector is
costly. - Lancozs method works effectively for finding
eigenvectors - Finding eigenvalues does not take much time.
- Eigenvalues are good information for shift.
- Reduce time of factorization and sophistcated use
of re-orthogonalization cut down the whole
solution time.
Time relative to band RRR
Time relative to dense RRR
36Contents
- Background
- Computational Methods
- Generalized Tight Bound Molecular Dynamics
- Eigenvalue computation
- Generalized Eigenvalue Problems
- Parallel Dense Algorithm using SCALAPACK
- Spectrum Slicing for Sparse Matrix
- Experimental Results
- Conclusion
37Performance of Sparse Method with Shifted Inverse
Lancozs
- Use ARPACK for Lanczos method
- Built upon LAPACK and BLAS.
- Achieves better performance than other sparse
methods - Close to the best dense method (RRR).
- Space requirement is little bit more than sparse
inverse iteration.
38Solution Time Comparison
- 2.4 GHz, 8GB Memory
- Based on column block cyclic distribution
- 1 time step solution
- Single processor
39Memory Ratio (Sparse vs. Dense)
Integer 4bytes, Double 8bytes
40Si Nanowires
- Most stable Si crystalline phases
- All four-fold coordinated
- Include diamond, clathrate phases
- Investigate on Si-nanotrees with diamond or
clathrate type inner cores. - Modelling
- GTBMD
- Reliable in obtaining very good agreement with
experiment for structural and vibrational
properties of Si from dimer to the bulk. - Electronic structure analysis
- sp3s tight-binding model that correctly
reproduces the band gap for bulk Si in the
diamond and clathrate phases.
41Experiments
- We consider
- Si-nanotrees and stems.
- 2000-2500 atoms.
- Four-fold coordinated inner core surrounded by an
outer surface of atoms with three-fold
coordination. - Two category
- Tetrahedral Si(Td)
- Clathrate(cage-like) Si (fcc), Si(sc)
- Inner core made of the Si clathrate structure
consisting of fcc and sc.
42Results (Structures)
- Top
- Nanowire with an inner crystalline core
consisting of tetrahedral structure. - Middle
- Clathrate structure with 34 atoms basis in a
face-centered cubic unit cell. - Bottom
- Clathrate structure with 46 atoms basis in a
simple cubic unit cell.
43Results (Structures)
- Surface reconstruction, coupled with branching
results in interesting junction regions - Smooth for Si(34), Si(46) nanotrees.
- Si(Td) junction appear abrupt.
- Cage-like arrangement allow seamless connection
between stems and the branches - Cage-like forms are more isotropic than the
tetrahedral forms.
44Results (DOS)
- Electronic densities of states (DOS), HOMO-LUMO
gap - Stems 0.57eV
- Bulk diamond phase of silicon 1.1eV
- Trees 0.16eV
- More states from the unoccupied levels are pulled
in towards the Fermi level, EF(dashed) - Enhance conductivity due to more conduction
channels being available.
45Conclusions
- GTBMD computation results for structure and
stability studies using large scale quantum
mechanical simulations of nano-trees from Si. - Sparse alternative method to reduce memory usage.
- Si structures are stable, electronic structure
shows narrow HOMO-LUMO gap.
46Future Research Plans
- Sparse Spectrum slicing technique reduce memory
usage - Compute eigenpairs based on the previous time
step eigenpairs for a low rank update. - Attempt to update into GTBMD.
47Q A