Data Mining Approaches in Atomistic Modeling - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Data Mining Approaches in Atomistic Modeling

Description:

Property: Fe ... Physical: s = P(d) (requires thermodynamic models of relevant ... composition, topological, electronic, physical-chemical properties, ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 29
Provided by: a15173
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Approaches in Atomistic Modeling


1
Data Mining Approaches in Atomistic Modeling
Dane Morgan MIT
AMASS Seminar 7/25/03
2
Outline
  • Introduction
  • Ex 1 Intergranular Embrittlement of Fe
  • Ex 2 Catalytic Activity - Hydrogenation
  • Ex 3 Stainless Steel CrxNiyFe(1-x-y)
  • Ex 4 Conductivity T7 7xxx Al Alloys
  • Ex 5 Boiling Points
  • Ex 6 Crystal Structure Prediction open
    questions

3
Predicting Properties with Atomistic Modeling
4
Power of Data Mining
Use known data to establish R
Use R to predict new data
  • Does not require complete and accurate multiscale
    theories
  • New physics in relationships R
  • Quick, cheap screening for desired properties,
    errors, etc. can be qualitative

5
Key Issues
  • Descriptors accessible to modeling
  • Descriptors optimally chosen
  • Use known relationships/physics
  • Optimize from large set of possibilities
  • Descriptors?Property relationship is robust
  • Sensible choice of methods
  • tested with cross validation, test sets
  • Data
  • Large enough
  • Clean enough

6
Ex 1 Intergranular Embrittlement of Fe
  • Property Fe embrittlement
  • Descriptors?Property relationship Embrittlement
    ? Grain boundary segregation E - Free surface
    segregation E (EGB EFS) (Rice 89)
  • Descriptors (EGB EFS) (calculated ab initio)
  • Data Embrittling potency for B, C, P, S.

7
Ex 1 Intergranular Embrittlement of Fe
(Wu, et al., Phys. Rev. B., 96)
Also correctly predicts effect of Mn and Mo on P
embrittlement!
(Zhong, et al., Phys Rev B, 97, Geng, et al.,
Solid State Comm., 01)
8
Ex 2 Catalytic Activity - Hydrogenation
  • Property Reaction rates (Hydrogenation of
    ethene, benzene on 3d transition metal M)
  • Descriptors?Property relationship
  • Adapted Bronsted-Evans_Polanyi Free E
  • Langmuir-Hinshelwood Rate Equations
  • ? Rate REMC,12 fitting constants
    independent of M
  • Descriptors
  • EMC M-C bond strength in bulk NaCl structure
    (calculated ab initio)
  • 12 fitting constants (fit to experimental data
    for each reaction)
  • Data 10-20 reaction rates for each of ethene and
    benzene

9
Ex 2 Catalytic Activity - Hydrogenation
(Toulhoat, et al. 02)
10
Ex 3 Stainless Steel CrxNiyFe(1-x-y)
  • Property High hardness and ductility
  • Descriptors?Property relationship
  • Hardness ? shear modulus G
  • Ductility ? bulk modulus/shear modulus B/G
  • Descriptors B,G (from ab initio)
  • Data Not clearly defined

11
Hardness vs. Shear Modulus
(Teter, MRS Bulletin, 98)
12
Ex 3 Stainless Steel CrxNiyFe(1-x-y))
(Vitos, et al., Nature Materials, 02)
  • Optimal at Cr18Ni24Fe58 (multiple patents)
  • Predict improved mechanical properties for Ir, Os
    doping

High G (hard)
Conflict!
High B/G (ductile)
13
Ex 4 Conductivity T7 7xxx Al Alloys
  • Property Electrical conductivity s
  • Descriptors?Property relationship
  • Linear s Vd (requires only fitting)
  • Neurofuzzy s NF(d) (requires only fitting)
  • Physical s P(d) (requires thermodynamic models
    of relevant phases, RayleighMaxwell equation for
    resistivity with dispersed particles,
    Starink-Zahra equation for precipitation, 1D
    diffusion equation, Matthiesens rule for
    resistivity with dissolved elements)
  • Descriptors Concentrations, ageing time ? d
    xZn, xMg, xCu, xZr, xFe, xSi, t

14
Ex 4 Conductivity T7 7xxx Al Alloys
s measured for 36 concentration/ageing time
samples
(Starink, et al., 00)
15
Ex 5 Boiling Points
(Quantitative Structure-Property Relationships
QSPR)
  • Property Boiling Point TB
  • Descriptors?Property relationship Neural Network
    (10181, sigmoid, backpropagation)
  • Descriptors Electrostatic and structural
    properties (calculated with semiempirical VAMP
    AM1)
  • Data TB for 6629 molecules containing elements
    H, B, C, N, O, F, Al, Si, P, S, Cl, Zn, Ge, Br,
    Sn, I, Hg

16
Data Mining Descriptors?Property Relationships
  • Many general approaches
  • Graphical
  • Linear Regressions (normal least squares,
    principal component regression, partial least
    squares, )
  • Neural Networks (perceptrons, feed-forward,
    radial-basis, )
  • Clustering (k-means, nearest-neighbor, )
  • Many choices in each approach
  • Neural Networks
  • Number of neurons/layers 341
  • Transfer functions step, sigmoid, tansig, etc.
  • Training method backpropagation algorithms

  • Thousands of possible approaches!
  • Many yield similar results
  • Appropriate for different situations
  • Problem dependent - much art!!

17
Descriptors
Charged partial surface areas descriptors,
Accelyris QSAR module
  • Partial positive surface area (sum of the surface
    area of positive atoms)
  • Partial negative surface area (sum of the surface
    area of negative atoms)
  • Total charge weighted positive surface area
    (descriptor 1 multiplied by the total positive
    charge)
  • Total charge weighted negative surface area
    (descriptor 2 multiplied by the total negative
    charge)
  • Atomic charge weighted positive surface area
    (sum of sasacharge for all positive atoms)
  • Atomic charge weighted negative surface area (sum
    of sasacharge for all negative atoms)
  • Difference in charged surface areas (descriptor
    1 - descriptor 2)
  • Difference in total charge weighted surface areas
    (descriptor 3 - descriptor 4)
  • Difference in atomic charge weighted surface
    areas (descriptor 5 - descriptor 6)
  • Fractional charged partial surface areas (6
    descriptors divided by total surface area)
  • "
  • "
  • "
  • "
  • "
  • Surface weighted charged partial surface areas (6
    descriptors multiplied by total surface area)
  • "
  • "
  • "

(http//www.accelrys.com/cerius2/descriptor.htmll
ist)
18
Descriptors
  • Many broad categories composition, topological,
    electronic, physical-chemical properties,
  • Thousands of possible descriptors
  • Use physical knowledge to choose relevant ones
    (e.g., QSAR principle)
  • Use numerical methods to choose important
    descriptors

19
Ex 5 Boiling Point Descriptors
(Chalk, et al., J Chem. Inf. Comput. Sci, 01)
20
Ex 5 Atomistic Modeling Methods
  • Use VAMP AM1 and PM3 Hamiltonians
  • Semi-empirical molecular orbital based
  • Quantum mechanical, but matrix elements are fit
    to experimental data
  • Can calculate optimized geometries, electronic
    structure (charge properties)
  • Fairly accurate (known failings) and fast

21
Ex 5 Boiling Points
Training set (6000)
Test set (629)
?17? (max -119?)
?19? (max -94?)
(Chalk, et al., J Chem. Inf. Comput. Sci, 01)
  • Large errors often due to
  • Incorrect experimental measurements of TB (low
    pressure)
  • Incorrect experimental structures (tautomer
    misidentification)
  • Failure of atomistic modeling method
    (approximation errors)

22
Ex 6 Crystal Structure Prediction
  • Property Stable crystal structure
  • Descriptors?Property relationship Neighbor
    Clustering algorithm (Euclidean metric)
  • Descriptors Chemical scale (empirically assigned
    value for each element) (Pettifor, J. Phys. C,
    86)
  • Data All intermetallic binary alloys (thousands)

23
Structure Maps
CsCl
NaCl
(Rodgers, CRYSTMET, 03)
24
Ex 6 Crystal Structure Prediction
  • Powerful structure maps can give 90-95
    predictive accuracy
  • Many Descriptors 50 have been tried based on
    size, atomic number, cohesive energy,
    electrochemistry, valence electrons
  • Cant be extended accurate maps require 40 of
    the possible systems to be known (80 binaries
    known, 0.1 quaternaries)
  • Can atomistic modeling help?
  • Fill in data for multicomponent systems
  • Provide optimal descriptors

(Villars, Intermetallic Compounds, 94)
25
Conclusions
  • Atomistic modeling and data mining can provide
    valuable predictive ability when physical
    theories are incomplete
  • Key issues are data quality, descriptors, and
    descriptor?properties relationship
  • Dangers of overfitting and tuning

26
Bible Code
Are these words closer than by chance? Can the
Bible predict future events?
Some say yes (Witzumn, et al, Stat. Sci.,
94) Some say no (McKay, et al., Stat. Sci., 99)
  • Many articles
  • gt60 books on Bible Codes on Amazon
  • 1 major motion picture (Omega Code)

Be careful with your statistics!
27
The First and Greatest Example of Atomic Level
Data Mining
28
END
Write a Comment
User Comments (0)
About PowerShow.com