Title: STATISTICAL LEARNING METHODS FOR MICROSTRUCTURES
1STATISTICAL LEARNING METHODS FOR MICROSTRUCTURES
Veera Sundararaghavan and Prof. Nicholas Zabaras
Materials Process Design and Control
Laboratory Sibley School of Mechanical and
Aerospace Engineering188 Frank H. T. Rhodes
Hall Cornell University Ithaca, NY
14853-3801 Email vs85_at_cornell.edu,
zabaras_at_cornell.edu URL http//mpdc.mae.cornell.
edu/
2WHAT IS STATISTICAL LEARNING
- Statistical learning is all about automating the
process of searching for patterns from large
scale statistics. - Which patterns are interesting?
- Mathematical techniques for associating input
data with desired attributes and identifying
correlations - A powerful tool for designing materials
3FOR MICROSTRUCTURES?
- Properties of a material are affected by the
underlying microstructure - Microstructural attributes related to specific
properties - Examples Correlation functions -gt Elastic moduli
- Orientation distribution -gtYield stress
in polycrystals - Attributes evolve during processing (thermo
mechanical, chemical, solidification etc.) - Can we identify specific patterns in these
relationships? - Is it possible to probabilistically predict the
best microstructure and the best processing paths
for optimizing properties based on available
structural attributes?
4TERMINOLOGY
- Microstructure can be represented in terms of
typical attributes - Examples are volume fractions, probability
functions, shape/size attributes, orientation of
grains, cluster functions, lineal measures and so
on - All these attributes affect physical properties
- Attributes evolve during processing of a
microstructure - Attributes are represented in a discrete (vector)
form as features - features are represented as a vector xk, k
1,,n where n is the dimensionality of the
feature - Every different feature is represented as xk(i)
where superscript denotes the ith feature that we
are interested in
5TERMINOLOGY
Given a data set of computational or experimental
microstructures, can we learn the functional
differences between them based on features? We
denote microstructures that are similar in
attributes in terms of a class representation
y, y 1..k where k is number of classes.
Classes are formed into hierarchies Each level
represented by feature x(i). Structure based
classes are affiliated with process and
properties powerful tool for exploring complex
microstructure design space
6APPLICATIONS
7MICROSTRUCTURE LIBRARIES FOR REPRESENTATION
Input microstructure
Sundararaghavan Zabaras, Acta Materialia, 2004
Pre-processing
Identify and add new classes
Feature Detection
Employ lower-order features
Classifier
quantification and mining associations
8MICROSTRUCTURE RECONSTRUCTION
Sundararaghavan and Zabaras, Computational
Materials Sci, 2005
Process
Pattern recognition
Microstructure evolution models
2D Imaging techniques
Feature extraction
Reverse engineer process parameters
Database
vision
Microstructure Analysis (FEM/Bounding theory)
3D realizations
9STATISTICAL LEARNING TOOLBOX
Training samples
NUMERICAL SIMULATION OF MATERIAL RESPONSE
Update data In the library
- Multi-length
- scale analysis
- Polycrystalline
- plasticity
STATISTICAL LEARNING TOOLBOX
Image
- Functions
- Classification methods
- Identify new classes
ODF
Associate data with a class update classes
Process controller
Pole figures
10DESIGNING PROCESSES FOR MICROSTRUCTURES
Sundararaghavan and Zabaras, Acta Materialia, 2005
DATABASE
Process sequence-2 New process parameters ODF
history Reduced basis
Process sequence-1 Process parameters ODF
history Reduced basis
New dataset added
Desired texture/property
Classifier
Adaptive basis selection
Process
Reduced basis
Optimization
Probable Process sequences Initial parameters
Stage - 1
Stage - 2
Optimum parameters
Materials Process Design and Control Laboratory
11THIS LECTURE WILL COVER.
- This lecture we will try to go into the math
behind statistical learning and learn two really
useful techniques Support Vector Machines and
Bayesian Clustering. - Applications to microstructure representation,
reconstruction and process design will be shown - We will skim over the physics and some important
computational tools behind these problems
12STATISTICAL LEARNING TECHNIQUES
13STATISTICAL LEARNING TECHNIQUES
This lecture
14STATISTICAL LEARNING TECHNIQUES
This lecture
Function approximation Useful for prediction in
regions that are computationally unreachable (not
covered in this lecture)
15PRELIMINARIES OF SUPERVISED CLASSIFIERS
Low strength
denotes 1 denotes -1
Two class problem The classes for the test
specimens are known apriori Aim To predict the
strength of a new microstructure
Pore density
High strength
Volume fraction
16SUPPORT VECTOR MACHINES
f(x,w,b) sign(w. x - b)
denotes 1 denotes -1
How would you classify this data?
17OCCAMS RAZOR
plurality should not be assumed without necessity
William of Ockham, Surrey (England) 1285-1347
AD, theologian
- Simpler models are more likely to be correct than
complex ones - Nature prefers simplicity.
- principle of uncertainty maximization
18SUPPORT VECTOR MACHINES
f(x,w,b) sign(w. x - b)
denotes 1 denotes -1
How would you classify this data?
19SUPPORT VECTOR MACHINES
f(x,w,b) sign(w. x - b)
denotes 1 denotes -1
Any of these would be fine.. ..but which is best?
20SUPPORT VECTOR MACHINES
f(x,w,b) sign(w. x - b)
denotes 1 denotes -1
Define the margin of a linear classifier as the
width that the boundary could be increased by
before hitting a datapoint.
21SUPPORT VECTOR MACHINES
f(x,w,b) sign(w. x - b)
denotes 1 denotes -1
The maximum margin linear classifier is the
linear classifier with the, um, maximum
margin. This is the simplest kind of SVM (Called
an LSVM)
Support Vectors are those datapoints that the
margin pushes up against
Linear SVM
22SUPPORT VECTOR MACHINES
M Margin Width
x
Predict Class 1 zone
x-
How do we compute M in terms of w and b?
Predict Class -1 zone
wxb1
wxb0
Claim x x- l w for some value of l.
Why?
wxb-1
- Plus-plane x w . x b 1
- Minus-plane x w . x b -1
- The vector w is perpendicular to the Plus Plane.
Why?
Let u and v be two vectors on the Plus Plane.
What is w . ( u v ) ?
And so of course the vector w is also
perpendicular to the Minus Plane
23SUPPORT VECTOR MACHINES
Computing the margin width
M Margin Width
x
Predict Class 1 zone
x-
Predict Class -1 zone
wxb1
w . (x - l w) b 1 gt w . x - b l w .w
1 gt -1 l w .w 1 gt
wxb0
wxb-1
- What we know
- w . x b 1
- w . x- b -1
- x x- l w
- x - x- M
- Its now easy to get M in terms of w and b
24SUPPORT VECTOR MACHINES
M
Predict Class 1 zone
wxb1
Predict Class -1 zone
wxb0
wxb-1
Minimize w.w What are the constraints? w . xk b
gt 1 if yk 1 w . xk b lt -1 if yk -1
Learning the Maximum Margin Classifier
25SUPPORT VECTOR MACHINES
This is going to be a problem! What should we
do? Minimize w.w C (distance of error points
to their correct place)
26SUPPORT VECTOR MACHINES
M
e11
e2
wxb1
e7
wxb0
wxb-1
Minimize
Constraints? w . xk b gt 1-ek if yk 1 w . xk
b lt -1ek if yk -1 ek gt 0 for all k
27SUPPORT VECTOR MACHINES
What can be done about this?
Harder 1-dimensional dataset
28SUPPORT VECTOR MACHINES
Quadratic Basis Functions
x0
29SUPPORT VECTOR MACHINES WITH KERNELS
F x ? f(x)
Minimize
Constraints? w . F(xk) b gt 1-ek if yk 1 w .
F(xk) b lt -1ek if yk -1 ek gt 0 for all k
30SUPPORT VECTOR MACHINES QUADRATIC PROGRAMMING
Maximize
where
Subject to these constraints
Then define
Datapoints with ak gt 0 will be the support vectors
Then classify with f(x,w,b) sign(w. (x) - b)
31p 3
B
Class-B
C
Class-A
C
A
B
A
Class-C
32MULTIPLE FEATURES
HIERARCHICAL LIBRARIES (a.k.a) DIVISIVE
CLUSTERING
33DYNAMIC MICROSTRUCTURE LIBRARY CONCEPTS
Space of all possible microstructures
A class of microstructures (eg. equiaxial grains)
New class partition
Hierarchical sub-classes (eg. medium grains)
Expandable class partitions (retraining)
distance measures
New class
Dynamic Representation
New microstructure added
Axis for representation
Updated representation
Materials Process Design and Control Laboratory
34QUANTIFICATION OF DIVERSE MICROSTRUCTURE
A Common Framework for Quantification of Diverse
Microstructure
Qualitative representation
Equiax grains Grain size small
Lower order descriptor approach
Grain size distribution
No. of grains
Grain size number
Equiaxial grain microstructure space
Quantitative approach
Microstructure represented by a set of numbers
Representation space of all possible polyhedral
microstructures
1.4 2.6 4.0 0.9 ..
Materials Process Design and Control Laboratory
35BENEFITS
- A data-abstraction layer for describing
microstructural information. - An unbiased representation for comparing
simulations and experiments AND for evaluating
correlation between microstructure and
properties. - A self-organizing database of valuable
microstructural information which can be
associated with processes and properties. - Data mining Process sequence selection for
obtaining desired properties - Identification of multiple process paths leading
to the same microstructure - Adaptive selection of basis for reduced order
microstructural simulations. - Hierarchical libraries for 3D microstructure
reconstruction in real-time by matching multiple
lower order features. - Quality control Allows machine inspection and
unambiguous quantitative specification of
microstructures.
Materials Process Design and Control Laboratory
36PRINCIPAL COMPONENT ANALYSIS
- Let be n images.
- Vectorize input images
- Create an average image
- Generate training images
- Create correlation matrix (Lmn)
- Find eigen basis (vi) of the correlation matrix
- Eigen microstructures (ui) are generated from the
basis (vi) as - Any new face image ( ) can be transformed to
eigen face components through n coefficients
(wk) as,
Reduced basis
Data Points
Representation coefficients
Materials Process Design and Control Laboratory
37REQUIREMENTS OF A REPRESENTATION SCHEME
A set of numbers which completely represents a
microstructure within its class
REPRESENTATION SPACE OF A PARTICULAR
MICROSTRUCTURE
Must differentiate other cases (must be
statistically representative)
2.7 3.6 1.2 0.1 ..
8.4 2.1 5.7 1.9 ..
Need for a technique that is autonomous,
applicable to a variety of microstructures,
computationally feasible and provides complete
representation
Materials Process Design and Control Laboratory
38PCA REPRESENTATION OF MICROSTRUCTURE AN EXAMPLE
Input Microstructures
Eigen-microstructures
Representation coefficients (x 0.001)
0.0125 1.3142 -4.23 4.5429 -1.6396
-0.8406 0.8463 -3.0232 0.3424 2.6752
3.943 -4.2162 -0.6817 -9718 1.9268
1.1796 -1.3354 -2.8401 6.2064 -3.2106
5.8294 5.2287 -3.7972 -3.6095 -3.6515
Basis 1
Image-1 quantified by 5 coefficients over the
eigen-microstructures
Basis 5
Materials Process Design and Control Laboratory
39EIGEN VALUES AND RECONSTRUCTION OVER THE BASIS
Significant eigen values capture most of the
image features
4
2
3
1
Reconstruction of microstructures over fractions
of the basis
1.Reconstruction with 100 basis
3. Reconstruction with 60 basis
2. Reconstruction with 80 basis
4. Reconstruction with 40 basis
Materials Process Design and Control Laboratory
40INCREMENTAL PCA METHOD
- For updating the representation basis when new
microstructures are added in real-time. - Basis update is based on an error measure of the
reconstructed microstructure over the existing
basis and the original microstructure
Newly added data point
Updated Basis
IPCA Given the Eigen basis for 9
microstructures, the update in the basis for the
10th microstructure is based on a PCA of 10 x 1
coefficient vectors instead of a 16384 x 1 size
microstructures.
Materials Process Design and Control Laboratory
41ROSE OF INTERSECTIONS FEATURE ALGORITHM
(Saltykov, 1974)
Identify intercepts of lines with grain
boundaries plotted within a circular domain
Total number of intercepts of lines at each angle
is given as a polar plot called rose of
intersections
Count the number of intercepts over several lines
placed at various angles.
Materials Process Design and Control Laboratory
42GRAIN SHAPE FEATURE EXAMPLES
Materials Process Design and Control Laboratory
43GRAIN SIZE PARAMETER
Several lines are superimposed on the
microstructure and the intercept length of the
lines with the grain boundaries are recorded
(Vander Voort, 1993)
The intercept length (x-axis) versus number of
lines (y-axis) histogram is used as the measure
of grain size.
Materials Process Design and Control Laboratory
44GRAIN SIZE FEATURE EXAMPLES
Materials Process Design and Control Laboratory
45SVM TRAINING FORMAT
GRAIN FEATURES GIVEN AS INPUT TO SVM TRAINING
ALGORITHM
Class Feature number Feature value Feature number Feature value
1 1 23.32 2 21.52
2 1 24.12 2 31.52
Data point
CLASSIFICATION SUCCESS
Total images Number of classes Number of Training images Highest success rate Average success rate
375 11 40 95.82 92.53
375 11 100 98.54 95.80
Materials Process Design and Control Laboratory
46CLASS HIERARCHY
Level 1 Grain shapes
Class 2
Class 1
Level 2 Subclasses based on grain sizes
Class 1(a)
Class 1(b)
Class 1(c)
Class 2(a)
Class 2(b)
Class 2(c)
New classes Distance of image feature from the
average feature vector of a class
Materials Process Design and Control Laboratory
47IPCA QUANTIFICATION WITHIN CLASSES
Class-j Microstructures (Equiaxial grains,
medium grain size)
Class-i Microstructures (Elongated 45 degrees,
small grain size)
The Library Quantification and image
representation
Representation Matrix Image -1 Image-2 Image-3
Component in basis vector 1 123 23 38
2 91 54 -85
3 -54 90 12
Average Image 21 23 24
Eigen Basis
0.9 0.84 0.23..
0.54 0.21 0.74..
Materials Process Design and Control Laboratory
48REPRESENTATION FORMAT FOR MICROSTRUCTURE
Date 1/12 0223PM, Basis updated Shape Class 3,
(Oriented 40 degrees, elongated) Size Class 1,
(Large grains) Coefficients in the basis2.42,
12.35, -4.14, 1.95, 1.96, -1.25
Improvement of microstructure representation due
to classification
Reconstruction with 6 coefficients (24 basis) A
class with 25 images
Improvement in reconstruction 6 coefficients (10
of basis) Class of 60 images
Original image
Reconstruction over 15 coefficients
Materials Process Design and Control Laboratory
49- A DYNAMIC LIBRARY APPROACH
- Classify microstructures based on lower order
descriptors. - Create a common basis for representing images in
each class at the last level in the class
hierarchy. - Represent 3D microstructures as coefficients over
a reduced basis in the base classes. - Dynamically update the basis and the
representation for new microstructures
Does not decay to zero
COMMON-BASIS FOR MICROSTRUCTURE REPRESENTATION
50Basis Components
Reconstruct using two basis components
X 5.89
X 14.86
Project onto basis
Pixel value round-off
Representation using just 2 coefficients
(5.89,14.86)
51- Creation of 3D microstructure models from 2D
images - 3D imaging requires time and effort. Need to
address realtime methodologies for generating 3D
realizations. - Make intelligent use of available information
from computational models and experiments.
Database
2D Imaging techniques
Pattern recognition
vision
Microstructure Analysis
52Methods available are optimization based
Features of 2D image are matched to that of a 3D
microstructure by posing an optimization
problem. 1) Does not make use of available
information (experimental/simulated data) 2)
Cannot perform reconstructions in real-time. Need
to take into account the processes that create
these microstructure (Oren and Bakke, 2003) for
correctly modeling the geometric connectivity.
- Key assumptions employed for 3D image
reconstruction from a single 2D image - Randomness Assumption (Ohser and Mucklich
2000). - Grains in a polyhedral microstructure are
assumed to be of the similar shapes but of
different sizes. - Two phase microstructures can be characterized
using rotationally-invariant probability functions
53- PATTERN RECOGNITION A DATA-DRIVEN OPTIMIZATION
TOOL - Feature matching for reconstruction of 3D
microstructures
Real-time
Datasets microstructures from experiments or
physical models
DATABASE CREATION
Extraction of statistical features from the
database
FEATURE EXTRACTION
Creation of a microstructure class hierarchy
Classification methods
TRAINING
Prediction of 3D reconstruction, process paths,
etc,
PREDICTION
54- Algorithm (1 Monte Carlo Step)
- Calculation of the free energy of a randomly
selected node (Hi) - Random choice of a new crystallographic
orientation for the node - New calculation of the free energy of the
element (Hf) - The orientation that minimizes the energy
(min(Hf,Hi)) is chosen.
Potts Hamiltonian (H)
Ns Total No. of nodes
Nn(i) No. of neighbors of node i
Classes of microstructures based on grain size
feature
Microstructure Database
55Slice
Intercept lengths of parallel network of lines
with the grain boundaries are recorded at several
angles
The intercept length (x-axis) versus number of
lines (y-axis) histogram is the measure of grain
size (Heyn intercept histogram).
56FEATURE BASED CLASSIFICATION
3D Microstructures
Heyn int. Histogram
Rose of intersections
3D Microstructures
Class - 1
Class - 1
Class - 2
Class - 2
Class - 3
Class - 4
LEVEL - 1
LEVEL - 2
57RECONSTRUCTION OF POLYHEDRAL MICROSTRUCTURE
Polarized light micrographs of Aluminum alloy
AA3002 representing the rolling plane (Wittridge
Knutsen 1999)
A reconstructed 3D image
Comparison of the average feature of 3D class and
the 2D image
58The stereological integral equation for
estimating the 3D grain size distribution from a
2D image for polyhedral microstructures
Na,Fa(s) density of grains and grain size
distribution in 2D image Nv,Fv(u) density of
grains and grain size distribution in 3D
microstructure rotation average of the
size of a particle with maximum size 1 Gu(s)
Size distribution function of the section
profiles under the condition that a random size
U equals the 3D particle mean size (u). Remark
Sizes are defined as the maximum calliper
diameter of a grain
59STEREOLOGICAL DISTRIBUTIONS (GEOMETRICAL)
3D reconstruction
2D grain profile
3D grain
Na,Fa(s) density of grains and grain size
distribution in 2D image Nv,Fv(u) density of
grains and grain size distribution in 3D
microstructure
60Rotationally invariant probability functions
(SiN ) can be interpreted as the probability of
finding the N vertices of a polyhedron separated
by relative distances x1, x2,..,xN in phase i
when tossed, without regard to orientation, in
the microstructure.
MC Sampling Computing the three point
probability function of a 3D microstructure(40x40x
40 mic) S3(r,s,t), r s t 2, 5000 initial
points, 4 samples at each initial point.
61- Microstructure is represented using voxels.
- Probability of solidification (P) depends on
- 1) Net weight (w) of the No. of neighbors of a
solid voxel - If w gt 8.6568 voxel solidifies (P 1)
- If 3.8284 lt w lt 8.6568, P 0.1
- If weight lt 3.8284, the voxel remains liquid (P
0) - 2) The solute concentration A linear probability
distribution with P 0 at critical concentration
and P 1 when concentration is 0.
Final state
When a voxel solidifies, liquid is expelled to
its neighbors, creating solute concentration
(ci,j,k) gradients. Movement of solute to
minimize concentration gradients is modeled using
ficks law
Where (i,j,k) is a voxel coordinate, n is the
time step and D is the diffusion coefficient
62TWO PHASE MICROSTRUCTURE CLASS HIERARCHY
Feature vector Three point probability function
Feature Autocorrelation function
3D Microstructures
3D Microstructures
Class - 1
g
r mm
Class - 2
LEVEL - 2
LEVEL - 1
63EXAMPLE 3D RECONSTRUCTION USING SVMS
Ag-W composite (Umekawa 1969)
A reconstructed 3D microstructure
Autocorrelation function
3 point probability function
64MICROSTRUCTURE ELASTIC PROPERTIES
3D image derived through pattern recognition
Experimental image
65WHAT IS MICROSTRUCTURE DESIGN
Direct problem
Known operating conditions
Use finite elements, experiments etc.
Initial Microstructure
Property?
Design for best processes
Design problems
processing sequence?
Final microstructure/ property
Initial microstructure
Design for best microstructure
Known operating conditions
Known property limits
Microstructure?
66SUPERVISED VS UNSUPERVISED LEARNING
- Supervised classification for design
- Classify microstructures based on known process
sequence classes - Given a desired microstructure, identify the
processing stages required through classification - Drawback Identifies a unique process sequence,
but we that find many processing paths to lead to
similar properties! - UNSUPERVISED CLASSIFICATION
- Identify classes purely based on structural
attributes - Associate processes and properties through
databases - Explores the structural attribute space for
similarities and unearths non-unique processing
paths leading to similar microstructural
properties
67K MEANS
Suppose the coordinates of points drawn randomly
from this dataset are transmitted. You can
install decoding software at the receiver. Youre
only allowed to send two bits per point. Itll
have to be a lossy transmission. Loss Sum
Squared Error between decoded coords and original
coords. What encoder/decoder will lose the least
information?
68K MEANS
Idea One
Break into a grid, decode each bit-pair as the
middle of each grid-cell
00
01
- Questions
- What are we trying to optimize?
- Are we sure it will find an optimal clustering?
11
10
Break into a grid, decode each bit-pair as the
centroid of all data in that grid-cell
69K MEANS
Find the cluster centers C1,C2,,Ck such that
the sum of the 2-norm distance squared between
each feature xi , i 1,..,n and its nearest
cluster center ch is minimized.
Cost Function
Cost function minimized by transmitting centroids
70THE EXPECTATION-MAXIMIZATION (EM) ALGORITHM
- What properties can be changed for centers c1 ,
c2 , , ck have when distortion is not
minimized? - Expectation step Compute expected centers
- Change encoding so that xi is encoded by its
nearest center - Maximization step Compute maximum likelihood
values of centers - (2) Set each Center to the centroid of points it
owns. - Theres no point applying either operation twice
in succession. - But it can be profitable to alternate.
- And thats K-means!
EM algorithm will be dealt with later
71K-MEANS
- Ask user how many clusters theyd like. (e.g.
k5)
72K-MEANS
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
73K-MEANS
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
- Each datapoint finds out which Center its
closest to. (Thus each Center owns a set of
datapoints)
74K-MEANS
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
- Each datapoint finds out which Center its
closest to. - Each Center finds the centroid of the points it
owns
75K-MEANS
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
- Each datapoint finds out which Center its
closest to. - Each Center finds the centroid of the points it
owns - and jumps there
- Repeat until terminated!
often unknown (is dependent on the features used
for microstructure representation)
76SHORTCOMINGS OF K-MEANS AND REMEDIES
- K-MEANS gives hyper-spherical clusters Not
always the case with data - Number of classes must be known apriori Beats
the reasoning for unsupervised clusters we do
not know anything about the classes in the data - May converge to local optima not so bad
- We will discuss about new strategies to get
improved clusters of microstructural features - Gaussian mixture models and Bayesian clustering
- Later, an improved k-means algorithm called
X-means which uses a Bayesian information
criterion
77PROBABILITY PRELIMINARIES
- A is a Boolean-valued random variable if A
denotes an event, and there is some degree of
uncertainty as to whether A occurs. - Examples
- A You win the toss
- A Probability of failure of a structure
Discrete Random Variables
0 lt P(A) lt 1 P(True) 1 P(False) 0 P(A or B)
P(A) P(B) - P(A and B)
P(A) P(A) 1
P(B) P(B A) P(B A)
78PROBABILITY PRELIMINARIES
Definition of Conditional Probability
P(A B) P(AB)
----------- P(B)
Corollary The Chain Rule
P(A B) P(AB) P(B)
Bayes Rule
- P(A B) P(AB) P(B)
- P(BA) ----------- ---------------
- P(A) P(A)
79PROBABILITY PRELIMINARIES
- MLE (Maximum Likelihood Estimator)
What if Y v itself is very unlikely?
Class of data argmaxi P(data class i)
- MAP (Maximum A-Posteriori Estimator)
Includes P(Y v) information through Bayes rule
(P(Y v) is called as prior)
Class of data argmaxi P(class i data)
80PROBABILITY PRELIMINARIES
- MAP (Maximum A-Posteriori Estimator)
81PROBABILITY PRELIMINARIES
Bayes Classifiers in a nutshell
1. Learn the distribution over inputs for each
value Y. 2. This gives P(X1, X2, Xm Yvi
). 3. Estimate P(Yvi ). as fraction of records
with Yvi . 4. For a new prediction
82NAÏVE BAYES CLASSIFIER
In the case of the naive Bayes Classifier this
can be simplified
The independent features assumption
83NAÏVE BAYES CLASSIFIER IS AN SVM?
The naïve Bayes classifier
Notation change
New Bayes classifier
84NAÏVE BAYES CLASSIFIER IS AN SVM?
Bayes classifier with feature weighting
A two class classifier Decision function given
by the sign of fWBC given by
wj 1 (for naïve Bayes) But, features may be
correlated!
85NAÏVE BAYES CLASSIFIER IS AN SVM?
SVM classifier!
Feature space of a naïve Bayes classifier
86INTRO TO BAYESIAN UNSUPERVISED CLASSIFICATION
Gaussian Mixture Models
Assume that each feature is generated as Pick a
class at random. Choose class i with probability
P(wi). The feature is sampled from a Gaussian
distribution N(mi, Si )
87GAUSSIAN MIXTURE MODEL
Probabilistic extension of K-MEANS
- There are k components. The ith component is
called yi - Component yi has an associated mean vector mi
- Each component generates data from a Gaussian
with mean mi and covariance matrix Si
m2
m3
Assuming features in each class can be modeled by
a Gaussian distribution, identify the parameters
(means,variances etc.) of the distributions
88GAUSSIAN MIXTURE MODEL
- We have x1 x2 xn features of a microstructure
- We have P(y1) .. P(yk). We have s.
- We can define, for any x , P(xyi , µ1, µ2 .. µk)
- Can we define P(x µ1, µ2 .. µk) ?
- Can we define P(x1, x2, .. xn µ1, µ2 .. µk) ?
89GAUSSIAN MIXTURE MODEL
Given a guess at µ1, µ2 .. µ k, We can obtain the
probability of the unlabeled data given those µs.
Inverse Problem Find ms given the points
x1,x2,xk
The normal max likelihood trick Set d log
Prob (.) 0 d µi and
solve for µis. Using gradient descent, Slow but
doable Use a much faster and recently very
popular method
90EM ALGORITHM REVISITED
- We have unlabeled microstructural features x1 x2
xR - We know there are k classes
- We know P(y1), P(y2), P(y3), , P(yk)
- We dont know µ1 µ2 .. µk
- We can write P( data µ1. µk)
Maximize this likelihood
91GAUSSIAN MIXTURE MODEL
This is n nonlinear equations in µjs.
If, for each xi we knew that for each yj the prob
that µj was in class yj is P(yjxi,µ1µk) Then
we would easily compute µj. If we knew each µj
then we could easily compute P(yjxi,µ1µj) for
each yj and xi.
92GAUSSIAN MIXTURE MODEL
- Iterate. On the tth iteration let our
estimates be - µ1(t), µ2(t) µc(t)
- E-step
- Compute expected classes of all datapoints for
each class
Just evaluate a Gaussian at xk
M-step. Compute Max. like µ given our datas
class membership distributions
93GAUSSIAN MIXTURE MODEL DENSITY ESTIMATION
Complex PDF of the feature space
Features in 2D
Classification Probabilistic quantification of
results Ambiguity Anomaly detection Very
popular in Genome mapping
94DATABASE FOR POLYCRYSTAL MICROSTRUCTURES
Multi-scale microstructure evolution models
Driven by distance based (or) Probabilistic
clustering
Statistical Learning
Database
Feature Extraction
Meso-scale database COMPONENTS
Divisive Clustering
Class hierarchies
Class Prediction
Materials Process Design and Control Laboratory
95DATABASE FOR POLYCRYSTAL MICROSTRUCTURES
Multi-scale microstructure evolution models
Statistical Learning
Database
Feature Extraction
Meso-scale database COMPONENTS
Divisive Clustering
Class hierarchies
Class Prediction
Materials Process Design and Control Laboratory
96DATABASE FOR POLYCRYSTAL MICROSTRUCTURES
Multi-scale microstructure evolution models
Statistical Learning
Database
Feature Extraction
Meso-scale database COMPONENTS
Divisive Clustering
Class hierarchies
Class Prediction
Materials Process Design and Control Laboratory
97ORIENTATION DISTRIBUTION FUNCTION
ORIENTATION DISTRIBUTION FUNCTION A(r,t)
- Determines the volume fraction of crystals
within - a region R' of the fundamental region R
- Probability of finding a crystal orientation
within - a region R' of the fundamental region
- Characterizes texture evolution
ODF EVOLUTION EQUATION EULERIAN DESCRIPTION
reorientation velocity
Any macroscale property lt ? gt can be expressed
as an expectation value if the corresponding
single crystal property ? ( ,t) is known.
Materials Process Design and Control Laboratory
98FEATURES OF AN ODF ORIENTATION FIBERS
Fibers h1,2,3, y 1,0,1
Sample Axis y
For a particular (h), the pole figure takes
values P(h,y) at locations y on a unit sphere.
angle
Point y (1,0,1)
1,2,3 Pole Figure
Crystal Axis h
Integrated over all fibers corresponding to
crystal direction h and sample direction y
Points (r) of a (h,y) fiber in the fundamental
region
Materials Process Design and Control Laboratory
99SIGNIFICANCE OF ORIENTATION FIBERS
Important fiber families lt110gt uniaxial
compression, plane strain compression and simple
shear. lt111gt Torsion, lt100gt,lt411gt fibers
Tension a fiber (ND lt110gt ) b fiber FCC
metals under plane strain compression
close affiliation with processes
z-axis lt110gt fiber BB
Uniaxial (z-axis) Compression Texture
z-axis lt111gt fiber CC
Predictable fiber development
z-axis lt100gt fiber AA
Materials Process Design and Control Laboratory
100LIBRARY FOR TEXTURES
Uni-axial (z-axis) Compression Texture
110 fiber family
Feature
q fiber path corresponding to crystal direction
h and sample direction y
z-axis lt110gt fiber (BB)
Materials Process Design and Control Laboratory
101SUPERVISED CLASSIFICATION USING SUPPORT VECTOR
MACHINES
Multi-stage classification with each class
affiliated with a unique process
Tension (T)
Stage 1
Stage 2
Stage 3
Identifies a unique processing sequence Fails to
capture the non-uniqueness in the solution
Given ODF/texture
Materials Process Design and Control Laboratory
102UNSUPERVISED CLASSIFICATION
Find the cluster centers C1,C2,,Ck such that
the sum of the 2-norm distance squared between
each feature xi , i 1,..,n and its nearest
cluster center Ch is minimized.
Each class is affiliated with multiple processes
Cost function
Feature Space
DATABASE OF ODFs
Clusters
Identify clusters
Materials Process Design and Control Laboratory
103ODF CLASSIFICATION
- Automatic class-discovery without class labels.
- Hierarchical Classification model
- Association of classes with processes, to
facilitate data-mining - Can be used to identify multiple process routes
for obtaining a desired ODF
ODF 2,12,32,97
One ODF, several process paths
Data-mining for Process information with ODF
Classification
Materials Process Design and Control Laboratory
104PROCESS PARAMETERS LEADING TO DESIRED PROPERTIES
ODF Classification
Database for ODFs
Property Extraction
Identify multiple solutions
Velocity Gradient
Different processes, Similar properties
Materials Process Design and Control Laboratory
105K-MEANS ALGORITHM FOR UNSUPERVISED CLASSIFICATION
- Lloyds Algorithm
- Start with k randomly initialized centers
- Change encoding so that xi is owned by its
nearest center. - Reset each center to the centroid of the points
it owns. - Alternate steps 1 and 2 until converged.
- User needs to provide k, the number of
clusters.
But, No. of clusters is unknown for the texture
classification problem
Materials Process Design and Control Laboratory
106SCHWARZ CRITERION FOR IDENTIFYING NUMBER OF
CLUSTERS
Maximum likelihood of the variance assuming
Gaussian data distribution
Probability of a point in cluster i
Log-likelihood of the data in a cluster
Materials Process Design and Control Laboratory
107CENTROID SPLIT TESTS
- X-MEANS algorithm
- Start with k clusters found through k-means
algorithm - Split each centroid into two centroids, and move
the new centroids along a distance proportional
to the cluster size in an arbitrarily chosen
direction - Run local k-means (k 2) in each cluster
- Accept split cluster in each region if BIC(k
1) lt BIC(k 2) - Test for various initial values of k and
select the k with maximum overall BIC
Materials Process Design and Control Laboratory
108COMPARISON OF K-MEANS AND X-MEANS
Local Optimum produced by the kmeans algorithm
with k 4
Cluster configuration produced by k-means with k
6 Over-estimates the natural number of clusters
Configuration produced by the x-means algorithm
Input range of k 2 to 15. x-means found 4
clusters from the data-set based on the Bayesian
Information Criterion
Materials Process Design and Control Laboratory
109MULTIPLE PROCESS ROUTES
Desired Youngs Modulus distribution
Stage 1 Tension a 0.9495
Stage 1 Tension a 0.9699
Stage 2 Rotation-1 a -0.2408
Stage 2 Shear-1 a 0.3384
Classification
Magnetic hysteresis loss distribution
Stage 1 Shear-1 a 0.9580
Stage 1 Shear -1 a 0.9454
Stage 2 Plane strain compression (a
-0.1597 )
Stage 2 Rotation-1 (a -0.2748)
Materials Process Design and Control Laboratory
110LIMITATIONS OF STATISTICAL LEARNING BASED DESIGN
SOLUTIONS
- Classification alone does not yield the final
design solution - Why? Since it is impossible to explore the
infinite design space within a database of
reasonable size. - Use statistical learning for providing initial
class of solutions - Use local optimization schemes (details not
given in this presentation) to identify the exact
solutions
Response surface
Objective to be minimized
Stat Learning Design solutions
Microstructure attributes
111DESIGN FOR DESIRED ODF A MULTI STAGE PROBLEM
Desired ODF
Optimal- Reduced order control
Stage 1 Plane strain compression (a1 0.9472)
Stage 2 Compression (a2 -0.2847)
Initial guess, a1 0.65, a2 -0.1
Full order ODF based on reduced order control
parameters
Materials Process Design and Control Laboratory
112DESIGN FOR DESIRED MAGNETIC PROPERTY
Crystal lt100gt direction. Easy direction of
magnetization zero power loss
h
External magnetization direction
Stage 1 Shear 1 (a1 0.9745)
Stage 2 Tension (a2 0.4821)
Materials Process Design and Control Laboratory
113DESIGN FOR DESIRED YOUNGS MODULUS
Stiffness of F.C.C Cu in crystal frame
Elastic modulus is found using the polycrystal
average ltCgt over the ODF as,
Stage 1 Shear (a1 -0.03579)
Stage 2 Tension (a2 0.17339)
Materials Process Design and Control Laboratory
114WHAT WE SHOULD KNOW
Appreciate the uses and understand the
limitations of statistical learning applied to
materials
- How to learn microstructure/process/property
relationships given computational and
experimental data - Be happy with probabilistic tools Bayesian
analytics and Gaussian mixture models - Understand simple tools like K-MEANS that can be
readily used. - Understand SVMs as a versatile statistical
learning tool For both feature selection and
classification - Apply statistical learning to perform real-time
decisions under high degrees of uncertainty
115USEFUL REFERENCES
- Andrew Moores Statistical learning course
online - http//www-2.cs.cmu.edu/awm/tutorials/
- Books
- R.O. Duda, P.E. Hart and D.G. Stork, Pattern
classification (2nd ed), John Wiley and Sons, New
York (2001). - Example papers on microstructure/materials
related applications for the tools presented in
this talk - V. Sundararaghavan and N. Zabaras, "A dynamic
material library for the representation of single
phase polyhedral microstructures", Acta
Materialia, Vol. 52/14, pp. 4111-4119, 2004 - V. Sundararaghavan and N. Zabaras,
"Classification of three-dimensional
microstructures using support vector machines",
Computational Materials Science, Vol. 32, pp.
223-239, 2005 - V. Sundararaghavan and N. Zabaras, "On the
synergy between classification of textures and
deformation process sequence selection", Acta
Materialia, Vol. 53/4, pp. 1015-1027, 2005 - T J Sabin, C A L Bailer-Jones and P J Withers,
Accelerated learning using Gaussian process
models to predict static recrystallization in an
AlMg alloy, Modelling Simul. Mater. Sci. Eng. 8
(2000) 687706 - C. A. L. Bailer-Jones, H. K. D. H. Bhadeshia and
D. J. C. MacKay, Gaussian Process Modelling of
Austenite Formation in Steel, Materials Science
and Technology, Vol. 15, 1999, 287-294.
116THANK YOU