Chapter 4 : Linear Methods for Classification - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Chapter 4 : Linear Methods for Classification

Description:

hyperplane or affine set L : defined by the equation (= a line in ) ... find hyperplane that minimizes some measure of overlap in the training data. least square ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 21

Provided by: bug94

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 4 : Linear Methods for Classification

1
Chapter 4 Linear Methods for Classification

Linear regression of an indicator matrix
Linear discriminant analysis
Logistic regression
Separating hyperplanes

In this chapter, decision boudaries are linear.
2
4.2. Linear regression of an indicator matrix
indicator
indicator response matrix
example
2 groups (K2) and 5 observations (N5)
observations 1 and 5 in group 1 observations 2, 3
and 4 in group 2
3
Linear regression model to each colums of Y
(see chapter 3 for linear regression)
model matrix
4
Classification of a new observation

compute the fitted output
identify the largest component and classify
accordingly

justification
regression estimate of conditional expectation
x in group k if max for G k
5
Linear regression to estimate conditional
expectation?
problem
can be negative or greater than 1
(if prediction outside the hull of the training
data)
probability?
BUT good results
solution linear regression onto a basis
expansion h(X) of the inputs (see chapter 5)
6
More simplistic view point construct targets
kth element
linear model by least square
classification
7
Problem with K ? 3 classes can be masked by
others
solution quadratic rather than linear fit
8
4.3. Linear discriminant analysis
density of X in class Gk
prior probability
fk(x) Gaussian and the class have a common
covariance matrix
log-ratio
is linear in x
decision boundaries are linear
discriminant function
classification
9
Remarks

with 2 classes, linear discriminant analysis
classification with linear least square
with more than 2 classes avoid masking problems
if not common covariance matrix, quadratic
discriminant analysis

10
Regularized discriminant analysis (RDA)
compromise between linear discriminant analysis
(LDA) and quadratic discriminant analysis (QDA)
regularized covariance matrix
covariance matrix used in LDA
determined by cross-validation
11
Computations

Algorithm
Sphere the data X (using eigen-decomposition of
the covariance matrix) common
covarianceidentity
classify in the transformed space

simplified by diagonalisation of covariance
matrices
(eigen-decomposition)
12
Reduced-rank linear discriminant analysis
Fisher Find the linear combination ZaTX such
that the between-class variance is maximized
relative to the within-class variance.
maximizing the Rayleigh quotient
where B between-class covariance W
within-class covariance
13
4.4. Logistic regression
model specified by K-1 log-odds or logit
transformations
14
Fitting logistic regression model
usually, by maximum likelihood (Newton-Raphson
algorithm to solve the score equations)
example K 2 (2 groups)
write
encode
log-likelihood
15
Example South african heart disease
correlation between the set of predictors
surprising results some variables not included
in the logistic model
16
Quadratic approximations and inference

quadratic approximation of deviance Pearson
chi-square statistic
if the model is correct, then consistent
(convergence to the true )
normal distribution of
model building Rao score test, Wald test.

connection with least square parameters
estimates of logistic regression
coefficients of a weigthed least square fit
weigths
17
Differences between LDA and logistic regression
same form BUT differences in the way the
coefficients are estimated
logistic regression more general, less
assumptions (arbitrary density function for X),
more robust BUT very similar results in practice
18
4.5. Separating hyperplanes
perceptron classifiers such as

vector normal to the surface L
for any point x0 in L,
the signed distance of any point x to L is given
by

hyperplane or affine set L defined by the
equation
( a line in )
properties
19
Rosenblatts perceptron learning algorithm
try to separate hyperplanes by minimizing the
distance of missclassified points to the
decisison boundary
minimize
M is the index set of missclassified points.
The algorithm uses stochastic gradient descent to
minimize this piecewise linear criterion.
20
Optimal separating hyperplanes
find hyperplane that minimizes some measure of
overlap in the training data.