Basics of discriminant analysis - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

Basics of discriminant analysis

Description:

Assume that we have several groups and we know that our observations must belong ... In two dimensional case we may have ellipse, hyperbola, parabola, two lines. ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 16

Provided by: gar115

Category:

more less

Transcript and Presenter's Notes

Title: Basics of discriminant analysis

1
Basics of discriminant analysis

Purpose
Various situations
Examples
R commands
Demonstration

2
Purpose

Assume that we have several groups and we know
that our observations must belong to one of these
groups. For example we may know that there are
several diseases and by symptoms we want to
decide which disease we are dealing with. Or we
may have several species of plants. When we
observe various characteristics of some specie we
want to know to which specie it belongs to.
We want to divide our space into regions and when
we observe an observation then we decide which
region it belongs to. Each region is assigned to
one of the classes. If an observation belongs to
the region number k then we say that this
observation belongs to class number k.
In the picture we have 3 regions. If an
observation belongs to the region 1 then we
decide that it is a member of the class 1.
Discriminant analysis is widely used in many
fields. For example it is an integral part of
neural networks.

2D example
3
1
2
3
Various situations

There can be several situations
We know the distribution for each class (it is an
unrealistic assumption). Then the problem becomes
easy. If we have an observation then calculate
probability of this observation using formula for
each class. Whichever has the maximum value that
wins.
We know the form of the distributions but we do
not know their parameters. For example we may
know that distribution for each class is normal
but we do not know mean and variances for these
distributions. Then we need to have
representatives for each class. Once we have
representatives we can estimate parameters of the
distributions (mean and variance for the normal
case). When we have new observation we can use
these parameters as true parameters and calculate
probabilities. Again the largest probability wins
When we have prior probabilities. E.g. in the
case of diseases we may know that one of them has
prior probability of 0.7 and another one may have
prior probability 0.3. In this case we can use
these probabilities when we calculate probability
of the observation by simple multiplications

4
Various situations Unknown parameters

If we know that probability distributions are
normal then we have two cases
Variances of these distributions are same
In this case space is divided by hyperplanes. In
one dimensional case with two classes we have one
point that divides line into two regions. This
point is in the middle of means for two
distributions. In two dimensional case with two
classes we have a line that divides space into
two regions. This lines intersects line segment
joining to means of distributions in the middle.
In three dimensional space we will have planes.
Variances are different
In this case we will have shapes defined by
quadratic forms that divide space into regions.
In one dimensional case we will have two points.
In two dimensional case we may have ellipse,
hyperbola, parabola, two lines. Form of these
lines are dependent on the differences of
variances. In three dimensional case we can have
ellipsoid, hyrperboloid etc.

5
Maximum likelihood discriminant analysis

Let us assume that we have g populations
(groups). Each of the population has the
probability distribution Li(x). Then for an
observation likelihood of all populations is
calculated and the population with the largest
likelihood is taken. If two of the populations
have the same likelihood then one of them can be
chosen. Let us assume we are dealing with one
dimensional populations and their distributions
are normal. Moreover let us assume that we have
only two populations then we will have
This quadratic inequality divides real numbers
line into two regions. When this inequality is
satisfied then the observation belongs to the
class 1 otherwise it belongs to the class 2. When
variances are equal then we have a linear
inequality. Then if ?1 gt ? 2 and xgt(?1 ?2)/2
then this rule puts x into group 1.
Multidimensional cases are similar to one
dimensional cases except inequalities are
multidimensional. When variances are equal then
the space is divided by a hyperplane (line in two
dimensional case)
If parameters of the distributions are not known
they are calculated using given observations

6
Distributions with equal and known variances 1D
example

The probability distributions for classes are
known. They are normal. Variances for both of
them are 1. One of them has mean value 5 and
another one has 8. Anything below 6.5 belongs to
class 1 and anything above 6.5 belongs to class
2. Observation with value 6.5 can belong to both
classes
The observations a and b will be assigned to
class 1 and the observations c and d will be
assigned to class 2. Anything smaller than the
middle of two means will be assigned to the call
1 and anything bigger than this value will belong
to class 2.

class 1
class 2
2
1
distrimination point
a
b c d
new observations
7
Distributions with known but different variances
1D example

Assume that we have two classes. Probability
distributions for both of them is normal. Means
and variances of distributions are known. One of
the distributions is much sharper than another
one. In this case the probability of the
observation b for the class 2 is higher than that
for the class 1. Probability of c for the class 1
is higher than for the class 2. Probability of
observation a, although very small, for the class
1 is higher than for the class 1. Thus the
observations a, c ,d will be assigned to the
class 1 and the observation b to the class 2.
Very small and large observation will belong to
class 1 and medium observations to class 2.

Interval for class 1
class 2
class 1
a
b c d
new observations
8
Two dimensional example

In two dimensional case we want to divide the
whole plane into two two (or more) sections. If
new observations belong to one of these regions
then we decide its class number. Red dot is on
the region corresponding to class 1 and Blue dot
is on the region belonging to class 1.
Parameters of the distributions are calculated
using sample points (shown by small black dots).
There are 50 observations for each class. If it
turns out that variances of distributions are
equal then we will have linear discriminations.
If variances would be unequal then we would have
quadratic discriminations (lines would be
quadratic).

Discrimination line
class 1
class 2
new observations
9
Likelihood ratio discriminant analysis

Likelihood ratio discriminant rule is a technique
that puts a given observation to the group that
is being tested and parameters are re-estimated.
It is done for each group. Observation is
allocated to a group that has the largest
likelihood.
This technique tend to put an observation to a
population that has larger sample size.

10
Fishers discriminant function

Fishers discrimination rule maximises the ratio
of between groups sum of squares to within group
sum of squares
Where W is the within group sum of squares
n is the total number of observations, g is the
number of groups, i, ni is the number of
observations in the group i. There are several
ways of calculating between groups sum of
squares. One popular way is a weighted sum of
squares.
Then problem of finding discrimination rule
reduces to finding maximum eigenvalue and
corresponding eigenvector of the matrix W-1B. New
observation x is put into the group i if the
following inequality holds

11
When parameters of distributions are unknown

In general the problem consists of two parts
Classification. At this stage space is divided
into regions and each region belongs to one
class. In some sense it means that we need to
find a function or inequalities to divide space
into parts. It is done usually by probability
distribution for each class. In a way this stage
can be considered as a rule generation.
Discrimination. Once space has been partitioned
or rules have been generated then using these
rules new observations are assigned to classes
Note that if each observation belongs to one
class only, then it is a deterministic rule.
There are other rules also. One of them is fuzzy
rules. Each observation has degree of belongness
to a class. For example observation may belong to
class 1 with degree equal to 0.7 and to class 2
with degree 0.3.

12
Probability of misclassification

Let us assume we have g groups (classes).
Probability of misclassification is defined as
probality of putting an observation to the class
i when it is from the class j. It is denoted as
pij. In particular probability of correct
allocation for the class i is pii and probability
of misclassification for this call is 1-pii.
Assume that we have two discriminant rules - d
and d. It is said that discriminant rule d is as
good as d if
pii? pii for i1,,,g
d is better than d if at least in one of the
cases inequality is strict. If there is no better
rule than d then it is called an admissible rule.
In general it may happen that it is not possible
to compare two rules. For example it may happen
that p11gtp11 but p22ltp22.

13
Resampling and misclassification

Resubstitution Estimate disciminant rule and
then for each observation calculate probability
of misclassification. Problem with this technique
is that it gives, as expected, optimistic
estimation.
Jacknife From each class one observation in turn
removed, discriminant rule is defined. Removed
observation is predicted. Then probability of
misclassification is calculated using ni1/n1.
Where n1 is the number of observation in the
first group, ni1 is number of cases when
observation from group 1 was classified as
belonging to group i. Similar misclassification
probability is calculated for each class.
Bootstrap Resample the sample of observations.
There several techniques that applies bootstrap.
One of them is described here.
First calculate misclassification probabilities
using resubstitution. Denote it by eai. There are
two ways Resample all observations
simultaneously or resample each group (i.e. take
a sample of n1 points from the group 1 etc). Then
define discrimination rule and then estimate
probabilities of misclassification for bootstrap
sample and for the original sample. Denote them
epib and pib. Calculate differences dibepib-pib.
Repeat it B times and averag. It is the bootstrap
bias correction. Then probability of
misclassification is eai-ltdgt

14
R commands for discriminant analysis

Commands for discrimnant analyses are in the
library MASS. This library should be loaded.
library(MASS)
Necessary commands are
lda linear discriminant analysis. Using given
observation this command calculates
discrimination lines (hyperplanes)
qda quadratic discrimination analysis. This
command calculates necessary equations. It does
not assume equality of the variances.
predict For new observation it makes decision
to which class it belongs.
Example of uses
z lda(data,groupinggroupings)
predict(z,newobservations)
Similarly for quadratic discriminant analysis
z qda(data,groupinsgroupings)
predict(z,newobservations)class
data are data matrix given us for discrimination
rule calculations. They can be considered as a
data set for training. grooupings defines which
observation belongs to which class.

15
References

Krzanowski WJ and Marriout FHC. (1994)
Multivatiate analysis. Kendalls library of
statistics
Mardia, K.V. Kent, J.T. and Bibby, J.M. (2003)
Multivariate analysis

Write a Comment

User Comments (0)