Basics of discriminant analysis - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Basics of discriminant analysis

Description:

Assume that we have several groups and we know that our observations must belong ... In two dimensional case we may have ellipse, hyperbola, parabola, two lines. ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 16
Provided by: gar115
Category:

less

Transcript and Presenter's Notes

Title: Basics of discriminant analysis


1
Basics of discriminant analysis
  • Purpose
  • Various situations
  • Examples
  • R commands
  • Demonstration

2
Purpose
  • Assume that we have several groups and we know
    that our observations must belong to one of these
    groups. For example we may know that there are
    several diseases and by symptoms we want to
    decide which disease we are dealing with. Or we
    may have several species of plants. When we
    observe various characteristics of some specie we
    want to know to which specie it belongs to.
  • We want to divide our space into regions and when
    we observe an observation then we decide which
    region it belongs to. Each region is assigned to
    one of the classes. If an observation belongs to
    the region number k then we say that this
    observation belongs to class number k.
  • In the picture we have 3 regions. If an
    observation belongs to the region 1 then we
    decide that it is a member of the class 1.
  • Discriminant analysis is widely used in many
    fields. For example it is an integral part of
    neural networks.

2D example
3
1
2
3
Various situations
  • There can be several situations
  • We know the distribution for each class (it is an
    unrealistic assumption). Then the problem becomes
    easy. If we have an observation then calculate
    probability of this observation using formula for
    each class. Whichever has the maximum value that
    wins.
  • We know the form of the distributions but we do
    not know their parameters. For example we may
    know that distribution for each class is normal
    but we do not know mean and variances for these
    distributions. Then we need to have
    representatives for each class. Once we have
    representatives we can estimate parameters of the
    distributions (mean and variance for the normal
    case). When we have new observation we can use
    these parameters as true parameters and calculate
    probabilities. Again the largest probability wins
  • When we have prior probabilities. E.g. in the
    case of diseases we may know that one of them has
    prior probability of 0.7 and another one may have
    prior probability 0.3. In this case we can use
    these probabilities when we calculate probability
    of the observation by simple multiplications

4
Various situations Unknown parameters
  • If we know that probability distributions are
    normal then we have two cases
  • Variances of these distributions are same
  • In this case space is divided by hyperplanes. In
    one dimensional case with two classes we have one
    point that divides line into two regions. This
    point is in the middle of means for two
    distributions. In two dimensional case with two
    classes we have a line that divides space into
    two regions. This lines intersects line segment
    joining to means of distributions in the middle.
    In three dimensional space we will have planes.
  • Variances are different
  • In this case we will have shapes defined by
    quadratic forms that divide space into regions.
    In one dimensional case we will have two points.
    In two dimensional case we may have ellipse,
    hyperbola, parabola, two lines. Form of these
    lines are dependent on the differences of
    variances. In three dimensional case we can have
    ellipsoid, hyrperboloid etc.

5
Maximum likelihood discriminant analysis
  • Let us assume that we have g populations
    (groups). Each of the population has the
    probability distribution Li(x). Then for an
    observation likelihood of all populations is
    calculated and the population with the largest
    likelihood is taken. If two of the populations
    have the same likelihood then one of them can be
    chosen. Let us assume we are dealing with one
    dimensional populations and their distributions
    are normal. Moreover let us assume that we have
    only two populations then we will have
  • This quadratic inequality divides real numbers
    line into two regions. When this inequality is
    satisfied then the observation belongs to the
    class 1 otherwise it belongs to the class 2. When
    variances are equal then we have a linear
    inequality. Then if ?1 gt ? 2 and xgt(?1 ?2)/2
    then this rule puts x into group 1.
  • Multidimensional cases are similar to one
    dimensional cases except inequalities are
    multidimensional. When variances are equal then
    the space is divided by a hyperplane (line in two
    dimensional case)
  • If parameters of the distributions are not known
    they are calculated using given observations

6
Distributions with equal and known variances 1D
example
  • The probability distributions for classes are
    known. They are normal. Variances for both of
    them are 1. One of them has mean value 5 and
    another one has 8. Anything below 6.5 belongs to
    class 1 and anything above 6.5 belongs to class
    2. Observation with value 6.5 can belong to both
    classes
  • The observations a and b will be assigned to
    class 1 and the observations c and d will be
    assigned to class 2. Anything smaller than the
    middle of two means will be assigned to the call
    1 and anything bigger than this value will belong
    to class 2.

class 1
class 2
2
1
distrimination point
a
b c d
new observations
7
Distributions with known but different variances
1D example
  • Assume that we have two classes. Probability
    distributions for both of them is normal. Means
    and variances of distributions are known. One of
    the distributions is much sharper than another
    one. In this case the probability of the
    observation b for the class 2 is higher than that
    for the class 1. Probability of c for the class 1
    is higher than for the class 2. Probability of
    observation a, although very small, for the class
    1 is higher than for the class 1. Thus the
    observations a, c ,d will be assigned to the
    class 1 and the observation b to the class 2.
    Very small and large observation will belong to
    class 1 and medium observations to class 2.

Interval for class 1
class 2
class 1
a
b c d
new observations
8
Two dimensional example
  • In two dimensional case we want to divide the
    whole plane into two two (or more) sections. If
    new observations belong to one of these regions
    then we decide its class number. Red dot is on
    the region corresponding to class 1 and Blue dot
    is on the region belonging to class 1.
  • Parameters of the distributions are calculated
    using sample points (shown by small black dots).
    There are 50 observations for each class. If it
    turns out that variances of distributions are
    equal then we will have linear discriminations.
    If variances would be unequal then we would have
    quadratic discriminations (lines would be
    quadratic).

Discrimination line
class 1
class 2
new observations
9
Likelihood ratio discriminant analysis
  • Likelihood ratio discriminant rule is a technique
    that puts a given observation to the group that
    is being tested and parameters are re-estimated.
    It is done for each group. Observation is
    allocated to a group that has the largest
    likelihood.
  • This technique tend to put an observation to a
    population that has larger sample size.

10
Fishers discriminant function
  • Fishers discrimination rule maximises the ratio
    of between groups sum of squares to within group
    sum of squares
  • Where W is the within group sum of squares
  • n is the total number of observations, g is the
    number of groups, i, ni is the number of
    observations in the group i. There are several
    ways of calculating between groups sum of
    squares. One popular way is a weighted sum of
    squares.
  • Then problem of finding discrimination rule
    reduces to finding maximum eigenvalue and
    corresponding eigenvector of the matrix W-1B. New
    observation x is put into the group i if the
    following inequality holds

11
When parameters of distributions are unknown
  • In general the problem consists of two parts
  • Classification. At this stage space is divided
    into regions and each region belongs to one
    class. In some sense it means that we need to
    find a function or inequalities to divide space
    into parts. It is done usually by probability
    distribution for each class. In a way this stage
    can be considered as a rule generation.
  • Discrimination. Once space has been partitioned
    or rules have been generated then using these
    rules new observations are assigned to classes
  • Note that if each observation belongs to one
    class only, then it is a deterministic rule.
    There are other rules also. One of them is fuzzy
    rules. Each observation has degree of belongness
    to a class. For example observation may belong to
    class 1 with degree equal to 0.7 and to class 2
    with degree 0.3.

12
Probability of misclassification
  • Let us assume we have g groups (classes).
    Probability of misclassification is defined as
    probality of putting an observation to the class
    i when it is from the class j. It is denoted as
    pij. In particular probability of correct
    allocation for the class i is pii and probability
    of misclassification for this call is 1-pii.
  • Assume that we have two discriminant rules - d
    and d. It is said that discriminant rule d is as
    good as d if
  • pii? pii for i1,,,g
  • d is better than d if at least in one of the
    cases inequality is strict. If there is no better
    rule than d then it is called an admissible rule.
  • In general it may happen that it is not possible
    to compare two rules. For example it may happen
    that p11gtp11 but p22ltp22.

13
Resampling and misclassification
  • Resubstitution Estimate disciminant rule and
    then for each observation calculate probability
    of misclassification. Problem with this technique
    is that it gives, as expected, optimistic
    estimation.
  • Jacknife From each class one observation in turn
    removed, discriminant rule is defined. Removed
    observation is predicted. Then probability of
    misclassification is calculated using ni1/n1.
    Where n1 is the number of observation in the
    first group, ni1 is number of cases when
    observation from group 1 was classified as
    belonging to group i. Similar misclassification
    probability is calculated for each class.
  • Bootstrap Resample the sample of observations.
    There several techniques that applies bootstrap.
    One of them is described here.
  • First calculate misclassification probabilities
    using resubstitution. Denote it by eai. There are
    two ways Resample all observations
    simultaneously or resample each group (i.e. take
    a sample of n1 points from the group 1 etc). Then
    define discrimination rule and then estimate
    probabilities of misclassification for bootstrap
    sample and for the original sample. Denote them
    epib and pib. Calculate differences dibepib-pib.
    Repeat it B times and averag. It is the bootstrap
    bias correction. Then probability of
    misclassification is eai-ltdgt

14
R commands for discriminant analysis
  • Commands for discrimnant analyses are in the
    library MASS. This library should be loaded.
  • library(MASS)
  • Necessary commands are
  • lda linear discriminant analysis. Using given
    observation this command calculates
    discrimination lines (hyperplanes)
  • qda quadratic discrimination analysis. This
    command calculates necessary equations. It does
    not assume equality of the variances.
  • predict For new observation it makes decision
    to which class it belongs.
  • Example of uses
  • z lda(data,groupinggroupings)
  • predict(z,newobservations)
  • Similarly for quadratic discriminant analysis
  • z qda(data,groupinsgroupings)
  • predict(z,newobservations)class
  • data are data matrix given us for discrimination
    rule calculations. They can be considered as a
    data set for training. grooupings defines which
    observation belongs to which class.

15
References
  • Krzanowski WJ and Marriout FHC. (1994)
    Multivatiate analysis. Kendalls library of
    statistics
  • Mardia, K.V. Kent, J.T. and Bibby, J.M. (2003)
    Multivariate analysis
Write a Comment
User Comments (0)
About PowerShow.com