Discrimination and Classification - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Discrimination and Classification

Description:

Title: Classification or Cluster Analysis Author: laverty Last modified by: laverty Created Date: 3/1/2005 3:39:29 PM Document presentation format – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 46
Provided by: laverty
Category:

less

Transcript and Presenter's Notes

Title: Discrimination and Classification


1
Discrimination and Classification
2
Discrimination
  • Situation
  • We have two or more populations p1, p2, etc
  • (possibly p-variate normal).
  • The populations are known (or we have data from
    each population)
  • We have data for a new case (population unknown)
    and we want to identify the which population for
    which the new case is a member.

3
The Basic Problem
  • Suppose that the data from a new case x1, , xp
    has joint density function either
  • p1 g(x1, , xn) or
  • p2 h(x1, , xn)
  • We want to make the decision to

D1 Classify the case in p1 (g is the correct
distribution) or D2 Classify the case in p2 (h
is the correct distribution)
4
The Two Types of Errors
  • Misclassifying the case in p1 when it actually
    lies in p2.
  • Let P12 PD1p2 probability of this type
    of error
  • Misclassifying the case in p2 when it actually
    lies in p1.
  • Let P21 PD2p1 probability of this type
    of error

This is similar Type I and Type II errors in
hypothesis testing.
5
Note
A discrimination scheme is defined by splitting p
dimensional space into two regions.
  • C1 the region were we make the decision D1.
    (the decision to classify the case in p1)
  • C2 the region were we make the decision D2.
    (the decision to classify the case in p2)

6
There can be several approaches to determining
the regions C1 and C2. All concerned with taking
into account the probabilities of
misclassification P21 and P12
  • Set up the regions C1 and C2 so that one of the
    probabilities of misclassification , P21 say,
    is at some low acceptable value a. Accept the
    level of the other probability of
    misclassification P12 b.

7
  • Set up the regions C1 and C2 so that the total
    probability of misclassification

PMisclassification P1 P21 P2P12
is minimized
P1 Pthe case belongs to p1
P2 Pthe case belongs to p2
8
  • Set up the regions C1 and C2 so that the total
    expected cost of misclassification

ECost of Misclassification ECM c21P1
P21 c12 P2P12
is minimized
P1 Pthe case belongs to p1
P2 Pthe case belongs to p2
c21 the cost of misclassifying the case in p2
when the case belongs to p1.
c12 the cost of misclassifying the case in p1
when the case belongs to p2.
9
The Optimal Classification Rule
  • Suppose that the data x1, , xp has joint
    density function
  • f(x1, , xp q)
  • where q is either q1 or q2.
  • Let
  • g(x1, , xp) f(x1, , xn q1) and
  • h(x1, , xp) f(x1, , xn q2)
  • We want to make the decision
  • D1 q q1 (g is the correct distribution)
    against
  • D2 q q2 (h is the correct distribution)

10
then the optimal regions (minimizing ECM,
expected cost of misclassification) for making
the decisions D1 and D2 respectively are C1 and
C2
and
where
11
ECM ECost of Misclassification c21P1
P21 c12 P2P12
  • Proof

12
  • Therefore

Thus ECM is minimized if C1 contains all of the
points (x1, , xp) such that the integrand is
negative
13
  • Fishers Linear Discriminant Function.
  • Suppose that x1, , xp is either data from a
    p-variate Normal distribution with mean vector

The covariance matrix S is the same for both
populations p1 and p2.
14
The Neymann-Pearson Lemma states that we should
classify into populations p1 and p2 using
That is make the decision D1 population is p1
if l gt k
15
or
or
and
16
Finally we make the decision D1 population is
p1
if
where
and
Note k 1 and ln k 0 if c12 c21 and P1
P2.
17
The function
Is called Fishers linear discriminant function
18
In the case where the populations are unknown but
estimated from data
Fishers linear discriminant function
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
  • Example 2
  • Annual financial data are collected for firms
    approximately 2 years prior to bankruptcy and for
    financially sound firms at about the same point
    in time. The data on the four variables
  • x1 CF/TD (cash flow)/(total debt),
  • x2 NI/TA (net income)/(Total assets),
  • x3 CA/CL (current assets)/(current
    liabilties, and
  • x4 CA/NS (current assets)/(net sales) are
    given in the following table.

23
  • The data are given in the following table

24
Examples using SPSS
25
Classification or Cluster Analysis
  • Have data from one or several populations

26
Situation
  • Have multivariate (or univariate) data from one
    or several populations (the number of populations
    is unknown)
  • Want to determine the number of populations and
    identify the populations

27
Example
28
(No Transcript)
29
Hierarchical Clustering Methods
  • The following are the steps in the agglomerative
    Hierarchical clustering algorithm for grouping N
    objects (items or variables).
  1. Start with N clusters, each consisting of a
    single entity and an N X N symmetric matrix
    (table) of distances (or similarities) D (dij).
  2. Search the distance matrix for the nearest (most
    similar) pair of clusters. Let the distance
    between the "most similar" clusters U and V be
    dUV.
  3. Merge clusters U and V. Label the newly formed
    cluster (UV). Update the entries in the distance
    matrix by
  1. deleting the rows and columns corresponding to
    clusters U and V and
  2. adding a row and column giving the distances
    between cluster (UV) and the remaining clusters.

30
  1. Repeat steps 2 and 3 a total of N-1 times. (All
    objects will be a single cluster a termination of
    this algorithm.) Record the identity of clusters
    that are merged and the levels (distances or
    similarities) at which the mergers take place.

31
Different methods of computing inter-cluster
distance
32
Example
  • To illustrate the single linkage algorithm, we
    consider the hypothetical distance matrix between
    pairs of five objects given below

33
  • Treating each object as a cluster, the clustering
    begins by merging the two closest items (3 5).
  • To implement the next level of clustering we need
    to compute the distances between cluster (35) and
    the remaining objects
  • d(35)1 min3,11 3
  • d(35)2 min7,10 7
  • d(35)4 min9,8 8
  • The new distance matrix becomes

34
  • The new distance matrix becomes

The next two closest clusters ((35) 1) are
merged to form cluster (135). Distances between
this cluster and the remaining clusters become
35
Distances between this cluster and the remaining
clusters become d(135)2 min7,9
7 d(135)4 min8,6 6 The distance matrix
now becomes
Continuing the next two closest clusters (2 4)
are merged to form cluster (24).
36
Distances between this cluster and the remaining
clusters become d(135)(24)
mind(135)2,d(135)4) min7,6 6 The final
distance matrix now becomes
At the final step clusters (135) and (24) are
merged to form the single cluster (12345) of all
five items.
37
The results of this algorithm can be summarized
graphically on the following "dendogram"
38
Dendograms
  • for clustering the 11 languages on the basis of
    the ten numerals

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
DendogramCluster Analysis of N22 Utility
companiesEuclidean distance, Average Linkage
45
DendogramCluster Analysis of N22 Utility
companiesEuclidean distance, Single Linkage
Write a Comment
User Comments (0)
About PowerShow.com