Title: Discrimination and Classification
1Discrimination and Classification
2Discrimination
- Situation
- We have two or more populations p1, p2, etc
- (possibly p-variate normal).
- The populations are known (or we have data from
each population) - We have data for a new case (population unknown)
and we want to identify the which population for
which the new case is a member.
3The Basic Problem
- Suppose that the data from a new case x1, , xp
has joint density function either - p1 g(x1, , xn) or
- p2 h(x1, , xn)
- We want to make the decision to
D1 Classify the case in p1 (g is the correct
distribution) or D2 Classify the case in p2 (h
is the correct distribution)
4The Two Types of Errors
- Misclassifying the case in p1 when it actually
lies in p2. - Let P12 PD1p2 probability of this type
of error
- Misclassifying the case in p2 when it actually
lies in p1. - Let P21 PD2p1 probability of this type
of error
This is similar Type I and Type II errors in
hypothesis testing.
5Note
A discrimination scheme is defined by splitting p
dimensional space into two regions.
- C1 the region were we make the decision D1.
(the decision to classify the case in p1)
- C2 the region were we make the decision D2.
(the decision to classify the case in p2)
6There can be several approaches to determining
the regions C1 and C2. All concerned with taking
into account the probabilities of
misclassification P21 and P12
- Set up the regions C1 and C2 so that one of the
probabilities of misclassification , P21 say,
is at some low acceptable value a. Accept the
level of the other probability of
misclassification P12 b.
7- Set up the regions C1 and C2 so that the total
probability of misclassification
PMisclassification P1 P21 P2P12
is minimized
P1 Pthe case belongs to p1
P2 Pthe case belongs to p2
8- Set up the regions C1 and C2 so that the total
expected cost of misclassification
ECost of Misclassification ECM c21P1
P21 c12 P2P12
is minimized
P1 Pthe case belongs to p1
P2 Pthe case belongs to p2
c21 the cost of misclassifying the case in p2
when the case belongs to p1.
c12 the cost of misclassifying the case in p1
when the case belongs to p2.
9The Optimal Classification Rule
- Suppose that the data x1, , xp has joint
density function - f(x1, , xp q)
- where q is either q1 or q2.
- Let
- g(x1, , xp) f(x1, , xn q1) and
- h(x1, , xp) f(x1, , xn q2)
- We want to make the decision
- D1 q q1 (g is the correct distribution)
against - D2 q q2 (h is the correct distribution)
10then the optimal regions (minimizing ECM,
expected cost of misclassification) for making
the decisions D1 and D2 respectively are C1 and
C2
and
where
11ECM ECost of Misclassification c21P1
P21 c12 P2P12
12Thus ECM is minimized if C1 contains all of the
points (x1, , xp) such that the integrand is
negative
13- Fishers Linear Discriminant Function.
- Suppose that x1, , xp is either data from a
p-variate Normal distribution with mean vector
The covariance matrix S is the same for both
populations p1 and p2.
14The Neymann-Pearson Lemma states that we should
classify into populations p1 and p2 using
That is make the decision D1 population is p1
if l gt k
15or
or
and
16Finally we make the decision D1 population is
p1
if
where
and
Note k 1 and ln k 0 if c12 c21 and P1
P2.
17The function
Is called Fishers linear discriminant function
18In the case where the populations are unknown but
estimated from data
Fishers linear discriminant function
19(No Transcript)
20(No Transcript)
21(No Transcript)
22- Example 2
- Annual financial data are collected for firms
approximately 2 years prior to bankruptcy and for
financially sound firms at about the same point
in time. The data on the four variables - x1 CF/TD (cash flow)/(total debt),
- x2 NI/TA (net income)/(Total assets),
- x3 CA/CL (current assets)/(current
liabilties, and - x4 CA/NS (current assets)/(net sales) are
given in the following table.
23- The data are given in the following table
24Examples using SPSS
25Classification or Cluster Analysis
- Have data from one or several populations
26Situation
- Have multivariate (or univariate) data from one
or several populations (the number of populations
is unknown) - Want to determine the number of populations and
identify the populations
27Example
28(No Transcript)
29Hierarchical Clustering Methods
- The following are the steps in the agglomerative
Hierarchical clustering algorithm for grouping N
objects (items or variables).
- Start with N clusters, each consisting of a
single entity and an N X N symmetric matrix
(table) of distances (or similarities) D (dij). - Search the distance matrix for the nearest (most
similar) pair of clusters. Let the distance
between the "most similar" clusters U and V be
dUV. - Merge clusters U and V. Label the newly formed
cluster (UV). Update the entries in the distance
matrix by
- deleting the rows and columns corresponding to
clusters U and V and - adding a row and column giving the distances
between cluster (UV) and the remaining clusters.
30- Repeat steps 2 and 3 a total of N-1 times. (All
objects will be a single cluster a termination of
this algorithm.) Record the identity of clusters
that are merged and the levels (distances or
similarities) at which the mergers take place.
31Different methods of computing inter-cluster
distance
32Example
- To illustrate the single linkage algorithm, we
consider the hypothetical distance matrix between
pairs of five objects given below
33- Treating each object as a cluster, the clustering
begins by merging the two closest items (3 5). - To implement the next level of clustering we need
to compute the distances between cluster (35) and
the remaining objects - d(35)1 min3,11 3
- d(35)2 min7,10 7
- d(35)4 min9,8 8
- The new distance matrix becomes
34- The new distance matrix becomes
The next two closest clusters ((35) 1) are
merged to form cluster (135). Distances between
this cluster and the remaining clusters become
35Distances between this cluster and the remaining
clusters become d(135)2 min7,9
7 d(135)4 min8,6 6 The distance matrix
now becomes
Continuing the next two closest clusters (2 4)
are merged to form cluster (24).
36Distances between this cluster and the remaining
clusters become d(135)(24)
mind(135)2,d(135)4) min7,6 6 The final
distance matrix now becomes
At the final step clusters (135) and (24) are
merged to form the single cluster (12345) of all
five items.
37The results of this algorithm can be summarized
graphically on the following "dendogram"
38Dendograms
- for clustering the 11 languages on the basis of
the ten numerals
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44DendogramCluster Analysis of N22 Utility
companiesEuclidean distance, Average Linkage
45DendogramCluster Analysis of N22 Utility
companiesEuclidean distance, Single Linkage