Discrimination and Classification - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Discrimination and Classification

Description:

Title: Classification or Cluster Analysis Author: laverty Last modified by: laverty Created Date: 3/1/2005 3:39:29 PM Document presentation format – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 46

Provided by: laverty

Category:

more less

Transcript and Presenter's Notes

Title: Discrimination and Classification

1
Discrimination and Classification
2
Discrimination

Situation
We have two or more populations p1, p2, etc
(possibly p-variate normal).
The populations are known (or we have data from
each population)
We have data for a new case (population unknown)
and we want to identify the which population for
which the new case is a member.

3
The Basic Problem

Suppose that the data from a new case x1, , xp
has joint density function either
p1 g(x1, , xn) or
p2 h(x1, , xn)
We want to make the decision to

D1 Classify the case in p1 (g is the correct
distribution) or D2 Classify the case in p2 (h
is the correct distribution)
4
The Two Types of Errors

Misclassifying the case in p1 when it actually
lies in p2.
Let P12 PD1p2 probability of this type
of error

Misclassifying the case in p2 when it actually
lies in p1.
Let P21 PD2p1 probability of this type
of error

This is similar Type I and Type II errors in
hypothesis testing.
5
Note
A discrimination scheme is defined by splitting p
dimensional space into two regions.

C1 the region were we make the decision D1.
(the decision to classify the case in p1)

C2 the region were we make the decision D2.
(the decision to classify the case in p2)

6
There can be several approaches to determining
the regions C1 and C2. All concerned with taking
into account the probabilities of
misclassification P21 and P12

Set up the regions C1 and C2 so that one of the
probabilities of misclassification , P21 say,
is at some low acceptable value a. Accept the
level of the other probability of
misclassification P12 b.

Set up the regions C1 and C2 so that the total
probability of misclassification

PMisclassification P1 P21 P2P12
is minimized
P1 Pthe case belongs to p1
P2 Pthe case belongs to p2
8

Set up the regions C1 and C2 so that the total
expected cost of misclassification

ECost of Misclassification ECM c21P1
P21 c12 P2P12
is minimized
P1 Pthe case belongs to p1
P2 Pthe case belongs to p2
c21 the cost of misclassifying the case in p2
when the case belongs to p1.
c12 the cost of misclassifying the case in p1
when the case belongs to p2.
9
The Optimal Classification Rule

Suppose that the data x1, , xp has joint
density function
f(x1, , xp q)
where q is either q1 or q2.
Let
g(x1, , xp) f(x1, , xn q1) and
h(x1, , xp) f(x1, , xn q2)
We want to make the decision
D1 q q1 (g is the correct distribution)
against
D2 q q2 (h is the correct distribution)

10
then the optimal regions (minimizing ECM,
expected cost of misclassification) for making
the decisions D1 and D2 respectively are C1 and
C2
and
where
11
ECM ECost of Misclassification c21P1
P21 c12 P2P12

Proof

Therefore

Thus ECM is minimized if C1 contains all of the
points (x1, , xp) such that the integrand is
negative
13

Fishers Linear Discriminant Function.
Suppose that x1, , xp is either data from a
p-variate Normal distribution with mean vector

The covariance matrix S is the same for both
populations p1 and p2.
14
The Neymann-Pearson Lemma states that we should
classify into populations p1 and p2 using
That is make the decision D1 population is p1
if l gt k
15
or
or
and
16
Finally we make the decision D1 population is
p1
if
where
and
Note k 1 and ln k 0 if c12 c21 and P1
P2.
17
The function
Is called Fishers linear discriminant function
18
In the case where the populations are unknown but
estimated from data
Fishers linear discriminant function
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22

Example 2
Annual financial data are collected for firms
approximately 2 years prior to bankruptcy and for
financially sound firms at about the same point
in time. The data on the four variables
x1 CF/TD (cash flow)/(total debt),
x2 NI/TA (net income)/(Total assets),
x3 CA/CL (current assets)/(current
liabilties, and
x4 CA/NS (current assets)/(net sales) are
given in the following table.

The data are given in the following table

24
Examples using SPSS
25
Classification or Cluster Analysis

Have data from one or several populations

26
Situation

Have multivariate (or univariate) data from one
or several populations (the number of populations
is unknown)
Want to determine the number of populations and
identify the populations

27
Example
28
(No Transcript)
29
Hierarchical Clustering Methods

The following are the steps in the agglomerative
Hierarchical clustering algorithm for grouping N
objects (items or variables).

Start with N clusters, each consisting of a
single entity and an N X N symmetric matrix
(table) of distances (or similarities) D (dij).
Search the distance matrix for the nearest (most
similar) pair of clusters. Let the distance
between the "most similar" clusters U and V be
dUV.
Merge clusters U and V. Label the newly formed
cluster (UV). Update the entries in the distance
matrix by

deleting the rows and columns corresponding to
clusters U and V and
adding a row and column giving the distances
between cluster (UV) and the remaining clusters.

Repeat steps 2 and 3 a total of N-1 times. (All
objects will be a single cluster a termination of
this algorithm.) Record the identity of clusters
that are merged and the levels (distances or
similarities) at which the mergers take place.

31
Different methods of computing inter-cluster
distance
32
Example

To illustrate the single linkage algorithm, we
consider the hypothetical distance matrix between
pairs of five objects given below

Treating each object as a cluster, the clustering
begins by merging the two closest items (3 5).
To implement the next level of clustering we need
to compute the distances between cluster (35) and
the remaining objects
d(35)1 min3,11 3
d(35)2 min7,10 7
d(35)4 min9,8 8
The new distance matrix becomes

The new distance matrix becomes

The next two closest clusters ((35) 1) are
merged to form cluster (135). Distances between
this cluster and the remaining clusters become
35
Distances between this cluster and the remaining
clusters become d(135)2 min7,9
7 d(135)4 min8,6 6 The distance matrix
now becomes
Continuing the next two closest clusters (2 4)
are merged to form cluster (24).
36
Distances between this cluster and the remaining
clusters become d(135)(24)
mind(135)2,d(135)4) min7,6 6 The final
distance matrix now becomes
At the final step clusters (135) and (24) are
merged to form the single cluster (12345) of all
five items.
37
The results of this algorithm can be summarized
graphically on the following "dendogram"
38
Dendograms

for clustering the 11 languages on the basis of
the ten numerals

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
DendogramCluster Analysis of N22 Utility
companiesEuclidean distance, Average Linkage
45
DendogramCluster Analysis of N22 Utility
companiesEuclidean distance, Single Linkage

Write a Comment

User Comments (0)