Title: Discriminant Analysis
1Discriminant Analysis
2The goal of discriminant analysis is the same as
MANOVAdevelopment of a linear combination that
can best separate groups. The key difference is
that in MANOVA the groups are usually constructed
by the researcher and have some clear structure
(e.g., a 2 x 2 factorial design). In discriminant
analysis, the groups usually have no particular
structure and their formation is not under
experimental control. A superficial difference is
what are called dependent and independent
variables.
3The linear combinations that discriminant
analysis constructs will maximize the ratio of
between-groups variance to within-groups variance
for a linear combination of predictors. If more
than one linear combination can be formed,
subsequent linear combinations are independent of
prior combinations and account for as much
remaining group variation as possible.
4In this hypothetical example, data from 500
graduate students seeking jobs were examined.
Available for each student were three predictors
GRE(VQ), Years to Finish the Degree, and Number
of Publications. The outcome measure was
categorical Got a job versus Did not get a
job. Half of the sample was used to determine
the best linear combination for discriminating
the job categories. The second half of the sample
was used for cross-validation.
5DISCRIMINANT /GROUPSjob(1 2) /VARIABLESgre
pubs years /SELECTsample(1) /ANALYSIS ALL
/SAVECLASS SCORES PROBS /PRIORS SIZE
/STATISTICSMEAN STDDEV UNIVF BOXM COEFF RAW
CORR COV GCOV TCOV TABLE CROSSVALID
/PLOTCOMBINED SEPARATE MAP /PLOTCASES
/CLASSIFYNONMISSING POOLED .
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13The pooled within-groups matrix is used under the
assumption that the multiple matrices that are
combined are homogeneous.
14Casual inspection suggests some differences in
the two group matrices.
15The heterogeneity is verified by this test. But
this test is sensitive in general and sensitive
to violations of multivariate normality too.
Tests of significance in discriminant analysis
are robust to moderate violations of the
homogeneity assumption.
16(No Transcript)
17These matrices are interpreted in the same way as
in MANOVA and canonical correlation analysis.
What discriminates the groups?
18This is the raw canonical discriminant function.
The means for the groups on the raw canonical
discriminant function can be used to establish
cut-off points for classification.
19Classification can be based on distance from the
group centroids and take into account information
about prior probability of group membership.
20Of historical interest only. Fishers original
approach to classification was based on the
calculation of a classification function score
for each group. The highest score determined the
group to which a person was assigned.
21(No Transcript)
22Two modes?
23(No Transcript)
24(No Transcript)
25An analysis of variance can be conducted on the
discriminant function scores. This univariate
approach should produce the same test of
significance as the multivariate approach.
26When just a single variable is examined, then
27L 248.000/419.985 .590
28Squared canonical correlation
29Violation of the homogeneity assumption can
affect the classification. To check, the analysis
can be conducted using separate group covariance
matrices.
30No noticeable change in the accuracy of
classification.
31The group that did not get a job was actually
composed of two subgroupsthose that got
interviews but did not land a job and those that
were never interviewed. This accounts for the
bimodality in the discriminant function scores.
The discriminant analysis of the three groups
allows for the derivation of one more
discriminant function, perhaps indicating the
characteristics that separate those who get
interviews from those who dont, or, those who
have successful interviews from those whose
interviews do not produce a job offer.
32(No Transcript)
33(No Transcript)
34DISCRIMINANT /GROUPSgroup(1 3)
/VARIABLESgre pubs years /SELECTsample(1)
/ANALYSIS ALL /SAVECLASS SCORES PROBS
/PRIORS SIZE /STATISTICSMEAN STDDEV UNIVF
BOXM COEFF RAW CORR COV GCOV TCOV TABLE
CROSSVALID /PLOTCOMBINED SEPARATE MAP
/PLOTCASES /CLASSIFYNONMISSING POOLED .
35(No Transcript)
36(No Transcript)
37Separating the three groups produces better
homogeneity.
38Still significant, but just barely. Not enough to
worry about.
39Two significant linear combinations can be
derived, but they are not of equal importance.
40What do the linear combinations mean now?
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45 Territorial Map Canonical Discriminant Function
2 -6.0 -4.0 -2.0 .0
2.0 4.0 6.0
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòò
ôòòòòòòòòòô 6.0 ô
23 31 ô ó
23 31
ó ó 23
31 ó ó
23 31
ó ó 23
31 ó ó
23 31
ó 4.0 ô ô ô 23 ô
31ô ô ô ó
23 31 ó
ó 23 31
ó ó
23 31 ó
ó 23 31
ó ó
23 31 ó 2.0
ô ô ô 23 ô 31
ô ô ó
23 31 ó ó
23 31
ó ó 23
31 ó ó
23 31
ó ó 23
31 ó .0 ô
ô ô 23 ô 31
ô ó 23
31 ó ó
23 31
ó ó 23
31 ó ó
23 31 ó
ó 23
31 ó -2.0 ô ô
23 ô ô31 ô ô
ó 23 31
ó ó 23
31 ó
ó 23 31
ó ó 23
31 ó ó
23 31
ó -4.0 ô ô 23 ô
ô ô 31 ô ô ó
23 31
ó ó 23
31 ó ó
23 31
ó ó 23
31 ó ó
23 31 ó
-6.0 ô 23
31 ô ôòòòòòòòòòôòòòòòòòò
òôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
-6.0 -4.0 -2.0 .0 2.0
4.0 6.0 Canonical
Discriminant Function 1 Symbols used in
territorial map Symbol Group Label ------
----- -------------------- 1 1
Unemployed 2 2 Got a Job 3 3
Interview Only Indicates a group
centroid
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50DISCRIMINANT /GROUPSgroup(1 3)
/VARIABLESgre pubs years /SELECTsample(1)
/ANALYSIS ALL /SAVECLASS SCORES PROBS
/PRIORS SIZE /STATISTICSMEAN STDDEV UNIVF
BOXM COEFF RAW CORR COV GCOV TCOV TABLE
/PLOTCOMBINED SEPARATE MAP /PLOTCASES
/CLASSIFYNONMISSING SEPARATE .
51 Territorial
Map Canonical Discriminant Function 2 -6.0
-4.0 -2.0 .0 2.0 4.0
6.0 ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòò
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô 6.0 ô
23 31
ô ó 23
31 ó ó
23 31
ó ó 23
31 ó ó
23 31 ó
ó 23
31 ó 4.0 ô ô
23ô ô 31 ô ô
ó 23 31
ó ó
23 31 ó
ó 23 31
ó ó 23
31 ó ó
23 31
ó 2.0 ô ô ô 23
ô 31 ô ô ó
23 31
ó ó 23
31 ó ó
23 31
ó ó 23
31 ó ó
23 31
ó .0 ô ô ô 23 ô
31 ô ó
23 31 ó
ó 23
31 ó ó
23 31 ó
ó 23 31
ó ó
23 31 ó
-2.0 ô ô 23 ô 31
ô ô ó 223
31 ó
ó 233 31
ó ó 23
31 ó ó
223 31
ó ó 2233
31 ó -4.0 ô
233 ô ô 31 ô
ô ó 223
31 ó ó
2233 31
ó ó 2233
31 ó ó 2233
31
ó ó2233
31 ó -6.0 ô33
31 ô
ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô
òòòòòòòòòôòòòòòòòòòô -6.0 -4.0
-2.0 .0 2.0 4.0 6.0
Canonical Discriminant
Function 1 Symbols used in territorial
map Symbol Group Label ------ -----
-------------------- 1 1 Unemployed
2 2 Got a Job 3 3 Interview
Only Indicates a group centroid
52(No Transcript)
53- Classification
- Classification of cases in a new sample is often
the goal of a discriminant analysis (but not
MANOVAwhy?) and one way of judging the quality
or goodness of fit of the solution. - Classification can be carried out in several ways
that typically converge on the same solution - Fishers classification functions
- Raw discriminant function coefficients
- Mahalanobis distance
54In Fishers method, a classification function is
derived for each group. The original data are
used to estimate a classification score for each
person, for each group. The person is then
assigned to the group that produces the largest
classification score.
55The method based on raw discriminant function
scores chooses an optimal cut-point along each
discriminant variate, assigning cases to groups
depending on whether the discriminant scores fall
above or below the cut-off values for one or more
discriminant variates. The cut-points are
established using weighted averages of means for
the discriminant functions. For each pair of
centroids
56n 54n 71n 125
57(No Transcript)
58An alternative and more informative approach was
developed by Mahalanobis. This distance approach
assigns a case to the group to which it is
closest in discriminant space. Unlike Euclidean
distance, the Mahalanobis distance takes the
variances and covariances of measures into
account.
59In a Cartesian coordinate system, the axes are
displayed as orthogonal. In this case that is not
really true to the nature of the data.
60Even if the data are standardized to remove
differences in scale, the orthogonal reference
system is not accurate
61q 29.9
cos(q) .867
We could make use of information about the
correlation between the variables to construct
reference vectors that better match the data.
With standardized data, the correlation is the
cosine of the angle between the reference vectors
defined by the variables.
62(No Transcript)
63(No Transcript)
64(No Transcript)
65The squared distance between each case and each
group centroid can be calculated, and assignment
to a group can be based on the shortest distance.
66Under the assumption of multivariate normality,
the probability of particular profile of scores
given a particular group centroid and pooled
covariance matrix can be estimated
67The probability of each case given each group can
be calculated. Using Bayess theorem, the
combined information can be used to calculated
the posterior probability of group membership
given a particular profile of scores.
68The posterior probabilities can be modified to
take prior probability of group membership into
account
69- The case is assigned to the group with the
highest posterior probability. The advantage of
this approach over the other classification
procedures is - An usual case is easily identified by a low
probability of the case given membership in any
of the groups (PxGroup). - The confidence in classification can be gauged by
the magnitude of the posterior probability
(P(Groupx).
70(No Transcript)
71(No Transcript)
72Is the classification better than would be
expected by chance? We first need to know the
correct classification that would occur by chance
73Is the classification better than would be
expected by chance? We first need to know the
correct classification that would occur by chance
74The total number of correct classifications that
would occur by chance (94.328, 37.7) can be
tested against the actual number of correct
classifications given the discriminant analysis
model (214, 85.6). A t-test can be calculated
Where the denominator is the standard error of
the number of correct classifications by chance.
75The difference between chance expected and actual
classification can be tested with a chi-square as
well.
Because this is a single degree of freedom test,
t2 c2
76MANOVA, discriminant analysis and multiple
regression are all special cases of canonical
correlation analysis. This can be demonstrated by
taking the same data set and showing the parallel
results that are produced by each approach. The
following analyses were conducted on the full
sample (N 500) for the data set that included
GRE(VQ), Years to Complete Degree, and Number of
Publications as continuous variables and Job
Status (Employed, Unemployed) as a categorical
variable.
77Discriminant Analysis DISCRIMINANT
/GROUPSjob(1 2) /VARIABLESgre pubs years
/ANALYSIS ALL /PRIORS EQUAL /STATISTICSMEAN
STDDEV UNIVF BOXM COEFF RAW CORR COV GCOV
TCOV /CLASSIFYNONMISSING POOLED .
78(No Transcript)
79(No Transcript)
80MANOVA manova gre, pubs, years by
job(1,2) /print cellinfo(means) parameters
signif(singledf multiv dimenr eigen univ hypoth)
homogeneity error(cor sscp) transform /discrim
stan corr alpha(1) /power exact /design .
81Multivariate Tests of Significance (S 1, M
1/2, N 247 ) Test Name Value Exact
F Hypoth. DF Error DF Sig. of F Pillais
.41038 115.07312 3.00 496.00
.000 Hotellings .69601 115.07312
3.00 496.00 .000 Wilks
.58962 115.07312 3.00 496.00
.000 Roys .41038 Note.. F
statistics are exact. - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - -
Observed Power at .0500 Level TEST NAME
Noncent. Power (All) 345.219
1.00 - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - Eigenvalues
and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
1 .696 100.000 100.000
.641
82Standardized discriminant function coefficients
Function No. Variable 1
GRE .338 PUBS -.958
YEARS .382 - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - -
Correlations between DEPENDENT and canonical
variables Canonical Variable
Variable 1 GRE -.039
PUBS -.877 YEARS .452
83Multiple Regression REGRESSION /DESCRIPTIVES
MEAN STDDEV CORR SIG N /MISSING LISTWISE
/STATISTICS COEFF OUTS CI BCOV R ANOVA COLLIN TOL
CHANGE ZPP /CRITERIAPIN(.05) POUT(.10)
/NOORIGIN /DEPENDENT job /METHODENTER gre
pubs years .
84(No Transcript)
85(No Transcript)
86Canonical Correlation Analysis manova gre, pubs,
years with job /print cellinfo(means)
parameters signif(singledf multiv dimenr eigen
univ hypoth) homogeneity error(cor sscp)
transform /discrim stan corr alpha(1) /power
exact /design .
87Multivariate Tests of Significance (S 1, M
1/2, N 247 ) Test Name Value Exact
F Hypoth. DF Error DF Sig. of F Pillais
.41038 115.07312 3.00 496.00
.000 Hotellings .69601 115.07312
3.00 496.00 .000 Wilks
.58962 115.07312 3.00 496.00
.000 Roys .41038 Note.. F
statistics are exact. - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - -
Observed Power at .0500 Level TEST NAME
Noncent. Power (All) 345.219
1.00 - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - Eigenvalues
and Canonical Correlations Root No.
Eigenvalue Pct. Cum. Pct. Canon Cor.
Sq. Cor 1 .696 100.000
100.000 .641 .410
88Standardized canonical coefficients for DEPENDENT
variables Function No. Variable
1 GRE .259 PUBS
-.912 YEARS .314 - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- - - Correlations between DEPENDENT and
canonical variables Function No.
Variable 1 GRE -.051
PUBS -.922 YEARS .551
89(No Transcript)