Title: Multivariate Normality
1Multivariate Normality
2Underlying multivariate analyses is the
assumption of multivariate normality. This
assumption extends the idea of bivariate
normality to more than two dimensions. In
bivariate normality, the distribution of one
variable is normal for all values of the other
variable.
3Bivariate normality can exist even when the
variables are strongly correlated
4In univariate statistics, the normality
assumption underlies significance testing. It is
with reference to sampling from some theoretical
distribution that we can make claims about the
likelihood of results occurring by chance or
under the null hypothesis. Similarly, the
establishment of confidence intervals depends on
distributional assumptions.
5Many multivariate procedures rely on maximum
likelihood estimation. The importance of the
normality assumption is easy to demonstrate there
as well. In maximum likelihood, the parameter
estimates maximize the probability of the data.
The maximum likelihood estimates for the sample
mean and variance find the values that maximize
the following
6Note the explicit assumption that the data are
normally distributed. If that assumption is in
error, then the normal probability density
function will not provide an optimal solution to
the problem.
7Provided the data are normally distributed, the
maximum likelihood estimates for m and s make the
obtained data more likely than any other
parameter estimates. The estimation process also
produces standard errors, making hypothesis tests
possible as well. But, the validity of these
hypothesis tests rests on the validity of the
normality assumption.
8The approach can be extended to multivariate data
as well. We could seek the maximum likelihood
estimates for a bivariate normal distribution
9Three bivariate normal distributions varying only
in the value of r. The validity of estimates of r
rely on the validity of the assumption of
bivariate normality.
10The maximum likelihood idea is easily extended to
more than two variables, and depending on the
multivariate problem, large numbers of parameters
may be estimated. Underlying the estimation,
however, is the assumption of multivariate
normality.
11Assessing univariate normality and bivariate
normality is reasonably easy in large part
because they can be inspected visually.
12- Assessing multivariate normality is a bit
trickier. When multivariate normality holds - All marginal distributions will be normal.
- All pairs of variables will be bivariate normal.
- All linear combinations will be normal.
- All pairs of linear combinations will be
bivariate normal. - Squared distances from the population centroid
will be chi-square distributed with k (k number
of variables) degrees of freedom.
Violating any of these is a violation of
multivariate normality.
13The example data come from a 4 x 4 design 4
Groups each measured on 4 Outcome Measures.
14(No Transcript)
15(No Transcript)
16(No Transcript)
17All marginal distributions will be normal
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25All pairs of variables will be bivariate normal
26Looks pretty bad so far, but what did we miss?
We forgot to remove the variability due to
groups. We need to examine the residuals.
27(No Transcript)
28(No Transcript)
29The consequences of forgetting about the group
variability can be considerable.
30Original
Residuals
31Original
Residuals
32(No Transcript)
33(No Transcript)
34(No Transcript)
35R E L I A B I L I T Y A N A L Y S I S - S C
A L E (A L P H A) Reliability Coefficients N
of Cases 200.0 N of Items
4 Alpha .5200
Original
R E L I A B I L I T Y A N A L Y S I S - S C
A L E (A L P H A) Reliability Coefficients N
of Cases 200.0 N of Items
4 Alpha .7326
Residuals
36All marginal distributions will be normal
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41All pairs of variables will be bivariate normal
42All linear combinations will be normal
- It is not practical to test all linear
combinationsthere are an infinite number of
them. But, testing a small number of commonly
used linear combinations is important. The most
commonly tested - Sum of all measures
- Pair-wise differences
- Principal components
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55All pairs of linear combinations will be
bivariate normal
Here too the tests must be restricted, usually to
the pairs of linear combinations tested on the
previous step.
56(No Transcript)
57(No Transcript)
58Squared distances from the population centroid
will be chi-square distributed with k (k number
of variables) degrees of freedom
This requirement tests the normality of the
multivariate variance. The trick is to make sure
that group variability does not contaminate the
calculation of the Mahalanobis distances.
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64One additional test of multivariate normality can
be obtained from LISREL in the PRELIS
pre-processor. Multivariate measures of skew and
kurtosis, developed by Mardia, can be used as an
additional index of multivariate normality.
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74- Violations of multivariate normality can be
handled in multiple ways - Transformations
- Robust methods
- Bootstrapping