Title: Multivariate Analysis
1Multivariate Analysis
- Muhammad Qaiser Shahbaz
- Department of Statistics
- GC University, Lahore
2Multivariate Analysis
- Multivariate Analysis is a study of several
dependent random variables simultaneously. - These analysis are straight generalization of
univariate analysis. - Certain distributional assumptions are required
for proper analysis. - The mathematical framework is relatively complex
as compared with the univariate analysis. - These analysis are being used widely around the
world.
3Some Multivariate Distributions
- The Multivariate Normal Distribution
- Generalization of famous Normal Distribution
- The Wishart Distribution
- Generalization of the ChiSquare Distribution
- The Hotellings T2Statistic and Distribution
- Generalization of square of Studentst statistic
and distribution - The Willks Lambda Statistic
- Generalization of ratio of two ChiSquare
statistic
4Some Multivariate Measures
- The Mean Vector
- Collection of the means of the variables under
study - The Covariance Matrix
- Collection of the Variances and Covariances of
the variables under study - The Correlation Matrix
- Collection of Correlation Coefficients of the
variables involved under study - The Generalized Variance
- Determinant of the Covariance Matrix
5Some Multivariate Tests of Significance
- Testing significance of a single mean vector
- Testing equality of two mean vectors
- Testing equality of several mean vectors
- Testing significance of a single covariance
matrix - Testing equality of two covariance matrices
- Testing equality of several covariance matrices
- Testing independence of sets of variates
- Testing independence of variates
6Some Multivariate Techniques
- The Hotellings T2 Statistic
- The Multivariate Analysis of Variance and
Covariance - The Multivariate Experimental Designs
- The Multivariate Profile Analysis
- The Multivariate Regression Analysis
- The Generalized Multivariate Analysis of Variance
- The Principal Component Analysis
- The Factor Analysis
7Some Multivariate Techniques
- The Canonical Correlation Analysis
- The Discriminatory Analysis
- The Cluster Analysis
- The Multidimensional Scaling
- The Correspondence Analysis
- The Classification Trees
- The Path Analysis
- The Structural Equations Models
- The Seemingly Unrelated Regression Models
8The Factor Analysis
- Deals with the grouping of like variables in
sets. - Sets are formed in decreasing order of
importance. - Sets are relatively independent from each other.
- Two types are commonly used
- The Exploratory Factor Analysis
- The Confirmatory Factor Analysis
- One of the most commonly used technique in social
and psychological sciences
9The Exploratory Factor Analysis
- This technique deals with exploring the structure
of the data. - The variables involved under the study are
equally important. - Variables are grouped together on the basis of
their closeness. - Groups are generally formed so that they are
orthogonal to each other but this assumption can
be relaxed. - This technique exactly explains the Covariances
of the variables.
10Exploratory Factor Analysis
- Two major types of exploratory factor analysis
are available based upon the underlying
assumptions. - The Orthogonal Factor Analysis
- Based upon the assumption that the established
factor are orthogonal so they can not be further
factorized. - The Oblique Factor Analysis
- Based upon the assumption that the established
factors are not orthogonal and so can be further
factorized.
11Orthogonal Factor Model
- Based upon the model for each of the underlying
variable. - The Factor Analysis model is
12Structure of Covariance Matrix
- The model for all the variables is
- The Covariance Matrix is decomposed as
13Some Measures in Factor Analysis
- The Factor Analysis Model is
- The quantity is loading of ith variable
on jth factor and measures the degree of
dependence of a variable on a factor. - The ith communality that measures the portion
of variation of ith variable explained by jth
factor is given as
14Structure of Variance and Covariance
- The Factor Analysis Model is
- Variance of ith variable is given as
- The Covariance between ith and kth variable is
15Extraction Methods
- Principal Component Extraction
- Extract the Factors to Maximize the Variance
- Maximum Likelihood Extraction
- Maximize the probability of observing the
underlying correlation matrix - Generalized Least Square Extraction
- Minimize the difference between offdiagonal
elements of observed and reproduced correlation
matrix - Principal Axis Factoring
- Uses estimated communalities instead of 1s in
diagonal element of the correlation matrix - Alpha Factoring
- Used to obtain the consistent factors in repeated
samples
16Factor Rotation
- Rotation is done to simplify the solution of
factor analysis. - Interpretations can be easily done from rotated
solution. - Two types of rotations are available
- Orthogonal Rotation factors formed are
orthogonal - Oblique Rotation factors formed are correlated
17Orthogonal Rotations
- Rotated factors are orthogonal.
- Several rotation methods are available depending
upon their work. Most common are given. - Varimax Rotation (Kaiser1958)
- Minimize complexity of factors by maximizing
variance of loading on each factor. - Quartimax Rotation (Mulaik1972)
- Minimize complexity of variables by maximizing
variance of loading on each variable. - Equamax Rotation (Harman1976)
- Simplify both variables and factors. Compromise
between Varimax and Quartimax.
18Oblique Rotation
- Rotated factors are correlated.
- Several methods are available depending upon the
permitted amount of correlation in rotated
factors. - Direct Oblimin Rotation
- Simplify factors by minimizing cross products of
loading. Depends upon the provided value of
correlation among factors. - Promax Rotation
- Orthogonal factors rotated to oblique position.
Orthogonal factor loadings are raised to a
positive power.
19Test for Model Fit
- The goodness of fitted factor analysis model can
be tested by using the ChiSquare statistic. The
null hypothesis to be tested is - The test statistic for this test has been
developed by Lawley (1940) and Bartlett (1954). - The Measure of Sampling Adequacy, given by Kaiser
and Rice (1974) provide some sort of evidence
about model fit.
20The Factor Scores
- The values of factors for given values of
variables are Factor Scores. - Various methods are available for factor scores
- Bartletts Method (Bartlett 1938)
- The Regression Method (Thompson 1934)
21Example1
- Data from 100 individuals was collected on 10
dimensions to see the satisfaction level. A
portion of data is given below
22Specifying the Analysis
23Specifying the Analysis
24Specifying the Analysis
25The Output
26The Output
27The Output
28The Output
29The Output
30The Output
31Canonical Correlation Analysis
- Deals with the study of relationship between two
sets of variates. - Goal is to find the Linear Combination of
variables that are maximally correlated with each
other. - It is an extension of Principal Component
Analysis. - The linear combination of variables are obtained
under certain constraints. - The sets of variates can treated as dependent and
independent sets.
32Canonical Correlation Analysis
- The primary purpose is to find the pairs of
linear combination of variables so that they are
highly correlated. - As many pairs are obtained as there are variables
in the set with smaller number of variables. - The pairs are obtained so that they have
correlation in decreasing order. - Can be used to test the independence of sets of
variates in case of Multivariate Normality.
33Canonical Correlation Analysis
- The aim in canonical correlation analysis is to
obtain the maximum correlation between sets of
variates called Canonical Correlation. - Another aim is to obtain the vectors of
coefficients to obtain the linear combination of
variables called Canonical Variates. - Predictive Validity of Multivariate Regression
can also be judged using canonical correlation
analysis.
34Theoretical Framework
- The Canonical Correlation Analysis is based upon
the Joint Covariance (Joint Correlation) matrix
of two sets of variates. - The joint covariance matrix of two sets of
variates containing p and q variates is given
as
35Theoretical Framework
- The Canonical Correlations are obtained by
solving the determinantial equation - The ith pair of Canonical Variates is given as
- The coefficient vectors are obtained by solving
36Testing Significance of Canonical Correlations
- Several tests of significance can be tested by
using canonical correlation analysis. - A test of overall independence of two sets of
variates can be based upon the testing of the
hypothesis that all the Canonical Correlations
are simultaneously zero. This test was developed
by Willks (1932). The null hypothesis here is - Testing this hypothesis is equivalent to testing
the Significance of Regression Matrix in
Multivariate Regression.
37Testing Significance of Canonical Correlations
- A more general test of significance is to test
that first k canonical correlations are
nonzero whereas the last sk canonical
correlations are zero. The null hypothesis is - The test statistic for testing this hypothesis is
38Measures of Association
- Certain measures of association are available in
Canonical Correlation Analysis to decide about
the model fit. Some are given - Generalized Measure of Association
(Rozeboom1965) - Generalized Coefficient of Determination
(Yanai1974)
39Proportion of Variation in Canonical Correlation
Analysis
- The Correlation between actual variates and
canonical variate is given as - Proportion of Variation of a variable explained
by the canonical variate is
40A Glimpse of STATISTICA
41Example2
- Data from 100 individuals was collected on 10
dimensions to see the satisfaction level. A
portion of data is given below. We will see the
relationship is two sets of variates.
42Specifying the Analysis
43Specifying the Analysis
44The Output
45The Output
46The Output
47The Output
48The Canonical Variates
- The Canonical Variates for first set are
49The Canonical Variates
- The Canonical Variates for second set are
50The Structural Equation Models
- One of the most powerful techniques in
Statistical Analysis. - Deals with modeling of different types of
variables. - The variables may be discrete or continuous.
- Allows wide variety of variables that can be
included. - Allows the use of variables as well as factors as
dependent and independent variables.
51The Structural Equation Models
- Combination of Regression Analysis and
Exploratory Factor Analysis. - Some other names of the technique are Causal
Models, Simultaneous Equation Models, Path
Analysis, Confirmatory Factor Analysis, Latent
Variables Modeling. - In fact all the alternative names are special
cases of this technique.
52Terminology of Structural Equation Models
- Latent Variables
- The unobserved variables or factors in the
analysis, either dependent or independent. - Manifest Variables
- The observed variables in the analysis, either
dependent or independent. - Path Diagram
- The diagrammatic presentation of Structural
Equation Model.
53Notions of Path Diagram
- The Manifest variables are represented by the
squares or rectangles. - The Latent variables are represented by the
circles or ovals. - The relationships between variables are
represented by single sided arrows. - Direction of the arrow shows the direction of the
relationship. - Double sided relationships are represented by
double sided arrows.
54Some Rules of Path Diagram
- All the dependent (endogenous) variables have
arrows that are directing to them. - All the independent (exogenous) variables have
their variances and covariances represented
explicitly or implicitly. If variances and
covariances are not represented explicitly then - For latent variables, variances not explicitly
represented in the diagram are assumed to be 1.0,
and covariances not explicitly represented are
assumed to be 0. - For manifest variables, variances and covariances
not explicitly represented are assumed to be free
parameters.
55A Simple Path Diagram
56Two Special Cases of Structural Equation Models
- Two special cases of structural equation models
are widely used on the basis of their
applicability. These are - Confirmatory Factor Analysis
- Path Analysis
57Confirmatory Factor Analysis
- It is like Exploratory Factor Analysis but with
the exception that number of factors are
specified in advance. - The model contain one latent and one manifest
variable. - Uses the same model as used by exploratory factor
analysis. - The confirmatory factor analysis model is
generally under identified.
58The Path Analysis
- Widely used in Economics.
- All the variables included in the analysis are
manifest. - Also known as the Simultaneous Equation Models.
- The underlying model of the analysis is
59Theoretical Framework
- The General Structural Equation Model is
60Estimation of Model Parameter
- Estimation of parameters in Structural Equation
Models is not easy. - The Maximum Likelihood or Generalized Least
Squares Method can be used for estimation
purpose. - One very important thing in parameter estimation
is to decide whether parameters are estimable or
not. - If parameters are estimable then model is exactly
or over identified.
61Identification of the Model
- A simple rule for identification of the model
parameters and hence for identification of the
structural equation model is given below - Any parameter of the structural equation model
that can be represented as a function of one or
more elements of the variancecovariance matrix
of the structural model is identifiable. If all
parameters are identifiable then the model is
identified.
62General Use of Structural Equation Models
- The general use of Structural Equation Models is
to test a theory, postulated for a given
framework. The theory is tested by using the
estimate of population covariance matrix. - A theory is said to be acceptable if its
generated estimate of the covariance matrix is
most consistent with the population covariance
matrix.
63Testing Adequacy of the Model
- The adequacy of the Structural Equations Model
can be tested by using the ChiSquare statistic
that is based upon the minimum of the residual
function when convergence is achieved. - An insignificant result of this test indicates
that the model fits the data reasonably well and
hence is adequate.
64Some Useful Measures
- Normed Fit Index (Bentler Bonett1980)
-
- Value of greater than 0.9 indicates good fit.
This index may underestimate the fit of a good
fitting model. - NonNormed Fit Index
- This index may go outside the (01) range. In
small samples this may be too small.
65Some Useful Measures
- Incremental Fit Index (Bollen1989)
-
- This index is less variable as compared with the
NonNormed Fit Index. - Comparative Fit Index (Bentler1988)
66Some Useful Measures
- Absolute Fit Index (McDonald Marsh1990)
-
- This index depends only on the model under
study. - Goodness of Fit Indices (Bentler1983)
- This index is similar to the R2 of Regression
Analysis.
67Some Useful Measures
- Parsimony Fit Index (Mulaik et al.1989)
-
- Depends upon number of estimated parameters.
- Akaike Information Criterion (Akaike1987)
- Small values of these indices indicates good
fit. - Root Mean Square Residual
68ExampleConfirmatory Factor Analysis
- In a study by Jöreskog and Lawley (1968), nine
psychological tests were administered to 72
students of seventh and eight grade. The
Correlation Matrix of the scores is given. We
will run a Confirmatory Factor Analysis of the
data.
69Path Diagram of the Model
70Specifying the Analysis
71The Output
72The Output
73ExampleStructural Model
- Jöreskog and Sörbom (1982) used following data
on Home Environment and Mathematics achievement.
Following correlation matrix was used. We will
fit structural model on this data.
74Path Diagram of the Model
75Specifying the Analysis
76The Output
77The Output
78