Multivariate Analysis - PowerPoint PPT Presentation

1 / 78

About This Presentation

Title:

Multivariate Analysis

Description:

Generalized Least Square Extraction ... Test for Model Fit ... This index is less variable as compared with the Non Normed Fit Index. ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 79

Provided by: muhammadqa

Category:

more less

Transcript and Presenter's Notes

Title: Multivariate Analysis

1
Multivariate Analysis

Muhammad Qaiser Shahbaz
Department of Statistics
GC University, Lahore

2
Multivariate Analysis

Multivariate Analysis is a study of several
dependent random variables simultaneously.
These analysis are straight generalization of
univariate analysis.
Certain distributional assumptions are required
for proper analysis.
The mathematical framework is relatively complex
as compared with the univariate analysis.
These analysis are being used widely around the
world.

3
Some Multivariate Distributions

The Multivariate Normal Distribution
Generalization of famous Normal Distribution
The Wishart Distribution
Generalization of the ChiSquare Distribution
The Hotellings T2Statistic and Distribution
Generalization of square of Studentst statistic
and distribution
The Willks Lambda Statistic
Generalization of ratio of two ChiSquare
statistic

4
Some Multivariate Measures

The Mean Vector
Collection of the means of the variables under
study
The Covariance Matrix
Collection of the Variances and Covariances of
the variables under study
The Correlation Matrix
Collection of Correlation Coefficients of the
variables involved under study
The Generalized Variance
Determinant of the Covariance Matrix

5
Some Multivariate Tests of Significance

Testing significance of a single mean vector
Testing equality of two mean vectors
Testing equality of several mean vectors
Testing significance of a single covariance
matrix
Testing equality of two covariance matrices
Testing equality of several covariance matrices
Testing independence of sets of variates
Testing independence of variates

6
Some Multivariate Techniques

The Hotellings T2 Statistic
The Multivariate Analysis of Variance and
Covariance
The Multivariate Experimental Designs
The Multivariate Profile Analysis
The Multivariate Regression Analysis
The Generalized Multivariate Analysis of Variance
The Principal Component Analysis
The Factor Analysis

7
Some Multivariate Techniques

The Canonical Correlation Analysis
The Discriminatory Analysis
The Cluster Analysis
The Multidimensional Scaling
The Correspondence Analysis
The Classification Trees
The Path Analysis
The Structural Equations Models
The Seemingly Unrelated Regression Models

8
The Factor Analysis

Deals with the grouping of like variables in
sets.
Sets are formed in decreasing order of
importance.
Sets are relatively independent from each other.
Two types are commonly used
The Exploratory Factor Analysis
The Confirmatory Factor Analysis
One of the most commonly used technique in social
and psychological sciences

9
The Exploratory Factor Analysis

This technique deals with exploring the structure
of the data.
The variables involved under the study are
equally important.
Variables are grouped together on the basis of
their closeness.
Groups are generally formed so that they are
orthogonal to each other but this assumption can
be relaxed.
This technique exactly explains the Covariances
of the variables.

10
Exploratory Factor Analysis

Two major types of exploratory factor analysis
are available based upon the underlying
assumptions.
The Orthogonal Factor Analysis
Based upon the assumption that the established
factor are orthogonal so they can not be further
factorized.
The Oblique Factor Analysis
Based upon the assumption that the established
factors are not orthogonal and so can be further
factorized.

11
Orthogonal Factor Model

Based upon the model for each of the underlying
variable.
The Factor Analysis model is

12
Structure of Covariance Matrix

The model for all the variables is
The Covariance Matrix is decomposed as

13
Some Measures in Factor Analysis

The Factor Analysis Model is
The quantity is loading of ith variable
on jth factor and measures the degree of
dependence of a variable on a factor.
The ith communality that measures the portion
of variation of ith variable explained by jth
factor is given as

14
Structure of Variance and Covariance

The Factor Analysis Model is
Variance of ith variable is given as
The Covariance between ith and kth variable is

15
Extraction Methods

Principal Component Extraction
Extract the Factors to Maximize the Variance
Maximum Likelihood Extraction
Maximize the probability of observing the
underlying correlation matrix
Generalized Least Square Extraction
Minimize the difference between offdiagonal
elements of observed and reproduced correlation
matrix
Principal Axis Factoring
Uses estimated communalities instead of 1s in
diagonal element of the correlation matrix
Alpha Factoring
Used to obtain the consistent factors in repeated
samples

16
Factor Rotation

Rotation is done to simplify the solution of
factor analysis.
Interpretations can be easily done from rotated
solution.
Two types of rotations are available
Orthogonal Rotation factors formed are
orthogonal
Oblique Rotation factors formed are correlated

17
Orthogonal Rotations

Rotated factors are orthogonal.
Several rotation methods are available depending
upon their work. Most common are given.
Varimax Rotation (Kaiser1958)
Minimize complexity of factors by maximizing
variance of loading on each factor.
Quartimax Rotation (Mulaik1972)
Minimize complexity of variables by maximizing
variance of loading on each variable.
Equamax Rotation (Harman1976)
Simplify both variables and factors. Compromise
between Varimax and Quartimax.

18
Oblique Rotation

Rotated factors are correlated.
Several methods are available depending upon the
permitted amount of correlation in rotated
factors.
Direct Oblimin Rotation
Simplify factors by minimizing cross products of
loading. Depends upon the provided value of
correlation among factors.
Promax Rotation
Orthogonal factors rotated to oblique position.
Orthogonal factor loadings are raised to a
positive power.

19
Test for Model Fit

The goodness of fitted factor analysis model can
be tested by using the ChiSquare statistic. The
null hypothesis to be tested is
The test statistic for this test has been
developed by Lawley (1940) and Bartlett (1954).
The Measure of Sampling Adequacy, given by Kaiser
and Rice (1974) provide some sort of evidence
about model fit.

20
The Factor Scores

The values of factors for given values of
variables are Factor Scores.
Various methods are available for factor scores
Bartletts Method (Bartlett 1938)
The Regression Method (Thompson 1934)

21
Example1

Data from 100 individuals was collected on 10
dimensions to see the satisfaction level. A
portion of data is given below

22
Specifying the Analysis
23
Specifying the Analysis
24
Specifying the Analysis
25
The Output
26
The Output
27
The Output
28
The Output
29
The Output
30
The Output
31
Canonical Correlation Analysis

Deals with the study of relationship between two
sets of variates.
Goal is to find the Linear Combination of
variables that are maximally correlated with each
other.
It is an extension of Principal Component
Analysis.
The linear combination of variables are obtained
under certain constraints.
The sets of variates can treated as dependent and
independent sets.

32
Canonical Correlation Analysis

The primary purpose is to find the pairs of
linear combination of variables so that they are
highly correlated.
As many pairs are obtained as there are variables
in the set with smaller number of variables.
The pairs are obtained so that they have
correlation in decreasing order.
Can be used to test the independence of sets of
variates in case of Multivariate Normality.

33
Canonical Correlation Analysis

The aim in canonical correlation analysis is to
obtain the maximum correlation between sets of
variates called Canonical Correlation.
Another aim is to obtain the vectors of
coefficients to obtain the linear combination of
variables called Canonical Variates.
Predictive Validity of Multivariate Regression
can also be judged using canonical correlation
analysis.

34
Theoretical Framework

The Canonical Correlation Analysis is based upon
the Joint Covariance (Joint Correlation) matrix
of two sets of variates.
The joint covariance matrix of two sets of
variates containing p and q variates is given
as

35
Theoretical Framework

The Canonical Correlations are obtained by
solving the determinantial equation
The ith pair of Canonical Variates is given as
The coefficient vectors are obtained by solving

36
Testing Significance of Canonical Correlations

Several tests of significance can be tested by
using canonical correlation analysis.
A test of overall independence of two sets of
variates can be based upon the testing of the
hypothesis that all the Canonical Correlations
are simultaneously zero. This test was developed
by Willks (1932). The null hypothesis here is
Testing this hypothesis is equivalent to testing
the Significance of Regression Matrix in
Multivariate Regression.

37
Testing Significance of Canonical Correlations

A more general test of significance is to test
that first k canonical correlations are
nonzero whereas the last sk canonical
correlations are zero. The null hypothesis is
The test statistic for testing this hypothesis is

38
Measures of Association

Certain measures of association are available in
Canonical Correlation Analysis to decide about
the model fit. Some are given
Generalized Measure of Association
(Rozeboom1965)
Generalized Coefficient of Determination
(Yanai1974)

39
Proportion of Variation in Canonical Correlation
Analysis

The Correlation between actual variates and
canonical variate is given as
Proportion of Variation of a variable explained
by the canonical variate is

40
A Glimpse of STATISTICA
41
Example2

Data from 100 individuals was collected on 10
dimensions to see the satisfaction level. A
portion of data is given below. We will see the
relationship is two sets of variates.

42
Specifying the Analysis
43
Specifying the Analysis
44
The Output
45
The Output
46
The Output
47
The Output
48
The Canonical Variates

The Canonical Variates for first set are

49
The Canonical Variates

The Canonical Variates for second set are

50
The Structural Equation Models

One of the most powerful techniques in
Statistical Analysis.
Deals with modeling of different types of
variables.
The variables may be discrete or continuous.
Allows wide variety of variables that can be
included.
Allows the use of variables as well as factors as
dependent and independent variables.

51
The Structural Equation Models

Combination of Regression Analysis and
Exploratory Factor Analysis.
Some other names of the technique are Causal
Models, Simultaneous Equation Models, Path
Analysis, Confirmatory Factor Analysis, Latent
Variables Modeling.
In fact all the alternative names are special
cases of this technique.

52
Terminology of Structural Equation Models

Latent Variables
The unobserved variables or factors in the
analysis, either dependent or independent.
Manifest Variables
The observed variables in the analysis, either
dependent or independent.
Path Diagram
The diagrammatic presentation of Structural
Equation Model.

53
Notions of Path Diagram

The Manifest variables are represented by the
squares or rectangles.
The Latent variables are represented by the
circles or ovals.
The relationships between variables are
represented by single sided arrows.
Direction of the arrow shows the direction of the
relationship.
Double sided relationships are represented by
double sided arrows.

54
Some Rules of Path Diagram

All the dependent (endogenous) variables have
arrows that are directing to them.
All the independent (exogenous) variables have
their variances and covariances represented
explicitly or implicitly. If variances and
covariances are not represented explicitly then
For latent variables, variances not explicitly
represented in the diagram are assumed to be 1.0,
and covariances not explicitly represented are
assumed to be 0.
For manifest variables, variances and covariances
not explicitly represented are assumed to be free
parameters.

55
A Simple Path Diagram
56
Two Special Cases of Structural Equation Models

Two special cases of structural equation models
are widely used on the basis of their
applicability. These are
Confirmatory Factor Analysis
Path Analysis

57
Confirmatory Factor Analysis

It is like Exploratory Factor Analysis but with
the exception that number of factors are
specified in advance.
The model contain one latent and one manifest
variable.
Uses the same model as used by exploratory factor
analysis.
The confirmatory factor analysis model is
generally under identified.

58
The Path Analysis

Widely used in Economics.
All the variables included in the analysis are
manifest.
Also known as the Simultaneous Equation Models.
The underlying model of the analysis is

59
Theoretical Framework

The General Structural Equation Model is

60
Estimation of Model Parameter

Estimation of parameters in Structural Equation
Models is not easy.
The Maximum Likelihood or Generalized Least
Squares Method can be used for estimation
purpose.
One very important thing in parameter estimation
is to decide whether parameters are estimable or
not.
If parameters are estimable then model is exactly
or over identified.

61
Identification of the Model

A simple rule for identification of the model
parameters and hence for identification of the
structural equation model is given below
Any parameter of the structural equation model
that can be represented as a function of one or
more elements of the variancecovariance matrix
of the structural model is identifiable. If all
parameters are identifiable then the model is
identified.

62
General Use of Structural Equation Models

The general use of Structural Equation Models is
to test a theory, postulated for a given
framework. The theory is tested by using the
estimate of population covariance matrix.
A theory is said to be acceptable if its
generated estimate of the covariance matrix is
most consistent with the population covariance
matrix.

63
Testing Adequacy of the Model

The adequacy of the Structural Equations Model
can be tested by using the ChiSquare statistic
that is based upon the minimum of the residual
function when convergence is achieved.
An insignificant result of this test indicates
that the model fits the data reasonably well and
hence is adequate.

64
Some Useful Measures

Normed Fit Index (Bentler Bonett1980)
Value of greater than 0.9 indicates good fit.
This index may underestimate the fit of a good
fitting model.
NonNormed Fit Index
This index may go outside the (01) range. In
small samples this may be too small.

65
Some Useful Measures

Incremental Fit Index (Bollen1989)
This index is less variable as compared with the
NonNormed Fit Index.
Comparative Fit Index (Bentler1988)

66
Some Useful Measures

Absolute Fit Index (McDonald Marsh1990)
This index depends only on the model under
study.
Goodness of Fit Indices (Bentler1983)
This index is similar to the R2 of Regression
Analysis.

67
Some Useful Measures

Parsimony Fit Index (Mulaik et al.1989)
Depends upon number of estimated parameters.
Akaike Information Criterion (Akaike1987)
Small values of these indices indicates good
fit.
Root Mean Square Residual

68
ExampleConfirmatory Factor Analysis

In a study by Jöreskog and Lawley (1968), nine
psychological tests were administered to 72
students of seventh and eight grade. The
Correlation Matrix of the scores is given. We
will run a Confirmatory Factor Analysis of the
data.

69
Path Diagram of the Model
70
Specifying the Analysis
71
The Output
72
The Output
73
ExampleStructural Model

Jöreskog and Sörbom (1982) used following data
on Home Environment and Mathematics achievement.
Following correlation matrix was used. We will
fit structural model on this data.

74
Path Diagram of the Model
75
Specifying the Analysis
76
The Output
77
The Output
78