Data Analysis with SPSS: Introducing Exploratory Factor Analysis PowerPoint PPT Presentation

presentation player overlay
1 / 47
About This Presentation
Transcript and Presenter's Notes

Title: Data Analysis with SPSS: Introducing Exploratory Factor Analysis


1
Data Analysis with SPSSIntroducing Exploratory
Factor Analysis
  • Bidin Yatim
  • Phd (2005,Exeter)
  • MSc (1984, Aston)
  • BSc (1982, Nottingham)
  • Department of Statistics
  • Faculty of Quantitative Sciences

Topic 9
2
Exploratory Factor Analysis Introduction
  • Factor analysis attempts to identify underlying
    variables, or factors, that explain the pattern
    of correlations within a set of observed
    variables. Factor analysis is often used in data
    reduction to identify a small number of factors
    that explain most of the variance observed in a
    much larger number of manifest variables.

3
Exploratory Factor Analysis Introduction
  • Statistical technique for dealing with
    interdependencies between multiple variables ie
    if variables are interrelated without designating
    some dependent and others independent
  • Many variables reduced (grouped) into a smaller
    number of factors (Dimension reduction method)
  • To accomplish the same objective as PCA/ MDS.

4
The factor analysis procedure offers a high
degree of flexibility
  • Seven methods of factor extraction.
  • Five methods of rotation
  • Three methods of computing factor scores scores
    can be saved as variables for further analysis.

5
Alternative Methods of Factor Extraction
  • Principal Component Analysis
  • Maximum likelihood method
  • Principal axis
  • Image
  • Alpha
  • Generalized least squares
  • Unweighted least squares

6
Factor Analysis Extraction
  • Method to specify the method of factor
    extraction
  • Principal Components Analysis to form
    uncorrelated linear combinations of the observed
    variables. The first component has maximum
    variance. Successive components explain
    progressively smaller portions of the variance
    and are all uncorrelated with each other. It is
    used to obtain the initial factor solution. It
    can be used when a correlation matrix is
    singular.
  • Unweighted Least-Squares Method minimizes the
    sum of the squared differences between the
    observed and reproduced correlation matrices
    ignoring the diagonals.
  • Generalized Least-Squares Method minimizes the
    sum of the squared differences between the
    observed and reproduced correlation matrices.
    Correlations are weighted by the inverse of their
    uniqueness, so that variables with high
    uniqueness are given less weight than those with
    low uniqueness.
  • Maximum-Likelihood Method produces parameter
    estimates that are most likely to have produced
    the observed correlation matrix if the sample is
    from a multivariate normal distribution. The
    correlations are weighted by the inverse of the
    uniqueness of the variables, and an iterative
    algorithm is employed.
  • Principal Axis Factoring extracts factors from
    the original correlation matrix with squared
    multiple correlation coefficients placed in the
    diagonal as initial estimates of the
    communalities. These factor loadings are used to
    estimate new communalities that replace the old
    communality estimates in the diagonal. Iterations
    continue until the changes in the communalities
    from one iteration to the next satisfy the
    convergence criterion for extraction.
  • Alpha considers the variables in the analysis to
    be a sample from the universe of potential
    variables. It maximizes the alpha reliability of
    the factors.
  • Image Factoring developed by Guttman and based
    on image theory. The common part of the variable,
    called the partial image, is defined as its
    linear regression on remaining variables, rather
    than a function of hypothetical factors.
  • Analyze to specify either a correlation matrix
    or a covariance matrix.
  • Extract can either retain all factors whose
    eigenvalues exceed a specified value or retain a
    specific number of factors.
  • Display. to request the unrotated factor
    solution and a scree plot of the eigenvalues.
  • Scree plot A plot of the variance associated
    with each factor. Used to determine how many
    factors should be kept. Typically the plot shows
    a distinct break between the steep slope of the
    large factors and the gradual trailing of the
    rest (the scree).

7
Factor Analysis Rotation
  • Method Allows you to select the method of factor
    rotation.
  • Orthogonal
  • Varimax minimizes number of variables with high
    loadings on a factor
  • Quartimax Method minimizes the number of
    factors needed to explain each variable. It
    simplifies the interpretation of the observed
    variables.
  • Equamax Method combination of the varimax
    method, which simplifies the factors, and the
    quartimax method, which simplifies the variables.
    The number of variables that load highly on a
    factor and the number of factors needed to
    explain a variable are minimized.
  • Oblique (nonorthogonal)
  • Direct Oblimin Method When delta equals 0 (the
    default), solutions are most oblique. As delta
    becomes more negative, the factors become less
    oblique. To override the default delta of 0,
    enter a number less than or equal to 0.8.
  • Promax Rotation Allows factors to be
    correlated. It can be calculated more quickly
    than a direct oblimin rotation, so it is useful
    for large datasets.
  • Display Allows to include output on the rotated
    solution, as well as loading plots for the first
    two or three factors.
  • Factor Loading Plot Three-dimensional factor
    loading plot of the first three factors. For a
    two-factor solution, a two-dimensional plot is
    shown. The plot is not displayed if only one
    factor is extracted. Plots display rotated
    solutions if rotation is requested.

8
Factor Analysis Scores
  • Save as variables Creates one new variable for
    each factor in the final solution using-
  • Bartlett Scores The scores produced have a mean
    of 0. The sum of squares of the unique factors
    over the range of variables is minimized.
  • Anderson-Rubin Method modification of the
    Bartlett method which ensures orthogonality of
    the estimated factors. The scores produced have a
    mean of 0, a standard deviation of 1, and are
    uncorrelated.
  • Display factor score coefficient matrix Shows
    the coefficients by which variables are
    multiplied to obtain factor scores. Also shows
    the correlations between factor scores.

9
Types of Factor Analysis
  • Exploratory
  • Confirmatory

10
Uses of Factor Analysis
  • Instrument Development
  • Theory Development
  • Data Reduction
  • Model Testing
  • Comparing Models

11
Example
  • What underlying attitudes lead people to
    respond to the questions on a political survey as
    they do?
  • Examining the correlations among the survey
    items reveals that there is significant overlap
    among various subgroups of items--questions about
    taxes tend to correlate with each other,
    questions about military issues correlate with
    each other, and so on.
  • With factor analysis, you can investigate the
    number of underlying factors and, in many cases,
    you can identify what the factors represent
    conceptually. Additionally, you can compute
    factor scores for each respondent, which can then
    be used in subsequent analyses. For example, you
    might build a logistic regression model to
    predict voting behavior based on factor scores.

12
Assumptions
  • Interval/ ratio level data
  • Bivariate normal distribution for each pair of
    variables and observations should be independent.
  • Linear relationships
  • Substantial correlations among variables (can be
    tested using Bartletts sphericity test)
  • Categorical data (such as religion or country of
    origin) are not suitable for factor analysis.
    Data for which Pearson correlation coefficients
    can sensibly be calculated should be suitable for
    factor analysis.

13
Assumptions
  • The factor analysis model specifies that
    variables are determined by common factors (the
    factors estimated by the model) and unique
    factors (which do not overlap between observed
    variables) the computed estimates are based on
    the assumption that all unique factors are
    uncorrelated with each other and with the common
    factors.

14
Sample Size
  • 10 subjects per variable. To some, subject to
    variable ratio (STV) should at least be 51.
  • Every analysis should have 100 to 200 subjects

15
Steps
  • Obtain correlation matrix for the data.
  • Apply EFA
  • Decide on the number of factors/ components to be
    retained.
  • Interpreting the factors/components. Use rotation
    if necessary
  • Obtain factor score for further analysis

16
Two concepts about variables crucial in
understanding EFA
  • Common factor
  • a hypothetical construct that affects at least
    two of our measurement variables
  • We want to estimate the common factors that
    contribute to the variance in our variables.
  • Unique variance
  • factor that contributes to the variance in only
    one variable.
  • Only one unique factor for each variable.
  • Unique factors are unrelated to one another and
    unrelated to the common factors.
  • Want to exclude these unique factors from our
    solution.

17
Two concepts about variables crucial in
understanding EFA
  • Communalities (h2)1-uniqueness - the sum over
    all factors of the squared factor loading for a
    variable, it indicates the portion of variance of
    the variable that is accounted for by the set of
    factors (or in which a variable has in common
    with the other variables in the analysis) Small
    numbers indicate lack of shared variance.
  • Uniquenessspecific variance error variance, is
    that portion of the total variance that is
    unrelated to other variables

18
Total Variance
  • Total Variance Error variance Common Variance
    Specific Variance
  • NOTE If we have 10 original variables, and the
    variables are standardized, total variance 10.

19
Eigenvalue
  • Indicates the portion of the total variance of a
    correlation matrix that is explained by a factor

20
Iterated Principal Factors Analysis
  • The most common type of FA.
  • Also known as principal axis FA.
  • We eliminate the unique variance by replacing, on
    the main diagonal of the correlation matrix, 1s
    with estimates of communalities.
  • Initial estimate of communality R2 between one
    variable and all others.

21
Lets Do It AnalyzegtData ReductiongtFactorgtExtract
ion
  • Using the CerealFA data, change the extraction
    method to principal axis.

22
SPSS - Factor AnalysisOptions
  • Missing values
  • Exclude cases listwise
  • Exclude cases pairwise
  • Replace with mean
  • Coefficient display format
  • Sorted by size
  • suppress absolute values less than .10

23
Correlation Matrix
  • Examine matrix
  • Correlations should be .30 or higher
  • Kaiser-Meyer-Olkin (KMO) Measure of Sampling
    Adequacy
  • Bartlett's Test of Sphericity

24
Correlation Matrix
  • Bartlett's Test of Sphericity
  • Tests hypothesis that correlation matrix is an
    identity matrix.
  • Diagonals are ones
  • Off-diagonals are zeros
  • Significant result indicates matrix is not an
    identity matrix, therefore EFA can be used.

25
Correlation Matrix
  • Kaiser-Meyer Olkin (KMO)
  • measure of sampling adequacy.
  • index for comparing magnitudes of observed
    correlation coefficients to magnitudes of partial
    correlation coefficients
  • small values indicate correlations between pairs
    of variables cannot be explained by other
    variables

26
Kaiser-Meyer-Olkin (KMO)
  • Marvelous - - - - - - .90s
  • Meritorious - - - - - .80s
  • Middling - - - - - - - .70s
  • Mediocre - - - - - - - .60s
  • Miserable - - - - - - .50s
  • Unacceptable - - - below .50

27
Look at the KMO and Bartletts test
  • Bartletts test of sphericity is significant i.e.
    null hypothesis that correlation matrix is an
    identity is rejected.
  • Kaiser-Meyer-Olkin measure of sampling adequacy
    is gt0.8, meritorious.
  • Factor analysis is
    appropriate

28
Look at the Initial Communalities
  • They sum to
  • We have eliminated 25 units of unique
    variance.


29
Iterate!
  • Using the estimated communalities, obtain a
    solution.
  • Take the communalities from the first solution
    and insert them into the main diagonal of the
    correlation matrix.
  • Solve again.
  • Take communalities from this second solution and
    insert into correlation matrix.


30
Solve again
  • Repeat this, over and over, until the changes in
    communalities from one iteration to the next are
    trivial.
  • Our final communalities sum to .
  • After excluding units of unique variance, we
    have extracted units of common variance.
  • That is / 25 of the total variance in our
    25 variables.

31
How many factor to retain?
32
Criteria For Retention Of Factors
  • Eigenvalue greater than 1
  • Single variable has variance equal to 1
  • Plot of total variance - Scree plot
  • Gradual trailing off of variance accounted for is
    called the scree.
  • Note cumulative of variance of rotated factors

33
We have packaged those 58.05 into 4 factors
34
Before rotation
35
Rotated factor loading
36
Rotation produces
  • Factor Pattern Matrix
  • High low factor loadings are more apparent
  • generally used for interpretation
  • Factor Structure Matrix
  • correlations between factors and variables

37
Interpretation of Rotated Matrix
  • Loadings of .40 or higher
  • Name each factor based on 3 or 4 variables with
    highest loadings.
  • Do not expect perfect conceptual fit of all
    variables.

38
  • SPSS will not only give you the scoring
    coefficients, but also compute the estimated
    factor scores for you.
  • In the Factor Analysis window, click Scores and
    select Save As Variables, Regression, Display
    Factor Score Coefficient Matrix.

39
Here are the scoring coefficients.Look back at
the data sheet and you will see the estimated
factor scores.
40
Use the Factor Scores
  • In multiple regression
  • independent t to compare groups on mean factor
    scores.
  • Or even in ANOVA

41
Required Number of Subjects and Variables
  • Rules of Thumb (not very useful)
  • 100 or more subjects.
  • at least 10 times as many subjects as you have
    variables.
  • as many subjects as you can, the more the better.


42
  • Start out with at least 6 variables per expected
    factor.
  • Each factor should have at least 3 variables that
    load well.
  • If loadings are low, need at least 10 variables
    per factor.
  • Need at least as many subjects as variables. The
    more of each, the better.
  • When there are overlapping factors (variables
    loading well on more than one factor), need more
    subjects than when structure is simple.

43
  • If communalities are low, need more subjects.
  • If communalities are high (gt .6), you can get by
    with fewer than 100 subjects.
  • With moderate communalities (.5), need 100-200
    subjects.
  • With low communalities and only 3-4 high loadings
    per factor, need over 300 subjects.
  • With low communalities and poorly defined
    factors, need over 500 subjects.

44
What I Have Not Covered Today
  • LOTS.
  • For a general introduction to measurement
    (reliability and validity), see
    http//core.ecu.edu/psyc/wuenschk/docs2210/Researc
    h-3-Measurement.doc


45
Multivariate Analysis Summary
  • Multivariate analysis is hard, but useful if it
    is important to extract as much information from
    the data as possible.
  • For classification problems, the common methods
    provide different approximations to the Bayes
    discriminant.
  • There is considerably empirical evidence that, as
    yet, no uniformly most powerful method exists.
    Therefore, be wary of claims to the contrary!

46
Further reading
  • Hair, Anderson, Tatham Black, (HATB)
    Multivariate Data Analysis, 5th edn

47
Thats All Friends
  • See U Again
  • Some Other Days
  • Have a Nice Time With SPSS
Write a Comment
User Comments (0)
About PowerShow.com