Multivariate statistics and Market segmentation: Principal Components Analysis Cluster Analysis presentation

About This Presentation

Transcript and Presenter's Notes

Title: Multivariate statistics and Market segmentation: Principal Components Analysis Cluster Analysis

1
Multivariate statistics and Market
segmentationPrincipal Components
AnalysisCluster Analysis

AE B37 - Week 7 19 February 2003 MM

2
Further readings

Further readings
Malhotra Chapter 19, 20
Churchill, Iacobucci Chapters 17
Aaker et al. Chapter 21

3
Lecture outline

Basic statistical concepts
Factor analysis and Principal Components Analysis
Data reduction and summarisation
Cluster Analysis
Grouping similar statistical units
Joint application of PCA and CA
SPSS application

4
Basic statistical concepts

Variance
Covariance
Correlation and covariance
Standardisation

5
Factor Analysis

A statistical procedure for data reduction,
i.e. summarising a given set of variables into a
reduced set of unrelated variables, explaining
most of the original variability
Objectives of Factor analysis
Identification of a smaller set of unrelated
variables replacing the original set
Identification of underlying factors explaining
correlation among variables
Selection of a smaller set of salient variables

6
Factor Analysis and marketing research

Identification of customers characteristics prior
to clustering into groups (market segmentation)
Identification of product/brand attributes that
influence consumer choice
Understanding the correlation between target
consumer and media consumption habits

7
Some notation

p variables have been recorded on n individuals
Xj indicates the generic variable j
xij refers to the value of the j-th variable as
recorded on the i-th individual
Xjxij i1,2,,n j1,2,,p
?X is the variance-covariance matrix of X

8
Week7.sav variable view

p9 (all variables but the first (custid)

9
Week7.sav Data view
X1 X2 X3 X4 X5 X6 X7 X8
X9
10
The correlation matrix
11
Factor analysis model

X1 m1 g11F1 g12F2 g1mFme1
X2 m2 g21F1 g22F2 g2mFme2
?
Xj mj gj1F1 gj2F2 gjmFmej
?
Xp mp gp1F1 gp2F2 gpmFmep

X m GF e
where
Fi (i1,2,,m) are uncorrelated random variables
(common factors) m?p mi (i1,2,,p) are unique
factors for each variable ei (i1,2,,p) are
error random variables, uncorrelated with each
other and with F and represent the residual error
due to the use of common factors
12
Factor analysis model (factors view)

F1 b11X1 b12X2 b1pXp
F2 b21X1 b22X2 b2pXp
?
Fj bj1X1 bj2X2 bjpXp
?
Fm bp1X1 bp2X2 bppXp

F bX
The common factors are linear combinations of the
original variables
13
Estimation

There is not an unique solution (set of common
factors) any orthogonal rotation of the
solution is acceptable (factor rotation)
Variables in X need to be standardised prior to
analysis
Factor analysis estimate the following
quantities
The simple correlations (covariance) between each
factor i and the original variables j (factor
loadings), i.e. the coefficients gij (the factor
or component matrix)
The values of each common factor, for each of the
statistical units (factor scores)

14
Summarising covariance

The original set of variables X is characterised
by a p?p variance-covariance matrix

15
Covariance matrix for the residual variable

By summarising the original data through m
factors we commit an error measured by the
residuals ei, whose diagonal variance-covariance
matrix is

16
The fundamental relationship of Factor analysis
Original variance
Residual variance
Communality of Xi portion of the variance of Xi
explained by the m factors Communalities allow
to identify which of the variables is best
explained by the selected factors
17
Principal Component Analysis

It is a special case / estimation method of
factor analysis
The factors are built so that the first component
has the maximum possible amount of explained
variance
All original variance is considered, whereas in
factor analysis the estimates are only based in
common variance
Component scores can be computed exactly, whilst
factor scores are estimated there is no
guarantee that estimated factor scores will be
actually uncorrelated between each other

18
Choice of the number of principal components

Level of explained variance
Usually the m components explaining 70-80 of
the total variability
Eigenvalues of the data correlation matrix
The eigenvalues corresponding to each component
represents the amount of variance they explain.
The sum of eigenvalues equals the original number
of variables
Eigenvalues larger than 1 (explaining more
variance than the average component)
Scree diagram

19
Scree diagram (elbow rule)
20
The component scores

F1 b11X1 b12X2 b1pXp
The component scores are computed for each case
and each of the m principal components
The values of the component scores (standardised
to have mean 0 and variance 1) can be used for
summarising the data (plots or subsequent
analysis)
The essential characteristic of the components is
the lack of correlation between each other

21
Spss
22
Spss Output (1)
23
(No Transcript)
24
The factor scores
25
Interpreting the component matrix
1. Family Supermarket shopper
3. Single frequent cost-caring shopper
2. Family quality shopper
4. Vegetarian shopper
26
Cluster Analysis

It is a class of techniques used to classify
cases into groups that are relatively homogeneous
within themselves and heterogeneous between each
other, on the basis of a defined set of
variables. These groups are called clusters.

27
Cluster Analysis and marketing research

Market segmentation. E.g. clustering of consumers
according to their attribute preferences
Understanding buyers behaviours. Consumers with
similar behaviours/characteristics are clustered
Identifying new product opportunities. Clusters
of similar brands/products can help identifying
competitors / market opportunities
Reducing data. E.g. in preference mapping

28
Steps to conduct a Cluster Analysis

Select a distance measure
Select a clustering algorithm
Determine the number of clusters
Validate the analysis

29
(No Transcript)
30
Defining distance the Euclidean distance

Dij distance between cases i and j
xki value of variable Xk for case j
Problems
Different measures different weights
Correlation between variables (double counting)
Solution Principal component analysis

31
Clustering procedures

Hierarchical procedures
Agglomerative (start from n clusters, to get to 1
cluster)
Divisive (start from 1 cluster, to get to n
cluster)
Non hierarchical procedures
K-means clustering

32
Agglomerative clustering
33
Agglomerative clustering

Linkage methods
Single linkage (minimum distance)
Complete linkage (maximum distance)
Average linkage
Wards method
Compute sum of squared distances within clusters
Aggregate clusters with the minimum increase in
the overall sum of squares
Centroid method
The distance between two clusters is defined as
the difference between the centroids (cluster
averages)

34
K-means clustering

The number k of cluster is fixed
An initial set of k seeds (aggregation centres)
is provided
First k elements
Other seeds
Given a certain treshold, all units are assigned
to the nearest cluster seed
New seeds are computed
Go back to step 3 until no reclassification is
necessary
Units can be reassigned in successive steps
(optimising partioning)

35
Hierarchical vs Non hierarchical methods

Hierarchical clustering
No decision about the number of clusters
Problems when data contain a high level of error
Can be very slow
Initial decision are more influential (one-step
only)

Non hierarchical clustering
Faster, more reliable
Need to specify the number of clusters
(arbitrary)
Need to set the initial seeds (arbitrary)

36
Suggested approach

First perform a hierarchical method to define the
number of clusters
Then use the k-means procedure to actually form
the clusters

37
Defining the number of clusters elbow rule (1)
n
38
Elbow rule (2) the scree diagram
39
Validating the analysis

Impact of initial seeds / order of cases
Impact of the selected method
Consider the relevance of the chosen set of
variables

40
SPSS Example
41
(No Transcript)
42
Number of clusters 10 6 4
43
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Multivariate statistics and Market segmentation: Principal Components Analysis Cluster Analysis PowerPoint PPT Presentation