Correlation and regression analysis - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Correlation and regression analysis

Description:

Bivariate correlation. Select the variables you want to analyse ... The principle is identical to bivariate regression, but there are more explanatory variables ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 38
Provided by: anon304
Category:

less

Transcript and Presenter's Notes

Title: Correlation and regression analysis


1
Correlation and regression analysis
  • Week 8
  • Research Methods Data Analysis

2
Lecture outline
  • Correlation
  • Regression Analysis
  • The least squares estimation method
  • SPSS and regression output
  • Task overview

3
Correlation
  • Correlation measures to what extent two (or
    more) variables are related
  • Correlation expresses a relationship that is not
    necessarily precise (e.g. height and weight)
  • Positive correlation indicates that the two
    variables move in the same direction
  • Negative correlation indicates that they move in
    opposite directions

4
Covariance
  • Covariance measures the joint variability
  • If two variables are independent, then the
    covariance is zero (however, CovO does not mean
    that two variables are independent)
  • Where E() indicates the expected value (i.e.
    average value)

5
Correlation coefficient
  • The correlation coefficient r gives a measure (in
    the range 1, 1) of the relationship between two
    variables
  • r0 means no correlation
  • r1 means perfect positive correlation
  • r-1 means perfect negative correlation
  • Perfect correlation indicates that a p variation
    in x corresponds to a p variation in y

6
Correlation coefficient and covariance
Pearson correlation coefficient
Correlation coefficient - POPULATION
SAMPLE
7
Bivariate and multivariate correlation
  • Bivariate correlation
  • 2 variables
  • Pearson correlation coefficient
  • Partial correlation
  • The correlation between two variables after
    allowing for the effect of other control
    variables

8
Significance level in correlation
  • Level of correlation (value of the correlation
    coefficient) indicates to what extent the two
    variables move together
  • Significance of correlation (p value) given that
    the correlation coefficient is computed on a
    sample, indicates whether the relationship appear
    to be statistically significant
  • Examples
  • Correlation is 0.50, but not significant the
    sampling error is so high that the actual
    correlation could even be 0
  • Correlation is 0.10 and highly significant the
    level of correlation is very low, but we can be
    confident on the value of such correlation

9
Correlation and covariance in SPSS
Choose between bivariate partial
10
Bivariate correlation
Select the variables you want to analyse
Require the significance level (two tailed)
Ask for additional statistics (if necessary)
11
Bivariate correlation output
12
Partial correlations
List of variables to be analysed
Control variables
13
Partial correlation output
- - - P A R T I A L C O R R E L A T I O N C
O E F F I C I E N T S - - - Controlling for..
SIZE STYLE AMTSPENT USECOUP
ORG AMTSPENT 1.0000 .2677
-.0116 ( 0) ( 775) (
775) P . P .000 P
.746 USECOUP .2677 1.0000 .0500
( 775) ( 0) ( 775)
P .000 P . P .164 ORG
-.0116 .0500 1.0000 ( 775)
( 775) ( 0) P .746 P
.164 P . (Coefficient / (D.F.) / 2-tailed
Significance) " . " is printed if a coefficient
cannot be computed
Partial correlations still measure the
correlation between two variables, but eliminate
the effect of other variables, i.e. the
correlations are computed on consumers shopping
in stores of identical size and with the same
shopping style
14
Bivariate and partial correlations
  • Correlation between Amount spent and Use of
    coupon
  • Bivariate correlation 0.291 (p value 0.00)
  • Partial correlation 0.268 (p value 0.00)
  • The amount spent is positively correlated with
    the use of coupon (0no use, 1from newspaper,
    2from mailing, 3both)
  • The level of correlation does not change much
    after accounting for different shop size and
    shopping styles

15
Linear regression analysis
Intercept
Error
Dependent variable
Independent variable (explanatory variable,
regressor)
Regression coefficient
16
Regression analysis
y
x
17
Example
  • We want to investigate if there is a
    relationship between cholesterol and age on a
    sample of 18 people
  • The dependent variable is the cholesterol level
  • The explanatory variable is age

18
What regression analysis does
  • Determine whether a relationships exist between
    the dependent and explanatory variables
  • Determine how much of the variation in the
    dependent variable is explained by the
    independent variable (goodness of fit)
  • Allow to predict the values of the dependent
    variable

19
Regression and correlation
  • Correlation there is no causal relationship
    assumed
  • Regression we assume that the explanatory
    variables cause the dependent variable
  • Bivariate one explanatory variable
  • Multivariate two or more explanatory variables

20
How to estimate the regression coefficients
  • The objective is to estimate the population
    parameters a e b on our data sample
  • A good way to estimate it is by minimising the
    error ei, which represents the difference between
    the actual observation and the estimated
    (predicted) one

21
The objective is to identify the line (i.e. the a
and b coefficients) that minimise the distance
between the actual points and the fit line
22
The least square method
  • This is based on minimising the square of the
    distance (error) rather than the distance

23
Bivariate regression in SPSS
24
Regression dialog box
Dependent variable
Explanatory variable
Leave this unchanged!
25
Regression output
Statistical significance Is the coefficient
different from 0?
Value of the coefficients
26
Model diagnostics goodness of fit
The value of the R square is included between 0
and 1 and represents the proportion of total
variation that is explained by the regression
model
27
R-square
Total variation
Variation explaned by regression
Residual variation
28
Multivariate regression
  • The principle is identical to bivariate
    regression, but there are more explanatory
    variables
  • The goodness of fit can be measured through the
    adjusted R-square, which takes into account the
    number of explanatory variables

29
Multivariate regression in SPSS
  • Analyze / Regression / Linear

Simply select more than one explanatory variable
30
Output
31
Coefficient interpretation
  • The constant represents the amount spent being 0
    all other variables ( 296.5)
  • Health food stores, Size of store and being
    vegetarian are not significantly different from 0
  • Gender coeff -69.6 On average being woman
    (G1) implies spending 69 less
  • Shopping style coeff 22.8 S
  • S1 (shop per himself) 22.8
  • S2 (shop per himself spouse) 45.6
  • S3 (shop per himself family) 68.4
  • Coupon use coeff 30.4 C
  • C1 (do not use coupon) 30.4
  • C2 (coupon from newspapers) 60.8
  • C3 (coupon from mailings) 91.2
  • C4 (coupon from both) 121.6

Categorization problems?
32
Prediction
  • On average, how much will someone with the
    following characteristics spend
  • Male (G0)
  • Shopping for family (S3)
  • Not using coupons (C1)

33
How good is the model?
  • The regression model explain less than 19 of
    the total variation in the amount spent

34
Task A
  • Examine the relationship between the amount spent
    and the following customer characteristics
  • Being male/female
  • Being vegetarian
  • Shopping for himself / for himself and others
  • Shopping style (weekly, bi-weekly, etc.)
  • Potential methods
  • Battery of hypothesis testing Analysis of
    variance
  • Regression Analysis

35
Task B
  • Examine the relationship between the amount spent
    and the following customer characteristics
  • Hypothesis the average amount spent in
    health-oriented shop is higher than those of
    other shops. True or false?
  • Test the same hypothesis accounting for different
    shop sizes
  • Potential methods
  • Battery of hypothesis testing Analysis of
    variance
  • Regression Analysis

36
Task C
  • Find a relationship between the average amount
    spent per store and the following store
    characteristics
  • Size of store
  • Health-oriented store
  • Store organisation
  • Potential methods
  • Transform the customer data set into a store
    data set
  • Battery of ANOVA
  • Regression Analysis

37
Task D
  • Hypothesis is the amount spent by those that use
    coupon significantly higher?
  • What is the most effective way of distributing
    coupons
  • By mail
  • On newspapers
  • Both
  • Potential methods
  • Recode the variable into 1not using coupon and
    2using coupon
  • Hypothesis testing
  • Analysis of variance
Write a Comment
User Comments (0)
About PowerShow.com