Multiple Regression - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Multiple Regression

Description:

What is the predicted salary for someone with 10 GCSEs? Relating the values to the plot ... Number of GCSEs. Y axis. Salary in pounds ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 36
Provided by: mike65
Category:

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
  • Network of Hope

2
Multiple Regression
  • Regression is an extension of Correlations (i.e.
    tests of relationship)
  • Allows a way of summarising multiple correlations
    (relationships)
  • It allows a predictive model to be constructed
  • Correlations expressed as a value between
  • -1 1

3
When to use regression
  • Correlational studies, aiming to find a set of
    variables that predict another variable, or a
    model which helps to explain the role of a
    variable.

X3
Y
a
a
b
b
d
c
c
d
X1
Y
X1
X2
X3
X2
4
When to use Regression
  • Parametric data (usual criteria)
  • Ratio of 15 rows for each variable. So for a 4
    variable regression, there should be 60 rows.
  • Linearity i.e. you should be able to plot a
    straight line through the data.
  • Homoscedascity not Heteroscadiscity

5
Comparison of homoscedascity with Heteroscadiscity
Homoscedascity
Heteroscadiscity
Points evenly spread along line of regression
Points grouped in two separate clusters
6
Extending Correlations
  • Relationships expressed as an r. value of between
    1 1
  • One persons score on two variables

7
Fit Error
  • A line of fit is calculated by using the least
    squares method
  • The error is also known as the residual
    (differences between actual plotted scores and
    predicted line of fit)

8
Line of Fit Fitting a line to the data
  • If we know the degree
  • of relationship between
  • two variables for a set of participants
  • And we know the score of one participant one
  • variable, we can predict the score on the other

9
Line Formula
  • The formula for a straight line is used to
    predict (hence linear regression)
  • Y a bx
  • Y value of DV (what we want to predict, or the
    criterion)
  • a intercept (value of Y where x 0)
  • b slope of line
  • x number of units of the IV or predictor
    variable

10
Y axis DV or criterion

? ? ?
? ? ? ? ?
? ? ? ?
? ? ? ? ? ?
? ? ? ?
? ? ? ? ? ? ? ? ?
Slope amount of change in Y, per unit of X b
Point at which the line intersects with the y
axis a
X axis IV or predictor
11
Some examples to work out
  • In a simple linear regression looking at salary
    and GCSEs, Y is the predicted salary, and x is
    the number of GCSEs. If we know that a 10,000
    and b 1,500
  • What is the predicted salary for someone with 10
    GCSEs?

12
Relating the values to the plot
Y axis Salary in pounds

? ? ?
? ? ? ? ?
? ? ? ?
? ? ? ? ? ?
? ? ? ?
? ? ? ? ? ? ? ? ?
Slope amount of change in Y, per unit of X b,
or increase in salary for each GCSE
Point at which the line intersects with the y
axis a 10,000
X axis Number of GCSEs
13
Calculation
  • Y a bx
  • Y 10,000 (1,500 x10)
  • Y 10,000 15,000
  • Y 25,000
  • Predicted salary 25,000

14
Now try it for multiple regression i.e. when
there is more than one predictor or x variable
  • So imagine we also look at the effects of
    drinking on final salary (a second predictor
    variable or X2)
  • Y a b1x b2x
  • The constant, a is still 10,000, b1 1,500 (as
    before) and b2 -50 (minus 50 is the number of
    pounds income is reduced for each unit of alcohol
    drunk per week)
  • What would the salary be for someone with 5 GCSEs
    (X1) who drinks 20 units of alcohol per week?

15
Summarising Multiple Regression
  • A model for prediction
  • Criterion or DVs
  • Predictors or IVs
  • Predict a variable (criterion or DV) from a set
    of related variables (predictors or IVs)
  • Y lt X1, X2, X3, X4, X5 .

16
Shared Variance
  • Consider the negative relationship indicated
  • We have a negative correlation between height and
    hair length
  • Why might that be?

Height
r -0.63
Hair length
17
Sharing Variance between several variables
Study time
Intelligence
r 0.37
r 0.72
r 0.48
Exam Performance
These relationships can be represented as Venn
diagrams..
18
Shared Variance R2
Intelligence
Correlation cant tell you the multiple value or
R2
Exam performance
19
Partial Correlation
  • Partial correlations examine the unique
    contributions of each x variable in predicting y
  • Partials are correlations between an X variable
    (adjusted by all other variables) and Y adjusted
    by all the X variables)
  • It is a purer representation of the unique
    relationships between two variables

20
Original relationships
Study time
Intelligence
.37
R2
.72
.48
Exam performance
Relationship with intelligence removed
0.54 or 54
Study time
Intelligence
Exam performance
21
Shared VarianceHair length and height
Gender
Height
Hair length
Correlation between Hair length and Height was
-0.63 But, how much was accounted for by gender?
22
Correlation matrix for gender, hair length and
height
  • Hair length is highly correlated with height
  • (-0.63)
  • However we can also see that gender is highly
    correlated with both hair length (0.77) and
    height (0.86)
  • If we partial out gender (i.e. keep it constant)
    then we will see that there is not such a strong
    relationship between height and hair length

23
Correlations with gender partialled out
  • HEIGHT HAIR LENGTH
  • HEIGHT 1.0000 .1034
  • ( 0) ( 25)
  • p . p .304
  • HAIRLENGTH .1034 1.0000
  • ( 25) ( 0)
  • p .304 p .

24
Types of multiple regression
  • Standard or direct (includes all x variables in
    order or record)
  • Hierarchical (includes all x variables in blocks
    decided by the researcher)
  • Forward Stepwise (includes all x variables which
    significantly increase R², in order of
    contribution)
  • Backward stepwise regression (removes all x
    variables which do not significantly reduce R²m
    in order of least contribution)

25
Components of Regression
  • R Multiple correlation coefficient (ranges
    between 1 0)
  • R² coefficient of determination (the square of
    the value above, e.g. r .5 then R² .25, means
    25 of the variance shared between variables in
    solution)
  • Beta Beta weights standardised regression
    coefficients (have direction and magnitude like
    correlation coefficients)

26
Components of regression
  • F F ratio as in ANOVA Assumes that if your
    choice of variables x1, x2, x3 etc is random and
    not systematically related to Y then ratio will
    be roughly 1 to 1
  • Outliers and regression outliers are data
    scores that lie considerably outside of the
    normal distribution. This means they can distort
    your findings. In such cases it is advisable to
    identify and omit if necessary. Why

27
Outlier example
28
Multiple Regression can.
  • Determine the effect of multiple IVs on a single
    DV
  • Isolate the effect of a single IV
  • Indicate the combined effect of all the Ivs
  • Order the IVs in terms of strength of association
    with DV
  • Find the optimum number of IVs

29
Predicting record sales
  • The variables to be assessed (i.e. the predictor
    variables) are
  • Advertising budget
  • Number of plays on radio 1
  • Attractiveness of the band

30
Descriptives and Scattergrams
ZRESID (y axis) against ZPRED (x axis)
31
Correlations ( look for highest correlations of
variables with record sales as sales is the
variable of interest, or criterion variable)
32
Model summary and ANOVA
Use Adjusted R Square
Look to see if model predicting significantly
above chance
33
Collinearity statistics
Check the VIF and tolerance to see if there is
cause for concern
34
Beta Weights
Beta weights are the values obtained when the
regression equation is calculated using z scores.
This allows comparison of different types of
data..
So the best predictors are advertising budget and
? of plays on Radio 1 as they have the highest
values of beta
t and p values tell you whether each variable is
predicting above chance or not
35
Partial correlations?
Note how the correlation between advertising
budget and number of plays is reduced when the
attractiveness of the band is removed..
Write a Comment
User Comments (0)
About PowerShow.com