Multiple Regression - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Multiple Regression

Description:

What is the predicted salary for someone with 10 GCSEs? Relating the values to the plot ... Number of GCSEs. Y axis. Salary in pounds ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 36

Provided by: mike65

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Regression

1
Multiple Regression

Network of Hope

2
Multiple Regression

Regression is an extension of Correlations (i.e.
tests of relationship)
Allows a way of summarising multiple correlations
(relationships)
It allows a predictive model to be constructed
Correlations expressed as a value between
-1 1

3
When to use regression

Correlational studies, aiming to find a set of
variables that predict another variable, or a
model which helps to explain the role of a
variable.

X3
Y
a
a
b
b
d
c
c
d
X1
Y
X1
X2
X3
X2
4
When to use Regression

Parametric data (usual criteria)
Ratio of 15 rows for each variable. So for a 4
variable regression, there should be 60 rows.
Linearity i.e. you should be able to plot a
straight line through the data.
Homoscedascity not Heteroscadiscity

5
Comparison of homoscedascity with Heteroscadiscity
Homoscedascity
Heteroscadiscity
Points evenly spread along line of regression
Points grouped in two separate clusters
6
Extending Correlations

Relationships expressed as an r. value of between
1 1
One persons score on two variables

7
Fit Error

A line of fit is calculated by using the least
squares method
The error is also known as the residual
(differences between actual plotted scores and
predicted line of fit)

8
Line of Fit Fitting a line to the data

If we know the degree
of relationship between
two variables for a set of participants
And we know the score of one participant one
variable, we can predict the score on the other

9
Line Formula

The formula for a straight line is used to
predict (hence linear regression)
Y a bx
Y value of DV (what we want to predict, or the
criterion)
a intercept (value of Y where x 0)
b slope of line
x number of units of the IV or predictor
variable

10
Y axis DV or criterion

? ? ?
? ? ? ? ?
? ? ? ?
? ? ? ? ? ?
? ? ? ?
? ? ? ? ? ? ? ? ?
Slope amount of change in Y, per unit of X b
Point at which the line intersects with the y
axis a
X axis IV or predictor
11
Some examples to work out

In a simple linear regression looking at salary
and GCSEs, Y is the predicted salary, and x is
the number of GCSEs. If we know that a 10,000
and b 1,500
What is the predicted salary for someone with 10
GCSEs?

12
Relating the values to the plot
Y axis Salary in pounds

? ? ?
? ? ? ? ?
? ? ? ?
? ? ? ? ? ?
? ? ? ?
? ? ? ? ? ? ? ? ?
Slope amount of change in Y, per unit of X b,
or increase in salary for each GCSE
Point at which the line intersects with the y
axis a 10,000
X axis Number of GCSEs
13
Calculation

Y a bx
Y 10,000 (1,500 x10)
Y 10,000 15,000
Y 25,000
Predicted salary 25,000

14
Now try it for multiple regression i.e. when
there is more than one predictor or x variable

So imagine we also look at the effects of
drinking on final salary (a second predictor
variable or X2)
Y a b1x b2x
The constant, a is still 10,000, b1 1,500 (as
before) and b2 -50 (minus 50 is the number of
pounds income is reduced for each unit of alcohol
drunk per week)
What would the salary be for someone with 5 GCSEs
(X1) who drinks 20 units of alcohol per week?

15
Summarising Multiple Regression

A model for prediction
Criterion or DVs
Predictors or IVs
Predict a variable (criterion or DV) from a set
of related variables (predictors or IVs)
Y lt X1, X2, X3, X4, X5 .

16
Shared Variance

Consider the negative relationship indicated
We have a negative correlation between height and
hair length
Why might that be?

Height
r -0.63
Hair length
17
Sharing Variance between several variables
Study time
Intelligence
r 0.37
r 0.72
r 0.48
Exam Performance
These relationships can be represented as Venn
diagrams..
18
Shared Variance R2
Intelligence
Correlation cant tell you the multiple value or
R2
Exam performance
19
Partial Correlation

Partial correlations examine the unique
contributions of each x variable in predicting y
Partials are correlations between an X variable
(adjusted by all other variables) and Y adjusted
by all the X variables)
It is a purer representation of the unique
relationships between two variables

20
Original relationships
Study time
Intelligence
.37
R2
.72
.48
Exam performance
Relationship with intelligence removed
0.54 or 54
Study time
Intelligence
Exam performance
21
Shared VarianceHair length and height
Gender
Height
Hair length
Correlation between Hair length and Height was
-0.63 But, how much was accounted for by gender?
22
Correlation matrix for gender, hair length and
height

Hair length is highly correlated with height
(-0.63)
However we can also see that gender is highly
correlated with both hair length (0.77) and
height (0.86)
If we partial out gender (i.e. keep it constant)
then we will see that there is not such a strong
relationship between height and hair length

23
Correlations with gender partialled out

HEIGHT HAIR LENGTH
HEIGHT 1.0000 .1034
( 0) ( 25)
p . p .304
HAIRLENGTH .1034 1.0000
( 25) ( 0)
p .304 p .

24
Types of multiple regression

Standard or direct (includes all x variables in
order or record)
Hierarchical (includes all x variables in blocks
decided by the researcher)
Forward Stepwise (includes all x variables which
significantly increase R², in order of
contribution)
Backward stepwise regression (removes all x
variables which do not significantly reduce R²m
in order of least contribution)

25
Components of Regression

R Multiple correlation coefficient (ranges
between 1 0)
R² coefficient of determination (the square of
the value above, e.g. r .5 then R² .25, means
25 of the variance shared between variables in
solution)
Beta Beta weights standardised regression
coefficients (have direction and magnitude like
correlation coefficients)

26
Components of regression

F F ratio as in ANOVA Assumes that if your
choice of variables x1, x2, x3 etc is random and
not systematically related to Y then ratio will
be roughly 1 to 1
Outliers and regression outliers are data
scores that lie considerably outside of the
normal distribution. This means they can distort
your findings. In such cases it is advisable to
identify and omit if necessary. Why

27
Outlier example
28
Multiple Regression can.

Determine the effect of multiple IVs on a single
DV
Isolate the effect of a single IV
Indicate the combined effect of all the Ivs
Order the IVs in terms of strength of association
with DV
Find the optimum number of IVs

29
Predicting record sales

The variables to be assessed (i.e. the predictor
variables) are
Advertising budget
Number of plays on radio 1
Attractiveness of the band

30
Descriptives and Scattergrams
ZRESID (y axis) against ZPRED (x axis)
31
Correlations ( look for highest correlations of
variables with record sales as sales is the
variable of interest, or criterion variable)
32
Model summary and ANOVA
Use Adjusted R Square
Look to see if model predicting significantly
above chance
33
Collinearity statistics
Check the VIF and tolerance to see if there is
cause for concern
34
Beta Weights
Beta weights are the values obtained when the
regression equation is calculated using z scores.
This allows comparison of different types of
data..
So the best predictors are advertising budget and
? of plays on Radio 1 as they have the highest
values of beta
t and p values tell you whether each variable is
predicting above chance or not
35
Partial correlations?
Note how the correlation between advertising
budget and number of plays is reduced when the
attractiveness of the band is removed..

Write a Comment

User Comments (0)