STAT131 Week 3 Lecture 2 Least Squares regression - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

STAT131 Week 3 Lecture 2 Least Squares regression

Description:

Bivariate relationships. Examples of questions regarding bivariate ... Whether there are any bivariate outliers. Whether any points are likely to be influential ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 40
Provided by: AP39
Category:

less

Transcript and Presenter's Notes

Title: STAT131 Week 3 Lecture 2 Least Squares regression


1
STAT131Week 3 Lecture 2 Least Squares regression
  • Anne Porter
  • alp_at_uow.edu.au

2
Bivariate relationships
  • Examples of questions regarding bivariate data
    are
  • Is there a relationship between one quantitative
    variable another quantitative variable?
  • Is there a corrrelation between oxygen uptake
    heart rate?

3
Scatterplots
  • These are used to portray the visual
    relationship.
  • For example there may be a positive relationship
    between two quantitative variables oxygen intake
    and heart rate. Meaning

as
heart rate increases oxygen intake also
increases.
Scatterplots reveal 1. 2. 3. 4.
4
Scatterplots
  • These are used to portray the visual
    relationship.
  • For example there may be a positive relationship
    between two quantitative variables oxygen intake
    and heart rate.
  • Meaning

as heart rate increases
oxygen intake also increases.
  • Scatterplots reveal
  • Whether there is evidence of nonlinearity
  • Approximate strength and direction of the
    relationship
  • Whether there are any bivariate outliers
  • Whether any points are likely to be influential

5
Pearsons Correlation coefficient r
  • Measures the direction and strength of the
    straight line relationship
  • r can take on values from -1 to 1
  • a random scatter of points will give a
    correlation close to 0
  • a scatterplot must be used to see if using the
    correlation is (in)appropriate ie evidence of
    nonlinearity
  • there are many equivalent formulae for
    calculating r

6
Fitting a line through AIDS incidence data
Why might we want to do this?
7
Prediction
  • Models of existing data
  • May be useful for prediction

BUT
8
Many models may fit the data
So how will we choose?
Aids Incidence in NZ
9
Many models may fit the data
So how will we choose?
This will in part depend Upon the
discipline Experts and what they know about the
behaviour of the variable
Aids Incidence in NZ
10
Form of a relationship
  • If variables can be classified as a response (y
    axis) and an explanatory variable (x axis)
  • Then we may want to describe the mathematical
    form of the relationship
  • The simplest form being the equation of a
    straight line

(In SPSS, and the Social Sciences response and
explanatory are referred to as the dependent and
independent variables not to be confused with
probability!)
11
  • Fitting Lines to Data, Unit 12, Decisions through
    Data

12

How do we put a straight line through data ?
13
Possible procedure
  • Put in a line,
  • measure the vertical distances from points to
    lines,
  • square them,
  • sum the squared distances,
  • repeat the process until we have the best fitting
    line
  • that is the line with the smallest sum of squared
    distances.
  • Fortunately we do not have to go through an
    iterative process to do this, we can use the
    mathematics of fitting lines to data.

14
Least squares regression line
Minimise
Minimise the sum of the squared residuals
15
Caution extrapolation
  • Take care when extrapolating beyond the domain of
    the data upon which the model was developed.
  • Do not use models built on one population to
    predict for another eg models built for males are
    probably not suitable for females

16

Fig 1. Relationship between the Australian
English pound (Both in US)
  • What is the nature of the relationship?

As the value of the Pound (in US) increases the
value of the Australian (in US) decreases
English Pound (Value in US)
17
Correlation Coefficients - -What does the
output mean?
18
Correlation Coefficients - -What does the
output mean?
19

Fig 1. Relationship between Australian
English pound (Both in US)
  • What is the form of the relationship?
  • What does the slope look like it will be ?

Negative As one variable increases the other
decreases
English Pound (Value in US)
20

Fig 1. Relationship between Australian
English pound
2
  • What is the intercept with the y axis imagine
    the line.

English Pound (Value in US)
21

Fig 1. Relationship between Australian
English pound
2
  • What is the intercept with the y axis imagine
    the line.

- about 2
English Pound (Value in US)
22

Interpreting the output Use it to write the
equation of the line predicting the Australian
------------------ Variables in the Equation
------------------
23

OUPUT
------------------ Variables in the Equation
------------------
24

OUPUT
------------------ Variables in the Equation
------------------
25

OUTPUT
Variable(s) Entered on Step Number 1.. E_POUND
Multiple R .76471 R Square
.58477 Adjusted R Square .56817 Standard Error
.16689 Analysis of Variance
Proportion of variation In Aust y that can be
explained in terms of the English pound (x)
Measure of variation
F 35.20823 Signif F .0000
26

It is easy to get it wrong by specifying the
wrong variables to be x and y
------------------ Variables in the Equation
------------------
Not near 2.0 as the graph suggested
27
Regression
  • Correct results need correct explanatory (x) and
    response (y) variables.
  • If appropriate, the response variable can be
    predicted from the equation of the straight line
  • There are assumptions we need to check to assess
    adequacy of the model.
  • If F signif is not very small, say lt.05 then the
    equation/model is not useful.

28
Regression
  • The correlation squared is the proportion of
    variation in one variable that can be explained
    in terms of the other.
  • if r0.9 then r2 0.81.
  • 81 of the variation in y can be explained in
    terms of x
  • 19 of the variation in y scores can not be
    explained.

29
Regression
  • If the regression line is good
  • Explains a large proportion of variation
  • Explains more variation than other studies
  • F is significant
  • The value of x is in the domain on which the
    model is based
  • Then we can use it to predict responses with the
    equation

30

Mathematics of fitting lines through many points
Equation of the line

Slope
Intercept on y axis
You MUST correctly specify y the response
variable and x the explanatory variable correctly
31
Simple example
  • Find the form of the relationship between the
    number of siblings (x) and the number of bedrooms
    (y)

Siblings (x) Bedrooms (y) 0
1 2 4 3
5 2 3
What should we do first?
32
Simple example
  • Find the form of the relationship between the
    number of siblings (x) and the number of bedrooms
    (y)

Siblings (x) Bedrooms (y) 0
1 2 4 3
5 2 3
x
x
x
x
What should we do first?
33
Simple example
  • Doing the mathematics to find
  • and
  • to substitute in

Siblings (x) Bedrooms (y) 0
1 2 4 3
5 2 3 7
13
x2 xy
34
Simple example
  • Doing the mathematics to find
  • and
  • to substitute in

Siblings (x) Bedrooms (y) 0
1 2 4 3
5 2 3 7
13
x2 xy
0 8 15 6
0 4 9 4
17 29
35
Simple example
13/43.25
  • Doing the mathematics to find

7/41.75
and to substitute in
x2 xy
Siblings (x) Bedrooms (y) 0
1 2 4 3
5 2 3 7
13
0 4 9 4
0 8 15 6
17 29
36
Simple example
  • Doing the mathematics to find
  • and
  • to substitute in

x2 xy
Siblings (x) Bedrooms (y) 0
1 2 4 3
5 2 3 7
13
0 4 9 4
0 8 15 6
17 29
37
Simple example
  • We can use
  • And
  • To predict the value of y given a value of x by
    substituting in
  • When x 1 sibling we predict
  • When X2 we predict
    bedrooms
  • When X10 we are predicting outside the range of
    the data that was used to develop the model.
    Caution!

38
Simple example
  • We can use
  • And
  • To predict the value of y given a value of x by
    substituting in
  • When x 1 sibling we predict
  • When X2 we predict
    bedrooms
  • When X10 we are predicting outside the range of
    the data that was used to develop the model.
    Caution!

39
Next Lecture
  • Probability
  • Assignment1 may now be completed
Write a Comment
User Comments (0)
About PowerShow.com