Linear Regression and the Coefficient of Determination - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Linear Regression and the Coefficient of Determination

Description:

Section 4.2 Linear Regression and the Coefficient of Determination 4.2 / * Example cont. Step 1. Enter the data into the lists. Step 2. Create a scatter plot of the ... – PowerPoint PPT presentation

Number of Views:440
Avg rating:3.0/5.0
Slides: 39
Provided by: Odysseas
Category:

less

Transcript and Presenter's Notes

Title: Linear Regression and the Coefficient of Determination


1
Section 4.2
  • Linear Regression and the Coefficient of
    Determination

2
The Least Squares Line
  • When there appears to be a linear relationship
    between x and y we attempt to fit a line to
    the scatter diagram.

Least Squares Criterion
The sum of the squares of the vertical distances
from the points to the line is made as small as
possible.
3
Least Squares Criterion
  • d represents the difference between the y
    coordinate of the data point and the
    corresponding y coordinate on the line.
  • Thus if the data point lies above the line, d is
    positive, but if the data point lies below the
    line, d is negative.
  • As a result, the sum of the d values can be small
    even if the points are widely spread in the
    scatter diagram.
  • However, the squares cannot be negative.
  • By minimizing the sum of the squares, we are, in
    effect, not allowing positive and negative d
    values to cancel out one another in the sum.
  • It is this way that we can meet the least-squares
    critirion of minimizing the sum of the squares of
    the vertical distances between the points and the
    line over all points in the scatter diagram.

4
Equation of the Least Squares Line
  • y a bx

a the y-intercept
b the slope
5
Finding the Equation of the Least Squares Line
  • Obtain a random sample of n data pairs (x, y).
  • 1. Using the data pairs, compute Sx, Sy, Sx2,
    Sy2, and Sxy.
  • Compute the sample means

6
Finding the Slope
  • 2. Use the following formula
  • Finding the y-intercept

7
ExampleFind the Least Squares Line
8
Example cont.Finding the Slope
9
Example cont.Finding the y-intercept
The equation of the least squares line is
y a bx y 2.77 1.70x
10
Graph the least-Squares Line
  • We can use the slope-intercept method of algebra,
    but may not always be convenient if the intercept
    is not within the range of the sample data
    values.
  • It is better to select two x values in the range
    of the x data values and then use the
    least-squares line to compute two corresponding y
    values.
  • The point is always on the
    least-squares line.
  • To find another point, give x a value and find
    the y.
  • In our example (8.3 , 16.9)

Try x 5. Compute y y 2.8 1.7(5) 11.3
11
Graphing the least squares line
  • Using two values in the range of x, compute two
    corresponding y values.
  • Plot these points.
  • Join the points with a straight line.

12
Sketching the Line
13
Meaning of Slope
y a bx
  • In the equation , the
    slope b tell us how many units y changes for each
    unit change in x.
  • In our example regarding the miles traveled and
    the time in minutes
  • y 2.77 1.70x
  • The slope 1.70 tell us that a change in one mile
    takes in average 1.70 minutes.
  • The slope of the least-squares line tells how
    many units the response variable is expected to
    change for each unit change in the explanatory
    variable. The number of units change in the
    response variable for each unit change in the
    explanatory variable is called marginal change of
    the response variable.

14
Using the Equation of the Least Squares Line to
Make Predictions
  • Choose a value for x (within the range of x
    values).
  • Substitute the selected x in the least squares
    equation.
  • Determine corresponding value of y.

15
Predict the time to make a trip of 14 miles
  • Equation of least squares line
  • y 2.8 1.7x
  • Substitute x 14
  • y 2.8 1.7 (14)
  • y 26.6
  • According to the least squares equation, a trip
    of 14 miles would take 26.6 minutes.

16
Interpolation
  • using the least squares line to predict y values
    for x values that are between observed x values
    in the data set.

Extrapolation
using the least squares line to predict y values
for x values that are beyond observed x values in
the data set.
17
Extreme Data Points
  • The least squares line can be greatly affected by
    extreme or influential data points.

18
The least squares line
  • Is developed from sample data pairs (x, y).
  • May not reflect the relationship between x and y
    for values of x outside the data range.
  • For example, there is a fairly high correlation
    between height and age for boys ages 1 year to 10
    years. In general the older the boy, the taller
    the boy. A least-squares line based on such date
    give good predictions of height for ages 1 to 10.
  • However, it would be fairly meaningless to use
    the same linear regression line to predict the
    height of 20 to 50 years old.

19
The least squares line
  • Each different sample data will produce a
    slightly different equation for the least-squares
    line.
  • The least-squares line developed with x as the
    explanatory variable and y as the response
    variable can be used only to predict y values
    from specified x values.



20
A statistic related to r
  • If the sample correlation coefficient is r
  • The coefficient of determination r2

How good is the least squares line as an
instrument of regression? The answer is the
coefficient of determination
Coefficient of Determination
Is a measure of the proportion of the variation
in y that is explained by the regression line
using x as the predicting variable
21
Interpretation of r2
  • If r 0.9753643, then what percent of the
    variation in minutes (y) is explained by the
    linear relationship with x, miles traveled?
  • What percent is unexplained?
  • If r 0.9753643, then r2 .9513355
  • Approximately 95 percent of the variation in
    minutes (y) is explained by the linear
    relationship with x, miles traveled.
  • is unexplained (due to the
    random chance or the probability of lurking
    variables that influence y).

Assignments 7, 8 and 9
22
Correlation Coefficient r Coefficient of
Determination, r 2 (calc)
  • The correlation coefficient, r, and the
    coefficient of determination, r 2 ,will appear
    on the screen that shows the regression equation
    information (be sure the Diagnostics are turned
    on ---2nd Catalog (above 0), arrow down to
    DiagnosticOn, press ENTER twice.)
  • In addition to appearing with the regression
    information, the values r and r 2 can be found
    under
  • VARS, 5 Statistics ? EQ 7 r and 8 r 2 .

23
Linear Regression (calc)
  • A linear regression is also know as the "line of
    best fit". 
  • Side note  Although commonly used when dealing
    with "sets" of data, the linear regression can
    also be used to simply find the equation of the
    line between two points.Find the equation of the
    line passing through (-1, 1) and (-4,7).Entering
    the information as described in the example
    below, we see the following screens
  • The equation is y -2x -1.The correlation
    coefficient is -1 since both point are "on" the
    line and the line slopes negatively

24
Linear Regression Model Example (calc)
  • Let's examine an example of the linear regression
    as it pertains to a "set" of data. 
  • Data  Is there a relationship between Math SAT
    scores and the number of hours spent studying for
    the test?  A study was conducted involving 20
    students as they prepared for and took the Math
    section of the SAT Examination.
  • Let x be the Hours Spent Studying and y be Math
    SAT Score
  • x y x y x y
  • 4 390 22 790 10 690
  • 9 580 1 350 11 690
  • 10 650 3 400 16 770
  • 14 730 8 590 13 700
  • 4 410 11 640 13 730
  • 7 530 5 450 10 640
  • 12 600 6 520

25
Linear Regression Model Example cont.
  • Task
  • a) Determine a linear regression model equation
    to represent this data.  
  • b) Graph the new equation.  
  • c) Decide whether the new equation is a "good
    fit" to represent this data.  
  • d) Interpolate data  If a student studied for
    15 hours, based upon this study, what would be
    the expected Math SAT score?
  • e) Interpolate data  If a student obtained a
    Math SAT score of 720, based upon this study, how
    many hours did the student most likely spend
    studying?  
  • f) Extrapolate data  If a student spent 100
    hours studying, what would be the expected Math
    SAT score?  Discuss this answer. Any answers in
    relation to this problem are to be rounded to the
    nearest tenth.If rounding is not indicated in a
    problem, leave the full calculator entries as
    answers

26
Linear Regression Model Example cont.
  • Step 1.  Enter the data into the lists. 
  • Step 2.  Create a scatter plot of the data. 
         Go to STATPLOT (2nd Y) and choose the
    first plot.  Turn the plot ON, set the icon to
    Scatter Plot (the first one), set Xlist to L1 and
    Ylist to L2 (assuming that is where you stored
    the data), and select a Mark of your choice. 
  • Step 3.  Choose Linear Regression Model.    
    Press STAT, arrow right to CALC, and arrow down
    to 4 LinReg (axb).  Hit ENTER.  When LinReg
    appears on the home screen, type the parameters
    L1, L2, Y1.  The Y1 will put the equation into Y
    for you.        (Y1 comes from VARS ? YVARS,
    Function, Y1)

27
Linear Regression Model Example cont.
  • Step 4.  Graph the Linear Regression Equation
    from Y1.     ZOOM 9 ZoomStat to see the graph.
    (answer to part b)
  • Step 5.  Is this model a "good fit"?     The
    correlation coefficient, r, is .9336055153 which
    places the correlation into the "strong"
    category.  (0.8 or greater is a "strong"
    correlation)     The coefficient of
    determination, r 2, is .8716192582 which means
    that 87 of the total variation in y can be
    explained by the relationship between x and y. 
    The other 13 remains unexplained.     Yes, it
    is a "good fit".          (answer to part c)

28
Linear Regression Model Example cont.
  • Step 6.  Interpolate  (within the data set)    
     If a student studied for 15 hours, based upon
    this study, what would be the expected Math SAT
    score? From the graph screen, hit TRACE, arrow
    up to obtain the linear equation at the top of
    the screen, type 15, hit ENTER, and the answer
    will appear at the bottom of the
    screen.                                        
    (answer to part d --                       
    Math SAT score of 733.1)

29
Linear Regression Model Example cont.
  • Step 7.  Interpolate  (within the data
    set)   If a student obtained a Math SAT score of
    720, based upon this study, how many hours did
    the student most likely spend studying?  Go to
    TBLSET (above WINDOW) and set the TblStart to 13
    (since 13 hours gives a score of 700).  Set the
    delta Tbl to a decimal setting of your choice. 
    Go to TABLE and arrow up or down to find your
    desired score of 720, in the Y1 column         
                       
  • (answer to part e --  approx. 14.5 hours)

30
Linear Regression Model Example cont.
  • Step 8.  Extrapolate data  (beyond the data
    set)     If a student spent 100 hours studying,
    what would be the expected Math SAT score?     
    Discuss this answer.                 
  • With your linear equation in Y1, go to the home
    screen and type Y1(100).  Press ENTER. (Y1 comes
    from VARS ? YVARS, Function, Y1(100))
  • Our equation shows that if a student studies 100
    hours, he/she should score 2885.8 on the Math
    section of the SAT examination.  The only problem
    with this answer is that the highest score that
    can be obtained is 800.  So why is this score so
    outrageous?   ANSWER  When you extrapolate data,
    the further you move away from the data set, the
    less accurate your information becomes.  In this
    problem, the largest number of hours in the data
    set was 22 hours, but the extrapolation tried to
    jump to 100 hours. (answer to part f)

31
ExampleLinear Regression with Biological
Data(or the realities of working with real-life
data)
  • Pierce (1949) measured the frequency (thenumber
    of wing vibrations per second) of chirps made by
    a ground cricket, at various ground
    temperatures.  Since crickets are ectotherms
    (cold-blooded), the rate of their physiological
    processes and their overall metabolism are
    influenced by temperature.  Consequently, there
    is reason to believe that temperature would have
    a profound effect on aspects of their behavior,
    such as chirp frequency.

32
Example cont.
  • Chirps/Second Temperature (º F)
  • 20.0 88.6
  • 16.0 71.6
  • 19.8 93.3
  • 18.4 84.3
  • 17.1 80.6
  • 15.5 75.2
  • 14.7 69.7
  • 17.1 82.0
  • 15.4 69.4
  • 16.2 83.3
  • 15.0 78.6
  • 17.2 82.6
  • 16.0 80.6
  • 17.0 83.5
  • 14.1 76.3

33
Example cont.
  • Task
  • Determine a linear regression model equation to
    represent this data  
  • Graph the new equation.  
  • Decide whether the new equation is a "good fit"
    to represent this data.  
  • Extrapolate data  If the ground temperature
    reached 95º, then at what approximate rate would
    you expect the crickets to be chirping?  
  • Interpolate data  With a listening device, you
    discovered that on a particular morning the
    crickets were chirping at a rate of 18 chirps per
    second.  What was the approximate ground
    temperature that morning?   
  • f) If the ground temperature should drop to
    freezing (32º F), what happens to the
    cricket's chirping?   Answers in this problem are
    to be rounded to the nearest thousandth.

34
Example cont.
  • Step 1.  Enter the data into the lists. 
  • Step 2.  Create a scatter plot of the data. 
         Go to STATPLOT (2nd Y) and choose the
    first plot.  Turn the plot ON, set the icon to
    Scatter Plot (the first one), set Xlist to L1 and
    Ylist to L2 (assuming that is where you stored
    the data), and select a Mark of your
    choice.Obviously, there is some scatter to this
    data. This variability is the norm, rather than
    the exception, when working with biological data
    sets.  Real life data seldom creates a nice
    straight line.
  • Step 3.  Choose the Linear Regression Model.    
    Press STAT, arrow right to CALC, and arrow down
    to 4 LinReg (axb).  Hit ENTER.  When LinReg
    appears on the home screen, type the parameters
    L1, L2, Y1.  The Y1 will put the equation in to
    Y for you.             (Y1 comes from VARS ?
    YVARS, Function, Y1)

35
Example cont.
  • Step 4.  Graph the Linear Regression Equation
    from Y1.     ZOOM 9 ZoomStat to see the graph.
    (answer to part b)
  • Step 5.  Is this model a "good fit"?     The
    correlation coefficient, r, is .8364792791 which
    just barely places the correlation into the
    "strong" category.  (0.8 or greater is a "strong"
    correlation)     The coefficient of
    determination, r 2, is .6996975844 which means
    that 70 of the total variation in y can be
    explained by the relationship between x and y. 
    The other 30 remains unexplained.     Yes, it
    is somewhat of a "good fit". (answer to part
    c)

36
Example cont.
  • Step 6.  Extrapolate  (beyond the data set)    
     If the ground temperature reached 95º, then at
    what approximate rate would you expect the
    crickets to be chirping?Go to TBLSET (above
    WINDOW) and set the TblStart to 20 (since the
    highest temperature in the data set had 19.8
    chirps/second).  Set the delta Tbl to a decimal
    setting of your choice.  Go to TABLE (above
    GRAPH) and arrow up or down to find your desired
    temperature, 95º, in the Y1 column.           
    (answer to part d --  approx. 21.265 chirps per
    second)

37
Example cont.
  • Step 7.  Interpolate                           
    (within the data set)     With a listening
    device, you discovered that on a particular
    morning the crickets were chirping at a rate of
    18 chirps per second. 
  • What was the approximate ground temperature that
    morning?  From the graph screen, hit TRACE,
    arrow up to obtain the power equation, type 47,
    hit ENTER, and the answer will appear at the
    bottom of the screen. (answer to part e --  the
    ground temperature will be approx. 84.407º F)

38
Example cont.
  • Step 8.  If the ground temperature should drop to
    freezing (32º F), what happens to the cricket's
    chirping?
  • The TABLE tells us that at 32º F there are 1.85
    chirps per second.  So, what does this really
    mean?  Are the crickets cold?
  • These findings are a bit deceiving.  At 32º F,
    the crickets are dead.  The lifespan of a cricket
    in a cold climate is very short.  The crickets
    spend the winter as eggs laid in the soil.  These
    eggs hatch in late spring or early summer, and
    tiny immature crickets called nymphs emerge. 
    Nymphs develop into adults within approximately
    90 days. The adults mate and lay eggs in late
    summer before succumbing to old age or freezing
    temperatures in the fall.
  • Also, remember that the further you extrapolate
    away from the data set, the less reliable the
    information will be.
Write a Comment
User Comments (0)
About PowerShow.com