What Do We Learn from How the Data Vary Around the Regression Line? - PowerPoint PPT Presentation

About This Presentation
Title:

What Do We Learn from How the Data Vary Around the Regression Line?

Description:

Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line? – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 35
Provided by: KateM170
Category:

less

Transcript and Presenter's Notes

Title: What Do We Learn from How the Data Vary Around the Regression Line?


1
Section 11.4
  • What Do We Learn from How the Data Vary Around
    the Regression Line?

2
Residuals and Standardized Residuals
  • A residual is a prediction error the difference
    between an observed outcome and its predicted
    value
  • The magnitude of these residuals depends on the
    units of measurement for y
  • A standardized version of the residual does not
    depend on the units

3
Standardized Residuals
  • Standardized residual
  • The se formula is complex, so we rely on software
    to find it
  • A standardized residual indicates how many
    standard errors a residual falls from 0
  • Often, observations with standardized residuals
    larger than 3 in absolute value represent
    outliers

Typo on Pg 553 of Text. Corrected Version?
4
Example Detecting an Underachieving College
Student
  • Data was collected on a sample of 59 students at
    the University of Georgia
  • Two of the variables were
  • CGPA College Grade Point Average
  • HSGPA High School Grade Point Average

Example 13 in Text
5
Example Detecting an Underachieving College
Student
  • A regression equation was created from the data
  • x HSGPA
  • y CGPA
  • Equation

6
Example Detecting an Underachieving College
Student
  • MINITAB highlights observations that have
    standardized residuals with absolute value larger
    than 2

7
Example Detecting an Underachieving College
Student
  • Consider the reported standardized residual of
    -3.14
  • This indicates that the residual is 3.14 standard
    errors below 0
  • This students actual college GPA is quite far
    below what the regression line predicts

8
Analyzing Large Standardized Residuals
  • Does it fall well away from the linear trend that
    the other points follow?
  • Does it have too much influence on the results?
  • Note Some large standardized residuals may
    occur just because of ordinary random variability

9
Histogram of Residuals
  • A histogram of residuals or standardized
    residuals is a good way of detecting unusual
    observations
  • A histogram is also a good way of checking the
    assumption that the conditional distribution of y
    at each x value is normal
  • Look for a bell-shaped histogram

10
Histogram of Residuals
  • Suppose the histogram is not bell-shaped
  • The distribution of the residuals is not normal
  • However.
  • Two-sided inferences about the slope parameter
    still work quite well
  • The t- inferences are robust

11
The Residual Standard Deviation
  • For statistical inference, the regression model
    assumes that the conditional distribution of y at
    a fixed value of x is normal, with the same
    standard deviation at each x
  • This standard deviation, denoted by s, refers to
    the variability of y values for all subjects with
    the same x value

12
The Residual Standard Deviation
  • The estimate of s, obtained from the data, is

13
Example How Variable are the Athletes
Strengths?
  • From MINITAB output, we obtain s, the residual
    standard deviation of y
  • For any given x value, we estimate the mean y
    value using the regression equation and we
    estimate the standard deviation using s s 8.0

14
Confidence Interval for µy
  • We estimate µy, the population mean of y at a
    given value of x by
  • We can construct a 95 confidence interval for
    µy using

15
Prediction Interval for y
  • The estimate for the mean of y
    at a fixed value of x is also a prediction for an
    individual outcome y at the fixed value of x
  • Most regression software will form this interval
    within which an outcome y is likely to fall
  • This is called a prediction interval for y

(See Figure 11.10)
16
The Residual Standard Deviation
  • Difference in limit of CI and s

17
Prediction Interval for y vs Confidence Interval
for µy
  • The prediction interval for y is an inference
    about where individual observations fall
  • Use a prediction interval for y if you want to
    predict where a single observation on y will fall
    for a particular x value

18
Prediction Interval for y vs Confidence Interval
for µy
  • The confidence interval for µy is an inference
    about where a population mean falls
  • Use a confidence interval for µy if you want to
    estimate the mean of y for all individuals having
    a particular x value

19
Example Predicting Maximum Bench Press and
Estimating its Mean
20
Example Predicting Maximum Bench Press and
Estimating its Mean
  • Use the MINITAB output to find and interpret a
    95 CI for the population mean of the maximum
    bench press values for all female high school
    athletes who can do x 11 sixty-pound bench
    presses
  • For all female high school athletes who can do 11
    sixty-pound bench presses, we estimate the mean
    of their maximum bench press values falls between
    78 and 82 pounds

21
Example Predicting Maximum Bench Press and
Estimating its Mean
  • Use the MINITAB output to find and interpret a
    95 Prediction Interval for a single new
    observation on the maximum bench press for a
    randomly chosen female high school athlete who
    can do x 11 sixty-pound bench presses
  • For all female high school athletes who can do 11
    sixty-pound bench presses, we predict that 95 of
    them have maximum bench press values between 64
    and 96 pounds

22
Decomposing the Error
OR Regression SS Residual SS Total SS
F(MS Reg)/(MSE). More general the t test (in
cases studied in this class it is effectively t
squared) However in more complicated models (more
explanatory variables) the difference and utility
of this becomes apparent
23
Section 11.5
  • Exponential Regression A Model for Nonlinearity

24
Nonlinear Regression Models
  • If a scatterplot indicates substantial curvature
    in a relationship, then equations that provide
    curvature are needed
  • Occasionally a scatterplot has a parabolic
    appearance as x increases, y increases then it
    goes back down
  • More often, y tends to continually increase or
    continually decrease but the trend shows
    curvature

25
Example Exponential Growth in Population Size
  • Since 2000, the population of the U.S. has been
    growing at a rate of 2 a year
  • The population size in 2000 was 280 million
  • The population size in 2001 was 280 x 1.02
  • The population size in 2002 was 280 x (1.02)2
  • The population size in 2010 is estimated to be
  • 280 x (1.02)10
  • This is called exponential growth

26
Exponential Regression Model
  • An exponential regression model has the formula
  • For the mean µy of y at a given value of x, where
    a and ß are parameters

27
Exponential Regression Model
  • In the exponential regression equation, the
    explanatory variable x appears as the exponent of
    a parameter
  • The mean µy and the parameter ß can take only
    positive values
  • As x increases, the mean µy increases when ßgt1
  • It continually decreases when 0 lt ßlt1

28
Exponential Regression Model
  • For exponential regression, the logarithm of the
    mean is a linear function of x
  • When the exponential regression model holds, a
    plot of the log of the y values versus x should
    show an approximate straight-line relation with x

29
Example Explosion in Number of People Using the
Internet
30
Example Explosion in Number of People Using the
Internet
31
Example Explosion in Number of People Using the
Internet
32
Example Explosion in Number of People Using the
Internet
  • Using regression software, we can create the
    exponential regression equation
  • x the number of years since 1995. Start with x
    0 for 1995, then x1 for 1996, etc
  • y number of internet users
  • Equation

33
Interpreting Exponential Regression Models
  • In the exponential regression model,
  • the parameter a represents the mean value of y
    when x 0
  • The parameter ß represents the multiplicative
    effect on the mean of y for a one-unit increase
    in x

34
Example Explosion in Number of People Using the
Internet
  • In this model
  • The predicted number of Internet users in 1995
    (for which x 0) is 20.38 million
  • The predicted number of Internet users in 1996 is
    20.38 times 1.7708
Write a Comment
User Comments (0)
About PowerShow.com