Simple Linear Regression and Correlation: Inferential Method - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Simple Linear Regression and Correlation: Inferential Method

Description:

... weight loss was recorded for each wrestler after exercising for 15 min and then ... What is the expected weight loss (in pounds) for a 190-lb wrestler? 10 ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 36
Provided by: shish
Category:

less

Transcript and Presenter's Notes

Title: Simple Linear Regression and Correlation: Inferential Method


1
Chapter 13
  • Simple Linear Regression and Correlation
    Inferential Method

2
13.1 Simple Linear Regression Model
  • Deterministic Model and Probabilistic Model
  • Deterministic Model The value of y is completely
    determined by the value of an independent
    variable x.
  • y f(x).
  • Probabilistic Model The variables of interest (y
    and x) are not deterministically related. The
    equation of the additive probabilistic model is
  • y deterministic function of x random
    deviation
  • f(x) e

3
The Simple Linear Regression Model
  • The simple linear regression model assumes that
    there is a line with y-intercept a and slope ß,
    called the true or population regression line.
    When a value of the independent variable x is
    fixed and an observation on the dependent
    variable y is made,
  • y a ßx e
  • Without the random deviation e, all observed (x,
    y) points would fall exactly on the population
    regression line. The inclusion of e in the model
    equation recognizes that points will deviate from
    the line.

4
Two observations resulting from the simple linear
regression model
5
Basic Assumptions of the Simple Linear Regression
Model
  • The distribution of e at any particular x value
    has mean value 0. That is µe 0.
  • The standard deviation of e (which describes the
    spread of its distribution) is the same for any
    particular value of x. This standard deviation is
    denoted by s.
  • The distribution of e at any particular x value
    is normal.
  • The random deviations e1, e2, , en associated
    with different observations are independent of
    one another.

6
The distribution of y values in repeated sampling
  • For any fixed x value, y itself has a normal
    distribution, with
  • µy (mean y value for fixed x) (height of the
    population regression line above x) a ßx
    and
  • sy (standard deviation of y for a fixed x) s
  • The slope ß of the population regression line is
    the average change in y associated with a 1-unit
    increase in x.
  • The y intercept a is the height of the population
    line when x0.
  • The value of s determines the extent to which (x,
    y) observations deviate from the population line
  • When s is small, most observations are quite
    close to the line,
  • When s is large, there are likely to be some
    substantial deviations.

7
Simple Linear Regression Model
8
Example Stand on Your Head to Lose Weight?
  • Amateur wrestlers who are overweight near the end
    of the weight certification period, but just
    barely so, have been known to stand on their
    heads for a minute or two, get on their feet,
    step back on the scale, and establish that they
    are in the desired weight class. Using a head
    stand as the method of last resort has become a
    fairly common practice in amateur wrestling. Does
    this really work?
  • Data were collected in an experiment where weight
    loss was recorded for each wrestler after
    exercising for 15 min and then doing a headstand
    for 1 min 45 sec.
  • Based on the data, it was concluded that there
    was in fact a demonstrable weight loss that was
    greater than that for a control group that
    exercised for 15 min but did not do the headstand.

Continued on next page
9
Example Stand on Your Head to Lose Weight?
  • Let y weight loss (in pounds) and x body
    weight before exercise and headstand (in pounds).
  • The author concluded that a simple linear
    regression model was a reasonable way to relate y
    and x. Suppose the actual model equation has a
    0, ß 0.001, and s 0.09.
  • If the distribution of the random errors at
    any fixed weight (x value) is normal, then the
    variable y weight loss is normally distributed
    with
  • µy 0 0.001x, sy 0.09.
  • What is the expected weight loss (in pounds) for
    a 190-lb wrestler?

µy 0.001(190).19 lb
10
Some commonly encountered patterns in scatterplots
In practice, the judgment of whether the simple
linear regression model is appropriate must based
on how the data were collected and on a
scatterplot of the data.
Figure (a) Pattern consistent with the simple
linear regression model Figure (b) pattern
consistent with a nonlinear probabilistic
model Figure (c) Pattern suggesting that
variability in y changes with x
11
(No Transcript)
12
Estimating the Population Regression Line
  • The estimated regression line is just the
    least-squares line
  • Let x denote a specified value of the predictor
    variable x. then a bx has two different
    interpretations
  • It is a point estimate of the mean y value when
    xx, and
  • It is a point prediction of an individual y value
    to be observed when xx.

13
Example Mothers Age and Babys Birth Weight
  • Medical researchers have noted that adolescent
    females are much more likely to deliver
    low-weight babies than are adult females. An
    article gives the data in the table on the right
    with
  • x maternal age (in years) and
  • y birth weight of baby (in grams)
  • (a) Find the equation of the estimated regression
    line.
  • (b) Find the point estimate of the average birth
    weight of babies born to 18-year-old mothers.
  • (c) Predict the birth weight of a baby to be born
    to a particular 18-year-old mother.

14
Solution to Example Mothers Age and Babys
Birth Weight
  • (a) The equation of the estimated linear
    regression line is the least squares line.

The least squares line is
Excel solution on next slides
15
The estimated regression line is just the
least-squares line, and therefore, we can use
Excel (like in Chapter 5) to find the estimated
regression lineData ? Data Analysis ? Regression
16
Input x and y ranges. (y range comes first in the
Regression dialog box.)
17
In the Excel output, we find a and b in the
Coefficients column a -1163.45 and b
245.15. The estimated regression line is then
18
(No Transcript)
19
Some Remarks about se
  • In simple linear regression, estimation of a and
    ß results in a loss of 2 degrees of freedom,
    leaving n - 2 as the number of degrees of freedom
    for SSResid, se2 and se.
  • The coefficient of determination
  • can be interpreted as the proportion of observed
    y variation that can be explained by the model
    relationship.
  • se is the magnitude of a typical sample
    deviation (residual) from the least-squares line.
    The smaller the value se, the closer the points
    in the sample fall to the line and the better the
    line does in predicting y from x.

20
Example Woodpecker Hole Depth
  • Woodpeckers are a valuable forest asset. An
    article reported on a study of how woodpeckers
    behaved when provided with polystyrene cylinders
    as an alternative roost and nest cavity substrate
    at different ambient temperature. (See data on
    next slide.)
  • Let
  • x ambient temperature (ºC) and
  • y cavity depth (in centimeters)
  • (a) Find the estimated linear regression line.
  • (b) Does the model appear to be useful for
    estimation and prediction?

The scatterplot shows a negative linear
relationship between x and y.
Solution From Excel output, the estimated
linear regression line is
Data on next slide and Excel output on the slide
after next
21
Data for Example Woodpecker Hole Depth
22
  • r2 0.767 indicates that 76.7 of the observed
    variation in cavity depth y can be attributed to
    the probabilistic linear relationship with
    ambient temperature.
  • The estimated standard deviation se 2.33 is the
    magnitude of a typical sample deviation from the
    least squares line, which is reasonably small
    compared to y values. So the model appears to be
    useful.

23
13.2 Inference about ß (the slope of the
population regression line)
  • The slope ß in the simple linear regression model
    is the average or expected change in y associated
    with a 1-unit increase in x.
  • The value of ß is almost always unknown, it has
    to be estimated from the slope b of the
    least-squares line.
  • The value of the statistic b may vary from sample
    to sample, so how accurately does b estimate ß?
  • We need some facts about the sampling
    distribution of b
  • Where is the curve centered relative to ß?
  • How much does the curve spread out about its
    center?

24
Properties of the Sampling Distribution b
  • When the four basic assumptions of the simple
    linear regression model are satisfied, the
    following conditions are met
  • The mean value of b is ß. That is µb ß, so the
    sampling distribution of b is always centered at
    the value of ß.
  • The standard deviation of the statistic b is
  • The statistic b has a normal distribution (a
    consequence of the model assumption that the
    random deviation e is normally distributed.

25
The estimated standard deviation of b
When the four basic assumptions of the simple
linear regression model are satisfied, the
probability distribution of the standardized
variable
is the t distribution with df n - 2.
26
Confidence Interval for ß
  • When the four basic assumptions of the simple
    linear regression model are satisfied, a
    confidence interval for ß, the slope of the
    population regression line, has the form
  • b (t critical value) (sb)
  • where the t critical value is based on df n -
    2.

27
Example Athletic Performance and Cardiovascular
Fitness
  • Is cardiovascular fitness (as measured by time to
    exhaustion from running on a treadmill) related
    to an athletes performance in a 20-km ski race?
  • Let x treadmill time to exhaustion (in
    minutes) and
  • y 20-km ski time (in minutes).
  • Construct a 95 confidence interval for ß, the
    slope of the population regression line.
  • Solution The slope ß is the average change in
    ski time associated with 1-minute increase in
    treadmill time.
  • Assumption The distribution of errors at
    any given x is approximately normal.
  • A t critical value based on df n 2 11
    2 9 is 2.26 from Appendix Table 3.

Continue on next slide
28
From Excel output below b -2.3335 and sb
.591. The 95 confidence interval for ß is b
(t critical value) sb -2.3335
(2.26)(.591) -2.3335 1.336 (-3.671, -.999).
29
Hypothesis Tests Concerning ß
  • Null hypothesis H0 ß hypothesized value
  • Test Statistic (The test is based on df n - 2.)
  • Alternative Hypothesis P-Value
  • Ha ß gt hypothesized value Area to the right
    of the computed t under the appropriate t
    curve
  • Ha ß lt hypothesized value Area to the left of
    the computed t under the appropriate t curve
  • Ha ß ? hypothesized value 2 area to the
    right of t if t gt 0, or
  • 2 area to the left of t if t lt 0

30
Model Utility Test for Simple Linear Regression
  • The model utility test for simple linear
    regression is the test of
  • H0 ß 0 versus Ha ß ? 0
  • The null hypothesis specifies that there is no
    useful linear relationship between x and y,
    whereas the alternative hypothesis specifies that
    there is a useful linear relationship between x
    and y.
  • If H0 is rejected, we conclude that the simple
    linear regression model is useful for predicting
    y.
  • The test procedure in the previous box (with
    hypothesized value 0) is used to carry out the
    model utility test in particular, the test
    statistic is the t ratio

31
Example University Graduation Rates
  • The data on the right presents six-year
    graduation rate (), student-related expenditure
    per full-time student, and median SAT score for a
    random sample of 15 primarily undergraduate
    public universities and colleges in US with
    enrollment between 10,000 and 20,000 students.

32
  • Part (a) of Example University Graduation Rates
  • Is there a useful linear relation between
    graduation rate (y) and median SAT score (x)?
  • Conduct a model utility test using a
    .05.
  • Solution By the definition of slope, ß the
    true average change in y (graduation rate)
    associated with an increase of 1 point in x
    (median SAT score).
  • H0 ß 0, Ha ß ? 0
  • Significance level a 0.05.
  • Assumption Assuming that the distribution of
    errors at any given x value is approximately
    normal, the assumptions of the simple linear
    regression model are appropriate.

Excel output on next slide
33
  • Excel output b 0.132, a 91.31, r2 0.576,
    sb 0.031,

34
Solution to Part (a) of Example University
Graduation Rates
  • Because r2 .576, about 56.7 of
    observed variation in graduation rates can be
    explained by the simple linear regression model.
    (The correlation coefficient r 0.76.) It
    appears that there is a useful linear relation
    between x and y, but a confirmation requires a
    formal model utility test.
  • H0 ß 0, Ha ß ? 0
  • Significance level a 0.05
  • P-value 2 (.001) .002 lt a (Ha ß ? 0
    requires a two-tailed test.)
  • 5 Conclusion Since P-value lt a, we reject H0. We
    conclude that there is a useful linear
    relationship between graduation rate and median
    SAT score.

35
Exercise Part (b) of Example University
Graduation Rates University Graduation Rates
  • Is there a useful linear relation between
    graduation rate (y) and expenditure per full-time
    student (x)?
  • Let ß the true average change in y
    (graduation rate, in ) associated with an
    increase of 1 in x (expenditure per full-time
    student).
  • Conduct a model utility test using
    a.05.

Answer P-value .092 gt a. We fail to reject H0
ß 0. There is no convincing evidence of a
linear relationship between graduation rate and
expenditure per full-time student.
Write a Comment
User Comments (0)
About PowerShow.com