Section 7.3 ~ Best-Fit Lines and Prediction - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Section 7.3 ~ Best-Fit Lines and Prediction

Description:

Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young Line of Best-Fit The best-fit line (or regression line) on a ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 12
Provided by: Sandi97
Category:
Tags: best | fit | line | lines | prediction | section

less

Transcript and Presenter's Notes

Title: Section 7.3 ~ Best-Fit Lines and Prediction


1
Section 7.3 Best-Fit Lines and Prediction
  • Introduction to Probability and Statistics
  • Ms. Young

2
Objective
Sec. 7.3
  • After this section you will become familiar with
    the concept of a best-fit line for a correlation,
    recognize when such lines have predictive value
    and when they may not, understand how the square
    of the correlation coefficient is related to the
    quality of the fit, and qualitatively understand
    the use of multiple regression.

3
Line of Best-Fit
Sec. 7.3
  • The best-fit line (or regression line) on a
    scatterplot is a line that lies closer to the
    data points than any other possible line
  • This can be useful to make predictions based on
    existing data
  • The line of best-fit should have approximately
    the same number of points above it as it has
    below it and it does not have to start at the
    origin
  • The precise line of best-fit can be calculated by
    hand, but is very tedious so often times it is
    estimated by eye or by using a calculator

4
Cautions in Making Predictions from Best-Fit Lines
Sec. 7.3
  • Dont expect a best-fit line to give a good
    prediction unless the correlation is strong and
    there are many data points
  • If the sample points lie very close to the
    best-fit line, the correlation is very strong and
    the prediction is more likely to be accurate
  • If the sample points lie away from the best-fit
    line by substantial amounts, the correlation is
    weak and predictions tend to be much less
    accurate

5
Cautions in Making Predictions from Best-Fit Lines
Sec. 7.3
  • Dont use a best-fit line to make predictions
    beyond the bounds of the data points to which the
    line was fit
  • Ex. The diagram below represents the
    relationship between candle length and burning
    time. The data that was collected dealt with
    candles that all fall between 2 in. and 4 in.
    Using the line of best fit to make a prediction
    far off from these lengths would most likely be
    inappropriate.
  • According to the line of best-fit, a candle with
    a length of 0 in. burns for 2 minutes, an
    impossibility

6
Cautions in Making Predictions from Best-Fit Lines
Sec. 7.3
  • A best-fit line based on past data is not
    necessarily valid now and might not result in
    valid predictions of the future
  • Ex. Economists studying historical data found
    a strong correlation between unemployment and the
    rate of inflation. According to this
    correlation, inflation should have risen
    dramatically in the recent years when the
    unemployment rate fell below 6. But inflation
    remained low, showing that the correlation from
    old data did not continue to hold.
  • Dont make predictions about a population that is
    different from the population from which the
    sample data were drawn
  • Ex. you cannot expect that the correlation
    between aspirin consumption and heart attacks in
    an experiment involving only men will also apply
    to women
  • Remember that a best-fit line is meaningless when
    there is no significant correlation or when the
    relationship is nonlinear
  • Ex. there is no correlation between shoe size
    and IQ, so even though you can draw a line of
    best-fit, it is useless in making any conclusions

7
Example 1
Sec. 7.3
  • State whether the prediction (or implied
    prediction) should be trusted in
  • each of the following cases, and explain why or
    why not.
  • Youve found a best-fit line for a correlation
    between the number of hours per day that people
    exercise and the number of calories they consume
    each day. Youve used this correlation to predict
    that a person who exercises 18 hours per day
    would consume 15,000 calories per day.
  • This prediction would be beyond the bounds of the
    data collected and should therefore not be
    trusted
  • There is a well-known but weak correlation
    between SAT scores and college grades. You use
    this correlation to predict the college grades of
    your best friend from her SAT scores.
  • Since the correlation is weak, that means that
    there is much scatter in the data and you should
    not expect great accuracy in the prediction
  • Historical data have shown a strong negative
    correlation between birth rates in Russia and
    affluence. That is, countries with greater
    affluence tend to have lower birth rates. These
    data predict a high birth rate in Russia.
  • We cannot automatically assume that the
    historical data still apply today. In fact,
    Russia currently has a very low birth rate,
    despite also having a low level of affluence.

8
Example 1 Contd
Sec. 7.3
  • A study in China has discovered correlations that
    are useful in designing museum exhibits that
    Chinese children enjoy. A curator suggests using
    this information to design a new museum exhibit
    for Atlanta-area school children.
  • The suggestion to use information from the
    Chinese study for an Atlanta exhibit assumes that
    predictions made from correlations in China also
    apply to Atlanta. However, given the cultural
    differences between China and Atlanta, the
    curators suggestion should not be considered
    without more information to back it up.
  • Scientific studies have shown a very strong
    correlation between childrens ingesting of lead
    and mental retardation. Based on this
    correlation, paints containing lead were banned
  • Given the strength of the correlation and the
    severity of the consequences, this prediction and
    the ban that followed seem quite reasonable. In
    fact, later studies established lead as an actual
    cause of mental retardation, making the rationale
    behind the ban even stronger.

9
The Correlation Coefficient and Best-Fit Lines
Sec. 7.3
  • Recall that the correlation coefficient (r)
    refers to the strength of a correlation
  • The correlation coefficient can also be used to
    say something about the validity of predictions
    with best-fit lines
  • The coefficient of determination, r², is the
    proportion of the variation in a variable that is
    accounted for by the best-fit line
  • Ex. The correlation coefficient for the diamond
    weight and price from the scatterplot on p.307 is
    r 0.777, so r² 0.604. This means that about
    60 of the variation in the diamond prices is
    accounted for by the best-fit line relating
    weight and price and 40 of the variation in
    price must be due to other factors.

10
Example 2
Sec. 7.3
  • You are the manager of a large department store.
    Over the years, youve found a reasonably strong
    positive correlation between your September sales
    and the number of employees youll need to hire
    for peak efficiency during the holiday season.
    The correlation coefficient is 0.950. This year
    your September sales are fairly strong. Should
    you start advertising for help based on the
    best-fit line?
  • r² 0.903, which means that 90 of the variation
    in the number of peak employees can be accounted
    for by a linear relationship with September
    sales, leaving only 10 unaccounted for
  • Because 90 is so high, it is a good idea to
    predict the number of employees youll need using
    the best-fit line

11
Multiple Regression
Sec. 7.3
  • Multiple regression is a technique that allows us
    to find a best-fit equation relating one variable
    to more than one other variable
  • Ex. Price of diamonds in comparison to carat,
    cut, clarity, and color
  • The coefficient of determination (R²) is the most
    common measure in a multiple regression
  • This tells us how much of the scatter in the data
    is accounted for by the best-fit equation
  • If R²is close to 1, the best-fit equation should
    be very useful for making predictions within the
    range of the data
  • If R²is close to 0, the predictions are
    essentially useless
Write a Comment
User Comments (0)
About PowerShow.com