Descriptive methods in regression and correlation - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Descriptive methods in regression and correlation

Description:

4.1 Linear Equations with One Independent Variable ... intercept form you probably saw in algebra class, but with different letters. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 36
Provided by: grif54
Category:

less

Transcript and Presenter's Notes

Title: Descriptive methods in regression and correlation


1
MATH 1530 Elements of StatisticsDr. Kirsten Boyd
  • Chapter 4
  • Descriptive methods in regression and correlation

Slides adapted from Ms Smyth, Dr. Griffy and the
Weiss Text
2
(No Transcript)
3
Sec. 4.1 Linear Equations with One Independent
Variable
  • So the graph of that equation is a straight line
    with y-intercept b0 and slope b1. This is the
    same as the ymxb slope-intercept form you
    probably saw in algebra class, but with different
    letters.
  • example y 5 2x
  • y-intercept, b0, is?
  • slope, b1, is?
  • Graph this line

4
Slope
Word problem interpretation of slope Whenever
x increases by one unit, y increases by b1 units
(or decreases if b1 is negative, or stays the
same if b1 is zero).
5
Problem 4.6, page 160
  • A repair shop charges 55 per hour plus a 30
    service charge. Let x denote the number of hours
    required for the job and let y denote the total
    cost to the customer.
  • Part a. Find the equation that expresses y in
    terms of x.
  • Part b. Determine b0 and b1 .
  • Part c. Construct a table like Table 4.1 on p.
    157 for the x-values 0.5, 1, and 2.25 hours.
  • Part d. Draw the graph of the equation from Part
    a. by plotting the points from Part c. and
    connecting with a line.
  • Part e. Use the graph from Part d. to estimate
    visually the cost of a job that takes 1.75 hours.
    Then calculate the cost exactly, using the
    equation from Part a.

6
Sec. 4.2 The Regression Equation
  • Regression Equations explain a (linear) pattern
    in scatterplot data
  • x is the explanatory or predictor variable
  • y is the response variable

7
Scatterplot(Table 4.2 and Fig. 4.7, page 162)
8
Regression Equation
  • The goal is to construct a line with the smallest
    possible distances from the data points to the
    corresponding points on the line.
  • This line is the graph of the regression
    equation.

9
Example 4.3 (p. 163) Which Line Is Better?
10
Example 4.3 Comparing Lines
Sum of the squared errors is less for B than for
A. Line B is a better fit than A. error e
y- y
Notation For any x-value, y is actual value and
y is value from line.
11
Best Fitting Line Possible
12
Computing the Regression Equation
13
Use of Regression Equations
  • Regression equation models (not perfectly) data
  • x-values (explanatory variable) predict values of
    y (response variable)

WARNING You can predict accurately only within
the spread of the x-values.
14
Extrapolation
  • Using an x-value outside the range of data is
    unacceptable because the trend could change.
    Making predictions for x-values outside the range
    is called extrapolation, and should be avoided.

15
Extrapolation
16
Outliers and Influential Observations
  • Outlier is a data point that lies vertically far
    from the regression line relative to the other
    points
  • Influential Observation is a data point that lies
    horizontally far from the rest of the data and
    whose removal will considerably change the slope
    of the regression line

17
Outliers and Influential Observations(Fig. 4.12,
p. 169)
18
Data Must Be Linear
19
Problem 4.53, page 174
20
4.53 a
21
4.53 b
22
4.53 c-g
  • Emission increases as weight increases
  • ?y/?x ?Emissions/?Potato Weight 0.16/1
  • For each gram of potato plant, the emissions
    increase 0.16 hundred nanograms, which is 16
    nanograms.
  • y 3.5240.1628 75 15.73
  • Predictor x weight of potato plant
  • Response y emission quantity
  • none


23
Finding the Regression Equation Using Your
Calculator
  • 1. Enter x-values in L1 and y-values in L2 (or
    alternatively, use INS to insert new lists with
    whatever names you want)
  • 2. Stat gt Calc gt 8 LinReg(abx) gt
  • 3. Use LIST to enter L1, L2 (or whatever the
    names of your lists are)dont forget the comma
    between themthen press Enter
  • 4. Calculator tells you a and b, which
    correspond to b0 and b1 in book (equation is y
    a bx b0 b1x)

Be sure to turn your diagnostic on Catalog gt
Diagnostic On gt Enter (Catalog is the 2nd
function above the 0 key)
24
Sec. 4.3 The Coefficient of Determination
  • The coefficient of determination is denoted r2
  • Always between 0 and 1
  • Measures how well the regression equation
    describes the relationship between x and y
  • Close to 0 (0 to 0.4) gt regression is not useful
  • Close to 1 (0.6 to 1) gt regression is useful

25
Formulas for r2
26
SSE
Error of Sum of Squares, SSE, is the variation in
the observed values of y that is not explained by
the regression.
SSE SST-SSR
27
Coefficient of Determination, r2
  • To calculate r2, use your calculatorfollow same
    instructions as for obtaining regression
    equation.
  • We will not calculate r2 by hand.

Be sure to turn your diagnostic on Catalog gt
Diagnostic On gt Enter (Catalog is the 2nd
function above the 0 key)
28
Percentage of Variation
  • The percentage of variation in the y-values that
    is explained by the variation in the x-values is
  • r2 100

29
4.91 (p. 185, same data as 4.53)
  • Ignore (a) (computing SSR, SST, SSE) in all of
    Sec 4.3
  • 0.1096
  • r2 100 10.96
  • Not useful

30
Sec. 4.4 Linear Correlation
31
The Linear Correlation Coefficient, r
  • The sign of r indicates the slope of the data
  • positive r means positive relationship
  • negative r-value means negative relationship
  • The magnitude of r indicates the strength of the
    linear relationship (magnitude how far from
    zero)

Weak gt between -0.6 and 0.6 Strong gt less than
-0.75 or more than 0.75
32
The Linear Correlation Coefficient, r
  • Always between -1 and 1, inclusive
  • Sign of r is same as sign of b1 (slope of
    regression line)
  • Square r and you should always get r2
  • To find r, use calculatorfollow same
    instructions as for finding regression equation
    and r2

Be sure to turn your diagnostic on Catalog gt
Diagnostic On gt Enter (Catalog is the 2nd
function above the 0 key)
33
Linear Relationships
34
Interpreting r
  • r only has meaning if the data is linear
  • r can be computed for nonlinear data, but data
    may not be linear even if r is strong

35
4.125 (p. 194, same data as 4.53 and 4.91)
  • Ignore using the computing formula part of (a)
    in all of Sec. 4.4 and use your calculator
  • r 0.3311
  • Weak, positive, linear relationship
  • Very scattered, not close to line
  • r2 0.33112 0.1096 correlation coefficient
Write a Comment
User Comments (0)
About PowerShow.com