Ch 2 and 9.1 Relationships Between 2 Variables - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Ch 2 and 9.1 Relationships Between 2 Variables

Description:

Strength-how close the points lie to a line. Relationships between 2 numeric variables ... r measures the strength of the linear relationship. ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 32
Provided by: hint9
Category:

less

Transcript and Presenter's Notes

Title: Ch 2 and 9.1 Relationships Between 2 Variables


1
Ch 2 and 9.1Relationships Between 2 Variables
  • More than one variable can be measured on each
    individual.
  • Examples
  • Gender and Height
  • Size and Cost
  • Eye color and Major
  • We want to look at the relationship among these
    variables.
  • Is there an association between these two
    variables?
  • Two variables measured on the same individuals
    are associated if some values tend to occur more
    often with some values of the second variable
    than with other values of that variable.

2
Relationships Between 2 Variables
  • If we expect one variable to influence another,
    we call it the ___________ variable.
  • Explains or influences changes in the response
    variable
  • The variable that is influenced is called the
    ____________ variable.
  • Measures an outcome of a study
  • In each of the following examples, identify the
    explanatory and response variables
  • Gender and blood pressure
  • Class attendance and course grade
  • Number of beers and BAC

3
Relationships Between 2 Variables
  • We may be interested in relationships of
    different types of variables.
  • Categorical and Numeric
  • Categorical and Categorical
  • Numeric and Numeric

4
Relationships between Categorical and Numeric
Variables
  • We are interested in comparing the numerical
    variable across each of the levels of the
    categorical variable.
  • Examples
  • Compare high speeds for 4 different car brands
  • Compare sucrose levels for 5 different types of
    fruit
  • Compare GPR for 20 different majors

5
Relationships between Categorical and Numeric
Variables
  • Graphical Comparison
  • Example Sucrose levels of fruits (fictitious
    data)

6
Relationships between Categorical and Numeric
Variables
  • Numerical Comparison
  • We could also look at summary statistics for each
    group.

7
Ch 9.1Relationships Between Two Categorical
Variables
  • Depending on the situation, one of the variables
    is the explanatory variable and the other is the
    response variable.
  • In this case, we look at the percentages of one
    variable for each level of the other variable.
  • Examples
  • Gender and Soda Preference
  • Country of Origin and Marital Status
  • Smoking Habits and Socioeconomic Status

8
Two-Way Tables
  • Two-way tables come about when we are interested
    in the relationship between two categorical
    variables.
  • One of the variables is the _____________.
  • The other is the _______________.
  • The combination of a row variable and a column
    variable is a ______________.

9
Two-Way Tables
  • Example

10
Relationships between two categorical variables
  • Example Gender and Highest Degree Obtained
  • Joint Distribution How likely are you to have a
    bachelors degree and be a male? _____________
  • Marginal Distribution What is the least likely
    highest degree obtained? _____________
  • Conditional Distribution If you are a female,
    how likely are you to have obtained a graduate
    degree? ______________

11
Relationships between two categorical variables
  • Shows the percentages
  • for the joint, marginal,
  • and conditional distributions.

12
Ch 2 Relationships Between 2 Numeric Variables
  • Depending on the situation, one of the variables
    is the explanatory variable and the other is the
    response variable.
  • There is not always an explanatory-response
    relationship.
  • Examples
  • Height and Weight
  • Income and Age
  • SAT scores on math exam and on verbal exam
  • Amount of time spent studying for an exam and
    exam score

13
Relationships between 2 numeric variables
  • Scatterplots
  • Look for overall pattern and any striking
    deviations from that pattern.
  • Look for outliers, values falling outside the
    overall pattern of the relationship
  • You can describe the overall pattern of a
    scatterplot by the form, direction, and strength
    of the relationship.
  • Form Linear or clusters
  • Direction
  • Two variables are _____________________ when
    above-average values of one tend to accompany
    above-average values of the other and likewise
    below-average values also tend to occur together.
  • Two variables are _____________________ when
    above-average values of one variable accompany
    below-average values of the other variable, and
    vice-versa.
  • Strength-how close the points lie to a line

14
Relationships between 2 numeric variables
  • Example
  • Response MPG
  • Explanatory Weight

Response Variable (y-axis)
Explanatory Variable (x-axis)
15
Relationships between 2 numeric variables
  • Relationships between two numeric variables
  • Example
  • Vehicle Weight
  • Horsepower

16
Relationships between 2 numeric variables
  • ___________ or r measures the direction and
    strength of the linear relationship between two
    numeric variables
  • General Properties
  • It must be between -1 and 1, or (-1 r 1).
  • If r is negative, the relationship is negative.
  • If r 1, there is a perfect negative linear
    relationship (extreme case).
  • If r is positive, the relationship is positive.
  • If r 1, there is a perfect positive linear
    relationship (extreme case).
  • If r is 0, there is no linear relationship.
  • r measures the strength of the linear
    relationship.
  • If explanatory and response are switched, r
    remains the same.
  • r has no units of measurement associated with it
  • Scale changes do not affect r

17
(No Transcript)
18
Relationships between 2 numeric variables
  • Examples of extreme cases

r 1
r 0
r -1
19
Relationships between 2 numeric variables
  • Match the correlation with to the scatterplot

r 0.04
r 0.43
r -0.84
r 0.76
r 0.21
20
Relationships between 2 numeric variables
  • It is possible for there to be a strong
    relationship between two variables and still have
    r 0.
  • EX.

21
Relationships between 2 numeric variables
  • Important notes
  • Association does not imply causation
  • Correlation does not imply causation
  • Slope is not correlation
  • A scale change does not change the correlation.
  • Correlation doesnt measure the strength of a
    non-linear relationship

22
Regression Line
  • A regression line is a straight line that
    describes how a response variable y changes as an
    explanatory variable x changes.
  • A regression line summarizes the relationship
    between two variables, but only in a specific
    setting when one of the variables helps explain
    or predict the other.
  • We often use a regression line to predict the
    value of y for a given value of x.
  • Regression, unlike correlation, requires that we
    have an explanatory variable and a response
    variable

23
Regression Line
  • Fitting a line to data means drawing a line that
    comes as close as possible to the points.
  • Extrapolation-the use of a regression line for
    prediction far outside the range of values of the
    explanatory variable x that you used to obtain
    the line.
  • Such predictions are often not accurate.

24
Least-Squares Regression Line
  • The least-squares regression line of y on x is
    the line that makes the sum of squares of the
    vertical distances of the data points from the
    line as small as possible.
  • These vertical distances are called the
    residuals, or the error in prediction, because
    they measure how far the point is from the line
  • where y is the point and
    is the predicted point.

25
Least-Squares Regression Line
  • The equation of the least-squares regression line
    of y on x is

26
Least-Squares Regression Line
  • The expression for slope, b1, says that along the
    regression line, a change of one standard
    deviation in x corresponds to a change of r
    standard deviations in y.
  • The slope, b1, is the amount by which y changes
    when x increases by one unit.
  • The intercept, b0, is the value of y when
  • The least-squares regression line ALWAYS passes
    through the point

27
r2 in Regression
  • The square of the correlation, r2, is the
    fraction of the variation in the values of y that
    is explained by the least-squares regression of y
    on x.
  • Use r2 as a measure of how successfully the
    regression explains the response.
  • Interpret r2 as the percent of variation
    explained
  • For Simple Linear Regression, r2 is simply the
    square of the correlation coefficient.

28
Relationships between 2 numeric variables
  • Example
  • How much of the variation is explained
  • by the least squares line of y on x? ______
  • What is the correlation coefficient? ______

Horsepower -10.78 0.04weight (Equation of
the line.)
__________ y-value or response (horsepower) when
line crosses the y-axis.
_______ increase in response for a unit increase
in explanatory variable.
So if weight increases by one pound, horsepower
increases by 0.04 units (on average).
29
Relationships between 2 variables
  • Lurking Variable A variable that is not among
    the explanatory or response variables in a study
    and yet may influence the interpretation of
    relationships among those variables.
  • Simpsons Paradox An association or comparison
    that holds for all of several groups can reverse
    direction when the data are combined to form a
    single group. This reversal is called Simpsons
    Paradox. This can happen when a lurking variable
    is present. Please see Examples 9.9 and 9.10 in
    the text.

30
Outliers and Influential Observations in
Regression
  • An outlier is an observation that lies outside
    the overall pattern of the other observations.
  • An observation is influential for a statistical
    calculation if removing it would markedly change
    the result of the calculation.
  • Points that are outliers in the x direction of a
    scatterplot are often influential for the
    least-squares regression line.

31
Outliers and Influential Observations in
Regression
Child 18 is an outlier in the x direction.
Because of its extreme position on the age scale,
this point has a strong influence on the position
of the regression line.
r2 is also affected by the influential
observation. With Child 18, r2 41, but without
Child 18, r2 11. The apparent strength of the
association was largely due to a single
influential observation.
The dashed line was calculated leaving out Child
18. The solid line is with Child 18.
Write a Comment
User Comments (0)
About PowerShow.com