Basic Quantitative Methods in the Social Sciences AKA Intro Stats PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Basic Quantitative Methods in the Social Sciences AKA Intro Stats


1
Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
  • 02-250-01
  • Lecture 9

2
Assignment Due and Course Evaluations
  • All four modules of the assignment are due in the
    first 5 minutes of class. NO assignment will be
    accepted after 405 PM.
  • Course evaluations will be completed during the
    first 10 minutes of class.

3
Correlation
  • We are often interested in knowing about the
    relationship between two variables.
  • Consider the following research questions
  • Does the incidence of crime (X) vary with the
    outdoor temperature (Y) in Detroit?
  • Does pizza consumption (X) have anything to do
    with how much time one spends surfing the web
    (Y)?
  • Does severity of depression (X) vary as a
    function of Ecstacy use (Y)?
  • Do the occurrence of pimples (X) increase as air
    pollution increases (Y) in Windsor?

4
Correlation
  • These are all examples of relationships.
  • In each case, we are asking whether one variable
    (X) is related to another variable (Y). Stated
    differently Are X and Y correlated?
  • More specifically Are changes in one variable
    reliably accompanied by changes in the other?
  • Correlation coefficients can be calculated so
    that we can measure the degree to which two
    variables are related to each other.

5
Scatter Plot Used to Describe Correlation
  • We can plot the X and Y points on a Scatter plot.
  • We plot the Y scores on the vertical axis and the
    X scores on the horizontal axis.
  • We then can draw a straight line to try to
    represent or describe the points on our scatter
    plot.

6
Graphing Relationships
  • When our height and weight scores are plotted, we
    see some irregularity.
  • We can draw a straight line through these points
    to summarize the relationship.
  • The line provides an average statement about
    change in one variable associated with changes in
    the other variable.

r .770
7
Correlation
AGE
WEIGHT
8
Imagine if.
  • All of the dots fell exactly on the line? What
    would that mean?
  • All of the dots clustered close to the line, but
    few fell on the line What would that mean?
  • The dots were widely dispersed around the line,
    such that the line is only a vague representation
    of how the scatterplot looks. What would that
    mean?

9
Correlation Positive R
  • Lets look at some different scatter plots.
  • A positive relationship.

10
Various degrees of linear correlation
11
Correlation Negative R
  • Lets look at some different scatter plots.
  • A negative relationship.

12
Various degrees of linear correlation
13
Correlation No Relationship
  • Lets look at some different scatter plots.
  • No Relationship

14
What Direction Relationship Is Described in This
Scatter Plot?
15
Logic Dictates
  • We can measure the distance between each dot and
    the line.
  • If a perfect correlation (1.000) is represented
    by all of the dots falling on the line, while a
    line whose dots vary around it indicates a weaker
    correlation
  • The degree to which the two variables are
    correlated can be thought of as the mean distance
    between the dots and the line. This is calculated
    algebraically.

16
Covariance
  • Conceptually, the correlation between X and Y is
    based on covariance a statistic representing
    the degree to which two variables vary together.
  • Like variance, covariance is based on deviations
    from the mean.
  • r is calculated as
  • But wait! Just like calculating variance, there
    is an easier formula

17
The Pearson Product-Moment Correlation
Coefficient (r)
  • r is a quantitative expression of the degree to
    which two variables are correlated in a linear
    relationship.
  • Linear relationship This means that the
    scatterplot points are clustered more or less
    symmetrically about a straight line, such that
    the line is an adequate representation of the
    relationship.
  • Non-linear or curvillinear relationship The
    scatterplot points do not cluster around a
    straight line. Example? Arousal/performance

18
Characteristics of r
  • r has two components
  • The degree of relationship
  • The direction of relationship
  • r ranges from 1.000 to 1.000

19
Are X Y Correlated?
20
The Pearson r
(SC) (SU)
SCU
N
r
Note This formula really is the same as the one
in the book, just slightly rearranged.
21
We Need
  • Sum of the Xs SC
  • Sum of the Ys SU
  • Sum of the Xs squared (SC)2
  • Sum of the Ys squared (SU)2
  • Sum of the squared Xs SC2
  • Sum of the squared Ys SU2
  • Sum of Xs times the Ys SCU
  • Number of Subjects (N)

22
Correlation Arithmetic
23
The Pearson r
(15) (17)
57
5
r
24
The Pearson r
255
57
5
r
25
The Pearson r
57
51
r
26
The Pearson r
6
r
27
The Pearson r
6
r
28
The Pearson r
6
r




57.8
63
45
55
29
The Pearson r
6
r




5.2
10
30
The Pearson r
6
r
52
31
The Pearson r
6
r
7.2111
32
The Pearson r
.832
r
33
Hypothesis Testing with Correlations
  • H0 ? 0 (? rho population correlation
    coefficient)
  • Ha ? ? 0 (there is a significant relationship
    between X and Y)
  • Technically, you could do a one-tailed test for
    correlations (? lt0 or ? gt0), but for our purposes
    we will always test whether there simply is a
    relationship therefore, we will always do a
    two-tailed test for correlations.
  • Find the critical value for .05 with dfn-2
    (where N is the number of paired observations) in
    Table E.2 p. 440

34
The Pearson r
.832
r
Is an r of .832 significant?
See Table E.2 (p.440) for n - 2 df ( 5 - 2 3
df) and an alpha (a) of .05
35
The Pearson r
.832
r
Is an r of .832 significant?
The Critical r .878 r .832 Therefore, the
correlation is NOT significant
36
Popcorn Consumption
  • Researcher X hypothesizes that popcorn
    consumption varies as a function of stress. He
    gives a random sample of 5 people a self-report
    measure of stress that produces scores ranging
    from 1 (little or no stress) to 10 (very
    stressed), and then has them watch a movie. He
    measures how many kernels of popcorn each of them
    eat. Is popcorn consumption correlated with
    stress?

37
Are X Y Correlated?
Stress
Ratings of Kernals
38
The Pearson r
(SC) (SU)
SCU
N
r
39
We Need
  • Sum of the Xs SC
  • Sum of the Ys SU
  • Sum of the Xs squared (SC)2
  • Sum of the Ys squared (SU)2
  • Sum of the squared Xs SC2
  • Sum of the squared Ys SU2
  • Sum of Xs times the Ys SCU
  • Number of Subjects (N)

40
Correlation Arithmetic
41
The Pearson r
(SC) (SU)
SCU
N
r
42
The Pearson r
(29) (40)
256
5
r
43
The Pearson r
1160
256
5
r
44
The Pearson r
256
232
r
45
The Pearson r
24
r
46
The Pearson r
24
r
47
The Pearson r
24
r




189
320
370
168.2
48
The Pearson r
24
r




50
20.8
49
The Pearson r
24
r
1040
50
The Pearson r
24
r
32.2490
51
The Pearson r
.744
r
52
The Pearson r
.744
r
Is an r of .744 significant?
See Table E.2 (p.440) for n - 2 df ( 5 - 2 3
df) and an alpha (a) of .05
53
The Pearson r
.744
r
Is an r of .744 significant?
The Critical r .878 r .744 Therefore, the
correlation is NOT significant
54
A Useful Means of Interpretation Variance
  • r is not the most useful interpretation of a
    correlation.
  • r2 is more useful. r2 is the proportion of the
    variance of the Y scores that is accounted for by
    X.
  • You need so much information in order to make an
    error free prediction of Y. r2 is roughly equal
    to the percentage of that information that you
    possess just by knowing X.

55
Why Do Some People Have High-Self-Esteem While
Others Have Low Self-Esteem?
  • Say 100 people are given a self-esteem inventory
    (e.g., I think I am a person of worth, from
    1strongly disagree to 5 strongly agree)
  • They are also asked to fill out measures of
    body-satisfaction (I think I have a good body),
    social-esteem (I think I am a good friend), and
    academic-esteem (I am a good student).
  • Correlations are calculated between overall
    self-esteem and the other variables (3
    correlations).

56
Explaining Self-Esteem
  • The entire pie Overall self-esteem
  • The different pieces represent different
    variables that explain the variability (or
    variance) in self-esteem scores (in other words,
    these variables explain why some people have high
    self-esteem, low self-esteem, very low, etc. etc.)

57
So
  • Body-esteem accounts for (or explains) 16 of the
    variance in overall self-esteem.
  • Social-esteem explains? (.540)(.540) .290, so
    it explains 29 of the variance in overall
    self-esteem.

58
Correlation Errors in Interpreting r
  • Common errors in interpreting a correlation
    coefficient
  • Interpreting r in direct proportion to its size
  • Not a percentage
  • Not proportionate across the range (.2 not half
    of .4)
  • The correlation coefficient is an ordinal
    statistic. So r0.750 represents a stronger
    relationship than r0.520
  • Interpreting in terms of arbitrary descriptive
    labels
  • Small - medium large

59
More Errors Interpreting Correlation
  • Correlation does NOT imply Causation!
  • X causes Y to change ? Examples?
  • Y causes X to change ? Examples?
  • W causes changes in X and Y! ? Examples?
  • SO body-esteem might account for 16 of the
    variance in self-esteem, but this does not mean
    that body-esteem causes self-esteem.
  • For the trip to Hawaii and the Samsonite Luggage
    Psychologists used to think that having been
    sexually abused causes bulimia. How could
    researchers demonstrate that this is true?

60
Factors that affect the size of a correlation
  • Nature of the relationship between X and Y.
  • Heterogeneous subsamples if the sample could be
    subdivided into 2 distinct sets based on another
    variable (e.g, males vs. females)
  • Truncated range.
  • Range restricted in size.
  • May cause correlation to appear lower than it
    really is (or higher than it is for non-linear
    relationships)
  • Without the full range of scores it is not
    possible to calculate the correlation accurately.
    Lets look at why

61
Underlying Assumptions for r
  • X and Y need to be adequately represented by a
    straight line function. Stated differently, the
    relationship must be linear.
  • If r is to be used inferentially
  • Homoscedasticity The variabilities of X at
    different values of Y are equal. E.g.,
    variability in weight for 65 people is equal to
    variability in weight for 55.
  • Normality X is normally distributed at all
    values of Y (e.g., weight is normally distributed
    for 65 people and for 55 people.
  • Vice-versa as well (Y at values of X)

62
Work on it
  • Say were interested in knowing whether exam
    grades are related to number of hour spent
    studying. Ten students report how many hours they
    studied for an exam. Here are the data

63
Work on it!
  • State the Ho and Ha.
  • Test the hypothesis.
Write a Comment
User Comments (0)
About PowerShow.com