Title: Basic Quantitative Methods in the Social Sciences AKA Intro Stats
1Basic Quantitative Methods in the Social
Sciences(AKA Intro Stats)
2Assignment Due and Course Evaluations
- All four modules of the assignment are due in the
first 5 minutes of class. NO assignment will be
accepted after 405 PM. - Course evaluations will be completed during the
first 10 minutes of class.
3Correlation
- We are often interested in knowing about the
relationship between two variables. - Consider the following research questions
- Does the incidence of crime (X) vary with the
outdoor temperature (Y) in Detroit? - Does pizza consumption (X) have anything to do
with how much time one spends surfing the web
(Y)? - Does severity of depression (X) vary as a
function of Ecstacy use (Y)? - Do the occurrence of pimples (X) increase as air
pollution increases (Y) in Windsor?
4Correlation
- These are all examples of relationships.
- In each case, we are asking whether one variable
(X) is related to another variable (Y). Stated
differently Are X and Y correlated? - More specifically Are changes in one variable
reliably accompanied by changes in the other? - Correlation coefficients can be calculated so
that we can measure the degree to which two
variables are related to each other.
5Scatter Plot Used to Describe Correlation
- We can plot the X and Y points on a Scatter plot.
- We plot the Y scores on the vertical axis and the
X scores on the horizontal axis. - We then can draw a straight line to try to
represent or describe the points on our scatter
plot.
6Graphing Relationships
- When our height and weight scores are plotted, we
see some irregularity. - We can draw a straight line through these points
to summarize the relationship. - The line provides an average statement about
change in one variable associated with changes in
the other variable.
r .770
7Correlation
AGE
WEIGHT
8Imagine if.
- All of the dots fell exactly on the line? What
would that mean? - All of the dots clustered close to the line, but
few fell on the line What would that mean? - The dots were widely dispersed around the line,
such that the line is only a vague representation
of how the scatterplot looks. What would that
mean?
9Correlation Positive R
- Lets look at some different scatter plots.
- A positive relationship.
10Various degrees of linear correlation
11Correlation Negative R
- Lets look at some different scatter plots.
- A negative relationship.
12Various degrees of linear correlation
13Correlation No Relationship
- Lets look at some different scatter plots.
- No Relationship
14What Direction Relationship Is Described in This
Scatter Plot?
15Logic Dictates
- We can measure the distance between each dot and
the line. - If a perfect correlation (1.000) is represented
by all of the dots falling on the line, while a
line whose dots vary around it indicates a weaker
correlation - The degree to which the two variables are
correlated can be thought of as the mean distance
between the dots and the line. This is calculated
algebraically.
16Covariance
- Conceptually, the correlation between X and Y is
based on covariance a statistic representing
the degree to which two variables vary together. - Like variance, covariance is based on deviations
from the mean. - r is calculated as
- But wait! Just like calculating variance, there
is an easier formula
17The Pearson Product-Moment Correlation
Coefficient (r)
- r is a quantitative expression of the degree to
which two variables are correlated in a linear
relationship. - Linear relationship This means that the
scatterplot points are clustered more or less
symmetrically about a straight line, such that
the line is an adequate representation of the
relationship. - Non-linear or curvillinear relationship The
scatterplot points do not cluster around a
straight line. Example? Arousal/performance
18Characteristics of r
- r has two components
- The degree of relationship
- The direction of relationship
- r ranges from 1.000 to 1.000
19Are X Y Correlated?
20The Pearson r
(SC) (SU)
SCU
N
r
Note This formula really is the same as the one
in the book, just slightly rearranged.
21We Need
- Sum of the Xs SC
- Sum of the Ys SU
- Sum of the Xs squared (SC)2
- Sum of the Ys squared (SU)2
- Sum of the squared Xs SC2
- Sum of the squared Ys SU2
- Sum of Xs times the Ys SCU
- Number of Subjects (N)
22Correlation Arithmetic
23The Pearson r
(15) (17)
57
5
r
24The Pearson r
255
57
5
r
25The Pearson r
57
51
r
26The Pearson r
6
r
27The Pearson r
6
r
28The Pearson r
6
r
57.8
63
45
55
29The Pearson r
6
r
5.2
10
30The Pearson r
6
r
52
31The Pearson r
6
r
7.2111
32The Pearson r
.832
r
33Hypothesis Testing with Correlations
- H0 ? 0 (? rho population correlation
coefficient) - Ha ? ? 0 (there is a significant relationship
between X and Y) - Technically, you could do a one-tailed test for
correlations (? lt0 or ? gt0), but for our purposes
we will always test whether there simply is a
relationship therefore, we will always do a
two-tailed test for correlations. - Find the critical value for .05 with dfn-2
(where N is the number of paired observations) in
Table E.2 p. 440
34The Pearson r
.832
r
Is an r of .832 significant?
See Table E.2 (p.440) for n - 2 df ( 5 - 2 3
df) and an alpha (a) of .05
35The Pearson r
.832
r
Is an r of .832 significant?
The Critical r .878 r .832 Therefore, the
correlation is NOT significant
36Popcorn Consumption
- Researcher X hypothesizes that popcorn
consumption varies as a function of stress. He
gives a random sample of 5 people a self-report
measure of stress that produces scores ranging
from 1 (little or no stress) to 10 (very
stressed), and then has them watch a movie. He
measures how many kernels of popcorn each of them
eat. Is popcorn consumption correlated with
stress?
37Are X Y Correlated?
Stress
Ratings of Kernals
38The Pearson r
(SC) (SU)
SCU
N
r
39We Need
- Sum of the Xs SC
- Sum of the Ys SU
- Sum of the Xs squared (SC)2
- Sum of the Ys squared (SU)2
- Sum of the squared Xs SC2
- Sum of the squared Ys SU2
- Sum of Xs times the Ys SCU
- Number of Subjects (N)
40Correlation Arithmetic
41The Pearson r
(SC) (SU)
SCU
N
r
42The Pearson r
(29) (40)
256
5
r
43The Pearson r
1160
256
5
r
44The Pearson r
256
232
r
45The Pearson r
24
r
46The Pearson r
24
r
47The Pearson r
24
r
189
320
370
168.2
48The Pearson r
24
r
50
20.8
49The Pearson r
24
r
1040
50The Pearson r
24
r
32.2490
51The Pearson r
.744
r
52The Pearson r
.744
r
Is an r of .744 significant?
See Table E.2 (p.440) for n - 2 df ( 5 - 2 3
df) and an alpha (a) of .05
53The Pearson r
.744
r
Is an r of .744 significant?
The Critical r .878 r .744 Therefore, the
correlation is NOT significant
54A Useful Means of Interpretation Variance
- r is not the most useful interpretation of a
correlation. - r2 is more useful. r2 is the proportion of the
variance of the Y scores that is accounted for by
X. - You need so much information in order to make an
error free prediction of Y. r2 is roughly equal
to the percentage of that information that you
possess just by knowing X.
55Why Do Some People Have High-Self-Esteem While
Others Have Low Self-Esteem?
- Say 100 people are given a self-esteem inventory
(e.g., I think I am a person of worth, from
1strongly disagree to 5 strongly agree) - They are also asked to fill out measures of
body-satisfaction (I think I have a good body),
social-esteem (I think I am a good friend), and
academic-esteem (I am a good student). - Correlations are calculated between overall
self-esteem and the other variables (3
correlations).
56Explaining Self-Esteem
- The entire pie Overall self-esteem
- The different pieces represent different
variables that explain the variability (or
variance) in self-esteem scores (in other words,
these variables explain why some people have high
self-esteem, low self-esteem, very low, etc. etc.)
57So
- Body-esteem accounts for (or explains) 16 of the
variance in overall self-esteem. - Social-esteem explains? (.540)(.540) .290, so
it explains 29 of the variance in overall
self-esteem.
58Correlation Errors in Interpreting r
- Common errors in interpreting a correlation
coefficient - Interpreting r in direct proportion to its size
- Not a percentage
- Not proportionate across the range (.2 not half
of .4) - The correlation coefficient is an ordinal
statistic. So r0.750 represents a stronger
relationship than r0.520 - Interpreting in terms of arbitrary descriptive
labels - Small - medium large
59More Errors Interpreting Correlation
- Correlation does NOT imply Causation!
- X causes Y to change ? Examples?
- Y causes X to change ? Examples?
- W causes changes in X and Y! ? Examples?
- SO body-esteem might account for 16 of the
variance in self-esteem, but this does not mean
that body-esteem causes self-esteem. - For the trip to Hawaii and the Samsonite Luggage
Psychologists used to think that having been
sexually abused causes bulimia. How could
researchers demonstrate that this is true?
60Factors that affect the size of a correlation
- Nature of the relationship between X and Y.
- Heterogeneous subsamples if the sample could be
subdivided into 2 distinct sets based on another
variable (e.g, males vs. females) - Truncated range.
- Range restricted in size.
- May cause correlation to appear lower than it
really is (or higher than it is for non-linear
relationships) - Without the full range of scores it is not
possible to calculate the correlation accurately.
Lets look at why
61Underlying Assumptions for r
- X and Y need to be adequately represented by a
straight line function. Stated differently, the
relationship must be linear. - If r is to be used inferentially
- Homoscedasticity The variabilities of X at
different values of Y are equal. E.g.,
variability in weight for 65 people is equal to
variability in weight for 55. - Normality X is normally distributed at all
values of Y (e.g., weight is normally distributed
for 65 people and for 55 people. - Vice-versa as well (Y at values of X)
62Work on it
- Say were interested in knowing whether exam
grades are related to number of hour spent
studying. Ten students report how many hours they
studied for an exam. Here are the data
63Work on it!
- State the Ho and Ha.
- Test the hypothesis.