Title: Basic Statistical Concepts
1Basic Statistical Concepts
- Psych 231 Research Methods in Psychology
2- Turn in Journal summary 2 in class on Wednesday
(moved from turning in last week in labs)
3Properties of distributions Center
- There are three main measures of center
- Mean (M) the arithmetic average
- Add up all of the scores and divide by the total
number - Most used measure of center
- Median (Mdn) the middle score in terms of
location - The score that cuts off the top 50 of the from
the bottom 50 - Good for skewed distributions (e.g. net worth)
- Mode the most frequent score
- Good for nominal scales (e.g. eye color)
- A must for multi-modal distributions
4The Mean
- The most commonly used measure of center
- The arithmetic average
- Computing the mean
- The formula for the population mean is (a
parameter)
- The formula for the sample mean is (a
statistic)
5Spread (Variability)
- How similar are the scores?
- Range the maximum value - minimum value
- Only takes two scores from the distribution into
account - Influenced by extreme values (outliers)
- Standard deviation (SD) (essentially) the
average amount that the scores in the
distribution deviate from the mean - Takes all of the scores into account
- Also influenced by extreme values (but not as
much as the range) - Variance standard deviation squared
6Variability
- Low variability
- The scores are fairly similar
High variability The scores are fairly dissimilar
50, 51, 48, 54, 52, 47, 45
30, 51, 38, 64, 52, 47, 65
7Standard deviation
- The standard deviation is the most popular and
most important measure of variability. - The standard deviation measures how far off all
of the individuals in the distribution are from a
standard, where that standard is the mean of the
distribution. - Essentially, the average of the deviations.
8An Example Computing the Mean
Our population
2, 4, 6, 8
9An Example Computing Standard Deviation
(population)
- Step 1 To get a measure of the deviation we need
to subtract the population mean from every
individual in our distribution.
Our population
2, 4, 6, 8
X - ? deviation scores
2 - 5 -3
10An Example Computing Standard Deviation
(population)
- Step 1 To get a measure of the deviation we need
to subtract the population mean from every
individual in our distribution.
Our population
2, 4, 6, 8
X - ? deviation scores
2 - 5 -3
4 - 5 -1
11An Example Computing Standard Deviation
(population)
- Step 1 To get a measure of the deviation we need
to subtract the population mean from every
individual in our distribution.
Our population
2, 4, 6, 8
X - ? deviation scores
2 - 5 -3
6 - 5 1
4 - 5 -1
12An Example Computing Standard Deviation
(population)
- Step 1 To get a measure of the deviation we need
to subtract the population mean from every
individual in our distribution.
Our population
2, 4, 6, 8
X - ? deviation scores
2 - 5 -3
6 - 5 1
Notice that if you add up all of the deviations
they must equal 0.
4 - 5 -1
8 - 5 3
13An Example Computing Standard Deviation
(population)
- Step 2 So what we have to do is get rid of the
negative signs. We do this by squaring the
deviations and then taking the square root of the
sum of the squared deviations (SS).
SS ? (X - ?)2
(3)2
(-3)2
(-1)2
(1)2
9 1 1 9 20
14An Example Computing Standard Deviation
(population)
- Step 3 ComputeVariance (which is simply the
average of the squared deviations (SS)) - So to get the mean, we need to divide by the
number of individuals in the population.
variance ?2 SS/N
20/4 5.0
15An Example Computing Standard Deviation
(population)
- Step 4 Compute Standard Deviation
- To get this we need to take the square root of
the population variance.
16An Example Computing Standard Deviation
(population)
- To review
- Step 1 Compute deviation scores
- Step 2 Compute the SS
- Step 3 Determine the variance
- Take the average of the squared deviations
- Divide the SS by the N
- Step 4 Determine the standard deviation
- Take the square root of the variance
17An Example Computing Standard Deviation (sample)
- To review
- Step 1 Compute deviation scores
- Step 2 Compute the SS
- Step 3 Determine the variance
- Take the average of the squared deviations
- Divide the SS by the N-1
- Step 4 Determine the standard deviation
- Take the square root of the variance
- This is done because samples are biased to be
less variable than the population. This
correction factor will increase the samples SD
(making it a better estimate of the populations
SD)
18Relationships between variables
- Example Suppose that you notice that the more
you study for an exam, the better your score
typically is. - This suggests that there is a relationship
between study time and test performance. - We call this relationship a correlation.
19Relationships between variables
- Properties of a correlation
- Form (linear or non-linear)
- Direction (positive or negative)
- Strength (none, weak, strong, perfect)
- To examine this relationship you should
- Make a scatterplot
- Compute the Correlation Coefficient
20Scatterplot
- Plots one variable against the other
- Useful for seeing the relationship
- Form, Direction, and Strength
- Each point corresponds to a different individual
- Imagine a line through the data points
21Scatterplot
Hours study X Exam perf. Y
6 6
1 2
5 6
3 4
3 2
22Correlation Coefficient
- A numerical description of the relationship
between two variables - For relationship between two continuous variables
we use Pearsons r - It basically tells us how much our two variables
vary together - As X goes up, what does Y typically do
- X?, Y?
- X?, Y?
- X?, Y?
23Form
24Direction
Negative
Positive
- As X goes up, Y goes up
- X Y vary in the same direction
- Positive Pearsons r
- As X goes up, Y goes down
- X Y vary in opposite directions
- Negative Pearsons r
25Strength
- Zero means no relationship.
- The farther the r is from zero, the stronger the
relationship - The strength of the relationship
- Spread around the line (note the axis scales)
26Strength
r -1.0 perfect negative corr.
27Strength
Rel A
Rel B
Which relationship is stronger? Rel A, -0.8 is
stronger than 0.5
28Regression
- Compute the equation for the line that best fits
the data points
Y (X)(slope) (intercept)
29Regression
- Can make specific predictions about Y based on X
X 5 Y ?
Y (X)(.5) (2.0)
Y (5)(.5) (2.0) Y 2.5 2 4.5
30Regression
- Also need a measure of error
Y X(.5) (2.0) error
Y X(.5) (2.0) error
- Same line, but different relationships (strength
difference)
31Cautions with correlation regression
- Dont make causal claims
- Dont extrapolate
- Extreme scores (outliers) can strongly influence
the calculated relationship