Title: Psychology 9
1Psychology 9
- Quantitative Methods in Psychology
-
- Jack Wright
- Brown University
- Session 7
-
-
Note. These lecture materials are intended
solely for the private use of Brown University
students enrolled in Psychology 9, Spring
Semester, 2002-03. All other uses, including
duplication and redistribution, are unauthorized.
2Agenda
- Working with percentiles
- Central tendencies
- Variability
- Assignment remainder of Chapter 4
3Percentiles reprise preview
- Useful way of describing how high X is relative
to the sample in question - X is at 90th percentile is to say 90 of sample
falls at or below X - Percentiles provide the foundation for many of
the methods we will develop later - Eg making inferences about normal distributions
- Making inferences about sample means
4Finding percentiles using summary frequency
distributions
X f cf c 1-2 1 1 10
10 3-4 2 3 20 30 5-6 4 7 40
70 7-8 2 9 20 90 9-10 1 1 10 100
cases
Interval
- Note
- 1. Intervals extend to upper and lower real
limits (eg, 4.5-6.5) - Eg, percentile rank for 6.5 70.
- For other values of X, you must interpolate
- Eg, what is the percentile rank for X 5.5?
- See next slide
5Working with percentiles
X f cf c 1-2 1 1 10
10 3-4 2 3 20 30 5-6 4 7 40
70 7-8 2 9 20 90 9-10 1 10 10 100
cases
Interval
- What is the percentile rank for X 5.5?
- How far in the interval (5.5 4.5)/2 1/2
.5 - How many cases that far in .50 4 2
- Get cumulative percentage (3 2)/10 5/10
50
6A less obvious case of percentiles (optional)
X f cf c 1-2 1 1 10
10 3-4 2 3 20 30 5-6 4 7 40
70 7-8 2 9 20 90 9-10 1 10 10 100
cases
Interval
- What is the percentile rank for X 5?
- How far in the interval (5 4.5)/2 .5/2
.25 - How many cases that far in .25 4 1
- Get cumulative percentage (3 1)/10 4/10
40
7Reversing the problem Find X for some ile
X f cf c 1-2 1 1 10
10 3-4 2 3 20 30 5-6 4 7 40
70 7-8 2 9 20 90 9-10 1 10 10 100
cases
Interval
- What score would put you at the 70th percentile?
- Answer When exact percentile already available
- Simply extract value of X using true limits
- X(at 70th percentile) 6.5
8A less obvious case of finding X (optional)
X f cf c 1-2 1 1 10
10 3-4 2 3 20 30 5-6 4 7 40
70 7-8 2 9 20 90 9-10 1 10 10 100
?
cases
N 10
Interval
- What score would put you at the 80th percentile?
- get n needed to reach that percentile
- .80 N 8, or 1 in from next Interval
- get how much distance to cover in interval to get
this n - (1/freq. In interval) interval width (1/2) 2
1 - get X by adding to lower limit X 6.5 1
7.5
9Note about text
- Page 123, problem 3
- Frequency distribution is incorrect
- Interval is missing
- Answer 4 is incorrect and/or solution to table
other than table in 3.
10From graphical to numeric summaries
- So far, considered graphical methods
- Advantage rich, flexible, capitalize on power of
visual cortex - Disadvantage cumbersome, inefficient
- Descriptive statistics and numeric summaries
- Aim to summarize central tendency, variability,
shape - To do so more efficiently than graphical methods
- Tradeoff
- Easier to be mislead if we are careless
- Easier to forget what we are measuring
11Three measures of central tendency
Mode the most common values(s) the most
population interval(s)
0l 1 0t 2 3 0f 4 4 4 5 0s 6 6 7 0h 8
1l 1t 1
Median the value that Divides sample into 2
equal Parts (50th percentile) 4.5
Mean SX/N 61/12 5.08
Note decimal is 1 to right of stem.
12Footnote on Mode
- Two uses
- 1. most frequent exact value
- eg 4 in last example
- useful with discrete variables
- 2. most populous interval
- eg interval 4-5 in last example
- interval interpretation necessary when using
continuous variables
13Strengths and weakness of these measures Ex. 1
Modes 2.298 2.310
2.29h 8 8 8 8 8 8 2.30l 1 1 2.30t 2.30f 2.30s 2.30
h 9 2.31l 0 0 0 0 0 0 2.31t 2
Mean 2.304
Median 2.305
Note neither mean nor median is a good
description of central tendency because there is
no one central tendency.
14Strengths and weakness Ex. 2
2.29h 8 8 8 8 8 8 2.30l 1 1 2.30t 2.30f 2.30s 2.30
h 9 2.31l 0 0 0 0 0 0 2.31t 2 2.31f . . . Etc . .
. . . . . . 5.00 0
Modes 2.298 2.310 Not changed
Median 2.309 Almost no change
Mean 2.46. Now lies outside the Range of all
but one value.
15Summary
- Mode
- Strengths
- Easy to identify (usually)
- When based on intervals, will change as intervals
are changed - Weaknesses
- Not necessarily unique
- Not necessarily central
16Summary
- Median
- Strengths
- Uses more information than mode
- Not disturbed by extreme scores (robust)
- Weaknesses
- Does not use all information in sample ranks
matter, but not distances - Awkward to compute (e.g., when ties exist)
- Insensitive to bimodality
- Medians of two samples may not be combined to
determine median of the combined samples.
17Monday 6.10.03 ended here
18Summary
- Mean
- Strengths
- Uses all of the information in sample distances
matter - Mathematicall appealing
- Eg means of two samples can be combined to
determine mean of the combined samples - Weaknesses
- Disturbed by extreme scores not robust
- Therefore, can be misleading when data are skewed
- Like median, insensitive to bimodality
19Thinking about central tendencies
cases
Interval
- Where is the middle?
- How would we know if one description of middle
is better than another?
20Thinking about central tendencies
- Cleary, we want our estimate of middle (M) to
be as near as possible to the data - How are we going to define near?
- Perhaps average distance from M to each datum?
- Snag negative distances
- Eg for X 3 4 5 m6 and m2 are equally far
from the data - But distances are in one case, - in the other
- So what to do?
21Thinking about central tendencies
- Two options
- 1. Take the absolute value of the deviations
- Data 1 2 9
- if guess 9, X 9 8 7 0, sum 15
- if guess 3, X 3 2 1 6, sum 9
- if guess 2, X 2 1 0 7, sum 8
- 2. Take the square of the deviations
- Data 1 2 9
- if guess 9, (X 9)2 64 49 0, sum 113
- if guess 3, (X 3)2 4 1 36, sum
41 - if guess 2, (X 2)2 1 0 49, sum
50
22Thinking about central tendencies
- Now just imagine we make many guesses about where
the middle is - data 1, 2, 9
- For each guess, assess how far from data by these
two criteria - sum of absolute differences
- sum of squared differences
23Absolute distances
Median
Minimum
Guess of where the middle is
24Squared distances
Mean
Minimum
Guess of where the middle is
25Thinking about central tendencies
- Implications
- the middle we select depends on how we
operationalize distance from data to middle - median minimizes sum of absolute distances
- mean minimizes sum of squared distances
- therefore, mean is known as the least squares
estimate of central tendency - why prefer mean?
- erratic behavior of sum of absolute distances,
as we have seen - other problems already noted (e.g., combining
samples)