Title: Lecture 5 How much variety is there? Measures of variation
1Lecture 5 How much variety is there?Measures of
variation
- Sociology 549
- Paul von Hippel
2Measures of variation
- Sensitive (to extreme values)
- standard deviation (s)
- variance (s2)
- range
- Robust
- interquartile range (IQR)
- Standard deviation
- is basis for standardization
3Center vs. variation
- Two distributions can have same center
- But differ with respect to variation
- 2 basketball teams
- Clippers
- Knicks
- Similar mean height, between 66 and 67
- But they dont match up well
4Basketball teams Mean heights
5Variance and standard deviation
- Most common measures are
- Variance (s2)
- Standard deviation (s)
- To understand
- Must understand deviation
6Deviations from the mean
- Deviation from the mean is
- Y is the value for a particular case
- Y65 for Earl Boykins
- Y bar is the mean over all the cases
- Deviation -13.54 for Earl Boykins
- Interpretation He is 13.54 shorter than the
team mean
7Variance Calculation
- Variance is the mean of the squared deviations.
- Formula
- Steps
- Calculate the deviation for each case
- Square each deviation
- Sum the squared deviation
- Divide by N-1 (not N)
- Remember N is sample size, of cases
- Why N-1?
- If N1
- you cant see variation
- and you cant divide by N-10
8Variance Example
- Note influence of extreme cases (esp. Boykins)
9Variance Interpretation
- More variety?larger variance
- Beyond that, not easy to interpret
- Variance is in squared units
- Need to un-square them
10Standard deviation Calculation
- How can we un-square the variance
- Take the square root!
- Square root of variance is standard deviation
- Example. For Clippers,
- s(23.10)1/24.81 inches
11Standard deviation Interpretation
- Variance is in squared units
- Variance of Clipper heights is 23.10
inches-squared - Standard deviation is in original units
- SD of Clipper heights4.81 inches
- Deviations from mean also in inches
- Boykinss deviation 13.54 inches
- Can compare
- Standard deviation is a
- standard to which
- deviations are compared
12StandardizationDeviation vs. standard deviation
- Earl Boykins has a deviation of 13.54 inches
- The standard deviation is 4.81 inches
- So Earl Boykins is 13.54/4.81-2.81 standard
deviations from the mean height for his team - This is a standard or Z score
- General formula
- Interpretation
- The case is Z standard deviations from the mean
- E.g., Boykins is 2.81 standard deviations below
the mean height
13Standard scores Interpretation
- Extreme values? extreme standard scores
- Its rare to find Zgt2 or Zlt-2
14Exercise
- For exam scores below
- mean 70.2
- standard deviation (25)
- Calculate and interpret the standard score of
the most extreme value.
15Reversing standardization
- Given
- standard score Z
- mean
- standard deviation SY
- You can get back the raw score Y
- This is just a rearrangement of the
standardization formula
16Reversing standardization Example
- Earl Boykins is 2.81 standard deviations below
the mean for his team. - His team has a mean height of 78.54 inches, and a
standard deviation of 4.81 inches - What is Earl Boykins height again?
17Dummy variables review
- Suppose Y is a dummy variable
- e.g. Y1 if a student is female, Y0 if male
- Some proportion p have Y1 (female)
- p is also the mean, i.e.
18Dummy variables variance SD
- Can calculate variance ( SD) in usual way
- But theres a shortcut
- s2 p(1-p)
- s (s2)1/2
19Dummy variance SD Examples
Makes sense Colleges with more gender variety
have larger variance ( SD)
20Other measures of variation
- In addition to variance and sd
- Range
- Inter-Quartile Range (IQR)
21Range Calculation
- Largest minus smallest value
- E.g., Clippers
- shortest 65 inches (Boykins)
- tallest 84 inches (Olowakandi)
- Range 84-6519 inches
22Range Interpretation
- Really easy
- All the player heights fit in a 19-inch range,
from 65 to 84 inches. - But
- Sensitive to extreme values
- Uses only extreme values!
- Increases with N
23Interquartile range Motivation
- Less sensitive to extreme values
- Could be called trimmed range
- Range ignoring extreme scores
- Recipe
- 3rd quartile 1st quartile
24Finding the quartiles
- Quartiles split the distribution into quarters
- Split the distribution in half
- at the median (2nd quartile)
- 1st quartile median of smaller half
- 3rd quartile median of larger half
25IQR Example with odd N
median
IQR4.5
Interpretation About half the players (7 of 13)
have heights within a 4.5 range, between 77
(65) and 81.5 (69.5).
26IQR Example with even N
median
IQR81-756
Interpretation About half the players (6 or 8 of
14) have heights within a 6 range, between 75
(63) and 81 (69).
27Comparing measures of variation
- Using range, variance, or SD,Clippers look more
variable. - But using IQR, Knicks look more variable.
- Why?
28Influence of extreme values
One extreme height (Boykins) expands range and
variance of Clippers, but cant affect IQR.
Centrality measure Extreme values
Mean Influential
Trimmed mean Less influential
Median Not influential
Variation measure Extreme values
Range Very influential
Variance ( SD) Influential
IQR Less influential
29Formulas for frequency tables
30IQR from a frequency table
- Tricky to get a recipe thats always right.
- Rough method, usually right for large N.
- Q1 first value with cgt25
- Q3 first value with cgt75
31IQR from a frequency tableExample
- Q11, Q32, IQRQ3-Q11
- Interpretation
- More than 50 of surveyed householdshad 1 to 2
residents.
32Variance from a frequency table
- Data set
- Mean
- Variance (mean squared deviation)
- Frequency table
- Mean
- Variance (mean squared deviation)
33Variance from frequency table Example
- Same answer as from raw data.
34Summary
- Whats typical? isnt the whole story
- Lots of cases arent typical
- Some important cases may be very atypical
- Measures of variation
- Variance s2, Standard deviation s
- IQR
- Variance and s.d. are sensitive, IQR is robust
- Remaining lectures use variance and s.d.
- S.D. is basis for standardization
35Bonus slides
36Exercise
- Given exam scores
- Calculate variance, standard deviation, range and
IQR - Interpret range and IQR
37Answer
Q1(3165)/248
IQR91.5-4843.5
Q3(8895)/291.5
- Interpretation
- All the scores fit in a 64-point range (31 to
95). - But over half the scores fit in a 43.5-point
range. - (Here even the IQR is influenced by the lowest
score.)
38Warning
- There are other formulas for IQR!
- Your textbooks is the worst (not symmetric).
- For Excels formula, see http//www.staff.city.ac.
uk/r.j.gerrard/excelfaq/faq.htmlqtls
- But we wont get fussy
- Discrepancies are small in large samples
39Variance of a dummy Proof
p is proportion with Y1 (1-p) is proportion with
Y0