Title: Lecture 5 How much variety is there Measures of variation
1Lecture 5 How much variety is there?Measures of
variation
- Sociology 549
- Paul von Hippel
2Measures of variation
- Sensitive (to extreme values)
- standard deviation (s)
- variance (s2)
- range
- Robust
- interquartile range (IQR)
- Standard deviation
- is basis for standardization
3Center vs. variation
- Two distributions can have same center
- But differ with respect to variation
- 2 basketball teams
- Clippers
- Knicks
- Similar mean height, between 66 and 67
- But they dont match up well
4Basketball teams Mean heights
5Variance and standard deviation
- Most common measures are
- Variance (s2)
- Standard deviation (s)
- To understand
- Must understand deviation
6Deviations from the mean
- Deviation from the mean is
- Y is the value for a particular case
- Y65 for Earl Boykins
- Y bar is the mean over all the cases
- Deviation -13.54 for Earl Boykins
- Interpretation He is 13.54 shorter than the
team mean
7Variance Calculation
- Variance is the mean of the squared deviations.
- Formula
- Steps
- Calculate the deviation for each case
- Square each deviation
- Sum the squared deviation
- Divide by N-1 (not N)
- Remember N is sample size, of cases
- Why N-1?
- If N1
- you cant see variation
- and you cant divide by N-10
8Variance Example
- Note influence of extreme cases (esp. Boykins)
9Variance Interpretation
- More variety?larger variance
- Beyond that, not easy to interpret
- Variance is in squared units
- Need to un-square them
10Standard deviation Calculation
- How can we un-square the variance
- Take the square root!
- Square root of variance is standard deviation
- Example. For Clippers,
- s(23.10)1/24.81 inches
11Standard deviation Interpretation
- Variance is in squared units
- Variance of Clipper heights is 23.10
inches-squared - Standard deviation is in original units
- SD of Clipper heights4.81 inches
- Deviations from mean also in inches
- Boykinss deviation 13.54 inches
- Can compare
- Standard deviation is a
- standard to which
- deviations are compared
12StandardizationDeviation vs. standard deviation
- Earl Boykins has a deviation of 13.54 inches
- The standard deviation is 4.81 inches
- So Earl Boykins is 13.54/4.81-2.81 standard
deviations from the mean height for his team - This is a standard or Z score
- General formula
- Interpretation
- The case is Z standard deviations from the mean
- E.g., Boykins is 2.81 standard deviations below
the mean height
13Standard scores Interpretation
- Extreme values? extreme standard scores
- Its rare to find Zgt2 or Zlt-2
14Exercise
- For exam scores below
- mean 70.2
- standard deviation (25)
- Calculate and interpret the standard score of
the most extreme value.
15Reversing standardization
- Given
- standard score Z
- mean
- standard deviation SY
- You can get back the raw score Y
- This is just a rearrangement of the
standardization formula
16Reversing standardization Example
- Earl Boykins is 2.81 standard deviations below
the mean for his team. - His team has a mean height of 78.54 inches, and a
standard deviation of 4.81 inches - What is Earl Boykins height again?
17Dummy variables review
- Suppose Y is a dummy variable
- e.g. Y1 if a student is female, Y0 if male
- Some proportion p have Y1 (female)
- p is also the mean, i.e.
18Dummy variables variance SD
- Can calculate variance ( SD) in usual way
- But theres a shortcut
- s2 p(1-p)
- s (s2)1/2
19Dummy variance SD Examples
Makes sense Colleges with more gender variety
have larger variance ( SD)
20Other measures of variation
- In addition to variance and sd
- Range
- Inter-Quartile Range (IQR)
21Range Calculation
- Largest minus smallest value
- E.g., Clippers
- shortest 65 inches (Boykins)
- tallest 84 inches (Olowakandi)
- Range 84-6519 inches
22Range Interpretation
- Really easy
- All the player heights fit in a 19-inch range,
from 65 to 84 inches. - But
- Sensitive to extreme values
- Uses only extreme values!
- Increases with N
23Interquartile range Motivation
- Less sensitive to extreme values
- Could be called trimmed range
- Range ignoring extreme scores
- Recipe
- 3rd quartile 1st quartile
24Finding the quartiles
- Quartiles split the distribution into quarters
- Split the distribution in half
- at the median (2nd quartile)
- 1st quartile median of smaller half
- 3rd quartile median of larger half
25IQR Example with odd N
median
IQR4.5
Interpretation About half the players (7 of 13)
have heights within a 4.5 range, between 77
(65) and 81.5 (69.5).
26IQR Example with even N
median
IQR81-756
Interpretation About half the players (6 or 8 of
14) have heights within a 6 range, between 75
(63) and 81 (69).
27Comparing measures of variation
- Using range, variance, or SD,Clippers look more
variable. - But using IQR, Knicks look more variable.
- Why?
28Influence of extreme values
One extreme height (Boykins) expands range and
variance of Clippers, but cant affect IQR.
29Formulas for frequency tables
30IQR from a frequency table
- Tricky to get a recipe thats always right.
- Rough method, usually right for large N.
- Q1 first value with cgt25
- Q3 first value with cgt75
31IQR from a frequency tableExample
- Q11, Q32, IQRQ3-Q11
- Interpretation
- More than 50 of surveyed householdshad 1 to 2
residents.
32Variance from a frequency table
- Data set
- Mean
- Variance (mean squared deviation)
- Frequency table
- Mean
- Variance (mean squared deviation)
33Variance from frequency table Example
- Same answer as from raw data.
34Summary
- Whats typical? isnt the whole story
- Lots of cases arent typical
- Some important cases may be very atypical
- Measures of variation
- Variance s2, Standard deviation s
- IQR
- Variance and s.d. are sensitive, IQR is robust
- Remaining lectures use variance and s.d.
- S.D. is basis for standardization
35Bonus slides
36Exercise
- Given exam scores
- Calculate variance, standard deviation, range and
IQR - Interpret range and IQR
37Answer
Q1(3165)/248
IQR91.5-4843.5
Q3(8895)/291.5
- Interpretation
- All the scores fit in a 64-point range (31 to
95). - But over half the scores fit in a 43.5-point
range. - (Here even the IQR is influenced by the lowest
score.)
38Warning
- There are other formulas for IQR!
- Your textbooks is the worst (not symmetric).
- For Excels formula, see http//www.staff.city.ac.
uk/r.j.gerrard/excelfaq/faq.htmlqtls
- But we wont get fussy
- Discrepancies are small in large samples
39Variance of a dummy Proof
p is proportion with Y1 (1-p) is proportion with
Y0