Lecture 5 How much variety is there? Measures of variation PowerPoint PPT Presentation

presentation player overlay
1 / 39
About This Presentation
Transcript and Presenter's Notes

Title: Lecture 5 How much variety is there? Measures of variation


1
Lecture 5 How much variety is there?Measures of
variation
  • Sociology 549
  • Paul von Hippel

2
Measures of variation
  • Sensitive (to extreme values)
  • standard deviation (s)
  • variance (s2)
  • range
  • Robust
  • interquartile range (IQR)
  • Standard deviation
  • is basis for standardization

3
Center vs. variation
  • Two distributions can have same center
  • But differ with respect to variation
  • 2 basketball teams
  • Clippers
  • Knicks
  • Similar mean height, between 66 and 67
  • But they dont match up well

4
Basketball teams Mean heights
5
Variance and standard deviation
  • Most common measures are
  • Variance (s2)
  • Standard deviation (s)
  • To understand
  • Must understand deviation

6
Deviations from the mean
  • Deviation from the mean is
  • Y is the value for a particular case
  • Y65 for Earl Boykins
  • Y bar is the mean over all the cases
  • Deviation -13.54 for Earl Boykins
  • Interpretation He is 13.54 shorter than the
    team mean

7
Variance Calculation
  • Variance is the mean of the squared deviations.
  • Formula
  • Steps
  • Calculate the deviation for each case
  • Square each deviation
  • Sum the squared deviation
  • Divide by N-1 (not N)
  • Remember N is sample size, of cases
  • Why N-1?
  • If N1
  • you cant see variation
  • and you cant divide by N-10

8
Variance Example
  • Note influence of extreme cases (esp. Boykins)

9
Variance Interpretation
  • More variety?larger variance
  • Beyond that, not easy to interpret
  • Variance is in squared units
  • Need to un-square them

10
Standard deviation Calculation
  • How can we un-square the variance
  • Take the square root!
  • Square root of variance is standard deviation
  • Example. For Clippers,
  • s(23.10)1/24.81 inches

11
Standard deviation Interpretation
  • Variance is in squared units
  • Variance of Clipper heights is 23.10
    inches-squared
  • Standard deviation is in original units
  • SD of Clipper heights4.81 inches
  • Deviations from mean also in inches
  • Boykinss deviation 13.54 inches
  • Can compare
  • Standard deviation is a
  • standard to which
  • deviations are compared

12
StandardizationDeviation vs. standard deviation
  • Earl Boykins has a deviation of 13.54 inches
  • The standard deviation is 4.81 inches
  • So Earl Boykins is 13.54/4.81-2.81 standard
    deviations from the mean height for his team
  • This is a standard or Z score
  • General formula
  • Interpretation
  • The case is Z standard deviations from the mean
  • E.g., Boykins is 2.81 standard deviations below
    the mean height

13
Standard scores Interpretation
  • Extreme values? extreme standard scores
  • Its rare to find Zgt2 or Zlt-2

14
Exercise
  • For exam scores below
  • mean 70.2
  • standard deviation (25)
  • Calculate and interpret the standard score of
    the most extreme value.

15
Reversing standardization
  • Given
  • standard score Z
  • mean
  • standard deviation SY
  • You can get back the raw score Y
  • This is just a rearrangement of the
    standardization formula

16
Reversing standardization Example
  • Earl Boykins is 2.81 standard deviations below
    the mean for his team.
  • His team has a mean height of 78.54 inches, and a
    standard deviation of 4.81 inches
  • What is Earl Boykins height again?

17
Dummy variables review
  • Suppose Y is a dummy variable
  • e.g. Y1 if a student is female, Y0 if male
  • Some proportion p have Y1 (female)
  • p is also the mean, i.e.

18
Dummy variables variance SD
  • Can calculate variance ( SD) in usual way
  • But theres a shortcut
  • s2 p(1-p)
  • s (s2)1/2

19
Dummy variance SD Examples
Makes sense Colleges with more gender variety
have larger variance ( SD)
20
Other measures of variation
  • In addition to variance and sd
  • Range
  • Inter-Quartile Range (IQR)

21
Range Calculation
  • Largest minus smallest value
  • E.g., Clippers
  • shortest 65 inches (Boykins)
  • tallest 84 inches (Olowakandi)
  • Range 84-6519 inches

22
Range Interpretation
  • Really easy
  • All the player heights fit in a 19-inch range,
    from 65 to 84 inches.
  • But
  • Sensitive to extreme values
  • Uses only extreme values!
  • Increases with N

23
Interquartile range Motivation
  • Less sensitive to extreme values
  • Could be called trimmed range
  • Range ignoring extreme scores
  • Recipe
  • 3rd quartile 1st quartile

24
Finding the quartiles
  • Quartiles split the distribution into quarters
  • Split the distribution in half
  • at the median (2nd quartile)
  • 1st quartile median of smaller half
  • 3rd quartile median of larger half

25
IQR Example with odd N
median
IQR4.5
Interpretation About half the players (7 of 13)
have heights within a 4.5 range, between 77
(65) and 81.5 (69.5).
26
IQR Example with even N
median
IQR81-756
Interpretation About half the players (6 or 8 of
14) have heights within a 6 range, between 75
(63) and 81 (69).
27
Comparing measures of variation
  • Using range, variance, or SD,Clippers look more
    variable.
  • But using IQR, Knicks look more variable.
  • Why?

28
Influence of extreme values
One extreme height (Boykins) expands range and
variance of Clippers, but cant affect IQR.
Centrality measure Extreme values
Mean Influential
Trimmed mean Less influential
Median Not influential
Variation measure Extreme values
Range Very influential
Variance ( SD) Influential
IQR Less influential
29
Formulas for frequency tables
30
IQR from a frequency table
  • Tricky to get a recipe thats always right.
  • Rough method, usually right for large N.
  • Q1 first value with cgt25
  • Q3 first value with cgt75

31
IQR from a frequency tableExample
  • Q11, Q32, IQRQ3-Q11
  • Interpretation
  • More than 50 of surveyed householdshad 1 to 2
    residents.

32
Variance from a frequency table
  • Data set
  • Mean
  • Variance (mean squared deviation)
  • Frequency table
  • Mean
  • Variance (mean squared deviation)

33
Variance from frequency table Example
  • Same answer as from raw data.

34
Summary
  • Whats typical? isnt the whole story
  • Lots of cases arent typical
  • Some important cases may be very atypical
  • Measures of variation
  • Variance s2, Standard deviation s
  • IQR
  • Variance and s.d. are sensitive, IQR is robust
  • Remaining lectures use variance and s.d.
  • S.D. is basis for standardization

35
Bonus slides
36
Exercise
  • Given exam scores
  • Calculate variance, standard deviation, range and
    IQR
  • Interpret range and IQR

37
Answer
Q1(3165)/248
IQR91.5-4843.5
Q3(8895)/291.5
  • Interpretation
  • All the scores fit in a 64-point range (31 to
    95).
  • But over half the scores fit in a 43.5-point
    range.
  • (Here even the IQR is influenced by the lowest
    score.)

38
Warning
  • There are other formulas for IQR!
  • Your textbooks is the worst (not symmetric).
  • For Excels formula, see http//www.staff.city.ac.
    uk/r.j.gerrard/excelfaq/faq.htmlqtls
  • But we wont get fussy
  • Discrepancies are small in large samples

39
Variance of a dummy Proof
p is proportion with Y1 (1-p) is proportion with
Y0
Write a Comment
User Comments (0)
About PowerShow.com