Describing Distributions with Numbers - PowerPoint PPT Presentation

About This Presentation
Title:

Describing Distributions with Numbers

Description:

... deviation for reasonably symmetric distributions that are free of outliers. ... Illustrative example: 'Books read' 5-point summary: 0, 1, 3, 5.5, 99 ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 35
Provided by: jamesmaysm2
Learn more at: https://www.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Describing Distributions with Numbers


1
Chapter 2
  • Describing Distributions with Numbers

2
Numerical Summaries
  • Center of distribution
  • mean
  • median
  • Spread of distribution
  • five-point summary ( interquartile range)
  • standard deviation ( variance)

3
Mean (Arithmetic Average)
  • Traditional measure of center
  • Notation (xbar)
  • Sum the values and divide by the sample size (n)

4
Mean Illustrative Example Metabolic Rate
  • Data Metabolic rates, 7 men (cal/day)
  • 1792 1666 1362 1614 1460 1867 1439

5
Median (M)
  • Half of the ordered values are less than or equal
    to the median value
  • Half of the ordered values are greater than or
    equal to the median value
  • If n is odd, the median is the middle ordered
    value
  • If n is even, the median is the average of the
    two middle ordered values

6
Median
  • Example 1 data 2 4 6
  • Median (M) 4
  • Example 2 data 2 4 6 8
  • Median 5 (average of
    4 and 6)
  • Example 3 data 6 2 4
  • Median ? 2
  • (order the values 2 4 6 , so Median 4)

7
Location of the Median L(M)
  • Location of the median L(M) (n1)2 ,where n
    sample size.
  • Example If 25 data values are recorded, the
    Median is located at position (251)/2 13 in
    ordered array.

8
Median Illustrative Example
  • Data Metabolic rates, n 7
  • 1792 1666 1362 1614 1460 1867 1439

L(M) (7 1) / 2 4 Ordered array 1362
1439 1460 1614 1666 1792 1867
?
median Value of median 1614
9
Comparing the Mean Median
  • Mean median when data are symmetrical
  • Mean ? median when data skewed or have outlier
    (mean pulled toward tail) while the median is
    more resistant

If we switch this 1362 1439 1460 1614
1666 1792 1867 to this 1362 1439 1460
1614 1666 1792 9867 the median is still
1614 but the mean goes from 1600 to 2742.9
10
Question
  • The average salary at a high tech company is
    250K / year
  • The median salary is 60K.
  • How can this be?
  • Answer There are some very highly paid
    executives, but most of the workers make modest
    salaries

11
Spread Variability
  • Variability ? the amount values spread above and
    below the center
  • Can be measured in several ways
  • range (rarely used)
  • 5-point summary inter-quartile range
  • variance and standard deviation

12
Range
  • Based on smallest (minimum) and largest (maximum)
    values in the data set
  • Range max ? min
  • The range is not a reliable measure of spread
    (affected by outliers, biased)

13
Quartiles
  • Three numbers which divide the ordered data into
    four equal sized groups.
  • Q1 has 25 of the data below it.
  • Q2 has 50 of the data below it. (Median)
  • Q3 has 75 of the data below it.

14
Obtaining the Quartiles
  • Order the data.
  • Find the median
  • This is Q2
  • Look at the lower half of the data (those below
    the median)
  • The median of this lower half Q1
  • Look at the upper half of the data
  • The median of this upper half Q3

15
Illustrative example 10 ages
  • AGE (years) values, ordered array (n 10)
  • 05 11 21 24 27 28 30 42 50 52
  • ? ?
    ? Q1 Q2
    Q3
  • Q1 21
  • Q2 average of 27 and 28 27.5
  • Q3 42

16
Weight Data Sorted n 53 ?Median
L(M)(531)/227 ?? placing it at
165L(Q1)(261)/213.5 ?? placing it between 127
and 128 (127.5)L(Q3) 13.5 from the top ??
placing it between 185 and 185
Q1 127.5 Q2 165 Q3 185
17
Weight DataQuartiles
10 0166 11 009 12 0034578 13 00359 14 08 15
00257 16 555 17 000255 18 000055567 19 245 20
3 21 025 22 0 23 24 25 26 0
Q1 127.5
Q2 165
Q3 185
18
Five-Number Summary
  • minimum 100
  • Q1 127.5
  • M 165
  • Q3 185
  • maximum 260

IQR gives spread of middle 50 of the data
19
Boxplot
  • Central box spans Q1 and Q3.
  • A line in the box marks the median M.
  • Lines extend from the box out to the minimum and
    maximum.

20
Weight Data Boxplot
21
Quartile extrapolation
  • Quartile divides data set into 4 segment bottom,
    bottom middle, top middle, upper
  • With small data sets ? extrapolate values
  • Illustrative data 2, 4, 6, 8
  • 2 4 6 8
  • Q1 Q2 Q3
  • Q1 average of 2 and 4, which is 3
  • Q2 average of 4 and 5, which is 5
  • Q3 average of 6 and 8, which is 7

22
Boxplots ? useful for comparing two groups (text
p. 39)
23
Variances Standard Deviation
  • The most common measures of spread
  • Based on deviations around the mean
  • Each data value has a deviation, defined as

24
Fig 2.3 Metabolic Rate for 7 men, with their
mean () and two deviations shown
25
Variance
  • Find the mean
  • Find the deviation of each value
  • Square the deviations
  • Sum the squared deviations we call this the sum
    of squares, or SS
  • Divide the SS by n-1
  • (gives typical squared deviation from mean)

26
Variance Formula
27
Standard Deviation Square root of the variance
28
Variance and Standard DeviationIllustrative
Example
  • Data Metabolic rates, 7 men (cal/day)
  • 1792 1666 1362 1614 1460 1867 1439

29
Variance and Standard Deviation Illustrative
Example (cont.)
Observations Deviations Squared deviations

1792 1792?1600 192 (192)2 36,864
1666 1666 ?1600 66 (66)2 4,356
1362 1362 ?1600 -238 (-238)2 56,644
1614 1614 ?1600 14 (14)2 196
1460 1460 ?1600 -140 (-140)2 19,600
1867 1867 ?1600 267 (267)2 71,289
1439 1439 ?1600 -161 (-161)2 25,921
sum 0 SS 214,870
30
Variance and Standard Deviation Illustrative
Example (cont.)
Notes (1) Use standard deviation s for
descriptive purposes(2) Variance standard
deviation calculated by calculator or computer in
practice
31
Summary Statistics
  • Two main measures of central location
  • Mean ( )
  • Median (M)
  • Two main measures of spread
  • Standard deviation (s)
  • 5-point summary (interquartile range)

32
Choosing Summary Statistics
  • Use the mean and standard deviation for
    reasonably symmetric distributions that are free
    of outliers.
  • Use the median and IQR (or 5-point summary) when
    data are skewed or when outliers are present.

33
Example Number of Books Read
L(M)(521)/226.5
M
34
Illustrative example Books read
  • 5-point summary 0, 1, 3, 5.5, 99Note highly
    asymmetric distribution

xbar 7.06 s 14.43The mean and standard
deviation give false impression with asymmetric
data
Write a Comment
User Comments (0)
About PowerShow.com