Title: Measures of Location and variability
1Measures of Location and variability
- Chapter 2.4 Summary Measures of Location
- mean
- median
- quartiles
- Chapter 2.5 Summary Measures of variability
- range
- standard deviation (sd)
- inter quartile range (IQR)
2Measures of Location
- Chapter 2.4 Summary Measures of Location
- mean
- median
- trimmed mean
3Summary Measurements
- A parameter is a numerical summary measure of a
population distribution. ( refers to the entire
population ) - A statistic is a numerical quantity calculated
from the observations in a sample. (obtained from
information in the sample)
4Mean
- The population mean, denoted by ?, is the balance
point of the population distribution, also called
the center of the mass, of the population
distribution.
5sample mean
- The sample mean is the average of the all
observations. It gives the approximate value of
the population mean. If a sample consists of
observations y1, y2, , yn, then the sample mean
is
6Example 2.4.1
- Here is the net worth of 10 residents of
Washington state (in thousands of dollars) 100,
1000, 250, 25, 750, 575, 2500, 3200, 670, 320.
Compute the sample mean of the net worth. - Solution Sample mean
The average net worth of the 10 residents is 1039
thousand dollars
7Continued
- What happens if we add Bill Gates' net worth of
40.5 billion dollars, which is 40500000
thousands of dollars? - an outlier (a number that stand apart from the
remainder of the data ). - 3,682,763
8the net worth of residents
40500000
710
9Median
- The population median, denoted by ? , is the
numerical value that divides the population
distribution in half. It is also called the
second quartile.
50
50
?
?
10Median
- The sample median, denoted by M, is the middle
observarion if n is odd, or the average of the
two middle observation if n is even. In either
case, the median is located at the position
(n1)/2 in the ordered data set. - Example 5. 1, 2, 2, 3, 6, 7, 8
- Example 6. 8, 9, 10, 2, 6, 10
11Example 2.4.1(continued)100, 1000, 250, 25, 750,
575, 2500, 3200, 670, 320
- Steps to find median
- Step1,Order observations from smallest to
largest. - 25 100 250 320 670 750 1000 1575
2500 3200 - Step 2,Count the observations, denote the total
number as n. n10
12- Step3,Find the location of the median, which is
in the (n1)/2 th position - If n is odd, the median is the middle value.
- If n is even, the median is the average of the
middle two values - (101)/25.5 ,the median is
- (670 750)/2710
13Exercise Including Bill Gates' net worth, what
is the median of the net worth.
- 100, 1000, 250, 25, 750, 575, 2500, 3200, 670,
320, 40500000 - Solution
- 25 100 250 320 670 750 1000 1575
2500 3200 40500000 - n11,(111)/26
- the median 750
14Example 1
- data -1, 1
- data -2, 1,1
- data -3, -2, -1, 1, 1, 1, 1, 1, 1
- example 2
- 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
- 1, 2, 1, 2, 1, 2, 1, 2, 1, 20
15Trimmed mean
- Motivation
- A p trimmed sample mean
- Olympic game rating system
- use 1/9 trimmed mean
16Trimmed mean
- Example 3 Calculate 5 trimmed mean of the
above example. - 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
- 1, 2, 1, 2, 1, 2, 1, 2, 1, 20
Answer N 20 obs, 5201, then the remain data
set is 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1 , Answer _____.
17Exercise
- A stem and leaf is given (n10)
- 1 078
- 2 02457
- 3 14
- Find the 10 trimmed sample mean. _____
18Quartiles
- The first quartile, denoted by ? 1 , is the
numerical value that divides the lower half of
the population in half. The first sample
quartile, Q1 can estimate it. - The third quartile, denoted by ? 3 , is the
numerical value that divides the upper half of
the population in half. The third sample quartile
Q3 can estimate it. - The first and third sample quartiles, Q1 and Q3,
are similarly defined for samples. The median is
the second quartile, Q2.
19Quartiles
- Q3 Upper quartile median of upper half
- (include median if
n is odd) - Q1 Lower quartile median of lower half
- (include median if
n is odd) - Q2median
20Example 1
Data (sorted!) 35 37 45 46 49 56 57 57 59
61 62 64 68 71 72 76 80 89 94
Calculate Max, Min, n, Mean, Median, Q1 and
Q3
- Max 94, Min 35, n19, Mean 62, Median
61 - Upper half
- 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72
76 80 89 94 - Q3 (7172)/2 71.5
- Lower half
- 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
80 89 94 - Q1 (49 56)/2 52.5
21Example 2
- Researchers have investigated lead absorption in
children of parents who worked in a factory where
lead is used to make batteries. A stem and leaf
is given (n10) - 4 07
- 5
- 6 14
- 7 1349
- 8
- 9 2
- 10 3
- Compute the following quantities
- The sample mean , 10 trimmed mean,
- sample median M, first quantile Q1 and third
quantile Q3.
22Chapter 2.5 Summary Measure of Variability
- range,
- standard deviation (sd)
- inter quartile range (IQR) (Q spread)
23One open question
- The following two data sets are scores of student
A and student B in some tests. - A60, 60, 80, 80, 80, 90, 90
- B30, 50, 80, 80, 80, 100, 120
- Can the location measures tell the difference
between them ?
24A60, 60, 80, 80, 80, 90, 90 B30, 50, 80,
80, 80, 100, 120
25- Range H-L
- Q-spread is the distance between the first and
third sample quartile, Q3 Q1. - The corresponding q-spread is similarly
defined using the population quartiles in place
of the sample quartiles. (This measure of
variability is resistant to the influence of
outliers) - Standard deviation is the most widely used.
26- The sample variance, denoted by s2, is the
average squared distance of all measurements from
the sample mean. - A small question why do we square distance?
- The expression in the numerator is referred to as
a Sum of squares
27Standard deviation
Standard deviation is the positive square root of
the variance.
The population standard deviation is denoted by
?, the sample standard deviation is denoted by
s.
28Example
- Data set is given as follows
- 3 4 10 7 6
- mean median
- variance
- standard deviation
29Interpreting the standard deviation s
- If we have two samples, a larger value of s
in one sample reflects greater variation of the
observations from the mean than the other sample.
30- While, if we have one sample, once we know
standard deviation, we can tell the percent of
the data that is with in a specified number of
standard deviation. E.g., what percent of the
distribution is within one standard deviation of
the mean? The answer depends on the shape of the
distribution.
31 Variability- The standard deviation
- Standard deviation has also meaning when used
with only one sample. The number of measurements
that fall within 1, 2 and 3 standard deviations
of the mean are calculated by the following two
rules - -Chebyshevs rule
- -Empirical rule
- Chebyshevs rule applies to any set of data.
- The empirical rule applied only to bell shaped
symmetrical distributions of data.
32-Approximately 68 of the measurements fall
within 1 std of the mean. -Approximately 95 of
the measurements fall within 2 std of the
mean. -Essentially all the measurements will fall
within 3 std of the mean.
33Chebyshev's rule
- Chebyshev's rule (regardless of the shape of the
distribution) - (1) At least 3/4 of the measurements will fall
within two standard deviation of the mean.
- (2) At least 8/9 of the measurements will fall
within three standard deviation of the mean.
34Example
- The recorded temperature on the 24 launches
previous to the Challenger accident are given
here in a stem and leaf plot. Calculate the mean
and the standard deviation and use them to give
an interpretation of the amount of variability in
the data using either the empirical rule or
Chebyshevs rule (page 111). - 5 378
- 6 3677789
- 7 000023556689
- 8 01
35Answer
- Mean70
- Sd7.2
- 17/2470.868
- 23/2495.895
36z-score
- In the above example, we observed that 31 degrees
is unusually low. When 31 is included in the data
set, mean68.44, stDev10.53. How low is it? To
evaluate a single score, we calculate its
z-score - The z-score corresponding to a particular
observation x is given by - z(observation-mean)/standard deviation
37z-score
- Negative z-score indicates that the observation
is below the mean. It is generally assumed that
any observation with a z-score greater than 3 in
absolute value is an outlier
38Exercise
- We have a data set of ages of 10 students in one
university. - 22 21 27 32 19 20 22 23 18 25
- Draw the stem-and-leaf plot and histogram
- Compute the sample mean and 10 trimmed mean
- Compute the range and Q-spread .