Measures of Location and variability - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Measures of Location and variability

Description:

A parameter is a numerical summary measure of a population distribution. ... sample median M, first quantile Q1 and third quantile Q3. ... – PowerPoint PPT presentation

Number of Views:728

Avg rating:3.0/5.0

Slides: 39

Provided by: mingx

Category:

more less

Transcript and Presenter's Notes

Title: Measures of Location and variability

1
Measures of Location and variability

Chapter 2.4 Summary Measures of Location
mean
median
quartiles
Chapter 2.5 Summary Measures of variability
range
standard deviation (sd)
inter quartile range (IQR)

2
Measures of Location

Chapter 2.4 Summary Measures of Location
mean
median
trimmed mean

3
Summary Measurements

A parameter is a numerical summary measure of a
population distribution. ( refers to the entire
population )
A statistic is a numerical quantity calculated
from the observations in a sample. (obtained from
information in the sample)

4
Mean

The population mean, denoted by ?, is the balance
point of the population distribution, also called
the center of the mass, of the population
distribution.

5
sample mean

The sample mean is the average of the all
observations. It gives the approximate value of
the population mean. If a sample consists of
observations y1, y2, , yn, then the sample mean
is

6
Example 2.4.1

Here is the net worth of 10 residents of
Washington state (in thousands of dollars) 100,
1000, 250, 25, 750, 575, 2500, 3200, 670, 320.
Compute the sample mean of the net worth.
Solution Sample mean

The average net worth of the 10 residents is 1039
thousand dollars
7
Continued

What happens if we add Bill Gates' net worth of
40.5 billion dollars, which is 40500000
thousands of dollars?
an outlier (a number that stand apart from the
remainder of the data ).
3,682,763

8
the net worth of residents

40500000
710
9
Median

The population median, denoted by ? , is the
numerical value that divides the population
distribution in half. It is also called the
second quartile.

50
50
?
?
10
Median

The sample median, denoted by M, is the middle
observarion if n is odd, or the average of the
two middle observation if n is even. In either
case, the median is located at the position
(n1)/2 in the ordered data set.
Example 5. 1, 2, 2, 3, 6, 7, 8
Example 6. 8, 9, 10, 2, 6, 10

11
Example 2.4.1(continued)100, 1000, 250, 25, 750,
575, 2500, 3200, 670, 320

Steps to find median
Step1,Order observations from smallest to
largest.
25 100 250 320 670 750 1000 1575
2500 3200
Step 2,Count the observations, denote the total
number as n. n10

Step3,Find the location of the median, which is
in the (n1)/2 th position
If n is odd, the median is the middle value.
If n is even, the median is the average of the
middle two values
(101)/25.5 ,the median is
(670 750)/2710

13
Exercise Including Bill Gates' net worth, what
is the median of the net worth.

100, 1000, 250, 25, 750, 575, 2500, 3200, 670,
320, 40500000
Solution
25 100 250 320 670 750 1000 1575
2500 3200 40500000
n11,(111)/26
the median 750

14
Example 1

data -1, 1
data -2, 1,1
data -3, -2, -1, 1, 1, 1, 1, 1, 1
example 2
1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 20

15
Trimmed mean

Motivation
A p trimmed sample mean
Olympic game rating system
use 1/9 trimmed mean

16
Trimmed mean

Example 3 Calculate 5 trimmed mean of the
above example.
1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 20

Answer N 20 obs, 5201, then the remain data
set is 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1 , Answer _____.
17
Exercise

A stem and leaf is given (n10)
1 078
2 02457
3 14
Find the 10 trimmed sample mean. _____

18
Quartiles

The first quartile, denoted by ? 1 , is the
numerical value that divides the lower half of
the population in half. The first sample
quartile, Q1 can estimate it.
The third quartile, denoted by ? 3 , is the
numerical value that divides the upper half of
the population in half. The third sample quartile
Q3 can estimate it.
The first and third sample quartiles, Q1 and Q3,
are similarly defined for samples. The median is
the second quartile, Q2.

19
Quartiles

Q3 Upper quartile median of upper half
(include median if
n is odd)
Q1 Lower quartile median of lower half
(include median if
n is odd)
Q2median

20
Example 1
Data (sorted!) 35 37 45 46 49 56 57 57 59
61 62 64 68 71 72 76 80 89 94
Calculate Max, Min, n, Mean, Median, Q1 and
Q3

Max 94, Min 35, n19, Mean 62, Median
61
Upper half
35 37 45 46 49 56 57 57 59 61 62 64 68 71 72
76 80 89 94
Q3 (7172)/2 71.5
Lower half
35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
80 89 94
Q1 (49 56)/2 52.5

21
Example 2

Researchers have investigated lead absorption in
children of parents who worked in a factory where
lead is used to make batteries. A stem and leaf
is given (n10)
4 07
5
6 14
7 1349
8
9 2
10 3
Compute the following quantities
The sample mean , 10 trimmed mean,
sample median M, first quantile Q1 and third
quantile Q3.

22
Chapter 2.5 Summary Measure of Variability

range,
standard deviation (sd)
inter quartile range (IQR) (Q spread)

23
One open question

The following two data sets are scores of student
A and student B in some tests.
A60, 60, 80, 80, 80, 90, 90
B30, 50, 80, 80, 80, 100, 120
Can the location measures tell the difference
between them ?

24
A60, 60, 80, 80, 80, 90, 90 B30, 50, 80,
80, 80, 100, 120
25

Range H-L
Q-spread is the distance between the first and
third sample quartile, Q3 Q1.
The corresponding q-spread is similarly
defined using the population quartiles in place
of the sample quartiles. (This measure of
variability is resistant to the influence of
outliers)
Standard deviation is the most widely used.

The sample variance, denoted by s2, is the
average squared distance of all measurements from
the sample mean.
A small question why do we square distance?
The expression in the numerator is referred to as
a Sum of squares

27
Standard deviation
Standard deviation is the positive square root of
the variance.
The population standard deviation is denoted by
?, the sample standard deviation is denoted by
s.
28
Example

Data set is given as follows
3 4 10 7 6
mean median
variance
standard deviation

29
Interpreting the standard deviation s

If we have two samples, a larger value of s
in one sample reflects greater variation of the
observations from the mean than the other sample.

While, if we have one sample, once we know
standard deviation, we can tell the percent of
the data that is with in a specified number of
standard deviation. E.g., what percent of the
distribution is within one standard deviation of
the mean? The answer depends on the shape of the
distribution.

31
Variability- The standard deviation

Standard deviation has also meaning when used
with only one sample. The number of measurements
that fall within 1, 2 and 3 standard deviations
of the mean are calculated by the following two
rules
-Chebyshevs rule
-Empirical rule
Chebyshevs rule applies to any set of data.
The empirical rule applied only to bell shaped
symmetrical distributions of data.

Empirical rule

-Approximately 68 of the measurements fall
within 1 std of the mean. -Approximately 95 of
the measurements fall within 2 std of the
mean. -Essentially all the measurements will fall
within 3 std of the mean.
33
Chebyshev's rule

Chebyshev's rule (regardless of the shape of the
distribution)
(1) At least 3/4 of the measurements will fall
within two standard deviation of the mean.
(2) At least 8/9 of the measurements will fall
within three standard deviation of the mean.

34
Example

The recorded temperature on the 24 launches
previous to the Challenger accident are given
here in a stem and leaf plot. Calculate the mean
and the standard deviation and use them to give
an interpretation of the amount of variability in
the data using either the empirical rule or
Chebyshevs rule (page 111).
5 378
6 3677789
7 000023556689
8 01

35
Answer

Mean70
Sd7.2
17/2470.868
23/2495.895

36
z-score

In the above example, we observed that 31 degrees
is unusually low. When 31 is included in the data
set, mean68.44, stDev10.53. How low is it? To
evaluate a single score, we calculate its
z-score
The z-score corresponding to a particular
observation x is given by
z(observation-mean)/standard deviation

37
z-score

Negative z-score indicates that the observation
is below the mean. It is generally assumed that
any observation with a z-score greater than 3 in
absolute value is an outlier

38
Exercise