Measures of Location and variability - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Measures of Location and variability

Description:

A parameter is a numerical summary measure of a population distribution. ... sample median M, first quantile Q1 and third quantile Q3. ... – PowerPoint PPT presentation

Number of Views:728
Avg rating:3.0/5.0
Slides: 39
Provided by: mingx
Category:

less

Transcript and Presenter's Notes

Title: Measures of Location and variability


1
Measures of Location and variability
  • Chapter 2.4 Summary Measures of Location
  • mean
  • median
  • quartiles
  • Chapter 2.5 Summary Measures of variability
  • range
  • standard deviation (sd)
  • inter quartile range (IQR)

2
Measures of Location
  • Chapter 2.4 Summary Measures of Location
  • mean
  • median
  • trimmed mean

3
Summary Measurements
  • A parameter is a numerical summary measure of a
    population distribution. ( refers to the entire
    population )
  • A statistic is a numerical quantity calculated
    from the observations in a sample. (obtained from
    information in the sample)

4
Mean
  • The population mean, denoted by ?, is the balance
    point of the population distribution, also called
    the center of the mass, of the population
    distribution.

5
sample mean
  • The sample mean is the average of the all
    observations. It gives the approximate value of
    the population mean. If a sample consists of
    observations y1, y2, , yn, then the sample mean
    is

6
Example 2.4.1
  • Here is the net worth of 10 residents of
    Washington state (in thousands of dollars) 100,
    1000, 250, 25, 750, 575, 2500, 3200, 670, 320.
    Compute the sample mean of the net worth.
  • Solution Sample mean

The average net worth of the 10 residents is 1039
thousand dollars
7
Continued
  • What happens if we add Bill Gates' net worth of
    40.5 billion dollars, which is 40500000
    thousands of dollars?
  • an outlier (a number that stand apart from the
    remainder of the data ).
  • 3,682,763

8
the net worth of residents

40500000
710
9
Median
  • The population median, denoted by ? , is the
    numerical value that divides the population
    distribution in half. It is also called the
    second quartile.

50
50
?
?
10
Median
  • The sample median, denoted by M, is the middle
    observarion if n is odd, or the average of the
    two middle observation if n is even. In either
    case, the median is located at the position
    (n1)/2 in the ordered data set.
  • Example 5. 1, 2, 2, 3, 6, 7, 8
  • Example 6. 8, 9, 10, 2, 6, 10

11
Example 2.4.1(continued)100, 1000, 250, 25, 750,
575, 2500, 3200, 670, 320
  • Steps to find median
  • Step1,Order observations from smallest to
    largest.
  • 25 100 250 320 670 750 1000 1575
    2500 3200
  • Step 2,Count the observations, denote the total
    number as n. n10

12
  • Step3,Find the location of the median, which is
    in the (n1)/2 th position
  • If n is odd, the median is the middle value.
  • If n is even, the median is the average of the
    middle two values
  • (101)/25.5 ,the median is
  • (670 750)/2710

13
Exercise Including Bill Gates' net worth, what
is the median of the net worth.
  • 100, 1000, 250, 25, 750, 575, 2500, 3200, 670,
    320, 40500000
  • Solution
  • 25 100 250 320 670 750 1000 1575
    2500 3200 40500000
  • n11,(111)/26
  • the median 750

14
Example 1
  • data -1, 1
  • data -2, 1,1
  • data -3, -2, -1, 1, 1, 1, 1, 1, 1
  • example 2
  • 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
  • 1, 2, 1, 2, 1, 2, 1, 2, 1, 20

15
Trimmed mean
  • Motivation
  • A p trimmed sample mean
  • Olympic game rating system
  • use 1/9 trimmed mean

16
Trimmed mean
  • Example 3 Calculate 5 trimmed mean of the
    above example.
  • 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
  • 1, 2, 1, 2, 1, 2, 1, 2, 1, 20

Answer N 20 obs, 5201, then the remain data
set is 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1 , Answer _____.
17
Exercise
  • A stem and leaf is given (n10)
  • 1 078
  • 2 02457
  • 3 14
  • Find the 10 trimmed sample mean. _____

18
Quartiles
  • The first quartile, denoted by ? 1 , is the
    numerical value that divides the lower half of
    the population in half. The first sample
    quartile, Q1 can estimate it.
  • The third quartile, denoted by ? 3 , is the
    numerical value that divides the upper half of
    the population in half. The third sample quartile
    Q3 can estimate it.
  • The first and third sample quartiles, Q1 and Q3,
    are similarly defined for samples. The median is
    the second quartile, Q2.

19
Quartiles
  • Q3 Upper quartile median of upper half
  • (include median if
    n is odd)
  • Q1 Lower quartile median of lower half
  • (include median if
    n is odd)
  • Q2median

20
Example 1
Data (sorted!) 35 37 45 46 49 56 57 57 59
61 62 64 68 71 72 76 80 89 94
Calculate Max, Min, n, Mean, Median, Q1 and
Q3
  • Max 94, Min 35, n19, Mean 62, Median
    61
  • Upper half
  • 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72
    76 80 89 94
  • Q3 (7172)/2 71.5
  • Lower half
  • 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
    80 89 94
  • Q1 (49 56)/2 52.5

21
Example 2
  • Researchers have investigated lead absorption in
    children of parents who worked in a factory where
    lead is used to make batteries. A stem and leaf
    is given (n10)
  • 4 07
  • 5
  • 6 14
  • 7 1349
  • 8
  • 9 2
  • 10 3
  • Compute the following quantities
  • The sample mean , 10 trimmed mean,
  • sample median M, first quantile Q1 and third
    quantile Q3.

22
Chapter 2.5 Summary Measure of Variability
  • range,
  • standard deviation (sd)
  • inter quartile range (IQR) (Q spread)

23
One open question
  • The following two data sets are scores of student
    A and student B in some tests.
  • A60, 60, 80, 80, 80, 90, 90
  • B30, 50, 80, 80, 80, 100, 120
  • Can the location measures tell the difference
    between them ?

24
A60, 60, 80, 80, 80, 90, 90 B30, 50, 80,
80, 80, 100, 120
25
  • Range H-L
  • Q-spread is the distance between the first and
    third sample quartile, Q3 Q1.
  • The corresponding q-spread is similarly
    defined using the population quartiles in place
    of the sample quartiles. (This measure of
    variability is resistant to the influence of
    outliers)
  • Standard deviation is the most widely used.

26
  • The sample variance, denoted by s2, is the
    average squared distance of all measurements from
    the sample mean.
  • A small question why do we square distance?
  • The expression in the numerator is referred to as
    a Sum of squares

27
Standard deviation
Standard deviation is the positive square root of
the variance.
The population standard deviation is denoted by
?, the sample standard deviation is denoted by
s.
28
Example
  • Data set is given as follows
  • 3 4 10 7 6
  • mean median
  • variance
  • standard deviation

29
Interpreting the standard deviation s
  • If we have two samples, a larger value of s
    in one sample reflects greater variation of the
    observations from the mean than the other sample.

30
  • While, if we have one sample, once we know
    standard deviation, we can tell the percent of
    the data that is with in a specified number of
    standard deviation. E.g., what percent of the
    distribution is within one standard deviation of
    the mean? The answer depends on the shape of the
    distribution.

31
Variability- The standard deviation
  • Standard deviation has also meaning when used
    with only one sample. The number of measurements
    that fall within 1, 2 and 3 standard deviations
    of the mean are calculated by the following two
    rules
  • -Chebyshevs rule
  • -Empirical rule
  • Chebyshevs rule applies to any set of data.
  • The empirical rule applied only to bell shaped
    symmetrical distributions of data.

32
  • Empirical rule

-Approximately 68 of the measurements fall
within 1 std of the mean. -Approximately 95 of
the measurements fall within 2 std of the
mean. -Essentially all the measurements will fall
within 3 std of the mean.
33
Chebyshev's rule
  • Chebyshev's rule (regardless of the shape of the
    distribution)
  • (1) At least 3/4 of the measurements will fall
    within two standard deviation of the mean.
  • (2) At least 8/9 of the measurements will fall
    within three standard deviation of the mean.

34
Example
  • The recorded temperature on the 24 launches
    previous to the Challenger accident are given
    here in a stem and leaf plot. Calculate the mean
    and the standard deviation and use them to give
    an interpretation of the amount of variability in
    the data using either the empirical rule or
    Chebyshevs rule (page 111).
  • 5 378
  • 6 3677789
  • 7 000023556689
  • 8 01

35
Answer
  • Mean70
  • Sd7.2
  • 17/2470.868
  • 23/2495.895

36
z-score
  • In the above example, we observed that 31 degrees
    is unusually low. When 31 is included in the data
    set, mean68.44, stDev10.53. How low is it? To
    evaluate a single score, we calculate its
    z-score
  • The z-score corresponding to a particular
    observation x is given by
  • z(observation-mean)/standard deviation

37
z-score
  • Negative z-score indicates that the observation
    is below the mean. It is generally assumed that
    any observation with a z-score greater than 3 in
    absolute value is an outlier

38
Exercise
  • We have a data set of ages of 10 students in one
    university.
  • 22 21 27 32 19 20 22 23 18 25
  • Draw the stem-and-leaf plot and histogram
  • Compute the sample mean and 10 trimmed mean
  • Compute the range and Q-spread .
Write a Comment
User Comments (0)
About PowerShow.com