SLIDES PREPARED - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

SLIDES PREPARED

Description:

A measure of location or position for a collection of data values is a number ... The most commonly used measures of location for sample data are the: z-score, ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 63
Provided by: lloy53
Category:
Tags: prepared | slides

less

Transcript and Presenter's Notes

Title: SLIDES PREPARED


1
STATISTICS for the Utterly Confused, 2nd ed.
  • SLIDES PREPARED
  • By
  • Lloyd R. Jaisingh Ph.D.
  • Morehead State University
  • Morehead KY

2
Chapter 4
  • Data Description Numerical Measures of Position
    for Ungrouped Univariate Data

3
Outline
  • Do I Need to Read This Chapter?
  • 4-1 The z-Score or Standard Score
  • 4-2 Percentiles
  • Its a Wrap

4
Objectives
  • Introduction of some basic statistical
    measurements of position.
  • Introduction of some graphical displays to
    explain these measures of position.

5
Introduction
  • A measure of location or position for a
    collection of data values is a number that is
    meant to convey the idea of the relative position
    of a data value in the data set.
  • The most commonly used measures of location for
    sample data are the z-score, and percentiles.

6
4-1 The z-Score
  • Explanation of the term z-score The z-score
    for a sample value in a data set is obtained by
    subtracting the mean of the data set from the
    value and dividing the result by the standard
    deviation of the data set.
  • NOTE When computing the value of the z-score,
    the data values can be population values or
    sample values.
  • Hence we can compute either a population z-score
    or a sample z-score.

7
4-1 The z-Score
  • The Sample z-score for a value x is given by the
    following formula
  • Where is the sample mean and s is the sample
    standard deviation.

8
4-1 The z-Score
  • The Population z-score for a value x is given by
    the following formula
  • Where ? is the population mean and ? is the
    population standard deviation.

9
Quick Tip
  • The z-score is the number of standard deviations
    the data value falls above (positive z-score) or
    below (negative z-score) the mean for the data
    set.

10
Quick Tip
  • The z-score is affected by an outlying value in
    the data set, since the outlier (very small or
    very large value relative to the size of the
    other values in the data set) directly affects
    the value of the mean and the standard deviation.

11
The z-Score -- Example
  • Example What is the z-score for the value of 14
    in the following sample values?
  • 3 8 6 14 4 12 7 10

12
The z-score -- Example (Continued)
  • Solution
  • Thus, the data value of 14 is 1.57 standard
    deviations above the mean of 8, since the z-score
    is positive.

13
The z-Score Why do we use the z-score as a
measure of relative position?
Dot Plot of the data points with the location of
the mean and the data value of 14.
14
The z-score
  • Observe that the distance between the mean of 8
    and the value of 14 is 1.57?s 5.99 ? 6.
  • Observe that if we add the mean of 8 to this
    value of 6, we will get 8 6 14, the data
    value.
  • Thus, this shows that the value of 14 is 1.57
    standard deviations above the mean value of 8.

15
The z-score
  • That is, the z-score gives us an idea of how
    far away the data value is from the mean, and so
    it gives us an idea of the position of the data
    value relative to the mean.

16
The z-Score -- Example
  • Example What is the z-score for the value of 95
    in the following sample values?
  • 96 114 100 97 101 102 99
  • 95 90

17
The z-Score -- Example (Continued)
  • Example First compute the sample mean and
    sample standard deviation. These values are
    respectively 99.3333 6.5955. Verify.
  • Thus, z-score (95
    99.3333)/6.5955 -0.6570 ? -0.66.
  • Thus, the data value of 95 is located 0.66
    standard deviation below the mean value of
    99.3333, since the z-score is negative.

18
4-2 Percentiles
  • Explanation of the term percentiles
    Percentiles are numerical values that divide an
    ordered data set into 100 groups of values with
    at most 1 of the data values in each group.
  • When we discuss percentiles, we generally present
    the discussion through the kth percentile.
  • Let the kth percentile be denoted by Pk.

19
4-2 Percentiles
  • Explanation of the term kth percentile the kth
    percentile for an ordered array of numerical
    data is a numerical value Pk (say) such that at
    most k of the data values are smaller than Pk,
    and at most (100 k) of the data values
    are larger than Pk.
  • The idea of the kth percentile is illustrated on
    the next slide.

20
The kth Percentile
Illustration of the kth percentile.
21
Quick Tip
  • In order for a percentile to be determined, the
    data set first must be ordered from the smallest
    to the largest value.
  • There are 99 percentiles in a data set.

22
Display of the 99th Percentile
Illustration of the 99th percentile.
23
Percentile Corresponding to a Given Data Value
  • The percentile corresponding to a given data
    value, say x, in a set is obtained by using the
    following formula.

24
Percentile Corresponding to a Given Data Value
  • Example The shoe sizes, in whole numbers, for a
    sample of 12 male students in a statistics class
    were as follows 13, 11, 10, 13, 11, 10, 8, 12,
    9, 9, 8, and 9.
  • What is the percentile rank for a shoe size of
    12?

25
Percentile Corresponding to a Given Data Value
  • Solution First, we need to arrange the values
    from smallest to largest.
  • The ordered array is given below 8, 8, 9, 9, 9,
    10, 10, 11, 11, 12, 13, 13.
  • Observe that the number of values below the value
    of 12 is 9.

26
Percentile Corresponding to a Given Data Value
  • Solution (continued) The total number of values
    in the data set is 12.
  • Thus, using the formula, the corresponding
    percentile is

The value of 12 corresponds to approximately
the 79th percentile.
27
Percentile Corresponding to a Given Data Value
  • Example In the previous example, what is the
    percentile rank for a shoe size of 10 ?
  • Recall, the ordered array was 8, 8, 9, 9, 9, 10,
    10, 11, 11, 12, 13, 13.
  • Observe that the number of values below the value
    of 10 is 5.

28
Percentile Corresponding to a Given Data Value
  • Solution (continued) Recall, the total number of
    values in the data set was 12.
  • Thus, using the formula, the corresponding
    percentile is

The value of 10 corresponds to approximately
the 46th percentile.
29
Procedure for Finding a Data Value for a Given
Percentile
  • Assume that we want to determine what data value
    falls at some general percentile Pk.
  • The following steps will enable you to find a
    general percentile Pk for a data set.
  • Step 1 Order the data set from smallest to
    largest.
  • Step 2 Compute the position c of the percentile.
    To compute the value of c, use the following
    formula

30
Procedure for Finding a Data Value for a Given
Percentile
  • Step 3.1 If c is not a whole number, round up to
    the next whole number.
  • Locate this position in the ordered set.
  • The value in this location is the required
    percentile.

31
Procedure for Finding a Data Value for a Given
Percentile
  • Step 3.2 If c is a whole number, find the
    average of the values in the c and c1
    positions in the ordered set.
  • This average value will be the required
    percentile.

32
Percentile Corresponding to a Given Data Value
  • Example The data given below represents the 19
    countries with the largest numbers of total
    Olympic medals excluding the United States,
    which had 101 medals for the 1996 Atlanta
    games. Find the 65th percentile for the data set.
  • 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17,
    20, 19, 22, 15, 15, 15, 15.

33
Percentile Corresponding to a Given Data Value
  • Solution First, we need to arrange the data set
    in order. The ordered set is .
  • 15, 15, 15, 15, 17, 17, 19, 20, 21, 22, 23, 25,
    27, 35, 37, 41, 50, 63, 65.
  • Next, compute the position of the percentile.
  • Here n 19, k 65.
  • Thus, c (19 ? 65)/100 12.35.
  • We need to round up to a value 13.

34
Percentile Corresponding to a Given Data Value
  • Solution (continued) Thus, the 13th value in the
    ordered data set will correspond to the 65th
    percentile.
  • That is P65 27.
  • Question Why does a percentile measure relative
    position?

35
Question Why does a percentile measure Relative
Position?

Display of the 65th Percentile along with the
data values.
36
Question Why does a percentile measure Relative
Position?
  • Referring to the diagram on the previous
    page,observe that the value of 27 is such that at
    most 65 of the data values are smaller than 27
    and at most 35 of the values are larger than 27.
  • This shows that the percentile value of 27 is a
    measure of location.
  • Thus, the percentile gives us an idea of the
    relative position of a value in an ordered data
    set.

37
Percentile Corresponding to a Given Data Value
  • Example Find the 25th percentile for the
    following data set
  • 6, 12, 18, 12, 13, 8, 13, 11, 10, 16, 13, 11, 10,
    10, 2, 14.
  • Solution First, we need to arrange the data set
    in order. The ordered set is
  • 2, 6, 8, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13,
    14, 16, 18.

38
Percentile Corresponding to a Given Data Value
  • Solution (continued)
  • Next, compute the position of the percentile.
  • Here n 16, k 25.
  • Thus, c (16 ? 25)/100 4.0.
  • Thus, the 25th percentile will be the average of
    the values located at the 4th and 5th positions
    in the ordered set.
  • Thus, P25 (10 100/2 10.

39
Special Percentiles Deciles and Quartiles
  • Deciles and quartiles are special percentiles.
  • Deciles divide an ordered data set into 10 equal
    parts.
  • Quartiles divide the ordered data set into 4
    equal parts.
  • We usually denote the deciles by D1, D2, D3, ,
    D9.
  • We usually denote the quartiles by Q1, Q2, and
    Q3.

40
Deciles
  • Nine deciles.
  • At most 10 of the values are in each group.

41
Quartiles
  • Three quartiles.
  • At most 25 of the values are in each group.

42
Quick Tip
  • There are 9 deciles and 3 quartiles.
  • Q1 first quartile P25
  • Q2 second quartile P50
  • Q3 third quartile P75
  • D1 first decile P10
  • D2 second decile P20 . . .
  • D9 ninth decile P90

43
Quick Tip
  • P50 D5 Q2 median
  • i.e. the 50th percentile, the 5th decile, and the
    2nd quartile, and the median are all equal to one
    another.
  • Finding deciles and quartiles are equivalent
    equivalent to finding the equivalent percentiles.

44
OUTLIERS
  • Recall that an outlier is an extremely small or
    extremely large data value when compared with the
    rest of the data values.
  • The following procedure allows us to check
    whether a data value can be considered as an
    outlier.

45
Procedure to Check for OUTLIERS
  • The following steps will allow us to check
    whether a given value in a data set can be
    classified as an outlier.
  • Step 1 Arrange the data in order from smallest
    to largest.
  • Step 2 Determine the first quartile Q1 and the
    third quartile Q3. (Recall Q1 P25 and Q3 P75.

46
Procedure to Check for OUTLIERS
  • Step 3 Find the interquartile range (IQR). IQR
    Q3 Q1.
  • Step 4 Compute (Q1 1.5?IQR) and (Q3 1.5?IQR).

47
Procedure to Check for OUTLIERS
  • Step 5 Let x be the data value that is being
    checked to determine whether it is an outlier.
  • (a) If the value of x is smaller than (Q1
    1.5?IQR), then x is classified as an outlier.
  • (b) If the value of x is larger than (Q3
    1.5?IQR), then x is classified as an x is an
    outlier.

48
Procedure to Check for OUTLIERS
49
  • Example The data below represent the 20
    countries with the largest number of total
    Olympic medals, including the United States,
    which had 101 medals for the 1996 Atlanta games.
    Determine whether the number of medals won by the
    United States is an outlier relative to the
    numbers for the other countries.
  • The data is given on the next slide.

50
  • Example (continued) Data values 63, 65, 50,
    37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22,
    15, 15, 15, 15, 101.
  • Solution First, we need to arrange the data set
    in order. The ordered set is 15 15 15 15
    17 17 19 20 21 22 23 25 27 35 37 41
    50 63 65 101.
  • Next we need to determine the first and third
    quartiles.
  • Verify that Q1 P25 17 and Q3 P75 39.

51
  • Example (continued) Thus the IQR 39 17 22.
  • Now, Q1 1.5?IQR 17 (1.5?22) -16.
  • and, Q3 1.5?IQR 39 (1.5?22) 72.
  • Since, 101 gt 72, the value of 101 is an outlier
    relative to the rest of the values in the data
    set (based on the procedure presented here).
  • That is, the number of medals won by the United
    States is an outlier relative to the numbers won
    by the other 19 countries for the 1996 Atlanta
    Olympic Games.

52
Pictorial Representation for the OUTLIER of the
Number of Olympic Medals Won by the United States
in 1996 Atlanta Games.
OUTLIER
101
-16
72
53
BOX PLOTS
  • Explanation of the term box plot A box plot is
    a graphical display that involves a five-number
    summary of a distribution of values, consisting
    of the minimum value, the first quartile, the
    median, the third quartile, and the maximum value.

54
BOX PLOTS
  • A horizontal box-plot is constructed by drawing a
    box between the quartiles Q1 and Q3.
  • Horizontal lines are then drawn from the middle
    of the sides of the box to the minimum and
    maximum values.

55
BOX PLOTS
  • These horizontal lines are called whiskers.
  • A vertical line inside the box marks the median.
  • Outliers are usually indicated by a dot or an
    asterisk.

56
Example of a Box Plot for the Olympic (1996)
Medal Count Data
57
Information That Can Be Obtained From a Box Plot
58
Information That Can Be Obtained From a Box Plot
Looking at the Median
  • If the median is close to the center of the box,
    the distribution of the data values will be
    approximately symmetrical.
  • If the median is to the left of the center of the
    box, the distribution of the data values will be
    positively skewed.
  • If the median is to the right of the center of
    the box, the distribution of the data values will
    be negatively skewed.

59
Information That Can Be Obtained From a Box Plot
Looking at the Length of the Whiskers
  • If the whiskers are approximately the same
    length, the distribution of the data values will
    be approximately symmetrical.
  • If the right whisker is longer than the left
    whisker, the distribution of the data values will
    be positively skewed.
  • If the left whisker is longer than the right
    whisker, the distribution of the data values will
    be negatively skewed.

60
Box Plot Displaying Positive Skewness
61
Box Plot Displaying a Symmetrical Distribution
62
Box Plot Displaying a Negative Skewness
Write a Comment
User Comments (0)
About PowerShow.com