Title: SLIDES PREPARED
1STATISTICS for the Utterly Confused, 2nd ed.
- SLIDES PREPARED
- By
- Lloyd R. Jaisingh Ph.D.
- Morehead State University
- Morehead KY
2Chapter 4
- Data Description Numerical Measures of Position
for Ungrouped Univariate Data
3Outline
- Do I Need to Read This Chapter?
- 4-1 The z-Score or Standard Score
- 4-2 Percentiles
- Its a Wrap
4Objectives
- Introduction of some basic statistical
measurements of position. - Introduction of some graphical displays to
explain these measures of position.
5Introduction
- A measure of location or position for a
collection of data values is a number that is
meant to convey the idea of the relative position
of a data value in the data set. - The most commonly used measures of location for
sample data are the z-score, and percentiles.
64-1 The z-Score
- Explanation of the term z-score The z-score
for a sample value in a data set is obtained by
subtracting the mean of the data set from the
value and dividing the result by the standard
deviation of the data set. - NOTE When computing the value of the z-score,
the data values can be population values or
sample values. - Hence we can compute either a population z-score
or a sample z-score.
74-1 The z-Score
- The Sample z-score for a value x is given by the
following formula - Where is the sample mean and s is the sample
standard deviation.
84-1 The z-Score
- The Population z-score for a value x is given by
the following formula - Where ? is the population mean and ? is the
population standard deviation.
9Quick Tip
- The z-score is the number of standard deviations
the data value falls above (positive z-score) or
below (negative z-score) the mean for the data
set. -
10Quick Tip
- The z-score is affected by an outlying value in
the data set, since the outlier (very small or
very large value relative to the size of the
other values in the data set) directly affects
the value of the mean and the standard deviation. -
11The z-Score -- Example
- Example What is the z-score for the value of 14
in the following sample values? - 3 8 6 14 4 12 7 10
-
12The z-score -- Example (Continued)
- Solution
- Thus, the data value of 14 is 1.57 standard
deviations above the mean of 8, since the z-score
is positive.
13The z-Score Why do we use the z-score as a
measure of relative position?
Dot Plot of the data points with the location of
the mean and the data value of 14.
14The z-score
- Observe that the distance between the mean of 8
and the value of 14 is 1.57?s 5.99 ? 6. - Observe that if we add the mean of 8 to this
value of 6, we will get 8 6 14, the data
value. - Thus, this shows that the value of 14 is 1.57
standard deviations above the mean value of 8.
15The z-score
- That is, the z-score gives us an idea of how
far away the data value is from the mean, and so
it gives us an idea of the position of the data
value relative to the mean.
16The z-Score -- Example
- Example What is the z-score for the value of 95
in the following sample values? - 96 114 100 97 101 102 99
- 95 90
-
17The z-Score -- Example (Continued)
- Example First compute the sample mean and
sample standard deviation. These values are
respectively 99.3333 6.5955. Verify. - Thus, z-score (95
99.3333)/6.5955 -0.6570 ? -0.66. - Thus, the data value of 95 is located 0.66
standard deviation below the mean value of
99.3333, since the z-score is negative.
184-2 Percentiles
- Explanation of the term percentiles
Percentiles are numerical values that divide an
ordered data set into 100 groups of values with
at most 1 of the data values in each group. - When we discuss percentiles, we generally present
the discussion through the kth percentile. - Let the kth percentile be denoted by Pk.
194-2 Percentiles
- Explanation of the term kth percentile the kth
percentile for an ordered array of numerical
data is a numerical value Pk (say) such that at
most k of the data values are smaller than Pk,
and at most (100 k) of the data values
are larger than Pk. - The idea of the kth percentile is illustrated on
the next slide.
20The kth Percentile
Illustration of the kth percentile.
21Quick Tip
- In order for a percentile to be determined, the
data set first must be ordered from the smallest
to the largest value. - There are 99 percentiles in a data set.
-
22Display of the 99th Percentile
Illustration of the 99th percentile.
23Percentile Corresponding to a Given Data Value
- The percentile corresponding to a given data
value, say x, in a set is obtained by using the
following formula.
24Percentile Corresponding to a Given Data Value
- Example The shoe sizes, in whole numbers, for a
sample of 12 male students in a statistics class
were as follows 13, 11, 10, 13, 11, 10, 8, 12,
9, 9, 8, and 9. - What is the percentile rank for a shoe size of
12? -
25Percentile Corresponding to a Given Data Value
- Solution First, we need to arrange the values
from smallest to largest. - The ordered array is given below 8, 8, 9, 9, 9,
10, 10, 11, 11, 12, 13, 13. - Observe that the number of values below the value
of 12 is 9. -
26Percentile Corresponding to a Given Data Value
- Solution (continued) The total number of values
in the data set is 12. - Thus, using the formula, the corresponding
percentile is
The value of 12 corresponds to approximately
the 79th percentile.
27Percentile Corresponding to a Given Data Value
- Example In the previous example, what is the
percentile rank for a shoe size of 10 ? - Recall, the ordered array was 8, 8, 9, 9, 9, 10,
10, 11, 11, 12, 13, 13. - Observe that the number of values below the value
of 10 is 5. -
28Percentile Corresponding to a Given Data Value
- Solution (continued) Recall, the total number of
values in the data set was 12. - Thus, using the formula, the corresponding
percentile is
The value of 10 corresponds to approximately
the 46th percentile.
29Procedure for Finding a Data Value for a Given
Percentile
- Assume that we want to determine what data value
falls at some general percentile Pk. - The following steps will enable you to find a
general percentile Pk for a data set. - Step 1 Order the data set from smallest to
largest. - Step 2 Compute the position c of the percentile.
To compute the value of c, use the following
formula
30Procedure for Finding a Data Value for a Given
Percentile
- Step 3.1 If c is not a whole number, round up to
the next whole number. - Locate this position in the ordered set.
- The value in this location is the required
percentile.
31Procedure for Finding a Data Value for a Given
Percentile
- Step 3.2 If c is a whole number, find the
average of the values in the c and c1
positions in the ordered set. - This average value will be the required
percentile.
32Percentile Corresponding to a Given Data Value
- Example The data given below represents the 19
countries with the largest numbers of total
Olympic medals excluding the United States,
which had 101 medals for the 1996 Atlanta
games. Find the 65th percentile for the data set. - 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17,
20, 19, 22, 15, 15, 15, 15. -
33Percentile Corresponding to a Given Data Value
- Solution First, we need to arrange the data set
in order. The ordered set is . - 15, 15, 15, 15, 17, 17, 19, 20, 21, 22, 23, 25,
27, 35, 37, 41, 50, 63, 65. - Next, compute the position of the percentile.
- Here n 19, k 65.
- Thus, c (19 ? 65)/100 12.35.
- We need to round up to a value 13.
-
34Percentile Corresponding to a Given Data Value
- Solution (continued) Thus, the 13th value in the
ordered data set will correspond to the 65th
percentile. - That is P65 27.
- Question Why does a percentile measure relative
position? -
35Question Why does a percentile measure Relative
Position?
Display of the 65th Percentile along with the
data values.
36Question Why does a percentile measure Relative
Position?
- Referring to the diagram on the previous
page,observe that the value of 27 is such that at
most 65 of the data values are smaller than 27
and at most 35 of the values are larger than 27. - This shows that the percentile value of 27 is a
measure of location. - Thus, the percentile gives us an idea of the
relative position of a value in an ordered data
set. -
37Percentile Corresponding to a Given Data Value
- Example Find the 25th percentile for the
following data set - 6, 12, 18, 12, 13, 8, 13, 11, 10, 16, 13, 11, 10,
10, 2, 14. - Solution First, we need to arrange the data set
in order. The ordered set is - 2, 6, 8, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13,
14, 16, 18. -
38Percentile Corresponding to a Given Data Value
- Solution (continued)
- Next, compute the position of the percentile.
- Here n 16, k 25.
- Thus, c (16 ? 25)/100 4.0.
- Thus, the 25th percentile will be the average of
the values located at the 4th and 5th positions
in the ordered set. - Thus, P25 (10 100/2 10.
-
39Special Percentiles Deciles and Quartiles
- Deciles and quartiles are special percentiles.
- Deciles divide an ordered data set into 10 equal
parts. - Quartiles divide the ordered data set into 4
equal parts. - We usually denote the deciles by D1, D2, D3, ,
D9. - We usually denote the quartiles by Q1, Q2, and
Q3.
40Deciles
- Nine deciles.
- At most 10 of the values are in each group.
41Quartiles
- Three quartiles.
- At most 25 of the values are in each group.
42Quick Tip
- There are 9 deciles and 3 quartiles.
- Q1 first quartile P25
- Q2 second quartile P50
- Q3 third quartile P75
- D1 first decile P10
- D2 second decile P20 . . .
- D9 ninth decile P90
43Quick Tip
- P50 D5 Q2 median
- i.e. the 50th percentile, the 5th decile, and the
2nd quartile, and the median are all equal to one
another. - Finding deciles and quartiles are equivalent
equivalent to finding the equivalent percentiles.
44OUTLIERS
- Recall that an outlier is an extremely small or
extremely large data value when compared with the
rest of the data values. - The following procedure allows us to check
whether a data value can be considered as an
outlier. -
45Procedure to Check for OUTLIERS
- The following steps will allow us to check
whether a given value in a data set can be
classified as an outlier. - Step 1 Arrange the data in order from smallest
to largest. - Step 2 Determine the first quartile Q1 and the
third quartile Q3. (Recall Q1 P25 and Q3 P75.
46Procedure to Check for OUTLIERS
- Step 3 Find the interquartile range (IQR). IQR
Q3 Q1. - Step 4 Compute (Q1 1.5?IQR) and (Q3 1.5?IQR).
47Procedure to Check for OUTLIERS
- Step 5 Let x be the data value that is being
checked to determine whether it is an outlier. - (a) If the value of x is smaller than (Q1
1.5?IQR), then x is classified as an outlier. - (b) If the value of x is larger than (Q3
1.5?IQR), then x is classified as an x is an
outlier.
48Procedure to Check for OUTLIERS
49- Example The data below represent the 20
countries with the largest number of total
Olympic medals, including the United States,
which had 101 medals for the 1996 Atlanta games.
Determine whether the number of medals won by the
United States is an outlier relative to the
numbers for the other countries. - The data is given on the next slide.
-
50- Example (continued) Data values 63, 65, 50,
37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22,
15, 15, 15, 15, 101. - Solution First, we need to arrange the data set
in order. The ordered set is 15 15 15 15
17 17 19 20 21 22 23 25 27 35 37 41
50 63 65 101. - Next we need to determine the first and third
quartiles. - Verify that Q1 P25 17 and Q3 P75 39.
51- Example (continued) Thus the IQR 39 17 22.
- Now, Q1 1.5?IQR 17 (1.5?22) -16.
- and, Q3 1.5?IQR 39 (1.5?22) 72.
- Since, 101 gt 72, the value of 101 is an outlier
relative to the rest of the values in the data
set (based on the procedure presented here). - That is, the number of medals won by the United
States is an outlier relative to the numbers won
by the other 19 countries for the 1996 Atlanta
Olympic Games.
52Pictorial Representation for the OUTLIER of the
Number of Olympic Medals Won by the United States
in 1996 Atlanta Games.
OUTLIER
101
-16
72
53BOX PLOTS
- Explanation of the term box plot A box plot is
a graphical display that involves a five-number
summary of a distribution of values, consisting
of the minimum value, the first quartile, the
median, the third quartile, and the maximum value.
54BOX PLOTS
- A horizontal box-plot is constructed by drawing a
box between the quartiles Q1 and Q3. - Horizontal lines are then drawn from the middle
of the sides of the box to the minimum and
maximum values.
55BOX PLOTS
- These horizontal lines are called whiskers.
- A vertical line inside the box marks the median.
- Outliers are usually indicated by a dot or an
asterisk.
56Example of a Box Plot for the Olympic (1996)
Medal Count Data
57Information That Can Be Obtained From a Box Plot
58Information That Can Be Obtained From a Box Plot
Looking at the Median
- If the median is close to the center of the box,
the distribution of the data values will be
approximately symmetrical. - If the median is to the left of the center of the
box, the distribution of the data values will be
positively skewed. - If the median is to the right of the center of
the box, the distribution of the data values will
be negatively skewed.
59Information That Can Be Obtained From a Box Plot
Looking at the Length of the Whiskers
- If the whiskers are approximately the same
length, the distribution of the data values will
be approximately symmetrical. - If the right whisker is longer than the left
whisker, the distribution of the data values will
be positively skewed. - If the left whisker is longer than the right
whisker, the distribution of the data values will
be negatively skewed.
60Box Plot Displaying Positive Skewness
61Box Plot Displaying a Symmetrical Distribution
62Box Plot Displaying a Negative Skewness