Title: Chapter 3 - Describing Data Measures of Central Tendency
1Ch 3, Descriptive Statistics Numerical Measures
2Chapter Topics
- Measures of Location
- Mean, Median, Mode, Percentiles and Quartiles
- Measures of Variability
- Range, Interquartile Range, Variance, Standard
Deviation, Coefficient of Variation - Shape, Relative Location, and Detecting Outliers
- Symmetric or Skewed
- Z-scores, Chebychev, Empirical Rule, Detecting
Outliers
3How would you describe this distribution to
someone without using a picture?
4The Mean, Median and Mode are Measures of Central
Location
5Half of the observations are above the Median and
half are below it.
Median 80
6The Mode is the most popular value.
7Normal Distribution
- The Mean, Median and Mode are all equal.
- Normal Distribution
- Bell Shaped and Symmetrical
8Is the mean useful in describing this data?
9How useful is the mean? Does it make a
difference if the data (grades) are bunched up or
spread out?
10Measures of Variability
- Range
- Variance
- Standard Deviation
- Coefficient of Variation
- Interquartile Range
11Measure of Shape
- Symmetry and Skewness
- Pearsons
- Coefficient of Skewness
12Mean (Arithmetic Mean)
- Mean (arithmetic mean) of data values
- Sample mean
- Population mean
Sample Size
Population Size
13Mean (Arithmetic Mean)
(continued)
- The most common measure of central tendency
- Affected by extreme values (outliers)
- The Mean is pulled in the direction of skewness
or toward the outlier(s)
14Review Fall 2006
- If I ask you to Describe the Data, what 4 major
categories will you use? - What do the following variables represent?
- N, n,
- Normally distributed date is ______and_____.
- If data is normally distributed, what is the
relationship between the mean, median, and mode? - What is one major drawback of the mean?
- Give examples of numerical and categorical data.
15Bonus for ALL 5 Executives (000).
- 14,15,17,16,15
- Population Mean
-
- Value of an observation
-
- Number of observations in population
- Sum the values of X
16Mean Bonus for ALL 5 Executives
17Mean Bonus for ALL 5 Executives
18Bonus For a Sample of 5 Executives
- Bonus in (000) 14, 15,17, 16, 15
19Mean Bonus For a Sample of 5 Execs
Bonus in (000) 14, 15,17, 16, 15
20Is The Mean Representative of the Center of the
Data?
14, 15, 15, 16, 17
15.4
21How Sensitive is the Mean?
- Suppose there were six executives in the sample
with the following bonus - 14, 15, 17, 16, 15, 43
- What is the sample mean now?
- Is the computed mean representative of the data?
Why?
22How Sensitive is the Mean?
23Is The Mean Representative of the Center of the
Data?
14, 15, 15, 16, 17, 43
20
24Median
- Robust measure of central tendency
- Not affected by extreme values
-
-
- In an ordered array, the median is the middle
number - If n or N is odd, the median is the middle number
- If n or N is even, the median is the average of
the two middle numbers
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
25The Mode
- A Measure of Central Tendency
- Value that Occurs Most Often
- Not Affected by Extreme Values
- There May Not be a Mode
- There May be Several Modes
- Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
26Years of Service Sample of 6 Employees
- 16, 12, 8, 15, 8, 23
- Compute the mean, median, and mode
- Which of these measures is most representative of
the data? - Why?
27Years of Service Mean
28Years of Service cont...
8, 8, 12, 15, 16, 23
Mean 13.7
Median 13.5
Mode 8
Dont use the Mode for ungrouped data. It is not
reliable. It is only by chance that it will be
representative.
29- Show Doctor,s Pay Article
30Review
- What are three measures of central tendency?
Define them. - Which measure is least useful? Why?
- Which measure is most affected by outliers? An
outlier is an extreme value. - When are measures of central tendency not
adequate, by themselves, to describe the data? - When are all of the measures the same?
31Percentiles
- A percentile provides information about how
the - data are spread over the interval from the
smallest - value to the largest value.
- Admission test scores for colleges and
universities - are frequently reported in terms of
percentiles.
32Percentiles
- The pth percentile of a data set is a value such
that at least p percent of the items take on this
value or less and at least (100 - p) percent of
the items take on this value or more.
33Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth
percentile.
i (p/100)n
If i is not an integer, round up. The p th
percentile is the value in the i th position.
If i is an integer, the p th percentile is the
average of the values in positions i and i 1.
3480th Percentile
i (p/100)n (80/100)70 56
Averaging the 56th and 57th data values
80th Percentile (535 549)/2 542
Note Data is in ascending order.
3580th Percentile
At least 20 of the items take on a value of
542 or more.
At least 80 of the items take on a value
of 542 or less.
56/70 .8 or 80
14/70 .2 or 20
3680th Percentile
- Using Excels Percentile Function
The formula Excel uses to compute the location
(Lp) of the pth percentile is
Excel would compute the location of the 80th
percentile for the apartment rent data as
follows
L80 (80/100)70 (1 80/100) 56 .2 56.2
The 80th percentile would be
535 .2(549 - 535) 535 2.8 537.8
3780th Percentile
80th percentile
Use the Insert Function
It is not necessary to put the data in
ascending order.
Note Rows 7-71 are not shown.
38Quartiles
- Quartiles are specific percentiles.
- First Quartile 25th Percentile
- Second Quartile 50th Percentile Median
- Third Quartile 75th Percentile
39Third Quartile
- Using Excels Quartile Function
Excel computes the locations of the 1st, 2nd, and
3rd quartiles by first converting the quartiles
to percentiles and then using the following
formula to compute the location (Lp) of the pth
percentile
Excel would compute the location of the 3rd
quartile (75th percentile) for the rent data as
follows
L75 (75/100)70 (1 75/100) 52.5 .25
52.75
The 3rd quartile would be
515 .75(525 - 515) 515 7.5 522.5
40Quartiles
25
25
25
25
Q1
Q2
Q3
The position of the quartile is i (p/100)n
41A sample of 30 light trucks using diesel fuel
revealed these mileage's per gallon of fuel used.
42Frequency Distribution
43Compute the mean, median, mode, and quartiles for
the diesel truck fuel mileage.
- Mean 18.1 mpg
- Median 18.0 mpg
- Mode 17.0 mpg and 20 mpg
- First Quartile 16 mpg
- Third Quartile 20 mpg
i (p/100)n
See the next slide.
44i (p/100)n
Position of first quartile i(25/100) 30 7.5
Round up to the 8th value in the array Value of
the 8th value is the First quartile, or 16
mpg Position of the third quartile i(75/100)30
22.5 Round up to the 23rd value in the
array Value of the 23rd value is the third
quartile, or 20 mpg See page 96 for rules.
45Interpretation of the Mean- truck mileage data
- The average mileage was 18.1 mpg.
46Interpretation of the Median- truck mileage data
- Half the trucks got more than 18 miles per
gallon, and half got less than that amount.
47Interpretation of the Mode- truck mileage data
- The Mode is the value that appears most
frequently or you might use the the mid-point of
the modal class. - The modal class is the class with the highest
frequency the 16 to 18 mpg class. - There are two modes, 17 mpg. 20 mpg.
- Doesnt it make sense to talk in terms of the
modal class as opposed to the mode?
48Interpretation of Quartiles-truck mileage data
- 25 of the trucks got less than 16 mpg.
- 25 of the trucks got more than 20 mpg.
- 75 of the trucks got less than 20 mpg
- 50 of the trucks got between 16 and 20 mpg.
49Shape of Truck Mileage Data
50Shape of Truck Mileage Data
18
51(No Transcript)
52Measures of Variation
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Population Variance
Population Standard Deviation
Sample Variance
Sample Standard Deviation
Interquartile Range
53Range
- Measure of variation
- Difference between the largest and the smallest
observations - Ignores the way in which data are distributed
Range 12 - 7 5
Range 12 - 7 5
7 8 9 10 11 12
7 8 9 10 11 12
54Interquartile Range
- Difference between the first and third quartiles
- Spread in the middle 50
- Not affected by extreme values
Data in Ordered Array 11 12 13 16 16
17 17 18 21
55Variance
- Important measure of variation
- Shows variation about the mean
- Sample variance
- Population variance
56Standard Deviation
- Most important measure of variation
- Shows variation about the mean
- Has the same units as the original data
- Sample standard deviation
- Population standard deviation
57Comparing Standard Deviations
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
58Coefficient of Variation
- Measures relative variation
- Always in percentage ()
- Shows variation relative to mean
- Is used to compare two or more sets of data
measured in different units or sets of data -
with the same units and different means.
59Comparing Coefficient of Variation
- Stock A
- Average price last year 50
- Standard deviation 5
- Stock B
- Average price last year 100
- Standard deviation 5
- Coefficient of variation
- Stock A
- Stock B
60Class Room Exercises
- A sample of five recent accounting graduates
revealed the following starting salaries (000). - 17, 26, 18, 20, 19
- Compute the range, variance and standard
deviation - Write a paragraph in which you describe the data
by interpreting each of the statistics you have
computed.
61Range Highest value - Lowest Value
62Range
- Arrayed data
- 17, 18, 19, 20, 26
- Range 26-17 9 thousand
63Variance
64Variance
65Variance
66Variance
67Standard Deviation
- The standard deviation is the square root of the
variance
68Description of the Data
- The salaries of a sample of 5 recent accounting
graduates varied from 17,000 to 26,000, a range
of 9,000.
69Description of the Data Continued...
- The variance of 12,500 (2) is not useful in
describing the data. - Another average deviation from the mean is the
standard deviation of 3,400. This is not very
useful to usYET!!
70Shape of a Distribution
- Describes how data is distributed
- Measures of shape
- Symmetric or skewed
Right-Skewed
Left-Skewed
Symmetric
Mean lt Median lt Mode
Mean Median Mode
Mode lt Median lt Mean
71Chapter 3 Measures of Distribution Shape,
Relative Location, and Detecting Outliers
72Skewness
- Excel will compute a numerical value for
skewness. You will have to develop a feeling for
what numerical value indicates if a distribution
is moderately or heavily skewed. - Excels SKEW function can be used to compute the
skewness of a data set.
73Distribution Shape Skewness
- Symmetric (not skewed)
- Skewness is zero.
- Mean and median are equal.
Skewness 0
Relative Frequency
74Distribution Shape Skewness
- Moderately Skewed Left
- Skewness is negative.
- Mean will usually be less than the median.
Skewness - .31
75Distribution Shape Skewness
- Moderately Skewed Right
- Skewness is positive.
- Mean will usually be more than the median.
Skewness .31
76Distribution Shape Skewness
- Highly Skewed Right
- Skewness is positive (often above 1.0).
- Mean will usually be more than the median.
Skewness 1.25
77z-Scores
The z-score is often called the standardized
value.
It denotes the number of standard deviations a
data value xi is from the mean.
78z-Scores
- An observations z-score is a measure of the
relative - location of the observation in a data set.
- A data value less than the sample mean will
have a - z-score less than zero.
- A data value greater than the sample mean will
have - a z-score greater than zero.
- A data value equal to the sample mean will
have a - z-score of zero.
79Empirical or Normal Rule
- For a symmetrical, bell-shaped frequency
distribution - 68, 95, and 99.7 of the observations will lie
within plus and minus one, two, and three
standard deviations of the mean, respectively.
80Empirical Rule
x
m
m 3s
m 3s
m 1s
m 1s
m 2s
m 2s
81Students Grades Through Fall 93
ARRAYED DATA
82Students Grades Through Fall 93
Mean 79.3 Standard Dev. 9.7
83Empirical or Normal Rule
- 79.3 or - 9.7
- Between 69.6 89
- of grades
- 79.3 or- 2(9.7)
- Between 59.9 98.7
- of grades
- 79.3 or - 3(9.7)
- Between 50.2 108.4.
84Students Grades Through Fall 93
ARRAYED DATA
85Empirical or Normal Rule
- 79.3 or - 9.7
- Between 69.6 89
- of grades 106 out of 148 or about 72
- 79.3 or- 2(9.7)
- Between 59.9 98.7
- of grades 142 out of 148 or about 96
- 79.3 or - 3(9.7)
- Between 50.2 108.4 100 of the grades.
86Detecting Outliers
- An outlier is an unusually small or unusually
large - value in a data set.
- A data value with a z-score less than -3 or
greater - than 3 might be considered an outlier.
- It might be
- an incorrectly recorded data value
- a data value that was incorrectly included in
the - data set
- a correctly recorded data value that belongs in
- the data set
87Summary of Chapter Topics
- Measures of Central Tendency
- Mean, Median, Mode
- Quartile
- Measures of Variation
- The Range, Interquartile Range, Variance
and - Standard Deviation, Coefficient of variation
- Shape
- Symmetric, Skewed
88Summary of Chapter Topics cont.
- Empirical rule
- Pitfalls in numerical descriptive measures and
ethical considerations