Chapter 3 - Describing Data Measures of Central Tendency - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Chapter 3 - Describing Data Measures of Central Tendency

Description:

– PowerPoint PPT presentation

Number of Views:426
Avg rating:3.0/5.0
Slides: 89
Provided by: MCMar2
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 - Describing Data Measures of Central Tendency


1
Ch 3, Descriptive Statistics Numerical Measures
2
Chapter Topics
  • Measures of Location
  • Mean, Median, Mode, Percentiles and Quartiles
  • Measures of Variability
  • Range, Interquartile Range, Variance, Standard
    Deviation, Coefficient of Variation
  • Shape, Relative Location, and Detecting Outliers
  • Symmetric or Skewed
  • Z-scores, Chebychev, Empirical Rule, Detecting
    Outliers

3
How would you describe this distribution to
someone without using a picture?
4
The Mean, Median and Mode are Measures of Central
Location
5
Half of the observations are above the Median and
half are below it.
Median 80
6
The Mode is the most popular value.
7
Normal Distribution
  • The Mean, Median and Mode are all equal.
  • Normal Distribution
  • Bell Shaped and Symmetrical

8
Is the mean useful in describing this data?
9
How useful is the mean? Does it make a
difference if the data (grades) are bunched up or
spread out?
10
Measures of Variability
  • Range
  • Variance
  • Standard Deviation
  • Coefficient of Variation
  • Interquartile Range

11
Measure of Shape
  • Symmetry and Skewness
  • Pearsons
  • Coefficient of Skewness

12
Mean (Arithmetic Mean)
  • Mean (arithmetic mean) of data values
  • Sample mean
  • Population mean

Sample Size
Population Size
13
Mean (Arithmetic Mean)
(continued)
  • The most common measure of central tendency
  • Affected by extreme values (outliers)
  • The Mean is pulled in the direction of skewness
    or toward the outlier(s)

14
Review Fall 2006
  • If I ask you to Describe the Data, what 4 major
    categories will you use?
  • What do the following variables represent?
  • N, n,
  • Normally distributed date is ______and_____.
  • If data is normally distributed, what is the
    relationship between the mean, median, and mode?
  • What is one major drawback of the mean?
  • Give examples of numerical and categorical data.

15
Bonus for ALL 5 Executives (000).
  • 14,15,17,16,15
  • Population Mean
  • Value of an observation
  • Number of observations in population
  • Sum the values of X

16
Mean Bonus for ALL 5 Executives
17
Mean Bonus for ALL 5 Executives
18
Bonus For a Sample of 5 Executives
  • Bonus in (000) 14, 15,17, 16, 15

19
Mean Bonus For a Sample of 5 Execs
Bonus in (000) 14, 15,17, 16, 15
20
Is The Mean Representative of the Center of the
Data?
14, 15, 15, 16, 17
15.4
21
How Sensitive is the Mean?
  • Suppose there were six executives in the sample
    with the following bonus
  • 14, 15, 17, 16, 15, 43
  • What is the sample mean now?
  • Is the computed mean representative of the data?
    Why?

22
How Sensitive is the Mean?
23
Is The Mean Representative of the Center of the
Data?
14, 15, 15, 16, 17, 43
20
24
Median
  • Robust measure of central tendency
  • Not affected by extreme values
  • In an ordered array, the median is the middle
    number
  • If n or N is odd, the median is the middle number
  • If n or N is even, the median is the average of
    the two middle numbers

0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10 12
14
Median 5
Median 5
25
The Mode
  • A Measure of Central Tendency
  • Value that Occurs Most Often
  • Not Affected by Extreme Values
  • There May Not be a Mode
  • There May be Several Modes
  • Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14
No Mode
Mode 9
26
Years of Service Sample of 6 Employees
  • 16, 12, 8, 15, 8, 23
  • Compute the mean, median, and mode
  • Which of these measures is most representative of
    the data?
  • Why?

27
Years of Service Mean
28
Years of Service cont...
8, 8, 12, 15, 16, 23
Mean 13.7
Median 13.5
Mode 8
Dont use the Mode for ungrouped data. It is not
reliable. It is only by chance that it will be
representative.
29
  • Show Doctor,s Pay Article

30
Review
  • What are three measures of central tendency?
    Define them.
  • Which measure is least useful? Why?
  • Which measure is most affected by outliers? An
    outlier is an extreme value.
  • When are measures of central tendency not
    adequate, by themselves, to describe the data?
  • When are all of the measures the same?

31
Percentiles
  • A percentile provides information about how
    the
  • data are spread over the interval from the
    smallest
  • value to the largest value.
  • Admission test scores for colleges and
    universities
  • are frequently reported in terms of
    percentiles.

32
Percentiles
  • The pth percentile of a data set is a value such
    that at least p percent of the items take on this
    value or less and at least (100 - p) percent of
    the items take on this value or more.

33
Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth
percentile.
i (p/100)n
If i is not an integer, round up. The p th
percentile is the value in the i th position.
If i is an integer, the p th percentile is the
average of the values in positions i and i 1.
34
80th Percentile
i (p/100)n (80/100)70 56
Averaging the 56th and 57th data values
80th Percentile (535 549)/2 542
Note Data is in ascending order.
35
80th Percentile
At least 20 of the items take on a value of
542 or more.
At least 80 of the items take on a value
of 542 or less.
56/70 .8 or 80
14/70 .2 or 20
36
80th Percentile
  • Using Excels Percentile Function

The formula Excel uses to compute the location
(Lp) of the pth percentile is
Excel would compute the location of the 80th
percentile for the apartment rent data as
follows
L80 (80/100)70 (1 80/100) 56 .2 56.2
The 80th percentile would be
535 .2(549 - 535) 535 2.8 537.8
37
80th Percentile
80th percentile
  • Excel Formula Worksheet

Use the Insert Function
It is not necessary to put the data in
ascending order.
Note Rows 7-71 are not shown.
38
Quartiles
  • Quartiles are specific percentiles.
  • First Quartile 25th Percentile
  • Second Quartile 50th Percentile Median
  • Third Quartile 75th Percentile

39
Third Quartile
  • Using Excels Quartile Function

Excel computes the locations of the 1st, 2nd, and
3rd quartiles by first converting the quartiles
to percentiles and then using the following
formula to compute the location (Lp) of the pth
percentile
Excel would compute the location of the 3rd
quartile (75th percentile) for the rent data as
follows
L75 (75/100)70 (1 75/100) 52.5 .25
52.75
The 3rd quartile would be
515 .75(525 - 515) 515 7.5 522.5
40
Quartiles
25
25
25
25
Q1
Q2
Q3
The position of the quartile is i (p/100)n

41
A sample of 30 light trucks using diesel fuel
revealed these mileage's per gallon of fuel used.
42
Frequency Distribution
43
Compute the mean, median, mode, and quartiles for
the diesel truck fuel mileage.
  • Mean 18.1 mpg
  • Median 18.0 mpg
  • Mode 17.0 mpg and 20 mpg
  • First Quartile 16 mpg
  • Third Quartile 20 mpg

i (p/100)n
See the next slide.
44
i (p/100)n
Position of first quartile i(25/100) 30 7.5
Round up to the 8th value in the array Value of
the 8th value is the First quartile, or 16
mpg Position of the third quartile i(75/100)30
22.5 Round up to the 23rd value in the
array Value of the 23rd value is the third
quartile, or 20 mpg See page 96 for rules.
45
Interpretation of the Mean- truck mileage data
  • The average mileage was 18.1 mpg.

46
Interpretation of the Median- truck mileage data
  • Half the trucks got more than 18 miles per
    gallon, and half got less than that amount.

47
Interpretation of the Mode- truck mileage data
  • The Mode is the value that appears most
    frequently or you might use the the mid-point of
    the modal class.
  • The modal class is the class with the highest
    frequency the 16 to 18 mpg class.
  • There are two modes, 17 mpg. 20 mpg.
  • Doesnt it make sense to talk in terms of the
    modal class as opposed to the mode?

48
Interpretation of Quartiles-truck mileage data
  • 25 of the trucks got less than 16 mpg.
  • 25 of the trucks got more than 20 mpg.
  • 75 of the trucks got less than 20 mpg
  • 50 of the trucks got between 16 and 20 mpg.

49
Shape of Truck Mileage Data
50
Shape of Truck Mileage Data
18
51
(No Transcript)
52
Measures of Variation
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Population Variance
Population Standard Deviation
Sample Variance
Sample Standard Deviation
Interquartile Range
53
Range
  • Measure of variation
  • Difference between the largest and the smallest
    observations
  • Ignores the way in which data are distributed

Range 12 - 7 5
Range 12 - 7 5
7 8 9 10 11 12
7 8 9 10 11 12
54
Interquartile Range
  • Difference between the first and third quartiles
  • Spread in the middle 50
  • Not affected by extreme values

Data in Ordered Array 11 12 13 16 16
17 17 18 21
55
Variance
  • Important measure of variation
  • Shows variation about the mean
  • Sample variance
  • Population variance

56
Standard Deviation
  • Most important measure of variation
  • Shows variation about the mean
  • Has the same units as the original data
  • Sample standard deviation
  • Population standard deviation

57
Comparing Standard Deviations
Data A
Mean 15.5 s 3.338
11 12 13 14 15 16 17 18
19 20 21
Data B
Mean 15.5 s .9258
11 12 13 14 15 16 17 18
19 20 21
Data C
Mean 15.5 s 4.57
11 12 13 14 15 16 17 18
19 20 21
58
Coefficient of Variation
  • Measures relative variation
  • Always in percentage ()
  • Shows variation relative to mean
  • Is used to compare two or more sets of data
    measured in different units or sets of data

with the same units and different means.
59
Comparing Coefficient of Variation
  • Stock A
  • Average price last year 50
  • Standard deviation 5
  • Stock B
  • Average price last year 100
  • Standard deviation 5
  • Coefficient of variation
  • Stock A
  • Stock B

60
Class Room Exercises
  • A sample of five recent accounting graduates
    revealed the following starting salaries (000).
  • 17, 26, 18, 20, 19
  • Compute the range, variance and standard
    deviation
  • Write a paragraph in which you describe the data
    by interpreting each of the statistics you have
    computed.

61
Range Highest value - Lowest Value
62
Range
  • Arrayed data
  • 17, 18, 19, 20, 26
  • Range 26-17 9 thousand

63
Variance
64
Variance
65
Variance
66
Variance
67
Standard Deviation
  • The standard deviation is the square root of the
    variance

68
Description of the Data
  • The salaries of a sample of 5 recent accounting
    graduates varied from 17,000 to 26,000, a range
    of 9,000.

69
Description of the Data Continued...
  • The variance of 12,500 (2) is not useful in
    describing the data.
  • Another average deviation from the mean is the
    standard deviation of 3,400. This is not very
    useful to usYET!!

70
Shape of a Distribution
  • Describes how data is distributed
  • Measures of shape
  • Symmetric or skewed

Right-Skewed
Left-Skewed
Symmetric
Mean lt Median lt Mode
Mean Median Mode

Mode lt Median lt Mean
71
Chapter 3 Measures of Distribution Shape,
Relative Location, and Detecting Outliers
72
Skewness
  • Excel will compute a numerical value for
    skewness. You will have to develop a feeling for
    what numerical value indicates if a distribution
    is moderately or heavily skewed.
  • Excels SKEW function can be used to compute the
    skewness of a data set.

73
Distribution Shape Skewness
  • Symmetric (not skewed)
  • Skewness is zero.
  • Mean and median are equal.

Skewness 0
Relative Frequency
74
Distribution Shape Skewness
  • Moderately Skewed Left
  • Skewness is negative.
  • Mean will usually be less than the median.

Skewness - .31
75
Distribution Shape Skewness
  • Moderately Skewed Right
  • Skewness is positive.
  • Mean will usually be more than the median.

Skewness .31
76
Distribution Shape Skewness
  • Highly Skewed Right
  • Skewness is positive (often above 1.0).
  • Mean will usually be more than the median.

Skewness 1.25
77
z-Scores
The z-score is often called the standardized
value.
It denotes the number of standard deviations a
data value xi is from the mean.
78
z-Scores
  • An observations z-score is a measure of the
    relative
  • location of the observation in a data set.
  • A data value less than the sample mean will
    have a
  • z-score less than zero.
  • A data value greater than the sample mean will
    have
  • a z-score greater than zero.
  • A data value equal to the sample mean will
    have a
  • z-score of zero.

79
Empirical or Normal Rule
  • For a symmetrical, bell-shaped frequency
    distribution
  • 68, 95, and 99.7 of the observations will lie
    within plus and minus one, two, and three
    standard deviations of the mean, respectively.

80
Empirical Rule
x
m
m 3s
m 3s
m 1s
m 1s
m 2s
m 2s
81
Students Grades Through Fall 93
ARRAYED DATA
82
Students Grades Through Fall 93
Mean 79.3 Standard Dev. 9.7
83
Empirical or Normal Rule
  • 79.3 or - 9.7
  • Between 69.6 89
  • of grades
  • 79.3 or- 2(9.7)
  • Between 59.9 98.7
  • of grades
  • 79.3 or - 3(9.7)
  • Between 50.2 108.4.

84
Students Grades Through Fall 93
ARRAYED DATA
85
Empirical or Normal Rule
  • 79.3 or - 9.7
  • Between 69.6 89
  • of grades 106 out of 148 or about 72
  • 79.3 or- 2(9.7)
  • Between 59.9 98.7
  • of grades 142 out of 148 or about 96
  • 79.3 or - 3(9.7)
  • Between 50.2 108.4 100 of the grades.

86
Detecting Outliers
  • An outlier is an unusually small or unusually
    large
  • value in a data set.
  • A data value with a z-score less than -3 or
    greater
  • than 3 might be considered an outlier.
  • It might be
  • an incorrectly recorded data value
  • a data value that was incorrectly included in
    the
  • data set
  • a correctly recorded data value that belongs in
  • the data set

87
Summary of Chapter Topics
  • Measures of Central Tendency
  • Mean, Median, Mode
  • Quartile
  • Measures of Variation
  • The Range, Interquartile Range, Variance
    and
  • Standard Deviation, Coefficient of variation
  • Shape
  • Symmetric, Skewed

88
Summary of Chapter Topics cont.
  • Empirical rule
  • Pitfalls in numerical descriptive measures and
    ethical considerations
Write a Comment
User Comments (0)
About PowerShow.com