MATHSTAT 231 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

MATHSTAT 231

Description:

A graph, such as a histogram, gives us a picture of our data which can show us ... For symmetric and unimodal distributions, the mean, median and mode are close equal. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 35
Provided by: msu141
Category:

less

Transcript and Presenter's Notes

Title: MATHSTAT 231


1
MATH/STAT 231
  • Chapter 2-II
  • Univariate Data Description
  • Numerical Description

2
Introduction
  • A graph, such as a histogram, gives us a picture
    of our data which can show us the shape of the
    distribution, the number of modes and their
    locations and other various information. However,
    we also want some quantitative information about
    the distribution. The most commonly desired
    quantitative information are a value describing
    the center of the data and a value describing how
    spread-out the data is.
  • 1. Measures of central tendency
  • 2. Measures of Dispersion (variation)

3
(No Transcript)
4
Measures of central tendency
  • The three common measures of center are mean,
    median and mode.
  • Mean
  • The mean (arithmetic average) is the most
    common measure of central tendency. It simply the
    sum of the numbers divided by the number of
    numbers.

5
  • Example At a ski rental shop data was collected
    on the number of rentals on each of 10
    consecutive Saturdays 44, 50, 38, 96, 42, 47,
    40, 39, 46, 50. Try to find out the mean.

6
  • Example On his first 5 biology tests, Bob
    received the following scores  72, 86, 92, 63,
    and 77.  What test score must Bob earn on his
    sixth test so that his average (mean score) for
    all six tests will be 80?  
  •      
  •             72 86 92 63 77 x    
    80                            6cross
    multiply     
  •                            (80)(6) 390
    x                            480 390x
  •                                     x 90

7
  • Median
  • The median is the value that occupies in the
    middle position when the data are sorted from the
    smallest to the largest. Half the data values are
    above the median and half are below the median.
  • Find the median
  • 1. Order the observations from the smallest to
    the largest.
  • 2. Select the middle point.
  • If there is an odd number (N) of values, the
    median M is the (N1)/2 th value (the middle
    value). If there is an even number of values,
    the median is the average of the two middle
    numbers is the average of the two middle numbers
    (add the two middle values and divide by 2).

8
  • Example Test scores for a class of 17 students
    are as follows 93, 84, 97, 98, 100, 78, 86, 82,
    85, 92, 72, 55, 91, 90, 75, 94, 83. What is the
    median score?
  • Step 1 sort the data from smallest to
    largest
  • 55 72 75 78 82 83 84 85 86 90 91
    92 93 94 97 98 100
  • Step 2 locate the middle position
  • N 17
  • Middle position (171) /2 9
  • The median 86 (The 9th Number. The data at
    the middle position.)

9
  • Example At a ski rental shop data was collected
    on the number of rentals on each of 10
    consecutive Saturdays 44, 50, 38, 96, 42, 47,
    40, 39, 46, 50. Try to find out the median.
  • Step 1 sort the data from smallest to the
    largest
  • 38 39 40 42 44 46 47 50 50 96
  • Step 2 locate the middle position
  • N 10
  • The middle position is between
    5, 6
  • The median (Average of the two middle numbers)
  • 38 39 40 42 44 M 46 47 50 50 96
  • (44 46 ) /2 45

10
  • Example The ages of the 667 people participating
    in a large workshop (to the nearest year) are
    summarized as follows
  • What is the median age of the 667 people?

11
  • Mode
  • The value that occurs most often in a dataset
    is called the mode. It is sometimes said to be
    the most typical case.
  • No mode each value occurs only once.
  • Example For individuals having the following
    ages -- 18, 18, 19, 20, 20, 20, 21, and 23, the
    mode age is 20.
  • Example Find the mode for the following data
    5, 15, 10, 15, 5, 10, 10, 20, 25, 15. 

12
  • Try to find the mode based on graphs.

Data values of blood type A B B AB
O O O B AB B B B O A
O A O O O AB AB A O B
A
13
Comments
  • For symmetric distributions, the mean and median
    are close equal. For symmetric and unimodal
    distributions, the mean, median and mode are
    close equal.
  • In skewed distributions, the mean is farther out
    in the long tail than is the median. So, if the
    distribution is skewed to right, then mean gt
    mediangt mode. If the distribution is skewed to
    left, then mean lt median lt mode.

14
  • The mean is sensitive to the influence of a few
    extreme observations. The median and mode are
    more resistant than the mean.
  • For nominal data (such as sex or race), the mode
    is the only valid measure.
  • For ordinal data (such as salary categories),
    only the mode and median can be used.

15
(No Transcript)
16
Measuring dispersion (variation)
  • Range
  • Standard deviation
  • Range
  • Range Max Min
  • Example At a ski rental shop data was collected
    on the number of rentals on each of 10
    consecutive Saturdays 44, 50, 38, 96, 42, 47,
    40, 39, 46, 50. Find the range.
  • Range 96 (the largest value) 38 (the
    smallest value)

17
  • Standard deviation
  • Assume we have a dataset with n values
    x1,x2,xn.
  • the mean of the data set.
  • Formula for the standard deviation
  • Sample variance
  • or, in more compact notation
  • Sample Standard deviation
  • Variance and Standard deviation
  • Variance s x s s2
  • s square root (Variance)

18
Example Suppose we wished to find the standard
deviation of the data set consisting of the
values 3, 7, 7, and 19. Step 1 find the mean
(average) of 3, 7, 7, and 19, (3
7 7 19) / 4 9. Step 2 find the
deviation of each number from the mean, 3 - 9
- 6 7 - 9 - 2 7 - 9
- 2 19 - 9 10. Step 3 square each
of the deviations, which amplifies large
deviations and makes negative values positive,
( - 6)2 36 ( - 2)2 4 (
- 2)2 4 102 100. Step 4 find
the variance ( sum of squared deviations/(n-1)
), s2 (36 4 4 100) / (4-1)
48. Step 5 take the non-negative square
root of the variance (s2) s
sqrt(48) 6.93 So, the standard deviation of the
set is 6.93 .
19
Another formula to calculate the
variance Example Find the standard deviation
of the data set consisting of the values 3, 7, 7,
and 19. Step 1 Find the mean (3 7 7
19) / 4 9. V2 4 x 92 324
Step 2 Square each of the data value, and add
all of them V1 327272192
468 Step 3 s2 (V1-V2)/(4-1) 48 The
standard deviation sqrt (s2) sqrt (48) 6.93
20
  • Comments
  • 1. A large standard deviation indicates that
    the data points are far from the center (Mean)
    and a small standard deviation indicates that
    they are clustered closely around the center
    (Mean).
  • 2. The standard deviation has the same units
    as the data points themselves.

21
  • Practice use of the standard deviation
  • If you repeat a measurement several times on
    the same object over the period of measurement,
    you may get a series of readings that differ from
    each other. The cause may be small differences in
    how you use the instrument each time. The
    differences could also be due to random changes
    in the instrument, and they could be due to small
    changes in the object you are measuring. Whatever
    the cause, you would be inclined to take the
    average of the readings as the best value you can
    quote or use. You can get an idea of the
    variability from standard deviation of the
    readings. The bigger the standard deviation
    (variance) is, the less precise the readings are,
    and vice versa.

22
Measures of position
  • How do compare two data values from different
    groups.
  • 1. Five students have taken different forms of
    the spelling test. The difficulty level of
    different tests could be different. How to
    compare their performance?
  • 2. A training program only accepts top 25
    students according to a standard test. what grade
    does a student need to make to be in the top 25?

23
  • Measures of position are used to determine the
    relative position of a specified data value
    within a group of data values.
  • Percentile
  • Quantile
  • z-score

24
Percentiles and quartiles
  • Percentiles divide the data values into 100 equal
    groups. pth percentile of a distribution is the
    value such that p percent of data values fall at
    or below it.
  • The median is the 50th percentile.
  • The first quartile Q1 (lower quartile)
    is the 25th percentile, the third quartile Q3
    (upper quartile) is the 75th percentile. The
    median is the second quartile. Quartiles divide
    the distribution into four equal groups,
    separated by Q1, Q2, and Q3.

25
  • To calculate the quartiles
  • 1. Arrange the data values in increasing
    order and locate the median.
  • 2. Use the median to divide the ordered data
    values into two halves. Do not include the median
    into the halves.
  • 3. The lower quartile (Q1) is the median of
    the lower half of the data. The upper quartile
    (Q3) is the median of the upper half of the data.

26
  • Example. Suppose a group of 10 students have the
    following heights (in inches) 60,72, 64, 67, 70,
    68, 71, 68, 73, 59. Find Q1 ,Q3 and IQR.
  • 1 Sort the data from smallest to largest
  • 59, 60, 64, 67, 68, 68, 70, 71, 72,
    73.
  • 2 Divide the observations into lower
    half and upper half by median 59, 60, 64, 67,
    68 M 68, 70, 71, 72, 73.
  • 3. The first (second) half of the data is
    considered in calculating the first (third)
    quartile, and Q1 (Q3) is the median of this part
    of data
  • First half 59, 60, 64, 67, 68
  • Second half 68, 70, 71, 72, 73
  • IQR Q3-Q1 71-64 7

27
  • Related Measures
  • Midquartile
  • midquartile (Q1Q3)/2
  • The Interquartile Range
  • IQR Q3-Q1
  • The IQR is essentially the range of the middle
    50 of the data.

28
Standardized values or z-scores
  • If x is an observation (data value) from a
    population (a group of data)that has mean µ and
    standard deviation s, the z-score (standardized
    value) of x is
  • The z score for an data value, indicates how far
    and in what direction, that item deviates from
    the mean, expressed in units of the standard
    deviation.

29
  • Example
  • 1. Find the z-score corresponding to a raw
    score of 132 from a group of data with mean 100
    and standard deviation 15.
  • 2. The lengths of an adult South American rain
    forest beetle species are distributed with mean
    5.6cm and standard deviation 0.32cm. What is the
    z-score for a beetle of length 5.1 cm?
  • 3.A z-score of 1.7 was found from an
    observation coming from a population with mean 14
    and standard deviation 3. Find the value of the
    observation .

30
  • The z-score transformation is especially useful
    when seeking to compare the relative standings of
    items from distributions with different means
    and/or different standard deviations.
  • Five students have taken different forms of the
    spelling test. The scores of different forms are
    distributed with different mean and standard
    deviation. How to compare their performance?

31
The five-number summary and Boxplot
  • Five number summary
  • Minimum, Q1, Median, Q3, Maximum.
  • These five numbers offer a reasonable and
    complete description of distribution.
  • Boxplot (Box and whiskers display)
  • Visual version of five number summary. The
    five number summary is easier to understand when
    it is displayed in a graph.

32
  • Here are the four steps you follow to draw a
    boxplot
  • 1. Draw a box from the 25th (Q1) to the
    75th (Q3) percentile.
  • 2. Split the box with a line at the median.
  • 3. Draw a thin lines from the 75th
    percentile up to the maximum value. Draw another
    thin line from the 25th percentile down to the
    minimum value.

33
Information obtained from a Boxplot
__________
_________
___________
Symmetry versus Skewness
34
  • Example The following boxplot is of the birth
    weights (in ounces) of a sample of 160 infants
    born in a local hospital.

1. Describe the shape of the distribution? 2.
Find out the five number summary. 3. About 40
infants of the birth weights were below ____? 4.
IQR ?
Write a Comment
User Comments (0)
About PowerShow.com