A Review of Descriptive Statistics - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

A Review of Descriptive Statistics

Description:

Introduction A Review of Descriptive Statistics – PowerPoint PPT presentation

Number of Views:546
Avg rating:3.0/5.0
Slides: 32
Provided by: SteveK221
Category:

less

Transcript and Presenter's Notes

Title: A Review of Descriptive Statistics


1
Introduction
  • A Review of Descriptive Statistics

2
Charts
  • When dealing with a larger set of data values,
    it may be clearer to summarize the data by
    presenting a graphical image

3
Intervals
  • Numerical data values may be grouped or
    classified by defining class intervals

Suppose the following data values represent the
ACT test scores for 30 individuals. 8,
10, 11, 13, 13, 14, 14, 15, 15, 16,
16, 17, 17, 18, 18, 18, 18, 19, 20, 20,
21, 21, 21, 22, 22, 23, 25, 26, 28,
30 Define intervals so that each of the values
fall into exactly one of the intervals.
4
Frequency
  • Determine how many data scores fall in each of
    the intervals (the "frequency)

5
Histogram
Draw a bar chart (or "histogram") with the height
of the bar on each interval determined by the
frequency
6
Relative Frequency
  • Alternatively, give the percentage of scores or
    "relative frequency".
  • That is, if 5 of the 30 values fall in the
    interval, then the relative frequency is 5/30
    0.1667.

7
Relative to each other, the bars are the same
height and the histograms have the same shape.
8
Cumulative Frequency
  • or we could keep a running total, called a
    cumulative frequency, as we go from one
    interval to the next.
  • if there are 2 values in the first interval and 5
    in the next, then the cumulative frequency is 2
    5 7 for the second interval.

9
Cumulative Graph
  • The increase in the height of the bar shows how
    many data values were contributed by a given
    interval.

The increase in the height of the bar shows how
many data values were contributed by a given
interval.
10
The Middle
  • In addition to the graphical summary
  • also give numerical measurements which describe
    the distribution of the data

The middle ?
11
Set of Heights
  • the height (in inches) of 30 third graders.
    47.5 48.5 50 52 52 53 53 54
    54 54 54.5 54.5 55 55 55 55.5 55.5
    55.5 56 5656 56.5 56.5 57 57 57
    57 57.5 58 58
  • How should we describe the "middle height"?
  • For numerical data, we commonly compute the
    "arithmetic average" of the values, also called
    the mean value.

12
The Mean Value
  • To compute the average find the sum of the
    values and divide by the number of values in
    the set.
  • For our 30 third-graders, we find the sum of the
    30 heights and then divide by 30

Compare this to the middle of the histogram.
13
The Middle Weight
  • Looks to be in the middle!

Mean 54.7
14
Sampling a Population
  • We distinguish between a sample and the entire
    population.
  • A population consists of all the members of the
    set under consideration (eg., all third-graders
    in the United States)
  • A sample consists of a subset of members selected
    from a population (eg., 30 third-graders in our
    example)

15
Notation
  • The notation used depends on if were using the
    entire population or a sample.

16
Median Value
  • The median value is literally defined to be the
    middle data value. You may need to "split the
    difference" by averaging two middle values.
  • Half the data lies at or below the median and the
    other half lies at or above the median.
  • Median is another measure of the middle but is
    less affected by non-typical data values.

17
Median third-grader?
  • Consider our previous data for 30
    third-graders.47.5 48.5 50 52 52 53
    53 54 54 54 54.5 54.5 55 55 55
    55.5 55.5 55.5 56 56 56 56.5 56.5 57
    57 57 57 57.5 58 58
  • An even number of data values, so we average the
    two middle values.
  • The median is (55 55.5)/2 55.25 inches.

18
Mean vs. Median
  • In smaller samples, the median value is often a
    better measure it is unaffected a non-typical
    score and is more representative of the middle.
  • Suppose test scores were23, 58, 64, 68, 75,
    79, 83, 85, 87, 91, 94
    median is 79
  • Mean equals about 73.36

19
The Spread
  • Another characteristic of a data set is how
    widely the data values are spread.
  • Find a way to measure how widely the values vary.
  • The measurement we use is called the "standard
    deviation".

20
The Deviations
  • Having determined the mean value, we can measure
    how far each data value varies from the
    middle.
  • The difference or "deviation" from the middle, is
    computed as .
  • Our goal is to compute a sort of average of these
    deviations from the middle.

21
16 ounce drink
  • Suppose a sample of 8 medium colas were measured.
    The volumes, measured in ounces, are given by
    the data below. 16.2 16.5 15.9
    15.7 15.9 16.1 16.3 15.8

22
Deviations in Colas
  • Recall the contents of our 8 colaswhere the mean
    value is 16.05 ounces. data value
    deviation from middle 15.7
    15.8 15.9
    15.9 16.1
    16.2 16.3
    16.5

15.7 - 16.05 - 0.35 15.8 - 16.05 - 0.25
15.9 - 16.05 - 0.15 15.9 - 16.05 -
0.15 16.1 - 16.05 0.05 16.2 -
16.05 0.15 16.3 - 16.05 0.25
16.5 - 16.05 0.45
23
Squared Deviations
  • To prevent the negative and postive values from
    cancelling each other out, we square them.data
    deviation from middle deviation
    squared 15.7 15.7 - 16.05 -
    0.35 (- 0.35)2 0.1225 15.8
    15.8 - 16.05 - 0.25 (- 0.25)2
    0.0625 15.9 15.9 - 16.05 -
    0.15 (- 0.15)2 0.0225 15.9
    15.9 - 16.05 - 0.15 (- 0.15)2
    0.0225 16.1 16.1 - 16.05
    0.05 ( 0.05)2 0.0025 16.2
    16.2 - 16.05 0.15
    0.0225 16.3 16.3 - 16.05
    0.25 0.0625 16.5
    16.5 - 16.05 0.45
    0.2025

24
Avg. of Squared Deviations
  • To average the deviations add the squared
    deviations and divide by one less than the
    number of data values in the sample.
  • Finally, we "undo the squaring" by computing the
    square root.

25
  • data value deviation squared
  • 15.7 0.1225
  • 15.8 0.0625
  • 15.9 0.0225
  • 15.9 0.0225
  • 16.1 0.0025
  • 16.2 0.0225
  • 16.3 0.0625
  • 16.5 0.2025
  • total 0.5200 sum of
    squared deviations

26
Average Spread
s 0.2726 is a sort of average of how far the
data values vary from the middle
27
Notation
  • As with the mean value, notation depends on the
    whether the data represents the population or a
    sample.

28
Compare
  • The standard deviation describes the
    distribution of the data.
  • Which of the following distributions would you
    expect to have the larger standard deviation?

29
Match the statistics with the histograms
30
Bell-shaped Distribution
  • For reasonably large random samples, we often
    observe a "bell-shaped" distribution.
  • In such cases, we expect to find about 68 of the
    data within one std. dev. of the mean.

Also, about 95 of the data is expected to lie
within 2 standard deviations of the mean.
31
Empirical Rule
Write a Comment
User Comments (0)
About PowerShow.com