Chapter 2 Summarizing and Graphing Data - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

Chapter 2 Summarizing and Graphing Data

Description:

Title: Introduction to Statistics Author: CA Last modified by: Columbus State University Created Date: 1/19/2003 2:23:17 PM Document presentation format – PowerPoint PPT presentation

Number of Views:198
Avg rating:3.0/5.0
Slides: 87
Provided by: CA16
Category:

less

Transcript and Presenter's Notes

Title: Chapter 2 Summarizing and Graphing Data


1
Chapter 2 Summarizing and Graphing Data
  • Overview
  • Frequency Distributions
  • Statistical Graphics (stemplots, dotplots, etc)
  • Histograms

2
Frequency Distributions
  • The amount of data collected in some real-world
    situations can be overwhelming.
  • By suitably organizing data, we can often make a
    large and complicated set of data more compact
    and easier to understand.
  • Here, we discuss Frequency Distributions which
    involves putting data into groups rather than
    treating each observation individually.

3
Grouping Quantitative Data
  • Days to Maturity for Short-Term Investments The
    table below displays the number of days to
    maturity for 40 short-term investments.

4
Grouping Quantitative Data
  • Getting a clear picture of the data in the table
    is difficult.

5
Grouping Quantitative Data
  • By grouping the data into categories, or classes,
    we can make the data easier to comprehend.

6
Grouping Quantitative Data
  • The first step is to decide on the classes.
  • A convenient way to group these data is by 10s.

7
Grouping Quantitative Data
  • Since the shortest maturity period is 36 days,
    our first class is for maturity periods from 30
    days up to, but not including, 40 days.

8
Grouping Quantitative Data
  • The longest maturity period is 99 days, so
    grouping by 10s results in the seven classes.

9
Grouping Quantitative Data
  • The final step for grouping the data is to find
    the number of investments in each class.

10
Grouping Quantitative Data
  • The final step for grouping the data is to find
    the number of investments in each class.

11
Grouping Quantitative Data
  • The final step for grouping the data is to find
    the number of investments in each class.

12
Grouping Quantitative Data
  • In the previous example, we used a commonsense
    approach to grouping data into classes. Some of
    that common sense can be used as guidelines for
    grouping.
  • Three of the most important guidelines are the
    following.

13
Grouping Quantitative Data
  • The number of classes should be small enough to
    provide an effective summary but large enough to
    display the relevant characteristics of the data
    (in general between 5 and 20 ).
  • Each observation must belong to one, and only
    one, class.
  • Whenever feasible, all classes should have the
    same width.

14
Frequency Distributions
  • The number of observations that fall into a
    particular class is called the frequency or count
    of that class.
  • A table that provides all classes and their
    frequencies is called a frequency distribution.

15
Frequency Distributions
  • The frequency distribution in this example is

16
Relative-Frequency Distributions
  • In addition to the frequency of a class, we
    are often interested in the percentage of a class.

17
Relative-Frequency Distributions
The relative frequency is the percent of
observations within a category and is found using
the formula A relative frequency
distribution lists the relative frequency of each
category of data.
18
Terms Used in Grouping
  • Lower class limits are the smallest numbers that
    can actually belong to different classes.
  • Upper class limits are the largest numbers that
    can actually belong to different classes.
  • Class boundaries are the numbers used to
    separate classes, but without the gaps created by
    class limits.
  • Class midpoints are the midpoints of the classes
    and can be found by adding the lower class limit
    to the upper class limit and dividing the sum by
    two.
  • Class width is the difference between two
    consecutive lower class limits.

19
Reasons for Constructing Frequency Distributions
  • Large data sets can be summarized.
  • We can gain some insight into the nature of data.
  • We have a basis for constructing important graphs.

20
Another Example of Frequency Distribution
Illustrating these Concepts
21
Frequency Distribution Ages of Best Actresses
Frequency Distribution
Original Data
22
Lower Class Limits
23
Upper Class Limits
24
Class Boundaries
Editor Substitute Table 2-2
25
Class Midpoints
26
Class Width
Editor Substitute Table 2-2
27
Relative Frequency Distribution
28/76 37 30/76 39 etc.
Total Frequency 76
28
Cumulative Frequency Distribution
Cumulative Frequencies
29
Frequency Tables
30
Single-Value Grouping
If the data is discrete, the categories of data
will be the observations (when total number of
them is a relatively small number).
Consider the following example The following
data represent the number of available cars in a
household based on a random sample of 50
households. Construct a frequency and relative
frequency distribution.
31
Single-Value Grouping
3 0 1 2 1 1 1 2 0 2 4 2 2 2 1 2 2 0 2 4 1 1 3 2 4
1 2 1 2 2 3 3 2 1 2 2 0 3 2 2 2 3 2 1 2 2 1 1 3 5
32
Grouping Qualitative Data
  • The concepts of class limits and midpoints are
    not appropriate for qualitative data.
  • For instance, if we have data that categorize
    people as male or female, then the classes are
    male and female.
  • We can still group qualitative data and compute
    frequencies and relative frequencies for classes.
  • For qualitative data, the classes coincide with
    the observed values of the corresponding variable.

33
Grouping Qualitative Data
Example The data on the next slide represent the
color of MMs in a bag of plain MMs. Construct
a frequency distribution of the color of plain
MMs.
34
Yellow Orange Brown Green Green
Blue Brown Red Brown Brown
Orange Brown Red Brown Red
Green Brown Red Green Yellow
Yellow Red Red Brown Orange
Yellow Orange Red Orange Blue
Brown Red Yellow Brown Red
Brown Yellow Yellow Blue Yellow
Yellow Brown Yellow Green Orange
35
Yellow Orange Brown Green Green
Blue Brown Red Brown Brown
Orange Brown Red Brown Red
Green Brown Red Green Yellow
Yellow Red Red Brown Orange
Yellow Orange Red Orange Blue
Brown Red Yellow Brown Red
Brown Yellow Yellow Blue Yellow
Yellow Brown Yellow Green Orange
36
Frequency Tables
37
Distributions of Data Sets
The distribution of a data set is a table, graph,
or formula that provides the values of the
observations and how often they occur.
38
Displaying Distributions
Qualitative Variable
Quantitative Variable
Frequency Table
Stem Leaf Plot
Frequency Table
Histogram
Pie Chart
Bar Graph
Dot Plot
39
Displaying Distributions of Qualitative Data
40
  • Example how well educated are 30-something young
    adults? Here is the distribution of the highest
    level of education for people aged 25 to 34 years

41
Bar Graph
42
  • Label each category of data on a horizontal axis
    and the frequency or relative frequency of the
    category on the vertical axis.

43
  • A rectangle of equal width is drawn for each
    category whose height is equal to the category's
    frequency or relative frequency.

44
  • The bar graph quickly compares the sizes of the
    five education groups. The heights of the bars
    show the counts in the five categories.

45
Pie Chart
46
  • The pie chart helps us see what part of the whole
    each group forms. For example, the HS grad
    slice makes up 30.7 of the pie because 30.7 of
    young adults have only a high school education.

47
Displaying Distributions of Quantitative Data
48
Stemplots
  • A stemplot (also called a stem-and-leaf plot)
    gives a quick picture of the shape of a
    distribution while including the actual numerical
    values in the graph.
  • Stemplots work best for small numbers of
    observations that are all greater than 0.

49
How to Make a Stemplot
  1. Separate each observation into a stem consisting
    of all but the final (rightmost) digit and a
    leaf, the final digit. Stems may have as many
    digits as needed, but each leaf contains only a
    single digit.
  2. Write the stems in a vertical column with the
    smallest at the top, and draw a vertical line at
    the right of this column.
  3. Write each leaf in the row to the right of its
    stem, in increasing order out from the stem.

50
  • Here are the numbers of home runs that Babe Ruth
    hit in each of his 15 years with the New York
    Yankees, 1920 to 1934.

51
  • Stemplots do not work well for large data sets,
    where each stem must hold a large number of
    leaves.

Possible Modifications
  • We can increase the number of stems in a plot by
    splitting each stem into two one with leaves 0
    to 4 and the other with leaves 5 to 9.
  • When the observed values have many digits, it is
    often best to round the numbers to just a few
    digits before making a stemplot.

52
  • To make a stemplot of this distribution,
  • First round the purchases to the nearest dollar.
  • Then use tens of dollars as stems and dollars as
    leaves.

53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
To see the shape of a distribution more clearly,
turn the stemplot on its side so that the larger
values lie to the right.
57
Remarks About Stemplots
  • Stemplots display the actual values of the
    observations.
  • This feature makes stemplots awkward for large
    data sets.
  • The construction of a stemplot is a quick an easy
    way to sort data ( arrange them in order), and
    sorting is required for some other statistical
    procedures.

58
Dot Plot
  • Consists of a graph in which each data value is
    plotted as a point (or dot) along a scale of
    values

59
Histograms
  • A histogram is a bar graph of the frequency or
    the relative frequency distribution of a
    quantitative data.
  • The horizontal scale represents classes of data
    values and the vertical scale represents
    frequency or the relative frequency.
  • The heights of the bars correspond to frequency
    or the relative frequency values, and the bars
    are drawn adjacent to each other (without gaps).

60
Example Percent of Hispanics in the adult
population, by state (2000)
61
Step 1 Divide the range of the data into classes
of equal width.
  • The data in the table range from 0.6 to 38.7
  • We choose 8 intervals of length 5, that is,
  • We choose our classes as follows

62
Example Percent of Hispanics in the adult
population, by state (2000)
63
Step 2 Count the number of individuals in
each class.
  • These counts are called frequencies
  • A table of frequencies for all classes is a
    frequency table.

64
Step 3 Draw the histogram.
  • First mark the scale for the variable whose
    distribution you are displaying on the horizontal
    axis. In this case, percent of adults who are
    Hispanic
  • The vertical axis contains the scale of counts.

65
(No Transcript)
66
(No Transcript)
67
Two Types of Histograms
68
Histograms Another Example
  • Days to Maturity for Short-Term Investments
  • Frequency histogram

69
Histograms Another Example
  • Days to Maturity for Short-Term Investments
  • Frequency histogram

70
Histograms Another Example
  • Days to Maturity for Short-Term Investments
  • Relative-frequency histogram

71
Histograms Another Example
  • Days to Maturity for Short-Term Investments
  • Relative-frequency histogram

72
Examining distributions
  • In any graph of data, we look for the overall
    pattern and for striking deviations from that
    pattern.
  • We can describe the overall pattern of a
    distribution by its shape, center, and spread.
  • An important kind of deviation is an outlier, an
    individual value that falls outside the overall
    pattern.

73
Distributions Shapes
  • An important aspect of the distribution of a
    quantitative data set is its shape.
  • The shape of a distribution plays a role in
    determining the appropriate method of statistical
    analysis.
  • To identify the shape of a distribution, the best
    approach usually is to use a smooth curve that
    approximates the overall shape.

74
Distributions Shapes
In later chapters, there will be frequent
reference to data with a normal distribution.
One key characteristic of a normal distribution
is that it has a bell shape.
  • The frequencies start low, then increase to some
    maximum frequency, then decrease to a low
    frequency.
  • The distribution should be approximately
    symmetric.

75
  • For instance, the following table displays a
    frequency and a relative-frequency distribution
    for the heights of the 3264 female students who
    attend a Midwestern college.

This is a good example of normal distribution
76
  • The figure displays a relative-frequency
    histogram for the same data. Notice the bell
    shape.
  • Included is a smooth curve that approximates the
    overall shape of the distribution.

77
  • The figure displays a relative-frequency
    histogram for the same data. Notice the bell
    shape.
  • Included is a smooth curve that approximates the
    overall shape of the distribution.

78
  • Both the histogram and the smooth curve show that
    this distribution of heights is bell shaped but
    the smooth curve makes seeing the shape a little
    easier

79
Advantages of using smooth curves
  • We do not need to worry about minor differences
    in shape.
  • We can concentrate on overall patterns, which, in
    turn, allows us to classify most distributions by
    designating relatively few shapes.
  • and most importantly, allows us to use all the
    tools from Calculus.

80
Some Common Distributions Shapes
81
The relative-frequency histogram for household
size in the United States
The distribution of household sizes is right
skewed.
82
The distribution of Babe Ruths home run counts
is symmetric and unimodal. Range is from 22 to
60. There are no outliers.
The distribution of supermarket spending is
skewed to the right. Range is from 3 to
93. Outliers?
83
Population and Sample Distributions
  • The data set obtained by observing the values of
    a variable for an entire population is called
    population data or census data.
  • The data set obtained by observing the values of
    a variable for a sample of the population is
    called sample data.
  • To distinguish their distributions, we use the
    terminology population distribution and sample
    distribution.

84
Population and Sample Distributions
  • For a particular population and variable, sample
    distributions vary from sample to sample.
  • However, there is only one population
    distribution, namely, the distribution of the
    variable under consideration on the population
    under consideration.

85
Population distribution and six sample
distributions for household size
86
Population and Sample Distributions
  • In practice, we usually do not know the
    population distribution.
  • We can use the distribution of a simple random
    sample from the population to get a rough idea of
    the population distribution.
  • The larger the sample the better the sample
    distribution will approximate the population
    distribution.
Write a Comment
User Comments (0)
About PowerShow.com