Chapter 2 Summarizing and Graphing Data

About This Presentation

Title:

Chapter 2 Summarizing and Graphing Data

Description:

Title: Introduction to Statistics Author: CA Last modified by: Columbus State University Created Date: 1/19/2003 2:23:17 PM Document presentation format – PowerPoint PPT presentation

Number of Views:198

Avg rating:3.0/5.0

Slides: 87

Provided by: CA16

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 2 Summarizing and Graphing Data

1
Chapter 2 Summarizing and Graphing Data

Overview
Frequency Distributions
Statistical Graphics (stemplots, dotplots, etc)
Histograms

2
Frequency Distributions

The amount of data collected in some real-world
situations can be overwhelming.
By suitably organizing data, we can often make a
large and complicated set of data more compact
and easier to understand.
Here, we discuss Frequency Distributions which
involves putting data into groups rather than
treating each observation individually.

3
Grouping Quantitative Data

Days to Maturity for Short-Term Investments The
table below displays the number of days to
maturity for 40 short-term investments.

4
Grouping Quantitative Data

Getting a clear picture of the data in the table
is difficult.

5
Grouping Quantitative Data

By grouping the data into categories, or classes,
we can make the data easier to comprehend.

6
Grouping Quantitative Data

The first step is to decide on the classes.
A convenient way to group these data is by 10s.

7
Grouping Quantitative Data

Since the shortest maturity period is 36 days,
our first class is for maturity periods from 30
days up to, but not including, 40 days.

8
Grouping Quantitative Data

The longest maturity period is 99 days, so
grouping by 10s results in the seven classes.

9
Grouping Quantitative Data

The final step for grouping the data is to find
the number of investments in each class.

10
Grouping Quantitative Data

The final step for grouping the data is to find
the number of investments in each class.

11
Grouping Quantitative Data

The final step for grouping the data is to find
the number of investments in each class.

12
Grouping Quantitative Data

In the previous example, we used a commonsense
approach to grouping data into classes. Some of
that common sense can be used as guidelines for
grouping.
Three of the most important guidelines are the
following.

13
Grouping Quantitative Data

The number of classes should be small enough to
provide an effective summary but large enough to
display the relevant characteristics of the data
(in general between 5 and 20 ).
Each observation must belong to one, and only
one, class.
Whenever feasible, all classes should have the
same width.

14
Frequency Distributions

The number of observations that fall into a
particular class is called the frequency or count
of that class.
A table that provides all classes and their
frequencies is called a frequency distribution.

15
Frequency Distributions

The frequency distribution in this example is

16
Relative-Frequency Distributions

In addition to the frequency of a class, we
are often interested in the percentage of a class.

17
Relative-Frequency Distributions
The relative frequency is the percent of
observations within a category and is found using
the formula A relative frequency
distribution lists the relative frequency of each
category of data.
18
Terms Used in Grouping

Lower class limits are the smallest numbers that
can actually belong to different classes.
Upper class limits are the largest numbers that
can actually belong to different classes.
Class boundaries are the numbers used to
separate classes, but without the gaps created by
class limits.
Class midpoints are the midpoints of the classes
and can be found by adding the lower class limit
to the upper class limit and dividing the sum by
two.
Class width is the difference between two
consecutive lower class limits.

19
Reasons for Constructing Frequency Distributions

Large data sets can be summarized.
We can gain some insight into the nature of data.
We have a basis for constructing important graphs.

20
Another Example of Frequency Distribution
Illustrating these Concepts
21
Frequency Distribution Ages of Best Actresses
Frequency Distribution
Original Data
22
Lower Class Limits
23
Upper Class Limits
24
Class Boundaries
Editor Substitute Table 2-2
25
Class Midpoints
26
Class Width
Editor Substitute Table 2-2
27
Relative Frequency Distribution
28/76 37 30/76 39 etc.
Total Frequency 76
28
Cumulative Frequency Distribution
Cumulative Frequencies
29
Frequency Tables
30
Single-Value Grouping
If the data is discrete, the categories of data
will be the observations (when total number of
them is a relatively small number).
Consider the following example The following
data represent the number of available cars in a
household based on a random sample of 50
households. Construct a frequency and relative
frequency distribution.
31
Single-Value Grouping
3 0 1 2 1 1 1 2 0 2 4 2 2 2 1 2 2 0 2 4 1 1 3 2 4
1 2 1 2 2 3 3 2 1 2 2 0 3 2 2 2 3 2 1 2 2 1 1 3 5
32
Grouping Qualitative Data

The concepts of class limits and midpoints are
not appropriate for qualitative data.
For instance, if we have data that categorize
people as male or female, then the classes are
male and female.
We can still group qualitative data and compute
frequencies and relative frequencies for classes.
For qualitative data, the classes coincide with
the observed values of the corresponding variable.

33
Grouping Qualitative Data
Example The data on the next slide represent the
color of MMs in a bag of plain MMs. Construct
a frequency distribution of the color of plain
MMs.
34
Yellow Orange Brown Green Green
Blue Brown Red Brown Brown
Orange Brown Red Brown Red
Green Brown Red Green Yellow
Yellow Red Red Brown Orange
Yellow Orange Red Orange Blue
Brown Red Yellow Brown Red
Brown Yellow Yellow Blue Yellow
Yellow Brown Yellow Green Orange
35
Yellow Orange Brown Green Green
Blue Brown Red Brown Brown
Orange Brown Red Brown Red
Green Brown Red Green Yellow
Yellow Red Red Brown Orange
Yellow Orange Red Orange Blue
Brown Red Yellow Brown Red
Brown Yellow Yellow Blue Yellow
Yellow Brown Yellow Green Orange
36
Frequency Tables
37
Distributions of Data Sets
The distribution of a data set is a table, graph,
or formula that provides the values of the
observations and how often they occur.
38
Displaying Distributions
Qualitative Variable
Quantitative Variable
Frequency Table
Stem Leaf Plot
Frequency Table
Histogram
Pie Chart
Bar Graph
Dot Plot
39
Displaying Distributions of Qualitative Data
40

Example how well educated are 30-something young
adults? Here is the distribution of the highest
level of education for people aged 25 to 34 years

41
Bar Graph
42

Label each category of data on a horizontal axis
and the frequency or relative frequency of the
category on the vertical axis.

A rectangle of equal width is drawn for each
category whose height is equal to the category's
frequency or relative frequency.

The bar graph quickly compares the sizes of the
five education groups. The heights of the bars
show the counts in the five categories.

45
Pie Chart
46

The pie chart helps us see what part of the whole
each group forms. For example, the HS grad
slice makes up 30.7 of the pie because 30.7 of
young adults have only a high school education.

47
Displaying Distributions of Quantitative Data
48
Stemplots

A stemplot (also called a stem-and-leaf plot)
gives a quick picture of the shape of a
distribution while including the actual numerical
values in the graph.
Stemplots work best for small numbers of
observations that are all greater than 0.

49
How to Make a Stemplot

Separate each observation into a stem consisting
of all but the final (rightmost) digit and a
leaf, the final digit. Stems may have as many
digits as needed, but each leaf contains only a
single digit.
Write the stems in a vertical column with the
smallest at the top, and draw a vertical line at
the right of this column.
Write each leaf in the row to the right of its
stem, in increasing order out from the stem.

Here are the numbers of home runs that Babe Ruth
hit in each of his 15 years with the New York
Yankees, 1920 to 1934.

Stemplots do not work well for large data sets,
where each stem must hold a large number of
leaves.

Possible Modifications

We can increase the number of stems in a plot by
splitting each stem into two one with leaves 0
to 4 and the other with leaves 5 to 9.
When the observed values have many digits, it is
often best to round the numbers to just a few
digits before making a stemplot.

To make a stemplot of this distribution,
First round the purchases to the nearest dollar.
Then use tens of dollars as stems and dollars as
leaves.

53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
To see the shape of a distribution more clearly,
turn the stemplot on its side so that the larger
values lie to the right.
57
Remarks About Stemplots

Stemplots display the actual values of the
observations.
This feature makes stemplots awkward for large
data sets.
The construction of a stemplot is a quick an easy
way to sort data ( arrange them in order), and
sorting is required for some other statistical
procedures.

58
Dot Plot

Consists of a graph in which each data value is
plotted as a point (or dot) along a scale of
values

59
Histograms

A histogram is a bar graph of the frequency or
the relative frequency distribution of a
quantitative data.
The horizontal scale represents classes of data
values and the vertical scale represents
frequency or the relative frequency.
The heights of the bars correspond to frequency
or the relative frequency values, and the bars
are drawn adjacent to each other (without gaps).

60
Example Percent of Hispanics in the adult
population, by state (2000)
61
Step 1 Divide the range of the data into classes
of equal width.

The data in the table range from 0.6 to 38.7
We choose 8 intervals of length 5, that is,
We choose our classes as follows

62
Example Percent of Hispanics in the adult
population, by state (2000)
63
Step 2 Count the number of individuals in
each class.

These counts are called frequencies
A table of frequencies for all classes is a
frequency table.

64
Step 3 Draw the histogram.

First mark the scale for the variable whose
distribution you are displaying on the horizontal
axis. In this case, percent of adults who are
Hispanic
The vertical axis contains the scale of counts.

65
(No Transcript)
66
(No Transcript)
67
Two Types of Histograms
68
Histograms Another Example

Days to Maturity for Short-Term Investments
Frequency histogram

69
Histograms Another Example

Days to Maturity for Short-Term Investments
Frequency histogram

70
Histograms Another Example

Days to Maturity for Short-Term Investments
Relative-frequency histogram

71
Histograms Another Example

Days to Maturity for Short-Term Investments
Relative-frequency histogram

72
Examining distributions

In any graph of data, we look for the overall
pattern and for striking deviations from that
pattern.
We can describe the overall pattern of a
distribution by its shape, center, and spread.
An important kind of deviation is an outlier, an
individual value that falls outside the overall
pattern.

73
Distributions Shapes

An important aspect of the distribution of a
quantitative data set is its shape.
The shape of a distribution plays a role in
determining the appropriate method of statistical
analysis.
To identify the shape of a distribution, the best
approach usually is to use a smooth curve that
approximates the overall shape.

74
Distributions Shapes
In later chapters, there will be frequent
reference to data with a normal distribution.
One key characteristic of a normal distribution
is that it has a bell shape.

The frequencies start low, then increase to some
maximum frequency, then decrease to a low
frequency.
The distribution should be approximately
symmetric.

For instance, the following table displays a
frequency and a relative-frequency distribution
for the heights of the 3264 female students who
attend a Midwestern college.

This is a good example of normal distribution
76

The figure displays a relative-frequency
histogram for the same data. Notice the bell
shape.
Included is a smooth curve that approximates the
overall shape of the distribution.

The figure displays a relative-frequency
histogram for the same data. Notice the bell
shape.
Included is a smooth curve that approximates the
overall shape of the distribution.

Both the histogram and the smooth curve show that
this distribution of heights is bell shaped but
the smooth curve makes seeing the shape a little
easier

79
Advantages of using smooth curves

We do not need to worry about minor differences
in shape.
We can concentrate on overall patterns, which, in
turn, allows us to classify most distributions by
designating relatively few shapes.
and most importantly, allows us to use all the
tools from Calculus.

80
Some Common Distributions Shapes
81
The relative-frequency histogram for household
size in the United States
The distribution of household sizes is right
skewed.
82
The distribution of Babe Ruths home run counts
is symmetric and unimodal. Range is from 22 to
60. There are no outliers.
The distribution of supermarket spending is
skewed to the right. Range is from 3 to
93. Outliers?
83
Population and Sample Distributions

The data set obtained by observing the values of
a variable for an entire population is called
population data or census data.
The data set obtained by observing the values of
a variable for a sample of the population is
called sample data.
To distinguish their distributions, we use the
terminology population distribution and sample
distribution.

84
Population and Sample Distributions

For a particular population and variable, sample
distributions vary from sample to sample.
However, there is only one population
distribution, namely, the distribution of the
variable under consideration on the population
under consideration.

85
Population distribution and six sample
distributions for household size
86
Population and Sample Distributions

In practice, we usually do not know the
population distribution.
We can use the distribution of a simple random
sample from the population to get a rough idea of
the population distribution.
The larger the sample the better the sample
distribution will approximate the population
distribution.

Write a Comment

User Comments (0)