Title: Chapter 2 Summarizing and Graphing Data
1Chapter 2 Summarizing and Graphing Data
- Overview
- Frequency Distributions
- Statistical Graphics (stemplots, dotplots, etc)
- Histograms
2Frequency Distributions
- The amount of data collected in some real-world
situations can be overwhelming. - By suitably organizing data, we can often make a
large and complicated set of data more compact
and easier to understand. - Here, we discuss Frequency Distributions which
involves putting data into groups rather than
treating each observation individually.
3Grouping Quantitative Data
- Days to Maturity for Short-Term Investments The
table below displays the number of days to
maturity for 40 short-term investments.
4Grouping Quantitative Data
- Getting a clear picture of the data in the table
is difficult.
5Grouping Quantitative Data
- By grouping the data into categories, or classes,
we can make the data easier to comprehend.
6Grouping Quantitative Data
- The first step is to decide on the classes.
- A convenient way to group these data is by 10s.
7Grouping Quantitative Data
- Since the shortest maturity period is 36 days,
our first class is for maturity periods from 30
days up to, but not including, 40 days.
8Grouping Quantitative Data
- The longest maturity period is 99 days, so
grouping by 10s results in the seven classes.
9Grouping Quantitative Data
- The final step for grouping the data is to find
the number of investments in each class.
10Grouping Quantitative Data
- The final step for grouping the data is to find
the number of investments in each class.
11Grouping Quantitative Data
- The final step for grouping the data is to find
the number of investments in each class.
12Grouping Quantitative Data
- In the previous example, we used a commonsense
approach to grouping data into classes. Some of
that common sense can be used as guidelines for
grouping. - Three of the most important guidelines are the
following.
13Grouping Quantitative Data
- The number of classes should be small enough to
provide an effective summary but large enough to
display the relevant characteristics of the data
(in general between 5 and 20 ). - Each observation must belong to one, and only
one, class. - Whenever feasible, all classes should have the
same width.
14Frequency Distributions
- The number of observations that fall into a
particular class is called the frequency or count
of that class. - A table that provides all classes and their
frequencies is called a frequency distribution.
15Frequency Distributions
- The frequency distribution in this example is
16Relative-Frequency Distributions
- In addition to the frequency of a class, we
are often interested in the percentage of a class.
17Relative-Frequency Distributions
The relative frequency is the percent of
observations within a category and is found using
the formula A relative frequency
distribution lists the relative frequency of each
category of data.
18Terms Used in Grouping
- Lower class limits are the smallest numbers that
can actually belong to different classes. - Upper class limits are the largest numbers that
can actually belong to different classes. - Class boundaries are the numbers used to
separate classes, but without the gaps created by
class limits. - Class midpoints are the midpoints of the classes
and can be found by adding the lower class limit
to the upper class limit and dividing the sum by
two. - Class width is the difference between two
consecutive lower class limits.
19Reasons for Constructing Frequency Distributions
- Large data sets can be summarized.
- We can gain some insight into the nature of data.
- We have a basis for constructing important graphs.
20Another Example of Frequency Distribution
Illustrating these Concepts
21Frequency Distribution Ages of Best Actresses
Frequency Distribution
Original Data
22Lower Class Limits
23Upper Class Limits
24Class Boundaries
Editor Substitute Table 2-2
25Class Midpoints
26Class Width
Editor Substitute Table 2-2
27Relative Frequency Distribution
28/76 37 30/76 39 etc.
Total Frequency 76
28Cumulative Frequency Distribution
Cumulative Frequencies
29Frequency Tables
30Single-Value Grouping
If the data is discrete, the categories of data
will be the observations (when total number of
them is a relatively small number).
Consider the following example The following
data represent the number of available cars in a
household based on a random sample of 50
households. Construct a frequency and relative
frequency distribution.
31Single-Value Grouping
3 0 1 2 1 1 1 2 0 2 4 2 2 2 1 2 2 0 2 4 1 1 3 2 4
1 2 1 2 2 3 3 2 1 2 2 0 3 2 2 2 3 2 1 2 2 1 1 3 5
32Grouping Qualitative Data
- The concepts of class limits and midpoints are
not appropriate for qualitative data. - For instance, if we have data that categorize
people as male or female, then the classes are
male and female. - We can still group qualitative data and compute
frequencies and relative frequencies for classes. - For qualitative data, the classes coincide with
the observed values of the corresponding variable.
33Grouping Qualitative Data
Example The data on the next slide represent the
color of MMs in a bag of plain MMs. Construct
a frequency distribution of the color of plain
MMs.
34Yellow Orange Brown Green Green
Blue Brown Red Brown Brown
Orange Brown Red Brown Red
Green Brown Red Green Yellow
Yellow Red Red Brown Orange
Yellow Orange Red Orange Blue
Brown Red Yellow Brown Red
Brown Yellow Yellow Blue Yellow
Yellow Brown Yellow Green Orange
35Yellow Orange Brown Green Green
Blue Brown Red Brown Brown
Orange Brown Red Brown Red
Green Brown Red Green Yellow
Yellow Red Red Brown Orange
Yellow Orange Red Orange Blue
Brown Red Yellow Brown Red
Brown Yellow Yellow Blue Yellow
Yellow Brown Yellow Green Orange
36Frequency Tables
37Distributions of Data Sets
The distribution of a data set is a table, graph,
or formula that provides the values of the
observations and how often they occur.
38Displaying Distributions
Qualitative Variable
Quantitative Variable
Frequency Table
Stem Leaf Plot
Frequency Table
Histogram
Pie Chart
Bar Graph
Dot Plot
39Displaying Distributions of Qualitative Data
40- Example how well educated are 30-something young
adults? Here is the distribution of the highest
level of education for people aged 25 to 34 years
41Bar Graph
42- Label each category of data on a horizontal axis
and the frequency or relative frequency of the
category on the vertical axis.
43- A rectangle of equal width is drawn for each
category whose height is equal to the category's
frequency or relative frequency.
44- The bar graph quickly compares the sizes of the
five education groups. The heights of the bars
show the counts in the five categories.
45Pie Chart
46- The pie chart helps us see what part of the whole
each group forms. For example, the HS grad
slice makes up 30.7 of the pie because 30.7 of
young adults have only a high school education.
47Displaying Distributions of Quantitative Data
48Stemplots
- A stemplot (also called a stem-and-leaf plot)
gives a quick picture of the shape of a
distribution while including the actual numerical
values in the graph. - Stemplots work best for small numbers of
observations that are all greater than 0.
49How to Make a Stemplot
- Separate each observation into a stem consisting
of all but the final (rightmost) digit and a
leaf, the final digit. Stems may have as many
digits as needed, but each leaf contains only a
single digit. - Write the stems in a vertical column with the
smallest at the top, and draw a vertical line at
the right of this column. - Write each leaf in the row to the right of its
stem, in increasing order out from the stem.
50- Here are the numbers of home runs that Babe Ruth
hit in each of his 15 years with the New York
Yankees, 1920 to 1934.
51- Stemplots do not work well for large data sets,
where each stem must hold a large number of
leaves.
Possible Modifications
- We can increase the number of stems in a plot by
splitting each stem into two one with leaves 0
to 4 and the other with leaves 5 to 9. - When the observed values have many digits, it is
often best to round the numbers to just a few
digits before making a stemplot.
52- To make a stemplot of this distribution,
- First round the purchases to the nearest dollar.
- Then use tens of dollars as stems and dollars as
leaves.
53(No Transcript)
54(No Transcript)
55(No Transcript)
56To see the shape of a distribution more clearly,
turn the stemplot on its side so that the larger
values lie to the right.
57Remarks About Stemplots
- Stemplots display the actual values of the
observations. - This feature makes stemplots awkward for large
data sets. - The construction of a stemplot is a quick an easy
way to sort data ( arrange them in order), and
sorting is required for some other statistical
procedures.
58Dot Plot
- Consists of a graph in which each data value is
plotted as a point (or dot) along a scale of
values
59Histograms
- A histogram is a bar graph of the frequency or
the relative frequency distribution of a
quantitative data. - The horizontal scale represents classes of data
values and the vertical scale represents
frequency or the relative frequency. - The heights of the bars correspond to frequency
or the relative frequency values, and the bars
are drawn adjacent to each other (without gaps).
60Example Percent of Hispanics in the adult
population, by state (2000)
61Step 1 Divide the range of the data into classes
of equal width.
- The data in the table range from 0.6 to 38.7
- We choose 8 intervals of length 5, that is,
- We choose our classes as follows
62Example Percent of Hispanics in the adult
population, by state (2000)
63Step 2 Count the number of individuals in
each class.
- These counts are called frequencies
- A table of frequencies for all classes is a
frequency table.
64Step 3 Draw the histogram.
- First mark the scale for the variable whose
distribution you are displaying on the horizontal
axis. In this case, percent of adults who are
Hispanic - The vertical axis contains the scale of counts.
65(No Transcript)
66(No Transcript)
67Two Types of Histograms
68Histograms Another Example
- Days to Maturity for Short-Term Investments
- Frequency histogram
69Histograms Another Example
- Days to Maturity for Short-Term Investments
- Frequency histogram
70Histograms Another Example
- Days to Maturity for Short-Term Investments
- Relative-frequency histogram
71Histograms Another Example
- Days to Maturity for Short-Term Investments
- Relative-frequency histogram
72Examining distributions
- In any graph of data, we look for the overall
pattern and for striking deviations from that
pattern. - We can describe the overall pattern of a
distribution by its shape, center, and spread. - An important kind of deviation is an outlier, an
individual value that falls outside the overall
pattern.
73Distributions Shapes
- An important aspect of the distribution of a
quantitative data set is its shape. - The shape of a distribution plays a role in
determining the appropriate method of statistical
analysis. - To identify the shape of a distribution, the best
approach usually is to use a smooth curve that
approximates the overall shape.
74Distributions Shapes
In later chapters, there will be frequent
reference to data with a normal distribution.
One key characteristic of a normal distribution
is that it has a bell shape.
- The frequencies start low, then increase to some
maximum frequency, then decrease to a low
frequency. - The distribution should be approximately
symmetric.
75- For instance, the following table displays a
frequency and a relative-frequency distribution
for the heights of the 3264 female students who
attend a Midwestern college.
This is a good example of normal distribution
76- The figure displays a relative-frequency
histogram for the same data. Notice the bell
shape. - Included is a smooth curve that approximates the
overall shape of the distribution.
77- The figure displays a relative-frequency
histogram for the same data. Notice the bell
shape. - Included is a smooth curve that approximates the
overall shape of the distribution.
78- Both the histogram and the smooth curve show that
this distribution of heights is bell shaped but
the smooth curve makes seeing the shape a little
easier
79Advantages of using smooth curves
- We do not need to worry about minor differences
in shape. - We can concentrate on overall patterns, which, in
turn, allows us to classify most distributions by
designating relatively few shapes. - and most importantly, allows us to use all the
tools from Calculus.
80Some Common Distributions Shapes
81The relative-frequency histogram for household
size in the United States
The distribution of household sizes is right
skewed.
82The distribution of Babe Ruths home run counts
is symmetric and unimodal. Range is from 22 to
60. There are no outliers.
The distribution of supermarket spending is
skewed to the right. Range is from 3 to
93. Outliers?
83Population and Sample Distributions
- The data set obtained by observing the values of
a variable for an entire population is called
population data or census data. - The data set obtained by observing the values of
a variable for a sample of the population is
called sample data. - To distinguish their distributions, we use the
terminology population distribution and sample
distribution.
84Population and Sample Distributions
- For a particular population and variable, sample
distributions vary from sample to sample. - However, there is only one population
distribution, namely, the distribution of the
variable under consideration on the population
under consideration.
85Population distribution and six sample
distributions for household size
86Population and Sample Distributions
- In practice, we usually do not know the
population distribution. - We can use the distribution of a simple random
sample from the population to get a rough idea of
the population distribution. - The larger the sample the better the sample
distribution will approximate the population
distribution.