Title: Chapter 1 Looking at Data
1Chapter 3 Looking at DataDistributions
Chapter Three Looking At Data Distributions
Introduction 3.1 Displaying Distributions with
Graphs
23.1 Displaying Distributions with Graphs
- Variables
- Examining Distributions of Variables
- Graphs for Categorical Variables
- Bar graphs
- Pie charts
- Graphs for Quantitative Variables
- Histograms
- Stemplots
- Time plots
3Statistics
Statistics is the science of learning from
data. The first step in dealing with data is to
organize your thinking about the data. An
exploratory data analysis is the process of using
statistical tools and ideas to examine data in
order to describe their main features.
- Exploring Data
- Begin by examining each variable by itself. Then
move on to study the relationships among the
variables. - Begin with a graph or graphs. Then add numerical
summaries of specific aspects of the data.
4Variables
We construct a set of data by first deciding
which cases or observations or individuals or
units we want to study. For each case, we record
information about characteristics that we call
variables.
Individual An object described by data
Categorical variable Places individual into one
of several groups or categories.
Variable Characteristic of the individual
Quantitative variable Takes numerical values for
which arithmetic operations make sense.
5Quantitative Variables
- Quantitative variables can either be counts or
measurements or rates. - SEE EXAMPLE 3.6 ON PAGE 84 (IN THE CHAPTER 3
INTRODUCTION) FOR WHY RATES ARE IMPORTANT - RATE of occurrences of the event per X in the
population of all possible occurrences (where X
is a large number (10,000 100,000 e.g.) - Murder rate in NH County (murders in
NHC/possible murders(i.e., population) large
number (like 100,000) - (in 2012, 9 murders, population estimate209234
so the murder rate in NH County in 2012 is
9/209234 0.00004301. Multiply by 100,000 to get
the rate per 100,000. 0.00004301 100000
4.3 murders per 100,000 people (or per capita)
Guilford County 26/500879 100000 5.2 etc.
6Distribution of a Variable
To examine a single variable, we graphically
display its distribution.
- The distribution of a variable tells us what
values it takes and with what frequency it takes
on these values. - Distributions can be displayed using a variety of
graphical tools. The proper choice of graph
depends on the kind of the variable and how easy
it is to draw them. JMP makes easy work of
graphing!
Categorical variable Pie chart I dont
recommend pie charts! Bar graph these are fine!
Quantitative variable Histogram Stemplot
7Categorical Variables
- The distribution of a categorical variable lists
the categories and gives the count or percent of
individuals who fall into that category. - Pie charts show the distribution of a categorical
variable as a pie whose slices are sized by the
counts or percents for the categories hard to
draw and hard to interpret! - Bar graphs represent each category as a bar whose
heights show the category counts or percents.
8Pie Charts and Bar Graphs
Material Weight (million tons) Percent of total
Food scraps 25.9 11.2
Glass 12.8 5.5
Metals 18.0 7.8
Paper, paperboard 86.7 37.4
Plastics 24.7 10.7
Rubber, leather, textiles 15.8 6.8
Wood 12.7 5.5
Yard trimmings 27.7 11.9
Other 7.5 3.2
Total 231.9 100.0
9Quantitative Variables
- The distribution of a quantitative variable tells
us what values the variable takes on and the
frequency with which it takes on those values. - Histograms show the distribution of a
quantitative variable by using bars whose height
represents the number of individuals who take on
a value within a particular class. - Stemplots separate each observation into a stem
and a leaf that are then plotted to display the
distribution while maintaining the original
values of the variable original values are not
hidden as in the histogram. - Time plots plot each observation (on the vertical
axis) against the time at which it was measured
(on the horizontal axis).
10Stemplots
- To construct a stemplot
- Separate each observation into a stem (first part
of the number) and a leaf (the remaining part of
the number). - Write the stems in a vertical column draw a
vertical line to the right of the stems. - Write each leaf in the row to the right of its
stem order leaves if desired.
11Stemplots
Example Weight data?Introductory Statistics class
12Stemplots
- If there are very few stems (when the data cover
only a very small range of values), then we may
want to create more stems by splitting the
original stems. - Example If all of the data values were between
150 and 179, then we may choose to use the
following stems
Leaves 04 would go on each upper stem (first
15), and leaves 59 would go on each lower stem
(second 15).
13Histograms
- For quantitative variables that take many values
and/or large datasets - Divide the possible values into classes (equal
widths). - Count how many observations fall into each
interval (may change to percents). - Draw picture representing the distribution?bar
heights are equivalent to the number (percent) of
observations in each interval. - JMP does all three of the above with a couple of
clicks Analyze -gt Distribution -gt choose the
variable to be plotted.
14Histograms
Example Weight data?Introductory Statistics class
15Examining Distributions
- In any graph of data, look for the overall
pattern and for striking deviations from that
pattern. - You can describe the overall pattern by its
shape, center, and spread. - An important kind of deviation is an outlier, an
individual that falls outside the overall pattern.
16Examining Distributions - Shape
- A distribution is symmetric if the right and left
sides of the graph are approximately mirror
images of each other. - A distribution is skewed to the right
(right-skewed) if the right side of the graph
(containing the half of the observations with
larger values) is much longer than the left side. - A distribution is skewed to the left
(left-skewed) if the left side of the graph is
much longer than the right side.
Symmetric
Skewed-left
Skewed-right
17Outliers
- An important kind of deviation is an outlier.
Outliers are observations that lie outside the
overall pattern of a distribution. Always look
for outliers and try to explain them.
This overall pattern is fairly symmetrical,
except for two states that clearly do not belong
to the main trend. Alaska and Florida have
unusual representation of the elderly in their
populations. Large gaps in the distribution are
places to look for outliers.
Alaska
Florida
18Time Plots
- A time plot shows behavior of a quantitative
variable over time. - Time is always on the horizontal axis, and the
variable being plotted is on the vertical axis. - Look for an overall pattern (trend), and
deviations from this trend. Connecting the data
points by lines may emphasize this trend. - Look for patterns that repeat at known regular
intervals (seasonal variations) - Go over the US Regular Retail Gas Prices in JMP
19Time Plots
Look at the gas price data
20HW use JMP whenever possible to draw the graphs
HW Begin reading Intro. to Ch.3 and section 3.1
work through the Examples in 3.1 do the
Exercises 3.7, 3.10-3.14, 3.21, 3.24, 3.25,
3.27, 3.32, 3.33-3.36, 3.38 (JMP), 3.39