Title: Chapter 1: Exploring Data
1Chapter 1 Exploring Data 1.1 Displaying
Distributions with Graphs
2Types of Graphs
Categorical
Quantitative
Dotplot
Bar Chart
Stemplot
Pie Chart
Histogram
Ogive
Time Plot
3Bar graph
Displays categorical variables
Title
How to construct a bar graph
Step 1 Label your axes and title graph
Step 2 Scale your axes
Step 3 Leave spaces between bars
4Side-by-Side bar graph
Compares two variables of one individual
Title
5Example 1 The table shows results of a poll
asking adults whether they were looking forward
to the Super Bowl game, the commercials, or
didnt plan to watch.
Male Female Total
Game 279 200 479
Commercials 81 156 237
Wont Watch 132 160 292
Total 492 516 1008
Construct a side-by-side bar chart for their
preference based on gender. Note any trends that
appear.
6Reason Looking Forward to Super Bowl
300
Game
250
Commercials
200
150
Wont Watch
100
50
Female
Male
Males overwhelmingly watch the Super Bowl for the
game, where women seem mixed as to why they want
to watch it.
7Describing Quantitative Distributions
When describing a Graph -- CUSS
C
- Center
Average value, add up then divide by
Mean
Most frequent number. There can be many modes
Mode
Number in the center when data is lined up
Median
8Calculator Tip
To calculate mean and median
Stat edit type in data exit Stat CALC
1-Var Stats - L1
9Describing Quantitative Distributions
When describing a Graph -- CUSS
U
- Unusual points
Any data points that stand out as different
Dont call them outliers yet!
10Describing Quantitative Distributions
When describing a Graph -- CUSS
S
- Shape
Fold in half, it matches up
Symmetric
Special Case, dont say yet!
Bell/Normal
All the same frequencies
Uniform
11S
- Shape
One peak in the data
Unimodal
Two peaks in the data
Bimodal
12S
- Shape
Gaps
Space between the data
Cluster
Several data points grouped together
13S
- Shape
Skewed Right
Unusual point to the right
Skewed Left
Unusual point to the left
14Describing Quantitative Distributions
When describing a Graph -- CUSS
- Spread
S
Distance between largest and smallest values.
Range Maximum - Minimum
Range
Homogeneous
Data is all in a similar space (small spread)
15(No Transcript)
16Dotplot
Dots are used to keep count of the frequency of
each number
How to construct a dotplot
Step 1 Label your axis and title your graph.
Step 2 Mark a dot above the corresponding value
TITLE
17Example 2 The data below give the number of
hurricanes classified as major hurricanes in the
Atlantic Ocean each year from 1944 through 2006,
as reported by NOAA.
3 2 1 2 4 3 7 2 3 3 2 5 2 2 4 2 2
6 0 2 5 1 3 1 0 3 2 1 0 1 2 3 2 1
2 2 2 3 1 1 1 3 0 1 3 2 1 2 1 1 0
5 6 1 3 5 3 3 2 3 6 7 2 6 8
- Make a dotplot of the data.
18Number of Hurricanes Classified as a Major
Hurricane (1944-2006)
0 1 2 3 4 5 6 7 8
b. Describe what you see in a few sentences.
19- A dotplot is a simple display. It just places a
dot along an axis for each case in the data. - The dotplot to the right shows Kentucky Derby
winning times, plotting each race as its own dot. - You might see a dotplot displayed horizontally or
vertically.
20Guidelines for constructing Stemplots (stem and
leaf)
1. Put data in order from smallest to largest
2. Separate each value in a STEM and LEAF The
leaf is a single digit and it is the rightmost
digit of the number. The stem will consist of
everything else to the left of the leaf
3. Stems go in a vertical column from small to
large and a vertical line is drawn to the
right of the stems
4. Leaves are written to the right of their
stems from small to large.
21Back-to-Back Stemplots
To compare two different sets of data
Split Stemplots
To spread out the data to see more trends if they
are grouped together. Leaves will split from 0-4
and 5-9.
22Example 3 The data below give the amount of
caffeine content (in milligrams) for an 8-ounce
serving of popular soft drinks.
20 15 23 29 23 15 23 31 28 35 37 27 24 26 47 28 24 28 28
16 38 36 35 37 27 33 37 25 47 27 29 26 43 43 28 35 31 25
- Construct stemplot.
- Construct a split stemplot.
23Caffeine per 8oz of soda
a.
1 2 3 4
5 5 6 0 3 3 3 4 4 5 5 6 6 7 7 7 8 8 8 8 8 9 9 1
1 3 5 5 5 6 7 7 7 8 3 3 7 7
Key 1 5 15mill
b.
1 2 2 3 3 4 4
5 5 6 0 3 3 3 4 4 5 5 6 6 7 7 7 8 8 8 8 8 9 9 1
1 3 5 5 5 6 7 7 7 8 3 3 7 7
c. Differences?
24www.whfreeman.com/tps3e
1-Var Stats
25(No Transcript)
26Most people believe that you need to drink coffee
or an energy drink to get good buzz off of the
caffeine. Below is a table with common caffeine
levels of tea, coffee, and energy drinks. Coffee
133 160 150 103 150 93 150 115 75 75 40
Energy Drink
160 144 100 100 95 83 80 80 80 79
74 50 48
d. Make a back-to-back stemplot. Comment on the
difference in caffeine levels between coffee and
energy drinks.
27Coffee
Energy Drink
0 5 5 3 3 5 3 0 0 0 0
4 5 6 7 8 9 10 11 12 13 14 15 16
8 0 9 4
3 0 0 0 5 0 0 4
0
Key 1 5 15mg
28http//www.cspinet.org/new/cafchart.htm
29http//www.cspinet.org/new/cafchart.htm
30562
56.2
5.62
562
56 2
56 2
56 2
5 6
50
2
5 0
0 2
31Back-to-Back Stemplots
To compare two different sets of data
565
562
572
580
577
565
5 2 56 5 57 2 7 0 58
32Split Stemplots
To spread out the data to see more trends if they
are grouped together. Leaves will split from 0-4
and 5-9.
565
562
572
580
577
565
2 56 5 56 5 57 2 57 7
0 58
33Count towards median
median
Count towards median
34Calculator Tip
Sort values from smallest to largest
Stat Edit type in data exit Stat SortA
L1
35Calculator Tip
Clearing Lists
All Lists Mem ClrAllLists Enter
One List Stat Edit Highlight List name
Clear
36Calculator Tip
Deleted a list?
STAT SetUpEditor Enter
37Calculator Tip
Save a list?
L1 STO? Any name or Letter
To Retrieve later 2nd List
38Calculator Tip
Remove a number from list?
Line up number you want to delete, hit DEL
39Histogram
1. Divide the range of data into classes of
equal width.
2. Count the number of observations in each
class. Ensure no one number falls into two
classes
3. Label and scale the axes and title your graph.
4. Draw a bar that represents the count in each
class. The base of a bar should cover its
class, and the bar height is the class
count. Leave no horizontal space between
the bars unless the class is empty.
40Make a histogram. Pg. 59
Calculator Tip
Stat Edit type in data exit StatPlot 1
On histogram L1 Freq 1 Zoom ZoomStat (9)
41To adjust the classes Window Xmin Lowest
value Xmax Highest value Xscl Scale on
x-axis (width of bars) Ymin -0.2
typically Ymax Highest frequency rate (height
of bars) Yscl Scale on y-axis
Ymax
Yscl
Ymin
Xscl
Xmax
Xmin
42Ex. 4 Describe the distribution of the graph.
C
4-5 words
U
12 words
S
Unimodal, slight skew right
S
1 to 12
Range 11
43Example5 An executive finds the subscriptions
(in millions of people) of the 20 leading
American magazines is as follows
Readers Digest 17.9 Ladies Home Journal 5.3
TV Guide 17.1 National Enquirer 4.7
National Geographic 10.6 Time 4.6
Modern Maturity 9.3 Playboy 4.2
AARP News Bulletin 8.8 Redbook 4
Better Homes and Gardens 8 The Star 3.7
Family Circle 7.2 Penthouse 3.5
Womans Day 7 Newsweek 3
McCalls 6.4 Cosmopolitan 3
Good Housekeeping 5.4 People Weekly 2.8
Make a histogram for the number of subscriptions
in intervals of 2 (million) compared to the
frequency of that number. Then describe the
graph.
44Circulation in millions of people of American
Magazines
8 7 6 5 4 3 2 1
Frequency
2 4 6 8 10 12 14 16 18 20
Circulation (in millions)
Describe the features of the graph in detail.
C
mean 6 .825, median 5.35
U
17.1 17.9
S
Skewed to the right, unimodal
S
2.8 to 17.9, range of 15.1
45Height of NBA Players
46http//bcs.whfreeman.com/tps3e
Page 50 applets One-variable Statistical
calculator
- How do you determine how many classes to make?
- When is it good to split the stems on a stemplot?
47HW
P2 1.1 Types of Graphs Bar graph Dotplot Stemplot Histogram Describing a Graph 19 7, 9
P2 1.1 Types of Graphs Bar graph Dotplot Stemplot Histogram Describing a Graph 47 57-58 109 3(ab only) 11 51
48Day 3
1.1 1.2
49Relative Cumulative Frequency Graph (Ogive)
Shows relative standing of an observation
50Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
- What percent of presidents were younger than 60?
80
51Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
30
b. What percent of presidents were between 50 and
55?
52Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
c. There is a horizontal line between 35 and 40
years of age. What does that mean?
No presidents were less than 40 years old
53Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
d. What is the median age of the current
presidents?
55
54Example 6 The President of the United States
has to be at least 35 years old and be born in
America. Below is an ogive showing the relative
cumulative frequency of the previous presidents
that were inaugurated.
e. President Obama was 47 when he was
inaugurated. What percent of presidents were
older than him?
85
55Plots each observation against the time at which
it was measured. Always mark the time scale on
the horizontal axis and the variable being
measured on the y axis.
Time Plots
A common overall pattern.
Trend
Seasonal Variations
A pattern that repeats itself at regular time
intervals
56Ex. 7 Identify any trends and describe the
time plot.
Seems to fluctuate, peaking in 1983
57(No Transcript)
58Chapter 1 Exploring Data 1.2 Describing
Distributions with Numbers
59Mean The average number of a set of data. Add
the values in the data set and divide by the
number of observations
For n observations,
or
60Ex8 Find the mean for the two sets of
data. Data set A 1 1 2 2 3 Data set
B 1 1 2 2 500,000
Data set A
Data set B
What happened?
Strongly influenced by unusual values
61Variance
Average of the squares of the deviations of the
observations from their mean
or
62Standard Deviation
The square root of the variance
Measures the average distance the values are away
from the mean.
Degrees of Freedom
Dividing by n 1
63Calculator Tip
Standard Deviation
1-var stats L1
64Ex9 Calculate the Standard Deviation by
Hand Data Set 6, 4, 4, 3, 2, 6, 10
Mean 5
(4-5)2 (4-5)2 (3-5)2 (2-5)2 (6-5)2
(10-5)2
(6-5)2
42
(1)2
(-1)2 (-1)2 (-2)2 (-3)2 (1)2 (5)2
2.64575
65Example 10 Using the numbers 1-10, choose 4
numbers so the standard deviation will be the
smallest. Then choose 4 numbers so the standard
deviation will be the largest. (Repeats are ok)
Smallest
1, 1, 1, 1
Sx 0
Sx 5.196
Largest
1, 1, 10, 10
66http//www.stat.tamu.edu/west/ph/stddev.html
67Example 11 Which graph will have the larger
standard deviation? Why?
a. b. c.
d. e.
68- Properties of the standard deviation and
variance - Sensitive to _______________.
- Some deviations are positive and some are
negative (thats why we square them!) Otherwise,
they would add up to zero and tell us nothing
about the deviance around the mean. Then, to get
the original units, we take the square root.
outliers
69Properties of the standard deviation and variance
- Standard deviation is at least ZERO, or
greater, but never ________________. - Values that are very close together have a
_____________ standard deviation and those far
apart have a _____________ standard deviation.
negative
small
large
701.1 1.2 Ogives Time Plot Mean Variance Standard Deviation 64-69 89 101 13(ab only), 22, 23, 26 39, 43 54 Curriculum Night
71Day 4 1.2
72Median The midpoint or value where half of the
data is above the median and half is below the
median. (50 mark)
- To find the median
- Put all the data in order from smallest to
largest - Cancel off the end data points until you find
the middle
73Resistant measure
Good estimate even when there is very unusual
values.
74Ex12 Find the median for the two sets of
data. Data set A 1 1 2 2 3 Data set
B 1 1 2 2 500,000
Data set A
M 2
Data set B
M 2
Which one is a resistant measure? Mean or Median?
75p percent of the observations fall at or below it
pth percentile
76Quartiles
25th percentile first quartile Q1 50th
percentile median Q2 75th percentile third
quartile Q3
Five-Number Summary
Min, Q1, M, Q3, Max
77Boxplot
Uses the five-number summary. A box is drawn
connecting Q1 and Q3 with a line through the
median. Whiskers are drawn to the max and min.
25
25
25
25
min
Q1
med
Q3
max
line
78Interquartile Range
IQR Q3 Q1
Outliers
Data that is away from the majority of points
To Determine
Lower Outlier
Q1 1.5(IQR)
Upper Outlier
Q3 1.5(IQR)
All values should be between these two numbers
79Outlier
Outliers
min
Q1
med
Q3
max
line
Keep in mind, you dont know how much data is in
a boxplot!
80(No Transcript)
81Calculator Tip
Boxplots.
Pg. 81
Stat Edit type data exit StatPlot 1 on
boxplot (with or without outliers) L1
82Calculator Tip
5-Number Summary
Pg. 81
Stat Calc 1-var Stats L1
83Ex 13 The Fuel Economy of 2004 vehicles is
given.
13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28
28 29 32 66
a. Determine the 5-number summary.
Min
13
Q1
18
Med
23
28
Q3
66
Max
84b. Calculate the range and IQR for each data set.
Range
66 13
53
IQR
28 18
10
Min
13
Q1
18
Med
23
28
Q3
66
Max
85c. Make a box plot using the 5-number summary.
10 15 20 25 30 35 40
45 50 55 60 65 70
d. Describe the shape, center, and spread.
C
Median 23
S
Skewed Right
U
66
S
Range 53, IQR 10
86e. Are there any potential outliers using the
criterion?
Q1 1.5(IQR)
Q3 1.5(IQR)
18 1.5(10)
28 1.5(10)
18 15
28 15
3
43
Yes, 66 is above 43.
87f. Construct a modified boxplot to account for
the outlier.
10 15 20 25 30 35 40
45 50 55 60 65 70
88Ozone and Outliers The 'ozone hole' above
Antarctica provides the setting for one of the
most infamous outliers in recent history. It is a
great story to tell students who wantonly delete
outliers from a dataset merely because they are
outliers. In 1985 three researchers (Farman,
Gardinar and Shanklin) were puzzled by some data
gathered by the British Antarctic Survey showing
that ozone levels for Antarctica had dropped 10
below normal January levels. The puzzle was why
the Nimbus 7 satellite, which had instruments
aboard for recording ozone levels, hadn't
recorded similarly low ozone concentrations. When
they examined the data from the satellite it
didn't take long to realize that the satellite
was in fact recording these low concentrations
levels and had been doing so for years. But
because the ozone concentrations recorded by the
satellite were so low they were being treated as
outliers by a computer program and discarded! The
Nimbus 7 satellite had in fact been gathering
evidence of low ozone levels since 1976. The
damage to our atmosphere caused by
chloroflourocarbons went undetected and untreated
for up to nine years because outliers were
discarded without being examined. Moral Don't
just toss out outliers, as they may be the most
valuable members of a dataset.
89(No Transcript)
90Weight of NBA Players
91(No Transcript)
92- Compare the histogram and boxplot for daily wind
speeds - How does each display represent the distribution?
93Matching Histograms and Boxplots Match each
histogram with its boxplot, by writing the letter
of the boxplot in the space provided.
941.
D
95A
2.
963.
C
974.
E
985.
B
991970 Draft
Was the draft fair?
1001971 Draft
101(No Transcript)
102(No Transcript)
103(No Transcript)
104(No Transcript)
1051.2 Percentile Median Quartiles Boxplot IQR Determine outlier 82-84 106-107 33, 36, 37 61(a only), 62
106Day 5 1.2
107Comparing Distributions
Make sure you actually compare!!!!!!
Dont just state CUSS, but compare the values
108(No Transcript)
109 change in population from 1990 to 2000
110http//www.ruf.rice.edu/lane/stat_sim/descriptive
/
Mean and median applet.
www.whfreeman.com/tps3e
Pg. 73
Mean and median applet.
111If the data is uniform or symmetric use
Mean
Center
Spread
standard deviation
If the data is skewed, use
Median
Center
Spread
Five-number summary, Range, IQR
112(No Transcript)
113Who's Counting It's Mean to Ignore the Median
Reading Economic Numbers from Democratic,
Republican Points of View Aug. 6, 2006 -
Believe it or not, the difference in the way the
Democrats and Republicans react to the
performance of the U.S. economy is clarified by a
mathematical distinction studied in elementary
school. The distinction is between the mean,
which the Republicans emphasize, while the
Democrats prefer the median. The relevance of
this distinction is apparent in the just-released
figures on the U.S. economy for 2004, the latest
year for which there is complete data. The
Republicans chortle that the economy grew at a
healthy rate of 4.2 percent. (It's slowed since
then.) The Democrats point to data from the
Census Bureau for the same year (and earlier as
well), indicating that the real median family
income fell and that poverty increased.
114Example 14 Should you use the mean or median to
discuss the center?
- Average price of home
- Average age
- Average height
- Average gas mileage for all cars
Median
Mean
Mean
Median
115Linear Transformation
Change in the measurement unit where you add or
multiply the data
116Matching Histograms and Summary
Statistics Match each histogram with a set of
summary statistics, by writing the letter in the
space provided.
117D. mean 10.2 standard deviation
4.1 median 11.9 IQR 6.8
1.
D
118A. mean 10.5 standard deviation
1.4 median 10.7 IQR 2.0
2.
A
119B. mean 10.1 standard deviation
2.7 median 10.1 IQR 4.2
3.
B
120E. mean 8.8 standard deviation
2.8 median 8.0 IQR 1.9
4.
E
121C. mean 10.2 standard deviation
2.1 median 10.5 IQR 2.5
5.
C
122Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Original Data
Mean
Median
S.D.
Q1
Q3
IQR
Range
3
3.5
1.77
1
4.5
3.5
4
123Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Dotplot
1 2 3 4 5
124Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Boxplot
1 2 3 4 5
125Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Original Data
Mean
Median
S.D.
Q1
Q3
IQR
Range
Multiply by 3
3
9
3.5
10.5
1.77
5.31
1
3
4.5
13.5
3.5
10.5
4
12
126Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Dotplot
1 2 3 4 5
3 4 5 6 7 8 9 10
11 12 13 14 15
127Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Boxplot
1 2 3 4 5
3 4 5 6 7 8 9 10
11 12 13 14 15
128Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Original Data
Mean
Median
S.D.
Q1
Q3
IQR
Range
Multiply by 3
Add 4
3
9
7
3.5
10.5
7.5
1.77
5.31
1.77
1
3
3
4.5
13.5
8.5
3.5
10.5
3.5
4
12
4
129Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Dotplot
1 2 3 4 5
1 2 3 4 5 6 7 8 9
130Example 15 Consider the following data set 1,
1, 1, 3, 4, 4, 5, 5 Transform the data.
Boxplot
1 2 3 4 5
1 2 3 4 5 6 7 8 9
131Conclusion
Multiply
Changes both center and spread
Add
Changes mean 5-number summary Spread doesnt
change.
Middle always Moves
Spread Sometimes Shifts
132Mean
Standard Deviation
133- Example 6
- True or False.
- If you add 7 to each entry on a list, that adds
7 to the mean. - If you add 7 to each entry on a list, that adds
7 to the standard deviation. - If you double each entry on a list, that
doubles the mean.
TRUE
FALSE
TRUE
134- Example 6
- True or False.
- If you double each entry on a list, that
doubles the standard deviation. - Multiplying each entry on a list changes the
mean. - Multiplying each entry on a list changes the
standard deviation.
TRUE
TRUE
TRUE
135- Example 6
- True or False.
- g. Adding to each entry on a list changes the
mean. - h. Adding to each entry on a list changes the
standard deviation.
TRUE
FALSE
136Example 17 A college professor gave a test to
his students. The test had five questions, each
worth 20 points. The summary statistics for the
students scores on the test are below. After
grading the test, the professor realized that,
because he had made a typographical error in
question number 2, no student was able to answer
the question. So he decided to adjust the
students scores by adding 20 points to each one.
What will be the summary statistics for the new,
adjusted scores?
Summary Statistics for Scores Summary Statistics for Scores NEW
Mean 62
Median 60
Range 45
Standard Deviation 8
Q1 71
Q3 48
IQR 23
82
80
45
8
91
68
23
137Example 18 The summary statistics for the
property tax per property collected by one county
are below. This year, county residents voted to
increase property taxes by 2 percent to support
the local school system. What will be the
summary statistics for the new, increased
property taxes?
Summary Statistics for Property Tax Summary Statistics for Property Tax NEW
Mean 12,000
Median 8,000
Range 30,000
Standard Deviation 5,000
Q1 14,000
Q3 5,000
IQR 9,000
12,240
8,160
30,600
5,100
14,280
5,100
9,180
1381.2 Mean vs. Median Describing a Graph Choosing a Summary Linear Transformations 55-57 74-75 82 89 97 102 110-111 7, 10 27, 31, 32 35 40, 42 45, 46 58 68, 70
Research Project Due Soon!