Title: Graphs, Good and Bad
1Chapter 10
2Chapters 10 -11 Summarizing Data
- Presentation of data
- Frequency Tables
- Pictures of Data
- Numerical Summaries (Center and Variation)
3Summarizing Data
- Data is often collected in order to answer some
question or - address a specific issue.
- When analyzing a data set we should first
consider whether the data comes from a complete
population (remember this means taking a census -
for example test scores of everyone in this
class) or a sample. -
- Methods of descriptive statistics are used to
summarize the important characteristics of a set
of data.
4Important Characteristics of Data
- The following characteristics of data are usually
important -
- Center An average value that indicates where the
middle of the dataset is located. - Variation/Spread A measure of the amount of
variation in the data (average variation from the
center). - Distribution The shape of the distribution of
the data (symmetric, uniform or skewed). - Outliers Sample values that lie far away from
the vast majority of the other values. - Time Trend- changing characteristics of the data
over time.
5Frequency Tables
- A frequency table lists data values (either
individually or by groups of intervals called
classes ), along with the number of items that
fall into each class (frequency). Example - Test Score Frequency
- 0- 4 3
- 5 - 9 10
- 10-14 12
- 15-19 35
- 20-24 20
- 25-29 15
- 30-34 5
- This frequency table has 7 classes
(0-4,5-9,10-14,15-19,20-24,25-29,30-34). The
frequency represents the number of students
receiving that score.
6Example
- The heights (in inches) of 30 students are as
- follows
- 68 64 70 67 67 68 64 65 68 64 70 72
71 69 72 - 64 63 70 71 63 68 67 67 65 69 65 67
66 61 65 - Create a frequency table for the above data
- using the classes 60-61, 62-63, 64-65 etc.
7RELATIVE FREQUENCY TABLES
- Relative frequency frequency / total of items
- The relative frequency gives the percent of items
in each class. - A relative frequency table is a frequency table
with a column for the relative frequencies. The
relative frequencies might not add to 1 (100)
due to rounding. - Example Construct a relative frequency table for
our last example.
8Distribution
- Tells what values a variable takes and how often
it takes these values. - Can be a table, graph, or function.
9Graphs for Categorical Data
- A picture (a good one) is worth a thousand
words. -
- Bar Graph
- Horizontal axis represents the categories.
- Vertical axis represents the frequencies.
- A bar whose height is proportional to the
frequency is drawn - centered at the category.
10Example
- The following table gives the grade distributions
of a Math 161 Test - Grade Frequency
- A 5
- B 7
- C 12
- D 5
- F 3
- Draw a bar graph for the data.
11Graphs for Categorical Data
- Double (Side-by-side) Bar Graphs
- Used to compare two different distributions.
- For each category, draw two adjacent bars (one
for each - distribution).
12Example
- Suppose we now have the grades for two sections
of Math 161 - Grade Frequency
- Section 1 Section 2
- A 5 3
- B 7 5
- C 12 9
- D 5 4
- F 3 1
- Draw a double bar graph for the data.
13Example
- Are the frequency bar graphs the right way to
compare the performance of the two sections? Note
that the class sizes are not the same. What would
be a better way to compare the two sections? - Grade Frequency
- Section 1 Section 2
- A 5 3
- B 7 5
- C 12 9
- D 5 4
- F 3 1
14Graphs for Categorical Data
- Pie Chart
- Shows the whole group of categories in a circle.
- Shows the parts of some whole .
- The area of the sector representing a category
is - proportional to the frequency of the category.
15Example
- The following table gives the grade distributions
of a Math 161 Test - Grade Frequency
- A 5
- B 7
- C 12
- D 5
- F 3
- Draw a pie chart for the data.
16Pictographs
- A picture of a set of small
- figures or icons used to
- represent data, and often to
- represent trends.
- Usually, the icons are
- suggestively related to the
- data being represented.
- They can be misleading.
17Pictographs
- Double the length, width, and height of a
cube, and the volume increases by a factor of
eight.
18(No Transcript)
19Line Graphs
- A line graph shows behavior over time.
- Time is always on the horizontal axis.
- Look for an overall pattern (trend).
- Look for patterns that repeat at known regular
intervals (seasonal variations). - Look for any striking deviations that might
indicate unusual occurrences.
20(No Transcript)
21Misleading Graphs
- Changing the scale of a line graph or a bar graph
can make - increases or decreases appear more rapid.
- Both graphs plot the same data. Which one makes
the increase in cancer - deaths appear more rapid? Which graph would a
cancer advocate use?
22Salaries of People with Bachelors Degrees and
with High School Diplomas Which graph is
misleading?
40,500
40,500
40,000
40,000
30,000
35,000
24,400
30,000
20,000
24,400
25,000
10,000
20,000
0
Bachelor High School Degree Diploma
Bachelor High School Degree Diploma
(a)
(b)
23Important skills
- Computer programs will construct plots and
calculate summary statistics automatically. - The important skills for people are
- knowing what to use when.
- Interpretation.
- The tools used to analyze and summarize data
depend upon the type of variable one is
interested in.
24Principles for plots
- The way plots are used depends upon the purpose
for which - they are being used
- Exploration
- Principle Look at the data in as many different
ways as possible searching for its important
features. - Communication to others (follows exploration)
- Principle Be selective. Choose the displays
that best show to a reader features you have
observed.
25Making Good Graphs
- Title your graph.
- Make sure labels and legends describe variables
and their measurement units. Be careful with the
scales used. - Make the data stand out. Avoid distracting
grids, artwork, etc. - Pay attention to what the eye sees. Avoid
pictograms and tacky effects.
26Key Concepts
- Categorical and Quantitative Variables
- Distributions
- Pie Charts
- Bar Graphs
- Line Graphs
- Techniques for Making Good Graphs
27Chapter 11
- Displaying Distributions with Graphs
28Stemplots(Stem-and-Leaf Plots)
- For quantitative variables.
- Separate each observation into a stem (first part
of the number) and a leaf (the remaining part of
the number). - Usually, the last digit is used as the leaf and
the remaining digits form the stem. - If using the last digits as they are results in a
lot of stem values, we could round the numbers to
more convenient values. - Write the stems in a vertical column draw a
vertical line to the right of the stems. - Write each leaf in the row to the right of its
stem order leaves if desired.
29Weight DataWeights (in pounds) for a group of 40
students.
30Weight DataStemplot(Stem and Leaf Plot)
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 2
6
5
2
570
Key 203 means203 pounds Stems 10sLeaves
1s
2
31Stem-and-Leaf Plots
- Double stemmed (expanded) stem-and-leaf
- If there are a lot of leaves on one stem, we
could break it up into two stems one for the
digits 0-4 and the other for the digits 5-9. - Back to back Used to compare two different sets
of data.
32Histogram
- A histogram is a bar graph in which
- the horizontal axis represents the items or
classes. - the vertical axis represents the frequencies.
- the height of the bars are proportional to the
frequencies. - There are usually no gaps between the bars
(unless some classes have 0 - frequencies).
- To draw a histogram, we first need to construct a
frequency table. - Example draw a histogram for our weights
example. - The number of classes can affect the shape of the
histogram. - http//www.stat.sc.edu/west/javahtml/Histogra
m.html -
33Weight Data Frequency Table
Left endpoint is included in the group, right
endpoint is not.
34Weight Data Histogram
Left endpoint is included in the group, right
endpoint is not.
35Shape of the Data
- Symmetric
- bell-shaped
- other symmetric shapes
- Asymmetric
- skewed to the right
- skewed to the left
- Unimodal, bimodal
36Symmetric Distributions
Bell-Shaped
Mound-Shaped
37Symmetric Distributions Uniform
38Asymmetric Distributions
Skewed to the Left
Skewed to the Right
39(No Transcript)
40Number of Books Read for Pleasure
41Outliers
- Extreme values, far from the rest of the data.
- May occur naturally.
- May occur due to error in recording.
- May occur due to error in measuring.
- Observational unit may be fundamentally different.
42Key Concepts
- Displays (Stemplots Histograms)
- Graph Shapes
- Symmetric
- Skewed to the Right
- Skewed to the Left
- Outliers