Lecture Unit 2 Graphical and Numerical Summaries of Data - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Lecture Unit 2 Graphical and Numerical Summaries of Data

Description:

1) Construct graphs that appropriately describe data. 2) Calculate and interpret ... Sorted alphabetically. Much less useful. Percent of people dying from ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 55
Provided by: tomre7
Category:

less

Transcript and Presenter's Notes

Title: Lecture Unit 2 Graphical and Numerical Summaries of Data


1
Lecture Unit 2Graphical and Numerical Summaries
of Data
  • UNIT OBJECTIVES
  • At the conclusion of this unit you should be able
    to
  • 1) Construct graphs that appropriately describe
    data
  • 2) Calculate and interpret numerical summaries
    of a data set.
  • 3) Combine numerical methods with graphical
    methods to analyze a data set.
  • 4) Apply graphical methods of summarizing data
    to choose appropriate numerical summaries.
  • 5) Apply software and/or calculators to automate
    graphical and numerical summary procedures.

2
Displaying Qualitative Data
  • Section 2.1
  • Sometimes you can see a lot just by looking.
  • Yogi Berra
  • Hall of Fame Catcher, NY Yankees

3
The three rules of data analysis wont be
difficult to remember
  • 1. Make a picture reveals aspects not obvious in
    the raw data enables you to think clearly about
    the patterns and relationships that may be hiding
    in your data.
  • 2. Make a picture to show important features of
    and patterns in the data. You may also see things
    that you did not expect the extraordinary
    (possibly wrong) data values or unexpected
    patterns
  • 3. Make a picture the best way to tell others
    about your data is with a well-chosen picture.

4
Bar Charts show counts or relative frequency for
each category
  • Example Titanic passenger/crew distribution

5
Pie Charts shows proportions of the whole in
each category
  • Example Titanic passenger/crew distribution

6
Example Top 10 causes of death in the United
States 2001
For each individual who died in the United States
in 2001, we record what was the cause of death.
The table above is a summary of that information.
7
Top 10 causes of death bar graph Each category
is represented by one bar. The bars height shows
the count (or sometimes the percentage) for that
particular category.
Top 10 causes of deaths in the United States 2001
8
Top 10 causes of deaths in the United States 2001
Bar graph sorted by rank ? Easy to analyze
Sorted alphabetically ? Much less useful
9
Top 10 causes of death pie chart Each slice
represents a piece of one whole. The size of a
slice depends on what percent of the whole this
category represents.
Percent of people dying from top 10 causes of
death in the United States in 2001
10
Make sure your labels match the data. Make
sure all percents add up to 100.
Percent of deaths from top 10 causes
Percent of deaths from all causes
11
Child poverty before and after government
interventionUNICEF, 1996
  • What does this chart tell you?
  • The United States has the highest rate of child
    poverty among developed nations (22 of under
    18).
  • Its government does the leastthrough taxes and
    subsidiesto remedy the problem (size of orange
    bars and percent difference between orange/blue
    bars).
  • Could you transform this bar graph to fit in 1
    pie chart? In two pie charts? Why?

The poverty line is defined as 50 of national
median income.
12
Unnecessary dimension in a pie chart
13
Contingency Tables Categories for Two Variables
  • Example Survival and class on the Titanic

Marginal distributions
14
Contingency Tables Categories for Two Variables
(cont.)
  • Conditional distributions.
  • Given the class of a passenger, what is the
    chance the passenger survived?

15
Contingency Tables Categories for Two Variables
(cont.)
  • Questions
  • What percent of survivors were in second class?
  • What percent were in second-class and survivors ?
  • What percent of the second-class passengers
    survived?

118/710
118/2201
118/285
16
3-Way Tables
  • Example Georgia death-sentence data

17
UC Berkeley Lawsuit
18
LAWSUIT (cont.)
19
Simpsons Paradox
  • The reversal of the direction of a comparison or
    association when data from several groups are
    combined to form a single group.

20
Fly Alaska Airlines, the on-time airline!
21
American West Wins!Youre a Hero!
22
Section 2.2Displaying Quantitative Data
  • Histograms
  • Stem and Leaf Displays

23
Relative Frequency Histogram of Exam Grades
.30
.25
.20
Relative frequency
.15
.10
.05
0
40
50
60
70
80
90
100
Grade
24
Frequency Histograms
25
Frequency Histograms
  • A histogram shows three general types of
    information
  • It provides visual indication of where the
    approximate center of the data is.
  • We can gain an understanding of the degree of
    spread, or variation, in the data.
  • We can observe the shape of the distribution.

26
All 200 m Races 20.2 secs or less
27
Histograms Showing Different Centers
28
Histograms - Same Center, Different Spread
29
Frequency and Relative Frequency Histograms
  • identify smallest and largest values in data set
  • divide interval between largest and smallest
    values into between 5 and 20 subintervals called
    classes
  • each data value in one and only one class
  • no data value is on a boundary

30
How Many Classes?
31
Histogram Construction (cont.)
  • compute frequency or relative frequency of
    observations in each class
  • x-axis class boundaries
  • y-axis frequency or relative frequency scale
  • over each class draw a rectangle with height
    corresponding to the frequency or relative
    frequency in that class

32
Ex. No. of daily employee absences from work
  • 106 obs approx. no of classes
  • 2(106)1/3 2121/3 5.69
  • 1 log(106)/log(2) 1 6.73 7.73
  • There is no single correct answer for the
    number of classes
  • For example, you can choose 6, 7, 8, or 9
    classes dont choose 15 classes

33
EXCEL Histogram
34
Absences from Work (cont.)
  • 6 classes
  • class width (158-121)/637/66.17 7
  • 6 classes, each of width 7 classes span 6(7)42
    units
  • data spans 158-12137 units
  • classes overlap the span of the actual data
    values by 42-375
  • lower boundary of 1st class (1/2)(5) units below
    121 121-2.5 118.5

35
EXCEL histogram
36
Grades on a statistics exam
  • Data
  • 75 66 77 66 64 73 91 65 59 86 61 86 61
  • 58 70 77 80 58 94 78 62 79 83 54 52 45
  • 82 48 67 55

37
Frequency Distribution of Grades
Class Limits Frequency
40 up to 50 50 up to 60 60 up to 70 70 up to
80 80 up to 90 90 up to 100 Total
2 6 8 7 5 2 30
38
Relative Frequency Distribution of Grades
Class Limits Relative Frequency
40 up to 50 50 up to 60 60 up to 70 70 up to
80 80 up to 90 90 up to 100
2/30 .067 6/30 .200 8/30 .267 7/30
.233 5/30 .167 2/30 .067
39
Relative Frequency Histogram of Grades
.30
.25
.20
Relative frequency
.15
.10
.05
0
40
50
60
70
80
90
100
Grade
40
Stem and leaf displays
  • Have the following general appearance
  • stem leaf
  • 1 8 9
  • 2 1 2 8 9 9
  • 3 2 3 8 9
  • 4 0 1
  • 5 6 7
  • 6 4

41
Stem and Leaf Displays
  • Partition each no. in data into a stem and
    leaf
  • Constructing stem and leaf display
  • 1) deter. stem and leaf partition (5-20 stems)
  • 2) write stems in column with smallest stem at
    top include all stems in range of data
  • 3) only 1 digit in leaves drop digits or round
    off
  • 4) record leaf for each no. in corresponding stem
    row ordering the leaves in each row helps

42
Example employee ages at a small company
  • 18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39
    stem 10s digit leaf 1s digit
  • 18 stem1 leaf8 18 1 8
  • stem leaf
  • 1 8 9
  • 2 1 2 8 9 9
  • 3 2 3 8 9
  • 4 0 1
  • 5 6 7
  • 6 4

43
Suppose a 95 yr. old is hired
  • stem leaf
  • 1 8 9
  • 2 1 2 8 9 9
  • 3 2 3 8 9
  • 4 0 1
  • 5 6 7
  • 6 4
  • 7
  • 8
  • 9 5

44
Number of TD passes by NFL teams 2000
season(stems are 10s digit)
45
Pulse Rates n 138
46
Advantages/Disadvantages of Stem-and-Leaf Displays
  • Advantages
  • 1) each measurement displayed
  • 2) ascending order in each stem row
  • 3) relatively simple (data set not too large)
  • Disadvantages
  • display becomes unwieldy for large data sets

47
Population of 185 US cities with between 100,000
and 500,000
  • Multiply stems by 100,000

48
Back-to-back stem-and-leaf displays. TD passes by
NFL teams 1998, 2000multiply stems by 10
49
Interpreting Graphical Displays Shape
  • A distribution is symmetric if the right and left
    sides of the histogram are approximately mirror
    images of each other.

50
Outliers
  • An important kind of deviation is an outlier.
    Outliers are observations that lie outside the
    overall pattern of a distribution. Always look
    for outliers and try to explain them.

The overall pattern is fairly symmetrical except
for 2 states clearly not belonging to the main
trend. Alaska and Florida have unusual
representation of the elderly in their
population. A large gap in the distribution is
typically a sign of an outlier.
Alaska
Florida
51
Other Graphical Methods for Economic Data
  • Time plots
  • plot observations in time order, with time on
    the horizontal axis and the vari-able on the
    vertical axis
  • Time series
  • measurements are taken at regular intervals
    (monthly unemployment, quarterly GDP, weather
    records, electricity demand, etc.)

52
Winning Times 100 M Dash
53
Annual Mean Temperature
54
End of Section 2.2
Write a Comment
User Comments (0)
About PowerShow.com