Displaying Distributions with Graphs - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Displaying Distributions with Graphs

Description:

Making an accurate, pretty graph is an accomplishment in itself, but it's not the final step ... For the 65 and older data, the whole number part of the ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 40
Provided by: samuel59
Category:

less

Transcript and Presenter's Notes

Title: Displaying Distributions with Graphs


1
Displaying Distributions with Graphs
  • Chapter 11

2
Where does my school stand?
  • A student at Chicago State University pays 6,834
    in tuition, and she is curious how this compares
    to what students at other universities in
    Illinois pay?
  • There are 121 colleges and universities in
    Illinois!
  • They charge between 1,536 and 30,729 per year
    in tuition
  • Comparing Chicago State to the other 120 in a big
    list would not be practical, so we use a histogram

3
(No Transcript)
4
How much does my school cost?
  • We can also look at the same data using another
    type of graph the stemplot
  • To make a stemplot
  • Round the tuition charges to the nearest 100
    6,834 becomes 68
  • Then put the thousands digit to the left of a
    vertical line (from smallest to largest
    top-to-bottom)
  • And finally hang the hundreds digits one-by-one
    to the right of the line on the row that
    corresponds to their thousands digit
  • This modified histogram contains more detail
  • Chicago State is the 58th most expensive school

5
(No Transcript)
6
Histograms
  • Remember that categorical variables record group
    membership, and we used pie charts and bar
    graphs to display the distributions of
    populations according to categorical variables
  • This works fine for categorical variables with
    relatively few categories
  • What if we have a quantitative variable (like SAT
    scores) that can (and does) take many different
    values?
  • In this case we display the distribution of the
    quantitative variable using a histogram that
    counts values that are similar to each other
    together in one class

7
(No Transcript)
8
Example 1 How to make a histogram
  • Divide the range of the data into classes of
    equal width. The data in Table 11.1 range from
    6.3 to 17.0 so we choose classes
  • 6.0 percentage over 65 lt 7.0
  • 7.0 percentage over 65 lt 8.0
  • 17.0 percentage over 65 lt 18.0
  • Count the number of individuals in each class

9
Example 1 How to make a histogram
  • Draw the histogram
  • Label the horizontal axis with the scale for the
    variable whose distribution you are displaying
    percentage of residents ages 65 and over
  • Label the classes or bins on your scale from the
    lowest to highest values 4 to 20 in our case
  • Label the vertical axis with the scale of the
    counts number of states
  • Label the vertical axis tick marks with the count
    values 0 to 15 in our case
  • Place bars on the histogram for each class whose
    height corresponds to the count in each class,
    leaving no space between the bars

10
(No Transcript)
11
Example 1 How to make a histogram
  • Keep the bars the same width and dont leave
    space between them unless a class is empty
  • Choosing the number of classes or bins is up to
    you
  • Too many and there will be lots of empty classes
  • Too few and you wont see enough detail
  • Depends on your distribution

12
Interpreting histograms
  • Making an accurate, pretty graph is an
    accomplishment in itself, but its not the final
    step
  • After you have the graph you need to examine it
    and interpret it see what it tells you
  • First, look for patterns and deviations from the
    patterns

13
(No Transcript)
14
Interpreting histograms
  • In our example histogram, lets begin with the
    obvious deviations at the low and high end of the
    histogram. There are two states that are
    separated from the bulk in the middle, one with
    6.3 and one with 17.0 people 65 and older
  • These two outlier states are Alaska and Florida

15
(No Transcript)
16
(No Transcript)
17
Interpreting histograms
  • Once we have seen these on the histogram, its
    easy to pick them out of the list as well
  • What about a state like Utah with 8.5 65 and
    older?
  • This is a matter of judgment, Utah is certainly
    unusually low according to this histogram, but it
    may not qualify as an outlier in the same sense
    as Alaska
  • Once we have identified the outliers, the next
    step is to examine them closely and try to figure
    out why
  • Commonly they are due to data problems, typing
    errors like 40 instead of 4.0
  • Once the data problems are eliminated we need
    more information

18
Interpreting histograms
  • In this case, we know that Florida is a
    destination for retirees so it makes sense that
    Florida has a large elderly population
  • We probably know less about Alaska, but it is a
    northern frontier area that is generally not a
    retirement destination so perhaps this makes
    sense too

19
Interpreting histograms
  • Now we move on to see the overall pattern of the
    histogram
  • To do this we ignore the outliers and focus on
    the bulk of the histogram

20
(No Transcript)
21
Interpreting histograms
  • The center of the distribution is the point on
    the horizontal axis at which roughly half the
    observations are to the left and half the
    observations are to the right
  • We can describe the spread of the distribution by
    giving the smallest and largest values ignoring
    the outliers

22
Example 2 Describing distributions
  • Lets describe our example histogram
  • Shape the histogram is roughly symmetric and has
    a single peak
  • Center the midpoint of the distribution is close
    to the peak at about 13
  • Spread if we ignore outliers, the distribution
    runs from about 8 to about 16
  • If we compare this to the histogram of the
    distribution of tuition at Illinois colleges, we
    see that the tuition distribution is quite
    different it is skewed

23
(No Transcript)
24
(No Transcript)
25
Interpreting histograms
  • To describe a distribution, look for major
    features
  • Get an idea of the midpoint of the distribution
    and its spread
  • Look for symmetry and skewness

26
Example 3 Sampling again
  • The values a statistic takes in many random
    samples from the same population takes a regular
    pattern
  • The next histogram displays a distribution we
    went over in Chapter 3
  • Take a SRS of 2,527 adults
  • Ask each whether they favor a constitutional
    amendment that would define marriage as between a
    man and woman
  • The proportion who say yes is the sample
    proportion p-hat
  • Do this 1,000 times and collect the 1,000 sample
    proportions p-hat from the 1,000 samples
  • Make a histogram next slide

27
(No Transcript)
28
Example 4 Shakespeares words
  • The next slide has a histogram that shows the
    distribution of lengths of words used in
    Shakespeares plays
  • It has a single peak and is skewed to the right
  • Lots of shorter words and a few longer ones
  • The center is about 4 (4 letters)
  • The spread is from 1 to 12 letters
  • Notice the vertical scale
  • Not the count of the words, but the percentage of
    all the words
  • This is convenient when the counts are very large
    and when we want to compare distributions with
    different counts

29
(No Transcript)
30
Interpreting histograms
  • The overall shape of a distribution is important
    information about a variable
  • Some types of data regularly produce
    distributions with a similar shape, for example
  • Sizes of living things (like height of humans)
    tend to produce symmetric distributions
  • Data on incomes tends to be skewed right many
    moderate incomes and few very large incomes
  • It is common for data to be skewed right when
    there is a hard minimum like 1 for letters in
    words or 0 for income

31
Interpreting histograms
  • It is also common for distributions to be skewed
    left when there is a hard maximum like test
    scores
  • There are many other shapes for distributions as
    well, that are neither skewed nor symmetric
  • There may be double-peaked data
  • Evenly distributed data, etc.
  • Use your eyes and describe exactly what you see
  • Heres another example

32
(No Transcript)
33
Stemplots
  • For small data sets a stemplot may be more
    informative than a histogram

34
Example 5 Stemplot of the 65 and over data
  • For the 65 and older data, the whole number part
    of the percentage is the stem, and the tenths
    digit is the leaf
  • Stems can have as many digits as needed, but
    leaves must have only one digit

35
(No Transcript)
36
(No Transcript)
37
Stemplots
  • The advantage of stemplots is that the display
    the actual values of the variable in the
    distribution
  • You can choose the classes for a histogram, but
    he stems are given to you in a stemplot
  • If the data have too many digits and would
    produce a huge number of stems, you can round to
    the nearest hundreds or thousands or whatever to
    reduce the number of digits
  • Because there is no choice of classes, stemplots
    are easier to make but can quickly become more
    clumsy and generally dont work well for large
    sets of data

38
Summary
  • The distribution of a variable tells us what
    values the variable takes, and how often each
    value is taken
  • Use a histogram or stemplot to display the
    distribution of a quantitative variable
  • When looking at a graph, look for
  • The overall pattern,
  • Deviations from the overall pattern, and
  • Outliers

39
Summary
  • To describe the overall pattern of a histogram or
    stemplot, describe the
  • Shape,
  • Center, and
  • Spread
  • Some distributions have simple shapes that are
    either symmetric or skewed, others have more
    complicated shapes
Write a Comment
User Comments (0)
About PowerShow.com