Title: 2: Frequency distributions
12 Frequency distributions
- Stemplot, frequency tables, histograms
2Stem-and-leaf plots (stemplots)
- Analyses start by exploring data with pictures
- My favorite technique is the stemplot a
histogram-like display of data points
You can observe a lot by looking Yogi Berra
3Illustrative example sample.sav
- A SRS of AGE (in years)
- Data as an ordered array (n 10)
- 05 11 21 24 27 28 30 42 50 52
- Divide each data point into
- Stem values ? first one or two digits
- Leaf values ? next digit
- In this example
- Stem values ? tens place
- Leaf values ? ones place
- e.g., 21 has a stem value of 2 and leaf value of 1
4Stemplot (cont.)
- Draw stem-like axis from lowest to highest stem
- 0 1 2 3 4 5 10 ? axis multiplier
(important!) - Place leaves next to stem
- 21 plotted (animation)
1
5Continue plotting
- Rearrange leaves in rank order
- 05 11 21478 30 42 502 10
- For discussion, lets rotate the plot
8 7 4 25 1 1 0 2
0------------0 1 2 3 4 5 (x10)
------------Rotated stemplot
6Interpreting frequency distributions
- Central Location
- Gravitational center ? mean
- Middle value ? median
- Spread
- Range and inter-quartile range
- Standard deviation and variance (next week)
- Shape
- Symmetry
- Modality
- Kurtosis
7Mean arithmetic average
- Eye-ball method ? visualize where plot would
balance - Arithmetic method total divided by n
8 7 4 25 1 1 0 2
0------------0 1 2 3 4 5 ------------
Grav.Center
Eye-ball method ? balances around 25 to 30
Actual arithmetic average 29.0
8Middle point ? median
- Count from top to depth of (n 1) 2
- For illustrative data
- n 10
- Depth of median (101) 2 5.5
9Spread ? variability
- Easiest way to describe spread is by stating its
range, e.g., from 5 to 52 (not the best way) - A better way is to divide the data into low
groups and high groups - Quartile 1 median of low group
- Quartile 3 median of high group
10Shape ? visual pattern
- Skyline silhouette of plot
- Symmetry
- Mounds
- Outliers (if any)
- When n is small, its too difficult to describe
shape accurately
X X X XX X X X X
X------------0 1 2 3 4 5 ------------
11What to look for in shape
- Idealized shape density curve
- Look for
- General pattern
- Symmetry
- Outliers
12Symmetrical shapes
13Asymmetrical shapes
14Modality (no. of peaks)
15Kurtosis (steepness of peak)
? fat tails
Mesokurtic (medium)
Platykurtic (flat)
? skinny tails
Leptokurtic (steep)
Kurtosis can NOT be easily judged by eye
16Second example (n 8)
- Data 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94,
4.42 - Truncate extra digit (e.g., 1.47 ? 1.4)
- Stem ones-place
- Leaves tenths-place
- Do not plot decimal
142033477944(1)
- Center between 3.4 3.7 (underlined)
- Spread 1.4 to 4.4
- Shape mound, no outliers
17Third example (pollution.sav)
Regular stem 1478922234667893000123445678
(1)
- Regular stemplot (top) ? too squished
- Split-stem (bottom)
- First 1 on stem ? leaves 0 to 4
- Second 1 on stem ? leaves 5 to 9
Split-stem 141789222342667893000123
4435678(1)
Note negative skew
18How many stem-values?
- Start with between 4 and 12 stem- values
- Then, trial and error to draw out shape for the
most informative plot (use judgment)
19Body weight (n 53)
Data range from 100 to 260 lbs. ?? 100 lb.
multiplier seems too broad (only two stem
values)? 100 lb. multiplier w/ split stem-values
still too broad (only 4 stem values)? Try 10
pound stem multiplier
20Body weight (n 53)
100166 11009 120034578 1300359 1408 1500257
16555 17000255 18000055567 19245 203 21025
220 23 24 25 260 (10)
100 means 100 Shape Positive skew, high
outlier (260) Location median 165
(underlined) Spread from 100 to 260
21Quintuple splitBody weight data (n 53)
10000111 1t222222233333 1f4455555 1s66677777
7 1.888888888999 20111 2t2 2f 2s6 (100)
- Codes
- for leaves 0 and 1 t for leaves two and
threef for leaves four and fives for leaves
six and seven. for leaves eight and nine - Example
- 2t 2 means a value of 222 (100)
22Frequency counts (SPSS plot)
Age of participants SPSS provides frequency
counts w/ stemplot
Frequency Stem Leaf 2.00 3 .
0 9.00 4 . 0000 28.00 5 .
00000000000000 37.00 6 .
000000000000000000 54.00 7 .
000000000000000000000000000 85.00 8 .
000000000000000000000000000000000000000000
94.00 9 . 00000000000000000000000000000000
000000000000000 81.00 10 .
0000000000000000000000000000000000000000
90.00 11 . 00000000000000000000000000000000
0000000000000 57.00 12 .
0000000000000000000000000000 43.00 13 .
000000000000000000000 25.00 14 .
000000000000 19.00 15 . 000000000
13.00 16 . 000000 8.00 17 .
0000 9.00 Extremes (gt18) Stem width
1 Each leaf 2 case(s)
3 . 0 means 3.0 years
Because of large n, each leaf represents 2
observations
23Frequency tables
AGE     Freq  Rel.Freq  Cum.Freq. --------------
---------------Â 3Â Â Â Â Â Â Â Â Â 2Â Â Â Â 0.3Â Â Â Â Â 0.3Â 4Â
        9    1.4     1.7 5        28    4.3Â
    6.0 6        37    5.7    11.6 7      Â
 54    8.3    19.9 8        85   13.0    32.9
 9        94   14.4    47.210        81   1
2.4Â Â Â Â 59.611Â Â Â Â Â Â Â Â 90Â Â Â 13.8Â Â Â Â 73.412Â Â Â
     57    8.7    82.113        43    6.6  Â
 88.714        25    3.8    92.515        1
9Â Â Â Â 2.9Â Â Â Â 95.416Â Â Â Â Â Â Â Â 13Â Â Â Â 2.0Â Â Â Â 97.4
17Â Â Â Â Â Â Â Â Â 8Â Â Â Â 1.2Â Â Â Â 98.618Â Â Â Â Â Â Â Â Â 6Â Â Â Â 0.
9Â Â Â Â 99.519Â Â Â Â Â Â Â Â Â 3Â Â Â Â 0.5Â Â Â 100.0------
-----------------------Total    654  100.0
- Frequency count
- Relative frequency proportion or
- Cumulative frequency ? less than or equal to
current value
24Class intervals
- When data sparse ? group data into class
intervals - Classes can be uniform or non-uniform
25Uniform class intervals
- Create 4 to 12 class intervals
- Set end-point convention - include left boundary
and exclude right boundary - e.g., first class interval includes 0 and
excludes 10 (0 to 9.99 years of age) - Talley frequencies
- Calculate relative frequency
- Calculate cumulative frequency (demo)
26Heres age data in sample.sav
Class Freq Rel. Freq. () Cum. Freq ()
0 9.99 1 10 10
10 19.99 1 10 20
20 29.99 4 40 60
30 39.99 1 10 70
40 49.99 1 10 80
50 59.99 2 20 100
Total 10 100 --
27Histogram for quantitative data
Bars are contiguous
28Bar chart for categorical data
Bars are discrete