2: Frequency distributions - PowerPoint PPT Presentation

About This Presentation
Title:

2: Frequency distributions

Description:

Stem values first one or two digits. Leaf values next digit. In this example ... Truncate extra digit (e.g., 1.47 1.4) Stem = ones-place. Leaves = tenths-place ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 29
Provided by: budger
Learn more at: https://www.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: 2: Frequency distributions


1
2 Frequency distributions
  • Stemplot, frequency tables, histograms

2
Stem-and-leaf plots (stemplots)
  • Analyses start by exploring data with pictures
  • My favorite technique is the stemplot a
    histogram-like display of data points

You can observe a lot by looking Yogi Berra
3
Illustrative example sample.sav
  • A SRS of AGE (in years)
  • Data as an ordered array (n 10)
  • 05 11 21 24 27 28 30 42 50 52
  • Divide each data point into
  • Stem values ? first one or two digits
  • Leaf values ? next digit
  • In this example
  • Stem values ? tens place
  • Leaf values ? ones place
  • e.g., 21 has a stem value of 2 and leaf value of 1

4
Stemplot (cont.)
  • Draw stem-like axis from lowest to highest stem
  • 0 1 2 3 4 5 10 ? axis multiplier
    (important!)
  • Place leaves next to stem
  • 21 plotted (animation)

1
5
Continue plotting
  • Rearrange leaves in rank order
  • 05 11 21478 30 42 502 10
  • For discussion, lets rotate the plot

8 7 4 25 1 1 0 2
0------------0 1 2 3 4 5 (x10)
------------Rotated stemplot
6
Interpreting frequency distributions
  • Central Location
  • Gravitational center ? mean
  • Middle value ? median
  • Spread
  • Range and inter-quartile range
  • Standard deviation and variance (next week)
  • Shape
  • Symmetry
  • Modality
  • Kurtosis

7
Mean arithmetic average
  • Eye-ball method ? visualize where plot would
    balance
  • Arithmetic method total divided by n

8 7 4 25 1 1 0 2
0------------0 1 2 3 4 5 ------------
Grav.Center
Eye-ball method ? balances around 25 to 30
Actual arithmetic average 29.0
8
Middle point ? median
  • Count from top to depth of (n 1) 2
  • For illustrative data
  • n 10
  • Depth of median (101) 2 5.5

9
Spread ? variability
  • Easiest way to describe spread is by stating its
    range, e.g., from 5 to 52 (not the best way)
  • A better way is to divide the data into low
    groups and high groups
  • Quartile 1 median of low group
  • Quartile 3 median of high group

10
Shape ? visual pattern
  • Skyline silhouette of plot
  • Symmetry
  • Mounds
  • Outliers (if any)
  • When n is small, its too difficult to describe
    shape accurately

X X X XX X X X X
X------------0 1 2 3 4 5 ------------
11
What to look for in shape
  • Idealized shape density curve
  • Look for
  • General pattern
  • Symmetry
  • Outliers

12
Symmetrical shapes
13
Asymmetrical shapes
14
Modality (no. of peaks)
15
Kurtosis (steepness of peak)
? fat tails
Mesokurtic (medium)
Platykurtic (flat)
? skinny tails
Leptokurtic (steep)
Kurtosis can NOT be easily judged by eye
16
Second example (n 8)
  • Data 1.47, 2.06, 2.36, 3.43, 3.74, 3.78, 3.94,
    4.42
  • Truncate extra digit (e.g., 1.47 ? 1.4)
  • Stem ones-place
  • Leaves tenths-place
  • Do not plot decimal

142033477944(1)
  • Center between 3.4 3.7 (underlined)
  • Spread 1.4 to 4.4
  • Shape mound, no outliers

17
Third example (pollution.sav)
Regular stem 1478922234667893000123445678
(1)
  • Regular stemplot (top) ? too squished
  • Split-stem (bottom)
  • First 1 on stem ? leaves 0 to 4
  • Second 1 on stem ? leaves 5 to 9

Split-stem 141789222342667893000123
4435678(1)
Note negative skew
18
How many stem-values?
  • Start with between 4 and 12 stem- values
  • Then, trial and error to draw out shape for the
    most informative plot (use judgment)

19
Body weight (n 53)
Data range from 100 to 260 lbs. ?? 100 lb.
multiplier seems too broad (only two stem
values)? 100 lb. multiplier w/ split stem-values
still too broad (only 4 stem values)? Try 10
pound stem multiplier
20
Body weight (n 53)
100166 11009 120034578 1300359 1408 1500257
16555 17000255 18000055567 19245 203 21025
220 23 24 25 260 (10)
100 means 100 Shape Positive skew, high
outlier (260) Location median 165
(underlined) Spread from 100 to 260
21
Quintuple splitBody weight data (n 53)
10000111 1t222222233333 1f4455555 1s66677777
7 1.888888888999 20111 2t2 2f 2s6 (100)
  • Codes
  • for leaves 0 and 1 t for leaves two and
    threef for leaves four and fives for leaves
    six and seven. for leaves eight and nine
  • Example
  • 2t 2 means a value of 222 (100)

22
Frequency counts (SPSS plot)
Age of participants SPSS provides frequency
counts w/ stemplot
Frequency Stem Leaf 2.00 3 .
0 9.00 4 . 0000 28.00 5 .
00000000000000 37.00 6 .
000000000000000000 54.00 7 .
000000000000000000000000000 85.00 8 .
000000000000000000000000000000000000000000
94.00 9 . 00000000000000000000000000000000
000000000000000 81.00 10 .
0000000000000000000000000000000000000000
90.00 11 . 00000000000000000000000000000000
0000000000000 57.00 12 .
0000000000000000000000000000 43.00 13 .
000000000000000000000 25.00 14 .
000000000000 19.00 15 . 000000000
13.00 16 . 000000 8.00 17 .
0000 9.00 Extremes (gt18) Stem width
1 Each leaf 2 case(s)
3 . 0 means 3.0 years
Because of large n, each leaf represents 2
observations
23
Frequency tables
AGE     Freq  Rel.Freq  Cum.Freq. --------------
--------------- 3         2    0.3     0.3 4 
        9    1.4     1.7 5        28    4.3 
    6.0 6        37    5.7    11.6 7       
 54    8.3    19.9 8        85   13.0    32.9
 9        94   14.4    47.210        81   1
2.4    59.611        90   13.8    73.412   
     57    8.7    82.113        43    6.6   
 88.714        25    3.8    92.515        1
9    2.9    95.416        13    2.0    97.4
17         8    1.2    98.618         6    0.
9    99.519         3    0.5   100.0------
-----------------------Total    654  100.0
  • Frequency count
  • Relative frequency proportion or
  • Cumulative frequency ? less than or equal to
    current value

24
Class intervals
  • When data sparse ? group data into class
    intervals
  • Classes can be uniform or non-uniform

25
Uniform class intervals
  • Create 4 to 12 class intervals
  • Set end-point convention - include left boundary
    and exclude right boundary
  • e.g., first class interval includes 0 and
    excludes 10 (0 to 9.99 years of age)
  • Talley frequencies
  • Calculate relative frequency
  • Calculate cumulative frequency (demo)

26
Heres age data in sample.sav
Class Freq Rel. Freq. () Cum. Freq ()
0 9.99 1 10 10
10 19.99 1 10 20
20 29.99 4 40 60
30 39.99 1 10 70
40 49.99 1 10 80
50 59.99 2 20 100
Total 10 100 --
27
Histogram for quantitative data
Bars are contiguous
28
Bar chart for categorical data
Bars are discrete
Write a Comment
User Comments (0)
About PowerShow.com