Describe Distributions With Numbers - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Describe Distributions With Numbers

Description:

... number of home runs in the last 6 years by Barry bonds of san Francisco giants. ... Left double click bonds.xls to open the file. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 30
Provided by: jinb
Category:

less

Transcript and Presenter's Notes

Title: Describe Distributions With Numbers


1
Describe Distributions With Numbers
  • Working With Quantitative Data

2
The Mean
  • The GPAs of 4 students are
  • 4 2 3 3 (observations).
  • The mean (the average) GPA is.
  • The mean is the average which you would like any
    individual to be compared to It can be
    considered the center of the data.
  • Notation

3
  • Midterm grades of 4 students are 70 60 80 70, then

4
The Median
  • The incomes of 5 people are
  • 40 10 50 60 40.
  • Whats the midpoint? ( of people above that
    income of people below that income)
  • Sort the data (observations) from small to big
  • 10 40 40 50 60.
  • The number in the middle is 40. The number of
    people who dont make more than 40k is the same
    as the number of people who make more than 40k.
  • This middle number in the sorted data is called
    the median of the distribution The median can
    also be considered the center.

5
The Median
  • To find the median (M),
  • (I) sort the data from small to big
  • (II) M the number in the middle
  • Midterm grades 70 60 80 90 50 85 81
  • Sorting 50 60 70 80 81 85 90
  • M80
  • Midterm grades 80 60 95 90
  • Sorting 60 80 90 95
  • M85. (The middle lies in the middle
    between 80 and 90, i.E. The average of the middle
    two.)

6
More Examples
  • Data 6 2 4
  • M 2
  • Sort 2 4 6
  • M 4
  • Data 6 4 2 8
  • Sort 2 4 6 8
  • M5 (the average of the middle two numbers)

7
Comparison Mean and Median
  • Incomes
  • 10 40 40 50 60
  • Incomes
  • 10 40 40 50 600
  • M 40
  • M40

8
Comparison Mean and Median
  • If the richest persons income rise a lot, the
    average rises a lot However, the median stays
    the same.
  • Even though both are measures of center, but the
    median is resistant measure of center while the
    mean is not. So in a sense the median is a better
    measure of center.
  • Say, in a little town of 4 families, the mean of
    the income of the families is 100k. But the state
    census bureau stills labels the town as poor. How
    can it be?
  • It turns out the incomes of the families are
  • 5k 10k 5k 380k.
  • One really rich family, three extremely poor
    families.
  • M 7.5k really reflects the hard fact.

9
Mean Median in Distributions That Are Close to
Symmetric
10
Quartiles Measurement of Spread
  • To analyze data, just knowing the center (mean or
    median) is not enough, we also want to know how
    far each individual is from the center, i.E.
    Spread.
  • Quartiles measure the spread of the middle half
    of the population.
  • To find quartiles
  • (I) sort the data from small to big.
  • (II).
  • The 2nd quartile M (the median).
  • The 1st quartile the median of data
    before M.
  • The 3rd quartile the median of data
    after M.

11
Examples
  • Highway mileage/gallon of 7 midsize cars
  • 28 29 24 25 28 30 31
  • Sort 24 25 28 28 29 30 31
  • Minimum 24 Maximum 31
  • M (2nd quartile) 28
  • The median of 24 25 28 25
  • The median of 29 30 31 30

12
Examples
  • Ages of 5 presidents at inauguration
  • 52 69 64 46 54
  • (Carter, Reagan, Bush I, Clinton, Bush II)
  • Sort 46 52 54 64 69
  • M 54,
  • 6 college womens SSHA scores (survey of study
    habits and attitudes)154 109 137 115 152 140
  • Sort 109 115 137 140 152 154

13
The Five-number Summary Boxplots
  • The five-number minimum, , M, , maximum
  • Review M (the median) is a good measure of the
    center, quartiles are good measures of spread of
    the middle half data from M
  • The minimum, the maximum are measures of the
    spread of the whole data
  • The first row data are the earnings in 2001 of 16
    randomly chosen people who have high school
    diploma but no college after sorting. Five-number
    summary is
  • 5 19.5 24.5 41.5 67
  • The second row are the earnings of 15 college
    graduates after sorting. Five-number summary is
    4 30 35 55 110

14
5 6 12 19 20 21 22 24 25 31 32 40 43 43 47
67 4 25 30 30 30 31 32 35 50 50 50 55 60 74 110
This visual representation of five-number summary
is called Boxplot
15
Measuring SpreadThe Standard Deviation
  • Another way to measure spread is to see (called
    deviations) how far an observation is away from
    the mean
  • Grades (/10)
  • 6 4 9 8 3

16
The standard deviation (s) is the Square Root of
the Variance
  • The number of observations n5 ,
  • Variance

17
(No Transcript)
18
Variance and Standard DeviationExample from Text
  • Metabolic rates of 7 men (cal./24hr.)
  • 1792 1666 1362 1614 1460 1867 1439

19
(No Transcript)
20
Variance and Standard DeviationExample from Text
21
Variance and Standard DeviationExample from Text
22
Variance (s2) and Standard Deviations (s)
  • Find the mean
  • Find the deviation of each value from the mean
  • Square the deviations
  • Sum the squared deviations
  • Divide the sum by n-1 to get Variance
  • Take square root to obtain the standard deviation
    s.

23
(No Transcript)
24
Five-Number Summary Or Mean-Standard Deviation
Summary
25
The Five Numbers Based On Histogram
  • The five numbers for quantitative variables
    minimum, Q1, M, Q2, maximum
  • We generally cannot accurately determine these
    numbers by only looking at the histogram. But we
    CAN determine in which bin each of the five
    numbers resides. But usually not the mean or the
    standard deviation s.



Answer Median is bin 3 or 4, its value between 4
and 8 Q1 is in bin 2, its value between 2 and
4 Q3 is in bin 4, its value between 6 and 8
Answer The median grade is btw 2 and 4 Q1 is
between 2 and 4 Q3 is between 6 and 8
26
Excel Instructions With Examples
  • Preparation download the file bonds.xls to the
    A drive. This file contains the data of the
    number of home runs in the last 6 years by Barry
    bonds of san Francisco giants. The data appears
    in the block B2B7.
  • Left double click bonds.xls to open the file.
  • Note activate a cell to enter data, then click
    anywhere else so that the data will be entered.
  • Excel computes a different five-number summary,
    but in statistical terms, the difference means no
    significance .
  • Put labels near the number computed, as in mean
    4 median 5.

27
Step-by-stepFive-number Summary
  • Find the minimum active an empty cell by
    clicking in it (say D4) ? enter min(B2B7).
  • Find the 1st quartile Q1 activate another empty
    cell (say D5) ? enter quartile(B2B7,1).
  • Find the median M activate another empty cell by
    clicking in it (D6) ? enter median(B2B7).
  • Find the 3rd quartile Q3 activate another empty
    cell (D7) ? enter quartile(B2B7,3).
  • Find the maximum activate another empty cell
    (D8) ? enter max(B2B7).

28
Step-by-stepMean and the Standard Deviation
  • Compute the mean Activate an empty cell (say F2)
    ? enter average(B2B7) ? click somewhere else.
  • Compute the variance Activate an empty cell (say
    F3) ? enter var(B2B7) ? click somewhere else.
  • Compute the standard deviation Activate an empty
    cell (say F4) ? enter stdev(B2B7) ? click
    somewhere else.

29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com