Title: Describe Distributions With Numbers
1Describe Distributions With Numbers
- Working With Quantitative Data
2The Mean
- The GPAs of 4 students are
- 4 2 3 3 (observations).
- The mean (the average) GPA is.
- The mean is the average which you would like any
individual to be compared to It can be
considered the center of the data. - Notation
3- Midterm grades of 4 students are 70 60 80 70, then
4The Median
- The incomes of 5 people are
- 40 10 50 60 40.
- Whats the midpoint? ( of people above that
income of people below that income) - Sort the data (observations) from small to big
- 10 40 40 50 60.
- The number in the middle is 40. The number of
people who dont make more than 40k is the same
as the number of people who make more than 40k. - This middle number in the sorted data is called
the median of the distribution The median can
also be considered the center.
5The Median
- To find the median (M),
- (I) sort the data from small to big
- (II) M the number in the middle
- Midterm grades 70 60 80 90 50 85 81
- Sorting 50 60 70 80 81 85 90
- M80
- Midterm grades 80 60 95 90
- Sorting 60 80 90 95
- M85. (The middle lies in the middle
between 80 and 90, i.E. The average of the middle
two.)
6More Examples
- Data 6 2 4
- M 2
-
- Sort 2 4 6
- M 4
- Data 6 4 2 8
-
- Sort 2 4 6 8
- M5 (the average of the middle two numbers)
7Comparison Mean and Median
- Incomes
- 10 40 40 50 60
- Incomes
- 10 40 40 50 600
-
8Comparison Mean and Median
- If the richest persons income rise a lot, the
average rises a lot However, the median stays
the same. - Even though both are measures of center, but the
median is resistant measure of center while the
mean is not. So in a sense the median is a better
measure of center.
- Say, in a little town of 4 families, the mean of
the income of the families is 100k. But the state
census bureau stills labels the town as poor. How
can it be? - It turns out the incomes of the families are
- 5k 10k 5k 380k.
- One really rich family, three extremely poor
families. - M 7.5k really reflects the hard fact.
9Mean Median in Distributions That Are Close to
Symmetric
10Quartiles Measurement of Spread
- To analyze data, just knowing the center (mean or
median) is not enough, we also want to know how
far each individual is from the center, i.E.
Spread. - Quartiles measure the spread of the middle half
of the population.
- To find quartiles
- (I) sort the data from small to big.
- (II).
- The 2nd quartile M (the median).
- The 1st quartile the median of data
before M. - The 3rd quartile the median of data
after M.
11Examples
- Highway mileage/gallon of 7 midsize cars
- 28 29 24 25 28 30 31
- Sort 24 25 28 28 29 30 31
- Minimum 24 Maximum 31
- M (2nd quartile) 28
- The median of 24 25 28 25
- The median of 29 30 31 30
12Examples
- Ages of 5 presidents at inauguration
- 52 69 64 46 54
- (Carter, Reagan, Bush I, Clinton, Bush II)
- Sort 46 52 54 64 69
- M 54,
-
-
- 6 college womens SSHA scores (survey of study
habits and attitudes)154 109 137 115 152 140 - Sort 109 115 137 140 152 154
13The Five-number Summary Boxplots
- The five-number minimum, , M, , maximum
- Review M (the median) is a good measure of the
center, quartiles are good measures of spread of
the middle half data from M - The minimum, the maximum are measures of the
spread of the whole data
- The first row data are the earnings in 2001 of 16
randomly chosen people who have high school
diploma but no college after sorting. Five-number
summary is - 5 19.5 24.5 41.5 67
- The second row are the earnings of 15 college
graduates after sorting. Five-number summary is
4 30 35 55 110
145 6 12 19 20 21 22 24 25 31 32 40 43 43 47
67 4 25 30 30 30 31 32 35 50 50 50 55 60 74 110
This visual representation of five-number summary
is called Boxplot
15Measuring SpreadThe Standard Deviation
- Another way to measure spread is to see (called
deviations) how far an observation is away from
the mean - Grades (/10)
- 6 4 9 8 3
16The standard deviation (s) is the Square Root of
the Variance
- The number of observations n5 ,
- Variance
17(No Transcript)
18Variance and Standard DeviationExample from Text
- Metabolic rates of 7 men (cal./24hr.)
- 1792 1666 1362 1614 1460 1867 1439
19(No Transcript)
20Variance and Standard DeviationExample from Text
21Variance and Standard DeviationExample from Text
22Variance (s2) and Standard Deviations (s)
- Find the mean
- Find the deviation of each value from the mean
- Square the deviations
- Sum the squared deviations
- Divide the sum by n-1 to get Variance
- Take square root to obtain the standard deviation
s.
23(No Transcript)
24Five-Number Summary Or Mean-Standard Deviation
Summary
25The Five Numbers Based On Histogram
- The five numbers for quantitative variables
minimum, Q1, M, Q2, maximum - We generally cannot accurately determine these
numbers by only looking at the histogram. But we
CAN determine in which bin each of the five
numbers resides. But usually not the mean or the
standard deviation s.
Answer Median is bin 3 or 4, its value between 4
and 8 Q1 is in bin 2, its value between 2 and
4 Q3 is in bin 4, its value between 6 and 8
Answer The median grade is btw 2 and 4 Q1 is
between 2 and 4 Q3 is between 6 and 8
26Excel Instructions With Examples
- Preparation download the file bonds.xls to the
A drive. This file contains the data of the
number of home runs in the last 6 years by Barry
bonds of san Francisco giants. The data appears
in the block B2B7. - Left double click bonds.xls to open the file.
- Note activate a cell to enter data, then click
anywhere else so that the data will be entered. - Excel computes a different five-number summary,
but in statistical terms, the difference means no
significance . - Put labels near the number computed, as in mean
4 median 5.
27Step-by-stepFive-number Summary
- Find the minimum active an empty cell by
clicking in it (say D4) ? enter min(B2B7). - Find the 1st quartile Q1 activate another empty
cell (say D5) ? enter quartile(B2B7,1). - Find the median M activate another empty cell by
clicking in it (D6) ? enter median(B2B7). - Find the 3rd quartile Q3 activate another empty
cell (D7) ? enter quartile(B2B7,3). - Find the maximum activate another empty cell
(D8) ? enter max(B2B7).
28Step-by-stepMean and the Standard Deviation
- Compute the mean Activate an empty cell (say F2)
? enter average(B2B7) ? click somewhere else. - Compute the variance Activate an empty cell (say
F3) ? enter var(B2B7) ? click somewhere else. - Compute the standard deviation Activate an empty
cell (say F4) ? enter stdev(B2B7) ? click
somewhere else.
29(No Transcript)