Title: Histogram
1Histogram
- The histogram is a graphical means of
displaying the numerical data. If we slice up the
entire span of values covered by the quantitative
variable into equal-width piles called bins
(classes), a histogram plots the bin counts
(class counts) as the heights of bars - It can be constructed from the stem and leaf
plot each stem defines an interval of values as
a class. The class limits are the smallest and
largest possible values for the interval. Now go
back to Example 2.3.
2Grouped frequency table Example of 2.4
3Constructed Histogram
4 - Steps of construction
- find class limits and class boundaries
- find class frequency and construct grouped
frequency table - label horizontal axis using continuous scale
- label vertical axis for (relative) frequency
- draw bars using class boundaries and (relative)
frequency -
5Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
- Answer (a)
- find class limits and class boundaries
- label horizontal axis using continuous scale
- find class frequency
- label vertical axis for (relative) frequency
- draw bars using class boundaries and
(relative) frequency
6Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class limits
7Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class boundaries
Eg boundary between 154 and 155 is 154.5
8Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class boundaries
Eg boundary between 154 and 155 is 154.5
9Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class boundaries
heights of 325 students
Height cm
10Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) label horizontal axis using
continuous scale
heights of 325 students
140
150
160
170
180
190
200
Height cm
11Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
10
12Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
13Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
10
14Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
50
10
15Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
50
10
16Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
50
10
17Relative frequency histogram
18Stem-and-Leaf Display cholest Stem-and-leaf of
cholest N 62 Leaf Unit 10 1 1 6 4
1 899 13 2 001111111 30 2
22223333333333333 (11) 2 44444555555 21 2
66666677777 10 2 88 8 3 000 5 3 2
19Exercise2.4 create a histogram for this data set.
- The following is the concentration of mercury
in 30 lake trout caught in a major lake - 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
3.7 3.5 - 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
3.3 3.6 - 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
3.4 3.8 - Use boundaries 0.95-1.45, 1.45-1.95, 1.95-2.45,
2.45-2.95, 2.95-3.45, 3.45-3.95.
20Solution of exercise 2.5
21Population Frequency Curve (approximation of
histogram)
22Summary Measurements
- A parameter is a numerical summary measure of a
population distribution. - ( refers to the entire population )
- A statistic is a numerical quantity calculated
from the observations in a sample. (obtained from
information in the sample)
23Measures of Location and variability
- Chapter 2.4 Summary Measures of Location
- mean
- median
- quartiles
- Chapter 2.5 Summary Measures of variability
- range
- standard deviation(sd)
- Q-spread
24Mean
- The population mean, denoted by ?, is the balance
point of the population distribution, also called
the center of the mass, of the population
distribution.
25Mean
- The sample mean is the average of the all
observations. If a sample consists of
observations y1, y2, , yn, then the sample mean
is
26Example 2.4.1
- Here is the net worth of 10 residents of
Washington state (in thousands of dollars) 100,
1000, 250, 25, 750, 575, 2500, 3200, 670, 320.
Compute the sample mean of the net worth. - Solution Sample mean
The average net worth of the 10 residents is 1039
thousand dollars
27Continued
- What happens if we add Bill Gates' net worth of
40.5 billion dollars, which is 40500000
thousands of dollars? - an outlier (a number that stand apart from the
remainder of the data ). - 3,682,763
28the net worth of residents
40500000
710
29Median
- The population median, denoted by ? , is the
numerical value that divides the population
distribution in half. It is also called the
second quartile.
50
50
?
?
30- The sample median, denoted by M, is the middle
observation if n is odd, or the average of the
two middle observation if n is even. In either
case, the median is located at the position
(n1)/2 in the ordered data set.
31Example 2.4.1(continued)100, 1000, 250, 25, 750,
575, 2500, 3200, 670, 320
- Steps to find median
- Step1,Order observations from smallest to
largest. - 25 100 250 320 670 750 1000 1575
2500 3200 - Step 2,Count the observations, denote the total
number as n. n10
32- Step3,Find the location of the median, which is
in the (n1)/2 th position - If n is odd, the median is the middle value.
- If n is even, the median is the average of the
middle two values.(n/21/2) - (101)/25.5 ,the median is
- (670 750)/2710
33Exercise Including Bill Gates' net worth, what
is the median of the net worth.
- 100, 1000, 250, 25, 750, 575, 2500, 3200, 670,
320, 40500000 - Solution
- 25 100 250 320 670 750 1000 1575
2500 3200 40500000 - n11,(111)/26
- the median 750
34Remark
- The skewer on the right pulls the mean somewhat
to the right of the median. - The skewer on the left pulls the mean somewhat to
the left of the median
35 Population Quartiles
36Sample quartile
37Quartiles
- The first quartile, denoted by ? 1 , is the
numerical value that divides the lower half of
the population in half. The first sample
quartile, Q1 can estimate it. - The third quartile, denoted by ? 3 , is the
numerical value that divides the upper half of
the population in half. The third sample quartile
Q3 can estimate it. - The first and third sample quartiles, Q1 and Q3,
are similarly defined for samples. The median is
the second quartile, Q2.
38Example2.4.3 Find the quartiles Q1 and Q3 of
the data 3 5 8 2 11 5 4
8 8 6 9 7
-
-
- The first quartile is 4.5.
-
-
39How to find quartiles
- Step1.order the data, calculate the position of
median (n1)/2. - 2 3 4 5 5 6 7 8 8
8 9 11 - n12 , (121)/26.5
- Step2.determine the position of quartile by
calculating (n1)/21/2 - 6.56, (61)/23.5
-
40- Step3.Q1 (Q3)is found by counting from the lower
(higher) end to the observation in the quartile
position. - Note if the quartile position has a .5 decimal
part, we average the two observations on either
side. - Q1(45)/24.25
- Q3(88)/28
41Exercise 2.4.1
- data -1, 1
- data -2, 1,1
- data -3, -2, -1, 1, 1, 1, 1, 1, 1
- example 2
- 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
- 1, 2, 1, 2, 1, 2, 1, 2, 1, 20
42Exercise2.4.2
- Data (sorted!)
- 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
80 89 94 - Max 94, Min 35, n19, Mean 62,
Median 61 - Q3 Upper quartile middle of upper half
-
(include median if n is odd) - Q1 Lower quartile middle of lower half
-
(include median if n is odd) - Upper half
- 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72
76 80 89 94 - Q3 (7172)/2 71.5
- Lower half
- 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
80 89 94 - Q1 (49 56)/2 52.5
43Exercise2.4.3
- Researchers have investigated lead absorption in
children of parents who worked in a factory where
lead is used to make batteries. A stem and leaf
is given (n10) - 4 07
- 5
- 6 14
- 7 1349
- 8
- 9 2
- 10 3
- Compute the following quantities
- The sample mean ,
- sample median M, first quantile Q1 and third
quantile Q3. - Hint (101)/25.5, so median is the average of
the 5th and 6th numbers, which is ____. The 3th
number is Q1 which is_____, since the location
for Q1 is (51)/23. Symmetrically, the 8th is
Q3 which is ______. - Low Q1 M Q3 High
44Example 2.4.4
- Two classes take an exam
- The first class has score of 73,74,75,76,77
- The second class has score of 50,60,75,90,100
- Compare the performance of the two classes.
Mean75
45Chapter 2.5 Summary Measure of Variability
- range,
- standard deviation (sd)
- Q spread (inter quartile range) (IQR)
46- Range H-L The difference between the highest
/maximum measurement and lowest/minimum
measurement. (population sample, the same.) - Class177-734
- Class2100-5050
-
47Variance and Standard deviation
- Attempt 1.Compare the deviations of data from
mean and add. - For class1 the deviations
- 73-75-2, 74-75-1,
- 75-750, 76-751, 77-75 2
-
- -2 -10120
48- Attempt 2. Square the deviations to make them
positive - For class1squared deviatons
- 4,1,0,1,4, ss10
- Attempt3.Take Average of them (divide by n-1)
- (41014)/42.5 (Variance of scores of c1)
49- The sample variance, denoted by , is the
average squared distance of all measurements from
the sample mean. - The expression in the numerator is referred to as
a Sum of squares - Attempt 4.Take the square root to get back to
original units - For class1, s
1.58
50- Standard deviation is the positive square root of
the variance - The population standard deviation is denoted by
?, the sample standard deviation is denoted by s
or SD(stDev). - Exercise 2.4.3 Calculate the SD of scores in
class2 - 20.6155
51Q-spread Q3 Q1
- is the distance between the first and third
sample quartile. - The corresponding population q-spread is
similarly defined using the population quartiles
in place of the sample quartiles. - For class1,Q-spread76-742
- For class2,Q-spread90-6030
52Exercise 2.4.5
- Data set is given as follows
- 3 4 10 7 6
- mean median
- variance
- standard deviation
53 Variability- The standard deviation
- Standard deviation has also meaning when used
with only one sample. The number of measurements
that fall within 1, 2 and 3 standard deviations
of the mean are calculated by the following two
rules - -Empirical rule
- - Chebyshevs rule
- The empirical rule applied only to bell shaped
symmetrical distributions of data. - Chebyshevs rule applies to any set of data
54-Approximately 68 of the measurements fall
within 1 std of the mean. -Approximately 95 of
the measurements fall within 2 std of the
mean. -Approximately 99.7 of the measurements
fall within 3 std of the mean.
55Methods for Describing Sets of Data
. -At least 3/4 of the measurements fall within
two standard deviation of the mean, i.e.
-At least 8/9 of the measurements fall within
three standard deviation of the mean, i.e.
-In general, for kgt1, at least (1-1/k2) of the
measurements fall within k standard deviation of
the mean, i.e.
56Exercise 2.4.5
- The recorded temperature on the 24 launches
previous to the Challenger accident are given
here in a stem and leaf plot. Calculate the mean
and the standard deviation and use them to give
an interpretation of the amount of variability in
the data using either the empirical rule or
Chebyshevs rule (page 111). - 5 378
- 6 3677789
- 7 000023556689
- 8 01
- Hint it appears that the data are somewhat
bell-shaped, so we apply the empirical rule.
Mean_____, stDev_____. Based on the empirical
rule, check our answer with this data set how
many observations are within (62.8, 77.2)?____.
what is the percentage?_____. How many are
within (55.6, 84.4)? _______, what is the
percentage?________.
57Answer
- Mean70
- Sd7.2
- 17/2470.868
- 23/2495.895
58z-score
- In the above example, we observed that 31 degrees
is unusually low. When 31 is included in the data
set, mean68.44, stDev10.53. How low is it? To
evaluate a single score, we calculate its
z-score - The z-score corresponding to a particular
observation is given by - z(observation-mean)/standard deviation
59z-score
- Negative z-score indicates that the observation
is below the mean. It is generally assumed that
any observation with a z-score greater than 3 in
absolute value is an outlier
60Exercise2.4.6
- Here are the mean and SD of 800 m runs and long
jumps - 800mmean137 sec sd5 sec
- Long jump mean6 m sd0.3 m
- If Bachers 800 m time was 129 secends and
Prokhorovas winning long jump was - 6.6 m, which performance deserve more points?
61Exercise2.4.7
- We have a data set of ages of 11 students in one
university. - 22 21 27 32 19 20 22 23 18 25
- Draw the stem-and-leaf plot and histogram
- Compute the sample mean and median
- Compute the range and Q-spread .