Histogram - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Histogram

Description:

draw bars using class boundaries and (relative) frequency. Histogram - example ... Draw a histogram to illustrate the above data. ... – PowerPoint PPT presentation

Number of Views:2251
Avg rating:3.0/5.0
Slides: 62
Provided by: JJ16
Category:
Tags: draw | histogram

less

Transcript and Presenter's Notes

Title: Histogram


1
Histogram
  • The histogram is a graphical means of
    displaying the numerical data. If we slice up the
    entire span of values covered by the quantitative
    variable into equal-width piles called bins
    (classes), a histogram plots the bin counts
    (class counts) as the heights of bars
  • It can be constructed from the stem and leaf
    plot each stem defines an interval of values as
    a class. The class limits are the smallest and
    largest possible values for the interval. Now go
    back to Example 2.3.

2
Grouped frequency table Example of 2.4
3
Constructed Histogram
4
  • Steps of construction
  • find class limits and class boundaries
  • find class frequency and construct grouped
    frequency table
  • label horizontal axis using continuous scale
  • label vertical axis for (relative) frequency
  • draw bars using class boundaries and (relative)
    frequency

5
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.
  • Answer (a)
  • find class limits and class boundaries
  • label horizontal axis using continuous scale
  • find class frequency
  • label vertical axis for (relative) frequency
  • draw bars using class boundaries and
    (relative) frequency

6
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class limits
7
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class boundaries
Eg boundary between 154 and 155 is 154.5
8
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class boundaries
Eg boundary between 154 and 155 is 154.5
9
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class boundaries
heights of 325 students
Height cm
10
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) label horizontal axis using
continuous scale
heights of 325 students
140
150
160
170
180
190
200
Height cm
11
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
10
12
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
13
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
10
14
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
50
10
15
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
50
10
16
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
50
10
17
Relative frequency histogram
18
Stem-and-Leaf Display cholest Stem-and-leaf of
cholest N 62 Leaf Unit 10 1 1 6 4
1 899 13 2 001111111 30 2
22223333333333333 (11) 2 44444555555 21 2
66666677777 10 2 88 8 3 000 5 3 2
19
Exercise2.4 create a histogram for this data set.
  • The following is the concentration of mercury
    in 30 lake trout caught in a major lake
  • 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
    3.7 3.5
  • 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
    3.3 3.6
  • 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
    3.4 3.8
  • Use boundaries 0.95-1.45, 1.45-1.95, 1.95-2.45,
    2.45-2.95, 2.95-3.45, 3.45-3.95.

20
Solution of exercise 2.5
21
Population Frequency Curve (approximation of
histogram)
22
Summary Measurements
  • A parameter is a numerical summary measure of a
    population distribution.
  • ( refers to the entire population )
  • A statistic is a numerical quantity calculated
    from the observations in a sample. (obtained from
    information in the sample)

23
Measures of Location and variability
  • Chapter 2.4 Summary Measures of Location
  • mean
  • median
  • quartiles
  • Chapter 2.5 Summary Measures of variability
  • range
  • standard deviation(sd)
  • Q-spread

24
Mean
  • The population mean, denoted by ?, is the balance
    point of the population distribution, also called
    the center of the mass, of the population
    distribution.

25
Mean
  • The sample mean is the average of the all
    observations. If a sample consists of
    observations y1, y2, , yn, then the sample mean
    is

26
Example 2.4.1
  • Here is the net worth of 10 residents of
    Washington state (in thousands of dollars) 100,
    1000, 250, 25, 750, 575, 2500, 3200, 670, 320.
    Compute the sample mean of the net worth.
  • Solution Sample mean

The average net worth of the 10 residents is 1039
thousand dollars
27
Continued
  • What happens if we add Bill Gates' net worth of
    40.5 billion dollars, which is 40500000
    thousands of dollars?
  • an outlier (a number that stand apart from the
    remainder of the data ).
  • 3,682,763

28
the net worth of residents

40500000
710
29
Median
  • The population median, denoted by ? , is the
    numerical value that divides the population
    distribution in half. It is also called the
    second quartile.

50
50
?
?
30
  • The sample median, denoted by M, is the middle
    observation if n is odd, or the average of the
    two middle observation if n is even. In either
    case, the median is located at the position
    (n1)/2 in the ordered data set.

31
Example 2.4.1(continued)100, 1000, 250, 25, 750,
575, 2500, 3200, 670, 320
  • Steps to find median
  • Step1,Order observations from smallest to
    largest.
  • 25 100 250 320 670 750 1000 1575
    2500 3200
  • Step 2,Count the observations, denote the total
    number as n. n10

32
  • Step3,Find the location of the median, which is
    in the (n1)/2 th position
  • If n is odd, the median is the middle value.
  • If n is even, the median is the average of the
    middle two values.(n/21/2)
  • (101)/25.5 ,the median is
  • (670 750)/2710

33
Exercise Including Bill Gates' net worth, what
is the median of the net worth.
  • 100, 1000, 250, 25, 750, 575, 2500, 3200, 670,
    320, 40500000
  • Solution
  • 25 100 250 320 670 750 1000 1575
    2500 3200 40500000
  • n11,(111)/26
  • the median 750

34
Remark
  • The skewer on the right pulls the mean somewhat
    to the right of the median.
  • The skewer on the left pulls the mean somewhat to
    the left of the median

35
Population Quartiles
36
Sample quartile
37
Quartiles
  • The first quartile, denoted by ? 1 , is the
    numerical value that divides the lower half of
    the population in half. The first sample
    quartile, Q1 can estimate it.
  • The third quartile, denoted by ? 3 , is the
    numerical value that divides the upper half of
    the population in half. The third sample quartile
    Q3 can estimate it.
  • The first and third sample quartiles, Q1 and Q3,
    are similarly defined for samples. The median is
    the second quartile, Q2.

38
Example2.4.3 Find the quartiles Q1 and Q3 of
the data 3 5 8 2 11 5 4
8 8 6 9 7
  • The first quartile is 4.5.

39
How to find quartiles
  • Step1.order the data, calculate the position of
    median (n1)/2.
  • 2 3 4 5 5 6 7 8 8
    8 9 11
  • n12 , (121)/26.5
  • Step2.determine the position of quartile by
    calculating (n1)/21/2
  • 6.56, (61)/23.5

40
  • Step3.Q1 (Q3)is found by counting from the lower
    (higher) end to the observation in the quartile
    position.
  • Note if the quartile position has a .5 decimal
    part, we average the two observations on either
    side.
  • Q1(45)/24.25
  • Q3(88)/28

41
Exercise 2.4.1
  • data -1, 1
  • data -2, 1,1
  • data -3, -2, -1, 1, 1, 1, 1, 1, 1
  • example 2
  • 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
  • 1, 2, 1, 2, 1, 2, 1, 2, 1, 20

42
Exercise2.4.2
  • Data (sorted!)
  • 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
    80 89 94
  • Max 94, Min 35, n19, Mean 62,
    Median 61
  • Q3 Upper quartile middle of upper half

  • (include median if n is odd)
  • Q1 Lower quartile middle of lower half

  • (include median if n is odd)
  • Upper half
  • 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72
    76 80 89 94
  • Q3 (7172)/2 71.5
  • Lower half
  • 35 37 45 46 49 56 57 57 59 61 62 64 68 71 72 76
    80 89 94
  • Q1 (49 56)/2 52.5

43
Exercise2.4.3
  • Researchers have investigated lead absorption in
    children of parents who worked in a factory where
    lead is used to make batteries. A stem and leaf
    is given (n10)
  • 4 07
  • 5
  • 6 14
  • 7 1349
  • 8
  • 9 2
  • 10 3
  • Compute the following quantities
  • The sample mean ,
  • sample median M, first quantile Q1 and third
    quantile Q3.
  • Hint (101)/25.5, so median is the average of
    the 5th and 6th numbers, which is ____. The 3th
    number is Q1 which is_____, since the location
    for Q1 is (51)/23. Symmetrically, the 8th is
    Q3 which is ______.
  • Low Q1 M Q3 High

44
Example 2.4.4
  • Two classes take an exam
  • The first class has score of 73,74,75,76,77
  • The second class has score of 50,60,75,90,100
  • Compare the performance of the two classes.

Mean75
45
Chapter 2.5 Summary Measure of Variability
  • range,
  • standard deviation (sd)
  • Q spread (inter quartile range) (IQR)

46
  • Range H-L The difference between the highest
    /maximum measurement and lowest/minimum
    measurement. (population sample, the same.)
  • Class177-734
  • Class2100-5050

47
Variance and Standard deviation
  • Attempt 1.Compare the deviations of data from
    mean and add.
  • For class1 the deviations
  • 73-75-2, 74-75-1,
  • 75-750, 76-751, 77-75 2
  • -2 -10120

48
  • Attempt 2. Square the deviations to make them
    positive
  • For class1squared deviatons
  • 4,1,0,1,4, ss10
  • Attempt3.Take Average of them (divide by n-1)
  • (41014)/42.5 (Variance of scores of c1)

49
  • The sample variance, denoted by , is the
    average squared distance of all measurements from
    the sample mean.
  • The expression in the numerator is referred to as
    a Sum of squares
  • Attempt 4.Take the square root to get back to
    original units
  • For class1, s

1.58
50
  • Standard deviation is the positive square root of
    the variance
  • The population standard deviation is denoted by
    ?, the sample standard deviation is denoted by s
    or SD(stDev).
  • Exercise 2.4.3 Calculate the SD of scores in
    class2
  • 20.6155

51
Q-spread Q3 Q1
  • is the distance between the first and third
    sample quartile.
  • The corresponding population q-spread is
    similarly defined using the population quartiles
    in place of the sample quartiles.
  • For class1,Q-spread76-742
  • For class2,Q-spread90-6030

52
Exercise 2.4.5
  • Data set is given as follows
  • 3 4 10 7 6
  • mean median
  • variance
  • standard deviation

53
Variability- The standard deviation
  • Standard deviation has also meaning when used
    with only one sample. The number of measurements
    that fall within 1, 2 and 3 standard deviations
    of the mean are calculated by the following two
    rules
  • -Empirical rule
  • - Chebyshevs rule
  • The empirical rule applied only to bell shaped
    symmetrical distributions of data.
  • Chebyshevs rule applies to any set of data

54
  • Empirical rule

-Approximately 68 of the measurements fall
within 1 std of the mean. -Approximately 95 of
the measurements fall within 2 std of the
mean. -Approximately 99.7 of the measurements
fall within 3 std of the mean.
55
Methods for Describing Sets of Data
  • Chebyshevs rule

. -At least 3/4 of the measurements fall within
two standard deviation of the mean, i.e.
-At least 8/9 of the measurements fall within
three standard deviation of the mean, i.e.
-In general, for kgt1, at least (1-1/k2) of the
measurements fall within k standard deviation of
the mean, i.e.
56
Exercise 2.4.5
  • The recorded temperature on the 24 launches
    previous to the Challenger accident are given
    here in a stem and leaf plot. Calculate the mean
    and the standard deviation and use them to give
    an interpretation of the amount of variability in
    the data using either the empirical rule or
    Chebyshevs rule (page 111).
  • 5 378
  • 6 3677789
  • 7 000023556689
  • 8 01
  • Hint it appears that the data are somewhat
    bell-shaped, so we apply the empirical rule.
    Mean_____, stDev_____. Based on the empirical
    rule, check our answer with this data set how
    many observations are within (62.8, 77.2)?____.
    what is the percentage?_____. How many are
    within (55.6, 84.4)? _______, what is the
    percentage?________.

57
Answer
  • Mean70
  • Sd7.2
  • 17/2470.868
  • 23/2495.895

58
z-score
  • In the above example, we observed that 31 degrees
    is unusually low. When 31 is included in the data
    set, mean68.44, stDev10.53. How low is it? To
    evaluate a single score, we calculate its
    z-score
  • The z-score corresponding to a particular
    observation is given by
  • z(observation-mean)/standard deviation

59
z-score
  • Negative z-score indicates that the observation
    is below the mean. It is generally assumed that
    any observation with a z-score greater than 3 in
    absolute value is an outlier

60
Exercise2.4.6
  • Here are the mean and SD of 800 m runs and long
    jumps
  • 800mmean137 sec sd5 sec
  • Long jump mean6 m sd0.3 m
  • If Bachers 800 m time was 129 secends and
    Prokhorovas winning long jump was
  • 6.6 m, which performance deserve more points?

61
Exercise2.4.7
  • We have a data set of ages of 11 students in one
    university.
  • 22 21 27 32 19 20 22 23 18 25
  • Draw the stem-and-leaf plot and histogram
  • Compute the sample mean and median
  • Compute the range and Q-spread .
Write a Comment
User Comments (0)
About PowerShow.com