Title: Population Distributions
1Chapter 7
2Chapter 7 Population Distributions
- 7.1 Describing the Distribution of Values in a
Population - 7.2 Population Models for Continuous Numerical
Variables - 7.3 Normal Distributions
37.1 Describing the Distribution of Values in a
Population
- An Example How is the performance of an online
registration system for college courses? - Population All the students who use the system
- Variable of Interest Time to complete
registration - A variable associates a value with each
individual or object in a population. - The distribution of all the values of a numerical
variable or all the categories of a categorical
variable is called a population distribution.
4Categorical Variables Example
- In a study of factors related to air quality,
monitors were posted at every entrance to a
California university campus on October 10, 2003.
From 6am to 10 pm, monitors recorded the mode of
transportation for each person entering the
campus. Based on the information collected, the
population distribution of the variable
x mode of transportation was
constructed.
5Numerical Variables
- Numerical variables can be either discrete or
continuous. - A discrete numerical variable is one whose
possible values are isolated points along the
number line. - A continuous numerical variable is one whose
possible values form an interval along the number
line. - A discrete numerical variable can be summarized
by a relative frequency histogram. - A continuous numerical variable can be summarized
by a density histogram.
Therefore, relative frequency
(density)(interval width).
6Example Pet Ownership
- The Department of Animal Regulation released
information on pet ownership for the population
of all households in a particular county. The
variable considered was - x number of licensed dogs or cats for a
household. - Summarize the population distribution in a
relative frequency histogram. - Is x discrete or continuous?
- What is the probability of observing a household
with 3 or more licensed dogs or cat?
Answer on the next slide
7Answer to the Example Pet Ownership
- x number of licensed dogs or cats for a
household - Possible x values are 0, 1, 2, 3, 4 and 5. These
are isolated points along the number line, x is a
discrete variable. - The probability of observing a household with 3
or more licensed dogs or cats is - P ( x 3 ) P ( x 3 ) P ( x 4 ) P ( x
5 ) - 0.9 .03 .01 0.13
- Exercises
- What is the probability of observing a household
with at most 2 licensed dogs or cat? - What is the probability of observing a household
with at least 2 licensed dogs or cat?
8Example Birth Weights
- Birth weight was recorded for all full-term
babies born during 2002 in a semirural county.
The variable x birth weight for a full-term
baby in this county is an example of a continuous
numerical variable. - We can construct a density histogram of describe
the population distribution of x values.
- Density the height of each rectangle
- Relative frequency Probability
- (density)(interval width)
- (height)(interval width) area
9The Mean (µ) and Standard Deviation (s) of A
Numerical Variable
- The mean value of a numerical variable x, denoted
by µ, describes where the population distribution
of x is centered. - The standard deviation of a numerical variable x,
denoted by s, describes variability in the
population distribution. - When s near 0, the values of x tend to be close
to µ (little variability) when s is large, there
is more variability in the population of x
values.
10(No Transcript)
11 Which distribution has the largest standard
deviation?Which distribution has the largest
mean?Which distribution has a mean of about
5?Which distribution has the smallest standard
deviation?
12- The area of any rectangle in a density
histogram can be interpreted as the probability
of observing a variable value in that interval. - In (a), P ( 4.5 lt x lt 5.5 ) height (density) x
width (.05)(1) .05.
13A smooth curve specifies a continuous probability
distribution
- An observation of the four density histograms on
the preceding slide shows - A density histogram based on a small number of
intervals can be quite jagged. - As the number of interval increases, the
resulting histograms become much smoother in
appearance. - A smooth curve superimposed over a density
histogram such as the one shown on the right, is
called a continuous probability distribution.
A smooth superimposed over the density histogram
(d) of the preceding slide
147.2 Population Models for Continuous Numerical
Variables
- A continuous probability distribution is a smooth
curve, called a density curve, that serves as a
model for the population distribution of a
continuous variable. - Properties of continuous probability
distributions are - The total area under the curve is equal to 1
- The area under the curve and above any particular
interval is interpreted as the (approximate)
probability of observing a value in the
corresponding interval when an individual or
object is selected at random from the population.
15- Let x be a continuous numerical variable.
- For any particular number a, P( x a ) 0.
- (Because there is 0 area under the density curve
above a single x value.) - For any particular numbers a and b,
- P( x b ) P( x lt b )
- P( x a ) P( x gt a )
- P( a lt x lt b ) P( a x b )
- The above are NOT true for discrete numerical
variables!
16Example Departure Delays of A Commuter Train
- The length of time that elapses between the
scheduled departure time and the actual departure
time is recorded on 200 occasions, and the
resulting observations are summarized in the
density histogram.
The histogram in (a) is fairly flat, a reasonable
model for the population distribution is uniform
distribution in (b). The height of the density
curve is uniformly chosen to be 0.1, so that the
total area under the curve is equal to 1.
17Example 7.6 Priority Mail Package Weights
- Develop a reasonable model for the population
distribution. - The shape of the sample density histogram
suggests that a reasonable model for the
population is a triangular distribution. (See
next slide.) - 2. How to choose the height of the triangle so
that the total area under the probability
distribution curve 1? - Total area of triangle
- ½ ( base )( height ) 1.
- With base 2.0, the height must be equal to
1.
- 200 packages shipped using the
Priority Mail rate for packages under 2 lb were
weighed, resulting the following density
histogram. - Let x package weight (in pounds).
18Example Priority Mail Package Weights
Figure (a) histogram of package weight values
(b) continuous probability distribution for
package weight.
- Example Find the probability that a package
selected at random weighs less than 1.5 lb. - First using similar triangles to find the height
h at x 1.50 is 0.75. - Then P( x lt1.5 ) ½ (1.5)(0.75) .5625.
- Exercise Find P( x 1.5 ) and P( x 1.5 ).
19Example Service Times of An Airlines Reservation
- An airlines toll-free reservation number
recorded the length of time required to provide
service to each of 500 callers. Let x service
time.
- What is the population?
- Develop a model for the population distribution.
20Example Service Times of An Airlines Reservation
Figure (a) histogram of service times (b)
continuous distribution of service times.
- What is the probability that the service time for
a randomly selected caller lasts less than 3
minutes? - P ( x lt 3 ) ( height )( width ) (1/24)(3)
1/8 - 2. What is the probability that the service time
for a randomly selected caller lasts greater than
8 minutes? - 3. What is the probability that the service time
for a randomly selected caller lasts between 2
and 4 minutes?
21Example Telephone Registration Times
- Students at a university use a telephone
registration system to register for courses. The
variable - x length of time required for a student to
register. - The general form of the density histogram can be
described as bell shaped and symmetric. - The probability model of this problem is an
example of a type of symmetric bell-shaped
distribution known as a normal probability
distribution.
The superimposed smooth curve is a
reasonable model for the population distribution.
227.3 Normal Distributions
- Normal distributions are widely used for two
reasons - They provide a reasonable approximation to the
distribution of many different variables. - They play a central role in many of the
inferential procedures. - Normal distributions are distinguished by two
important parameters - The mean µ where the normal curve is centered.
- The standard deviation s how much the curve
spreads out around the center.
23Normal Distributions
- Normal distributions are continuous probability
distributions that are (1) bell shaped, (2)
symmetric and (3) the two tails die out quickly.
24The Standard Normal Distribution
- The normal distribution with µ 0 and s 1.
- The term z curve is used for the standard normal
curve. - P ( z lt z ) the cumulative area of z.
25Using Appendix Table 2 on page 706 and page 707
(also inside the back cover) to find standard
normal curve areas
- For any number z between -3.89 and 3.89 and
rounded to two decimal places, Appendix Table 2
gives - (area under z curve to the left of z) P ( z lt
z ) P ( z z ) - where z represents a variable whose distribution
is standard normal distribution . - To find this probability, locate the following
- The row labeled with the sign z and the digit to
either side of the decimal point (e. g., -1.7,
0.5, 3.6, etc.) - The column identified with the second digit to
the right of the decimal point in z (e. g., .06
if z -1.76) - The number at the intersection of this row and
column is the desired probability, P ( z lt z ) . - Example Find P ( z lt -1.76).
- Because -1.76 is a negative number, we
use the table on page. 706. First locate the row
labeled with -1.7 in the first column (z
column). Then we find P ( z lt -1.76) .0392 at
the intersection of this row and the column
labeled .06. - Note You may use an online z table instead of
Table 2, but you have to understand how to use it
because it may have a different design.
26 Using Table 2 (p. 706 p. 707) to Find Standard
Normal Probabilities ( Cumulative z Curve Areas)
- Examples Find the following probability
- 1. P( z lt -1.76) .0392
- Exercise P( z 0.58)
- P( z lt -4.12)
- P( z lt 4.18)
- 2. P( zgt1.96) 1 - P( zlt 1.96) 1 - .9750
.0250 - Another method P( zgt1.96) P( zlt -1.96)
.0250 - Exercise P(zgt -1.28)
- P(zgt 3.9)
- 3. P(-1.76lt z lt0.58) P( z 0.58) - P( z lt
-1.76) - .7190 - .0392 .6798
- Exercise P(-2.00lt z lt2.00)
Answer to the exercises P( z 0.58) .7190
P( z lt -4.12) 0 P( z lt 4.18) 1 P( z gt
-1.28) .8997 P( z gt 3.9) 0 P(-2.00 lt z lt
2.00) .9544.
27Example Identifying Extreme Values
- Find z such that P( zlt z ).02.
- Figure (a) shows that the cumulative area
for z is .02. (The area of .02 lt 0.5, so z must
be a negative number.) Therefore, we look for an
area of .02 in the body of Appendix Table 2. The
closest area in the table is .0202 in the -2.0
row and .05 column. So z -2.05.
- Find z such that P( z gt z ).05.
- P( z gt z ) .05 indicates that the area to
the right of z is .05 in Figure (b). Area to the
left of z is 1 -.05 .95. In Table 2 on page
707 .95 falls exactly between .9495
(corresponding to a z value of 1.64) and .9505
(corresponding to a z value of 1.65). So z
½(1.64 1.65)1.645.
28- Exercise Find the values that make up the most
extreme 5 of the standard normal distribution. - We need to separate the middle 95 from the
extreme 5. Because the standard normal
distribution is symmetric, the most extreme 5 is
equally divided between the high side and the low
side of the distribution, resulting in an area of
.025 for each of the tails of the z curve.
Symmetry about 0 implies that if z denotes the
value that separate the largest 2.5, the value
that separate the smallest 2.5 is simply -z.
Complete the problem by finding z.
Answer z 1.96
29Other Normal Distributions
- Let x be a variable whose behavior is described
by a normal distribution with mean µ and standard
deviation s. To calculate probabilities for x, we
standardize the relevant values and then use the
table for z curve areas. - P( xltb ) P( zltb )
- P( altx ) P( altz ) (equivalently, P( xgta )
P( zgta) ) - P( altxltb ) P( altzltb ),
- where
30Example Childrens Heights
- The height of a randomly selected 5-year-old
child is a normal distribution with a mean of µ
100 cm and standard deviation s 6cm. What
proportion of the heights is between 94 and 112
cm? - Let x the height of a randomly selected
5-year-old child.
About 82 of 5-year-old children have heights
between 94 and 112 cm. Exercise What is the
probability that a randomly selected 5-year-old
child will be taller than 110 cm?
Answer to Exercise 4.75
31Example IQ Scores
- A commonly used IQ scale has a mean of 100 and a
standard deviation of 15, and scores are
approximately normally distributed. (IQ score is
actually a discrete variable, but its population
distribution closely resembles a normal curve.) - What proportion of the population would qualify
for Mensa membership, which requires an IQ score
above 130? - What proportion of the population with IQ score
below 80? - What proportion of the population with IQ score
between 75 and 125?
32Solution to Example Registration Times
Let x IQ score of a randomly selected
individual. Given µ 100 and s 15.
- 1.
- 2.
- 3. For you exercise.
-
-
Answer of 3. P ( 75 lt x lt 125 ) .9050
33Example Registration Times
- The length of time (in minutes) required for
students to complete telephone registration in a
particular university can be well approximated by
a normal distribution with mean µ 12 min and
standard deviation s 2 min. The university
would like to disconnect students automatically
after some amount of time has elapsed. Determine
the amount of time that should be allowed before
disconnecting a student if the university wants
only the largest 1 to be disconnected.
34Solution of Registration Time Example
- Let x be the length of registration time for a
randomly selected student. - Given µ 12 minutes and s 2 minutes. Let x
be the time (in minutes) the phone registration
should be disconnected. - P( x gt x ) 1, P( x lt x ) 99.
- The z value corresponding to the
- cumulative area .99 is 2.33.
- Therefore,
-
35Example Motor Vehicle Emissions
- The EPA has determined that the emissions of
nitrogen oxides, which are major constituents of
smog, can be modeled using a normal distribution
with µ 1.6 and s 0.4. Suppose that the EPA
wants to offer some sort of incentive to get the
worst polluters off the road. What emission
levels constitute the worst 10 of the vehicles?
36Solution of Vehicle Emission Example
- Let x be the emission level of pollutant for a
randomly selected vehicle. - Given µ 1.6 and s 0.4. Let x be the
emission level that constitutes the worst 10
(the highest 10 emission level). - P( x gt x ) 10, P( x lt x ) 90.
- The z value corresponding to area .9 is 1.28.
- Therefore,
-