Title: Examples of continuous probability distributions:
1Examples of continuous probability distributions
- The normal and standard normal
2The Normal Distribution
f(X)
Changing µ shifts the distribution left or right.
Changing s increases or decreases the spread.
s
X
µ
3The Normal Distributionas mathematical function
(pdf)
Note constants ?3.14159 e2.71828
4The Normal PDF
- Its a probability function, so no matter what
the values of ? and ?, must integrate to 1!
5Normal distribution is defined by its mean and
standard dev.
- E(X)?
- Var(X)?2
- Standard Deviation(X)?
6The beauty of the normal curve
No matter what ? and ? are, the area between ?-?
and ?? is about 68 the area between ?-2? and
?2? is about 95 and the area between ?-3? and
?3? is about 99.7. Almost all values fall
within 3 standard deviations.
768-95-99.7 Rule
868-95-99.7 Rulein Math terms
9How good is rule for real data?
- Check some example data
- The mean of the weight of the women 127.8
- The standard deviation (SD) 15.5
1068 of 120 .68x120 82 runners In fact, 79
runners fall within 1-SD (15.5 lbs) of the mean.
127.8
1195 of 120 .95 x 120 114 runners In fact,
115 runners fall within 2-SDs of the mean.
127.8
1299.7 of 120 .997 x 120 119.6 runners In
fact, all 120 runners fall within 3-SDs of the
mean.
127.8
13Example
- Suppose SAT scores roughly follows a normal
distribution in the U.S. population of
college-bound students (with range restricted to
200-800), and the average math SAT is 500 with a
standard deviation of 50, then - 68 of students will have scores between 450 and
550 - 95 will be between 400 and 600
- 99.7 will be between 350 and 650
14Example
- BUT
- What if you wanted to know the math SAT score
corresponding to the 90th percentile (90 of
students are lower)? - P(XQ) .90 ?
Solve for Q?.Yikes!
15The Standard Normal (Z)Universal Currency
- The formula for the standardized normal
probability density function is
16The Standard Normal Distribution (Z)
- All normal distributions can be converted into
the standard normal curve by subtracting the mean
and dividing by the standard deviation
Somebody calculated all the integrals for the
standard normal and put them in a table! So we
never have to integrate! Even better, computers
now do all the integration.
17Comparing X and Z units
100
200
X
(? 100, ? 50)
Z
2.0
0
(? 0, ? 1)
18Example
- For example Whats the probability of getting a
math SAT score of 575 or less, ?500 and ?50? -
- i.e., A score of 575 is 1.5 standard deviations
above the mean
Yikes! But to look up Z 1.5 in standard normal
chart (or enter into SAS)? no problem! .9332
19Looking up probabilities in the standard normal
table
What is the area to the left of Z1.50 in a
standard normal curve?
Area is 93.32
20Looking up probabilities in the standard normal
table
What is the area to the left of Z1.51 in a
standard normal curve?
Area is 93.45
21Probit function the inverse
- ?(area) Z gives the Z-value that goes with
the probability you want - For example, recall SAT math scores example.
Whats the score that corresponds to the 90th
percentile? - In the Table, find the Z-value that corresponds
to an area of 90...
2290 area corresponds to a Z score of about 1.28.
23Probit function the inverse
- Z1.28 convert back to raw SAT score ?
- 1.28 X 500 1.28 (50)
- X1.28(50) 500 564 (1.28 standard
deviations above the mean!)
24Practice problem
- If birth weights in a population are normally
distributed with a mean of 109 oz and a standard
deviation of 13 oz, - What is the chance of obtaining a birth weight of
141 oz or heavier when sampling birth records at
random? - What is the chance of obtaining a birth weight of
120 or lighter?
25Answer
- What is the chance of obtaining a birth weight of
141 oz or heavier when sampling birth records at
random?
26Area to the left of Z2.46 is .9931
Area to the right of 2.46 is 1-.9931 .0069 or
.69
27Answer
- b. What is the chance of obtaining a birth
weight of 120 or lighter?
28Area to the left of Z0.85 is .8023 or 80.23.
29Practice problem 2 DSST (a measure of cognitive
function) is a normally distributed trait
Normally distributed Mean 28 points Standard
deviation 10 points
30Practice problem 2
- a. What percent of people have values of DSST
above 38? - b. What percent of people have values of DSST
below 8?
31Answers
- a. What percent of people have values of DSST
above 38?
Thus, 16 of people have DSSTs above 38.
32Answers
- b. What percent of people have values of DSST
below 8?
Thus, 2.5 of people have DSSTs below 8.
33Review question 1
- The probability that a standardized normal
variable Z is positive is ____. - 100
- 50
- 10
- 0
34Review question 2
- The probability that Z is between -2 and -1 is
_____. - 50
- 34
- 25.5
- 13.5
35Review question 3
- The probability that Z values are larger than
_____ is 0.6985. - Z1
- Z0
- Z-.5
- Z.5
36Review question 4
- 27 of Z values are smaller than ____.
- Z0
- Z1
- Z-.6
- Z.6
37Are my data normal?
- Not all continuous random variables are normally
distributed!! - It is important to evaluate how well the data are
approximated by a normal distribution
38Are my data normally distributed?
- Look at the histogram! Does it appear bell
shaped? - Compute descriptive summary measuresare mean,
median, and mode similar? - Do 2/3 of observations lie within 1 std dev of
the mean? Do 95 of observations lie within 2 std
dev of the mean? - Look at a normal probability plotis it
approximately linear? - Run tests of normality (such as
Kolmogorov-Smirnov). But, be cautious, highly
influenced by sample size!
39Data from our class
Median 8 Mean 8.8 Mode 0
SD 8.3 Range 0 to 32 ( 4 s)
40Data from our class
Median 45 Mean 41 Mode 6
SD 23 Range 0 to 83 ( 3.5 s)
41Data from our class
Median 4 Mean 3.7 Mode 4
SD 1.8 Range 0.5 to 7 ( 3.5 s)
42Data from our class
Median 18 Mean 20 Mode 20
SD 16 Range 2 to 70 (4 s)
43Data from our class
8.8 /- 8.3 0.5 17.1
44Data from our class
8.8 /- 28.3 0 25.4
45Data from our class
8.8 /- 38.3 0 33.7
46Data from our class
41 /- 23 18 64
47Data from our class
41 /- 223 0 87
48Data from our class
41 /- 323 0 100
49Data from our class
3.7 /- 1.8 1.9 5.5
50Data from our class
3.7 /- 21.8 0.1 7.3
51Data from our class
3.7 /- 31.8 0 9.1
52Data from our class
20 /- 16 4 36
53Data from our class
20 /- 216 0 52
54Data from our class
20 /- 316 0 68
55The Normal Probability Plot
- Normal probability plot
- Order the data.
- Find corresponding standardized normal quantile
values - Plot the observed data values against normal
quantile values. - Evaluate the plot for evidence of linearity.
56Normal probability plot coffee
Right-Skewed! (concave up)
57Normal probability plot love of writing
A wiggly line!
58Norm prob. plot Exercise
Mostly a straight line!
59Norm prob. plot Wake up time
Right-Skewed! (concave up)
60Formal tests for normality
- Results
- Coffee Moderate evidence of non-normality
(p.008 to p.11) - Writing love No evidence of non-normality (all
pgt.15) - Exercise No evidence of non-normality (all
pgt.15) - Homework Strong evidence of non-normality (all
plt.01)
61Review question 5
- Which of the following does NOT support the
conclusion that your data are normally
distributed - The histogram is bell-shaped.
- The normal probability plot is approximately a
straight line. - The mean and the median are far apart.
- Formal tests of normality (with fancy Russian
names) yield high p-values.
62Normal approximation to the binomial
- When you have a binomial distribution where the
expected value is greater than 5 (npgt5), then
the binomial starts to look like a normal
distribution? -
- Recall What is the probability of being a smoker
among a group of cases with lung cancer is .6,
whats the probability that in a group of 8 cases
you have less than 2 smokers?
63Normal approximation to the binomial
- When you have a binomial distribution where n is
large and p isnt too small (rule of thumb
meangt5), then the binomial starts to look like a
normal distribution? - Recall smoking example
64Normal approximation to binomial
What is the probability of fewer than 2 smokers?
Exact binomial probability (from before) .00065
.008 .00865
Normal approximation probability ?4.8 ?1.39
P(Zlt2).022
65- A little off, but in the right ballpark we
could also use the value to the left of 1.5 (as
we really wanted to know less than but not
including 2 called the continuity correction)
A fairly good approximation of the exact
probability, .00865.
P(Z-2.37) .0069
66Practice problem
- 1. You are performing a cohort study. If the
probability of developing disease in the exposed
group is .25 for the study duration, then if you
sample (randomly) 500 exposed people, Whats the
probability that at most 120 people develop the
disease?
67Answer
OR use, normal approximation ?np500(.25)125
and ?2np(1-p)93.75 ?9.68
68Review question 6
- If you flip a coin 1600 times, what is the
approximate probability that you will get fewer
than 860 heads? - 25
- 2.5
- 0.5
- 0.005
69Review Problem 7
- Which of the following about the normal
distribution is NOT true? - Theoretically, the mean, median, and mode are the
same. - About 2/3 of the observations fall within 1
standard deviation from the mean. - It is a discrete probability distribution.
- Its parameters are the mean, ? , and standard
deviation, ?.
70Proportions
- The binomial distribution forms the basis of
statistics for proportions. - A proportion is just a binomial count divided by
n. - For example, if we sample 200 cases and find 60
smokers, X60 but the observed proportion.30. - Statistics for proportions are similar to
binomial counts, but differ by a factor of n.
71Stats for proportions
For proportion
72It all comes back to Z
- Statistics for proportions are based on a normal
distribution, because the binomial can be
approximated as normal if npgt5
73Homework
- Problem Set 3
- Reading Vickers 10-15
- Journal article/article review sheet