Title: Central Limit Theorem
1Central Limit Theorem
- Example (NOTE THAT THE ANSWER IS CORRECTED
COMPARED TO NOTES5.PPT) - 5 chemists independently synthesize a compound 1
time each. - Each reaction should produce 10ml of a substance.
- Historically, the amount produced by each
reaction has been normally distributed with std
dev 0.5ml. - Whats the probability that less than 49.8mls of
the substance are made in total? - Whats the probability that the average amount
produced is more than 10.1ml? - 3. Suppose the average amount produced is more
than 11.0ml. Is that a rare event? Why or why
not? If more than 11.0ml are made, what might
that suggest?
2Answer
- Central limit theorem
- If E(Xi)m and Var(Xi)s2 for all i (and
independent) then - X1Xn N(nm,ns2)
- (X1Xn)/n N(m,s2/n)
3Lab
- Let Y total amount made. YN(510,50.52) (by
CLT)Pr(Ylt49.8) Pr(Y-50)/1.12 lt
(49.8-50)/1.12Pr(Z lt -0.18) 0.43 - Let W average amount made.WN(10,0.52/5) (by
CLT)Pr(W gt 10.1) PrZ gt (10.1
10)/0.22Pr(Z gt 0.45) 0.33
4Lab (continued)
- One definition of rareIts a rare event if Pr(W
gt 11.0) is small(i.e. if Seeing probability of
11.0 or something more extreme is
small)Pr(Wgt11) PrZ gt (11-10)/0.22
Pr(Zgt4.55) approximately zero. - This suggests that perhaps either the true mean
is not 10 or true std dev is not 0.5 (or not
normally distributed)
5Sample size 1006 (source gallup.com)
6- Let Xi 1 if person i thinks the Presidentis
hiding something and 0 otherwise. - Suppose E(Xi) p and Var(Xi) p(1-p) and each
persons opinion is independent. - Let Y total number of yesses X1 X1006
- Y Bin(1006,p)
- Suppose p 0.36 (this is the estimate)
- What is Pr(Y lt 352)?
Note that this definition turns three outcomes
intotwo outcomes
7Normal Approximation to the binomial CDF
Pr(Ylt352) Pr(Y0)Pr(Y351), where Pr(Yk)
(1006 choose k)0.36k0.641006-k
- Even with computers, as n gets large, computing
things like this can become difficult. (1006 is
OK, but how about 1,000,000?) - Idea Use the central limit theorem approximate
this probability - Y is approximately
- N10060.36, (0.36)(0.64)1006
N(362.16,231.8) (by central limit theorem) - Pr (Y-362.16)/15.2 lt (352-362.16)/15.2 Pr(Z
lt -0.67) 0.25
8Normal Approximation to the binomial CDF
Black step function is plots of bin(1006,0.36)
pdf versus Y (integers)
Blue line is plot of Normal(362.16,231.8) pdf
9Normal Approximation to the binomial CDF
Area under blue curve to left of 352 is
approximately equal to the sum of areas
of rectangles (black Stepfunction) to the left
of 352
10Comments about normal approximation of the
binomial Rule of thumb is that its OK if npgt5
and n(1-p)gt5. Continuity correction Y is
binomial. If we use the normal approximation to
the probability that Yltk, we should calculate
Pr(Yltk.5) If we use the normal approximation to
the probability that Ygtk, we should calculate
Pr(Yltk-.5) (see picture on board)
11Probability meaning of 6 sigma
- Even if you shift the process mean for the center
of the specifications to 1.5 standard deviations
toward one of the specifications, then you will
expect no more than 3.4 out of a million defects
outside of the specification toward which you
shifted. - (I know its convoluted, but thats the
definition)
12What does 6 sigma mean?(example)
- Suppose a product has a quantitative
specificationex Make the gap between the car
door and the car body between 3.4 and 4.6mm. - When cars are actually made, the std dev of car
door gap is 0.1mm. i.e. X1,,Xn are gap widths.
The sqrt(sample variance of X1,,Xn) 0.1mm
13Statistically, six sigma means that Upper Spec
Lower Spec gt 12 sigma (i.e. Specs are fixed.
Lower the manufactuing process variability.)
Distribution of gap widths
Lower specification
Upper specification
Center of spec 4mm gap
Shifted mean 3.85mm gap
3.4mm
4.6mm
4.6 3.4 1.2 120.1 12sigma
Probability of beingout here is Pr( gap is less
than 3.4 ) Pr( (gap 3.85)/0.1 lt
(3.4-3.85)/.1) Pr( Z lt -4.5) 3.4/1,000,000
Arbitrary magic number for 6s
14Probability meaning of 6 sigma
- In general
- Assume process mean is 1.5 standard deviations
toward the lower spec i.e. E(X)4-1.5s and
assume X has a normal distribution. - When the process is in control enough so that
the distance between the center of the specs and
the lower spec is least 6s, then - Pr(X below lower spec) Pr( Xlt4- 6s)Pr(X-
(4-1.5s))/s lt (4-6s-(4-1.5s))/s Pr(Zlt-4.5)
3.4/1,000,000
15Control Charts
- Let X an average of n measurements.
- Each measurement has mean m andvariance s2.
- Fact
- By the central limit theorem, almost all
observations of X fall in the interval m /-
3s/sqrt(n) (i.e. mean /- 3 standard deviations) - s/sqrt(n) is also called sx or standard error
16Use the fact to detect changes in production
quality
- Idea let xi average door gap from the n cars
made by shift i at the car plant
m3 s/sqrt(n) (Upper Control Limit)
x7
x6
x1
x8
x3
m
x2
x5
m-3 s/sqrt(n) (Lower Control Limit)
x4
shift
Points outside the /- 3 std error bounds, are
called out of control. They are evidence that
m and or s are not the true mean and std dev any
more, and the process needs to be readjusted.
Calculate the false alarm rate ( 26/10,000)
17Assume 100 new people arepolled.Assume true
pr( a new person says yes) 0.36.Let P P
hat number say yes/100 Whats an
approximation tothe distribution of P-hat?Use
the approximation todetermine a number so
thatthe Pr(p-hatgt that number) 0.95.
18EXAMPLE OF SAMPLING DISTRIBUTION OF P-HAT Xk 1
if person k says yes and 0 if not. Note that
E(Xk)0.36p and Var(Xk)0.360.64p(1-p) Note
that Xk is binomial(1,0.36). P-hat
(X1X100)/100. By CLT, P-hat is approximately
N(0.36,0.360.64/100). (Rule of thumb is that
this approximation is good if npgt5 and n(1-p)gt5.)
19- Suppose true p is 0.36.
- If survey is conducted again on 100 people, then
P-hat N(.36,(.36)(.64)/100) N(.36,
0.002304)Want p0 so that Pr(P-hatltp0) 0.95
Pr(P-hatltp0) 0.95 means Pr(Z lt (p0-.36)/0.048)
0.95.Since Pr(Zlt1.645) 0.95,(p0-.36)/0.048
1.645(p0-.36) 0.07896p0 0.43896
20- Suppose true p is 0.40.
- If survey is conducted again on 49 people, whats
the probability of seeing 38 to 44 favorable
responses? - Pr( 0.38 lt P hat lt 0.44)
- Pr(0.38-0.40)/sqrt(0.400.60/49) lt Z lt
(0.44-0.40)/sqrt(0.400.60/49) - Pr(-0.29 lt Z lt 0.57) Pr(Zlt0.57)
Pr(Zlt-0.29) 0.7157-0.38590.3298