Title: More Probability
1More Probability Some (Small) Samples
http//www.shodor.org/interactivate/activities/Adj
ustableSpinner/?version1.5.0_06browserMSIEvend
orSun_Microsystems_Inc. http//www.rossmanchance
.com/iscat/petowner.html
2Dichotymous Populations
- Each observation falls into exactly 1 of 2
categories. - Success (S 1) and Failure (F 0)
- The proportion of Successes is p. This is a
parameter it summarizes a population. - p is a special case of a mean.
3Example 1
- Dave has 15 DVDs of movies, 9 of which are
comedies (the remaining 6 are dramas). - Choosing a comedy is a Success (1)
- Choosing a drama is a Failure (0)
- The assignment could be done the other way.
4Example 1
- 9 of Daves 15 movie DVDs are comedies p 9/15
0.60. - The comedies are labeled 1 and the dramas 0.
- Daves collection is a population (relative to
Dave selecting samples for actual viewing). - p is a special case of a mean.
5Example 1
- 9 of Daves 15 movie DVDs are comedies p 9/15
0.60. - Choose one DVD X is its value
- ? 0.60 if the random selection of a movie
from all 15 is repeated over and over, in the
long run, 0.60 will be the proportion of times
that a comedy is selected.
6Two Sampling Schemes
- Sampling Randomly Without Replacement
- Sampling Randomly With Replacement
-
7Example 1
- Dave intends to take 2 movies on a trip, and will
randomly select them. (2 is the sample size, n.
There are 2 trials.) - This is sampling without replacement.
- A 1st selection is a comedy
- P(A) 9/15
- B 2nd selection is a comedy
- P(B) ???
8Example 2
- Consider 3 comedies and 2 dramas, well attend
two of them chosen at random. - C College Road Trip K Kite Runner
- W Witless Protection S Semi-Pro
- N No Country for Old Men
- A 1st is comedy Pr(A) 3/5 0.6
- B 2nd is comedy Pr(B) ????
- CK CW CS CN KW KS KN WS WN
SN - KC WC SC NC WK SK NK SW NW
NS
9Example 2
- Consider 3 comedies and 2 dramas, well attend
two of them chosen at random. - C College Road Trip K Kite Runner
- W Witless Protection S Semi-Pro
- N No Country for Old Men
- A 1st is comedy Pr(A) 3/5 0.6
- B 2nd is comedy Pr(B) ????
- CK CW CS CN KW KS KN WS WN SN
- KC WC SC NC WK SK NK SW NW NS
10Example 2
- Consider 3 comedies and 2 dramas, well attend
two of them chosen at random. - C College Road Trip K Kite Runner
- W Witless Protection S Semi-Pro
- N No Country for Old Men
- A 1st is comedy Pr(A) 3/5 0.6
- B 2nd is comedy Pr(B) 12/20 0.6
- CK CW CS CN KW KS KN WS WN SN
- KC WC SC NC WK SK NK SW NW NS
11Example 1
- Dave intends to take 2 movies on a trip, and will
randomly select them. (2 is the sample size, n.
There are 2 trials.) - This is sampling without replacement.
- A 1st selection is a comedy
- P(A) 9/15
- B 2nd selection is a comedy
- P(B) ???
- (9/15. But dont try convincing anyone of this.)
12Example 1
- Dave intends to take 2 movies on a trip, and will
randomly select them. (2 is the sample size, n.
There are 2 trials.) - This is sampling without replacement.
- A 1st selection is a comedy
- P(A) 9/15
- B 2nd selection is a comedy
- P(B) 9/15
- (But many people wont like this.)
13Example 1
- The probability the 1st selection is not a comedy
is 6/15 - P(is comedy) P(isnt comedy) 9/15 6/15 1
14Complements
- Let A be any event. The event is the
event that A does not occur. Since either A
occurs or it does not occur
15Conditional Probability
- Consider a sample of size 2. Let A be that the
first trial is a Success B that the second is a
Success.
is the probability that B occurs given that A is
known to have occurred. This type of probability
is called a conditional probability.
16Example 1
- A 1st choice is comedy
- B 2nd choice is comedy.
is the conditional probability that the 2nd
choice is a comedy, given that the first choice
is known to be a comedy. Taking out 1 comedy
leaves 8 from 14.
17Example 1
- A 1st choice is not comedy
- B 2nd choice is comedy.
is the conditional probability that the 2nd
choice is a comedy given that the first choice is
known to be drama (not a comedy). Taking out 1
drama leaves 9 comedies in 14.
18Example 3 Toddler Matching
- 3 toddlers are pictured. People are invited to
match the toddler to her father (who is known to
the people). - Assume that everyone guesses and consider the
guesses of 2 people. - A 1st person guesses right
- B 2nd person guesses right
19Example 3
- A 1st person guesses right
- B 2nd person guesses right
20Contrasting Examples
- Toddler Matching
- Sampling with replacement
- Independent trials
Choosing DVDs Sampling without
replacement Dependent trials
21 Dependent and Independent Events
Dependent trials (sampling w/o replacement)
Independent trials (sampling w/ replacement)
22 Multiplication Rule
The probability that A is the outcome of the
first trial and B is the outcome of the second.
This multiplication of probabilities rule
extends to 3 or more trials to obtain the
probability of a particular sequence of results
on trials 1n.
23Example 1
- A 1st choice is comedy
- B 2nd choice is comedy.
the probability that the 1st choice is comedy and
the 2nd choice is comedy. The probability that
both (or all) are comedies.
24Example 1
The probability that the 1st choice is comedy and
the 2nd choice is comedy is 0.3429. The
probability that both are comedies is
0.3429. This means If the selection of 2 films
is repeated over and over again, in the very long
run, the relative frequency of times for which
both selections are comedies is 0.3429.
25Example 1
the probability that the 1st choice is not comedy
and the 2nd choice is not comedy. The probability
that neither are comedies (all are not comedy).
26Example 1
What is the probability at least one of the
choices is a comedy? When it comes to counts of
things at least one and none are
complements. Pr(at least one) Pr(none)
1 Pr(at least one) 1/7 1 Pr(at least one)
6/7 0.8571. one or more is another way to
say this.
27Example 1
Pr(2 comedy) 12/35 0.3429 Pr(0 comedy)
1/7 5/35 0.1429 Pr(1 comedy) ?? the
rest 1 12/35 5/35 18/35 1
0.3429 0.1429 0.5143
28Example 1
Pr(2 comedy) 12/35 0.3429 Pr(0 comedy)
1/7 5/35 0.1429 Pr(1 comedy) 18/35
0.5143
29Example 1
X Prob 0 0.1429 1 0.5143 2 0.3429
This is a probability distribution summarizing
the sample number of comedies.
Over all possible selections of 2 DVDs, the mean
is 1.2 comedies (out of 2). NOTE 1.2 out of 2
1.2 / 2 0.6 p !
30Example 4
- Daves takes 4 DVDs on vacation with him. What is
the probability all four are comedies? - Sampling without replacement (dependent trials).
31Example 4
- Daves DVD player randomly chooses movies.
Consider the next four Saturday nights, when Dave
always watches a film. What is the probability
all four are comedies? - Sampling with replacement (independent trials).
32Example 4
- With replacement 0.1296
- Without replacement 0.0923
33Example 5
- Change the collection to have 150 movies 90 are
comedies. (Larger population to sample from.) - Dave will still choose 4 DVDs. (Same sample size
n 4.) - The population still has p 90/150 0.60
comedies.
34Example 5
- Daves takes 4 DVDs on vacation with him. What is
the probability all four are comedies? - Sampling without replacement (dependent).
- Lots of computing.
35Example 5
- Daves DVD player randomly chooses movies.
Consider the next four Saturday nights, when Dave
always watches a film. What is the probability
all four are comedies? - Sampling with replacement (independent).
- Pretty quick/simple to compute.
36Examples 4 5 Choosing 4 DVDsThe probability
all 4 are comedies.
- Smaller (15) population
- Without replacement
- 0.0923 (tedious)
- With replacement
- 0.1296 (easy)
Larger (150) population Without
replacement 0.1261 (very tedious) With
replacement 0.1296 (easy)
Very similar.
These dont require knowing the population size
all you need to know is the proportion 0.6 of
comedies. Then 0.64 0.1296.
37In Practice
- Most statistical populations are huge, relative
to the size of the sample. Further, since - the precise size of many populations is unknown
(yet the proportion of Successes is reasonable to
know), and - knowing the population size leads to virtually
undoable monster calculations when sampling
without replacement, - we prefer to handle sampling without replacement
(dependent trials) by approximating probabilities
using sampling with replacement (independent
trials).
38The 20 Times Rule
- If you are sampling without replacement
(dependent trials) from a finite population and - then you may proceed as if trials are independent
(sampling with replacement). Probabilities
computed this way will be quite accurate
approximations. - This makes calculations much simpler.
- Also said like this The sample size is no more
than 5 of the population size.
39The 20 Times Rule
- Loosely speaking
- When sampling without replacement, if the
population is large relative to the sample, then
any particular choice of initial values into the
sample has a small and ignorable effect on the
mix of items that remain in the population to be
sampled as later values. Treating the situation
as sampling without replacement yields accurate
approximations to probabilities.
40Example 6 Acceptance Sampling
- Setting Grocery Store
- A shipment of 2000 oranges just arrived.
- 10 oranges are randomly sampled, without
replacement. - If all 10 oranges are good, the shipment is
accepted. - Otherwise, under an agreement with the grower,
the shipment is rejected and returned to the
grower. - Suppose that, unknown to the store 250 of the
oranges are bruised.
41Acceptance Sampling
- Since 250 of 2000 oranges are bruised, the
defective rate is p 250/2000 0.125. The good
rate is (1 p) 0.875. - What is the probability that the store accepts
such a shipment? (If all 10 oranges are good,
accept the shipment.) - ? 0.2622
42Acceptance Sampling
- In a real problem there would be no 2000
(although a 0.125 bruise rate could well apply)
and this computation couldnt even be started. - In a real problem the sample size would be much
larger than 10. Even if the population size were
known, the computation would be enormous.
43Acceptance Sampling
- Since
- (the population size is 200 times the sample
size), - probabilities may be approximated by assuming
sampling with replacement (independent trials). - (Compare to 0.2622).
- This makes calculations much simplereven if the
sample were much larger.
44Acceptance Sampling
- 0.2631 is the (approximate) probability that all
10 oranges are good. - What is the probability that 9 of the 10 are
good? - The probability that the 1st orange is bad and
the remaining 9 are good is - This is also the probability of getting a bruised
orange in any of the 10 positions (1st, 2nd, ,
10th) in the sampling order, and good oranges in
the other 9 positions.
45Acceptance Sampling
- 0.2631 is the (approximate) probability that all
10 oranges are good. - What is the probability that 9 of the 10 are
good? - The probability that exactly 1 orange (in any
position) is bad (with the remaining 9 are good)
is - 10 ? 0.037582 0.3758
- The probability there are at least 9 (9 or 10)
good oranges is - 0.2631 0.3758 0.6389
46Guessing the Fathers Toddler
47Guessing the Fathers Toddler
- Toddler
- 1 68 33.5
- 2 62 30.5
- 3 73 36.0
- If people are randomly guessing (from 1 2 3),
is such a result possible? YES - Is such a result at all likely? ????
48Guessing the Fathers Toddler
- Toddler
- 1 150 73.9
- 2 13 6.4
- 3 40 19.7
- If people are randomly guessing, is such a result
possible? YES - Is such a result at all likely? ????
49Guessing the Fathers Toddler
- Spring 08 Fall 08
- Toddler
- 1 90 80.2 60 66.7
- 2 6 5.2 7 7.8
- 3 17 14.7 23 25.6
- What happened over the summer?
- Is such a seasonal difference at all likely to
occur just due to randomness? ????
50Guessing the Fathers Toddler
- Toddler of Males of Females
- 1 60.7 79.3
- 2 9.8 3.4
- 3 29.5 17.2
- Do men and women have different abilities at this
task? - Is such a difference at all likely to occur just
due to randomness? ???? - How does one arrive at a single value expressing
the difference between men and women?