Title: Probability and Sampling Distributions
1Chapter 4
- Probability and Sampling Distributions
2Random Variable
- Definition A random variable is a variable whose
value is a numerical outcome of a random
phenomenon. - The statistic calculated from a randomly chosen
sample is an example of a random variable. - We dont know the exact outcome beforehand.
- A statistic from a random sample will take
different values if we take more samples from the
same population.
3Section 4.4
- The Sampling Distribution of a Sample Mean
4Introduction
- A statistic from a random sample will take
different values if we take more samples from the
same population - The values of a statistic do no vary haphazardly
from sample to sample but have a regular pattern
in many samples - We already saw the sampling distribution
- Were going to discuss an important sampling
distribution. The sampling distribution of the
sample mean, x-bar( )
5Example
- Suppose that we are interested in the workout
times of ISU students at the Recreation center. - Lets assume that µ is the average workout time
of all ISU students - To estimate µ lets take a simple random sample of
100 students at ISU - We will record each students work out time (x)
- Then we find the average workout time for the 100
students - The population mean µ is the parameter of
interest. - The sample mean, , is the statistic (which is
a random variable). - Use to estimate µ (This seems like a sensible
thing to do).
6Example
- A SRS should be a fairly good representation of
the population so the x-bar should be somewhere
near the ?. - x-bar from a SRS is an unbiased estimate of ? due
to the randomization - We dont expect x-bar to be exactly equal to ?
- There is variability in x-bar from sample to
sample - If we take another simple random sample (SRS) of
100 students, then the x-bar will probably be
different. - Why, then, can I use the results of one sample to
estimate ??
7Statistical Estimation
- If x-bar is rarely exactly right and varies from
sample to sample, why is x-bar a reasonable
estimate of the population mean ?? - Answer if we keep on taking larger and larger
samples, the statistic x-bar is guaranteed to get
closer and closer to the parameter ? - We have the comfort of knowing that if we can
afford to keep on measuring more subjects,
eventually we will estimate the mean amount of
workout time for ISU students very accurately
8The Law of Large Numbers
- Law of Large Numbers (LLN)
- Draw independent observations at random from any
population with finite mean ? - As the number of observations drawn increases,
the mean x-bar of the observed values gets closer
and closer to the mean ? of the population - If n is the sample size as n gets large
- The Law of Large Numbers holds for any
population, not just for special classes such as
Normal distributions
9Example
- Suppose we have a bowl with 21 small pieces of
paper inside. Each paper is labeled with a number
0-20. We will draw several random samples out of
the bowl of size n and record the sample means,
x-bar for each sample. - What is the population?
- Since we know the values for each individual in
the population (i.e. for each paper in the bowl),
we can actually calculate the value of µ, the
true population mean. µ 10 - Draw a random sample of size n 1.
- Calculate x-bar for this sample.
10Example
- Draw a second random sample of size n 5.
Calculate for this sample. - Draw a third random sample of size n 10.
Calculate for this sample. - Draw a fourth random sample of size n 15.
Calculate for this sample. - Draw a fifth random sample of size n 20.
Calculate for this sample. - What can we conclude about the value of as
the sample size increases? - THIS IS CALLED THE LAW OF LARGE NUMBERS.
11Another Example
- Example Suppose we know that the average height
of all high school students in Iowa is 5.70
feet. - We get SRSs from the population and calculate
the height.
Mean of first n observations
12Example 4.21 From Book
- Sulfur compounds such as dimethyl sulfide (DMS)
are sometimes present in wine - DMS causes off-odors in wine, so winemakers
want to know the odor threshold - What is the lowest concentration of DMS that the
human nose can detect - Different people have different thresholds, so we
start by asking about the mean threshold ? in the
population of all adults - ? is a parameter that describes this population
13Example 4.21 From Text
- To estimate ?, we present tasters with both
natural wine and the same wine spiked with DMS at
different concentrations to find the lowest
concentration at which they can identify the
spiked wine - The odor thresholds for 10 randomly chosen
subjects (in micrograms/liter) - 28 40 28 33 20 31 29 27 17 21
- The mean threshold for these subjects is 27.4
- x-bar is a statistic calculated from this sample
- A statistic, such as the mean of a random sample
of 10 adults, is a random variable.
14Example
- Suppose ? 25 is the true value of the parameter
we seek to estimate - The first subject had threshold 28 so the line
starts there - The second point is the mean of the first two
subjects - This process continues many many times, and our
line begins to settle around ? 25
15Example 4.21From Book
The law of large numbers in action as we take
more observations, the sample mean always
approaches the mean of the population
16The Law of Large Numbers
- The law of large numbers is the foundation of
business enterprises such as casinos and
insurance companies - The winnings (or losses) of a gambler on a few
plays are uncertain -- thats why gambling is
exciting(?) - But, the house plays tens of thousands of times
- So the house, unlike individual gamblers, can
count on the long-run regularity described by the
Law of Large Numbers - The average winnings of the house on tens of
thousands of plays will be very close to the mean
of the distribution of winnings - Hence, the LLN guarantees the house a profit!
17Thinking about the Law of Large Numbers
- The Law of Large Numbers says broadly that the
average results of many independent observations
are stable and predictable - A grocery store deciding how many gallons of milk
to stock and a fast-food restaurant deciding how
many beef patties to prepare can predict demand
even though their customers make independent
decisions - The Law of Large Numbers says that the many
individual decisions will produce a stable result
18The Law of Small Numbers or Averages
- The Law of Large Numbers describes the regular
behavior of chance phenomena in the long run - Many people believe in an incorrect law of small
numbers - We falsely expect even short sequences of random
events to show the kind of average behaviors that
in fact appears only in the long run
19The Law of Small Numbers or Averages
- Example Pretend you have an average free throw
success rate of 70. One day on the free throw
line, you miss 8 shots in a row. Should you hit
the next shot by the mythical law of averages. - No. The law of large numbers tells us that the
long run average will be close to 70. Missing 8
shots in a row simply means you are having a bad
day. 8 shots is hardly the long run.
Furthermore, the law of large numbers says
nothing about the next event. It only tells us
what will happen if we keep track of the long run
average.
20The Hot Hand Debate
- In some sports If player makes several
consecutive good plays, like a few good golf
shots in a row, often they claim to have the hot
hand, which generally implies that their next
shot is likely to a good one. - There have been studies that suggests that runs
of golf shots good or bad are no more frequent in
golf than would be expected if each shot were
independent of the players previous shots - Players perform consistently, not in streaks
- Our perception of hot or cold streaks simply
shows that we dont perceive random behavior very
well!
21The Gambling Hot Hand
- Gamblers often follow the hot-hand theory,
betting that a lucky run will continue - At other times, however, they draw the opposite
conclusion when confronted with a run of outcomes - If a coin gives 10 straight heads, some gamblers
feel that it must now produce some extra tails to
get back into the average of half heads and half
tails - Not true! If the next 10,000 tosses give about
50 tails, those 10 straight heads will be
swamped by the later thousands of heads and
tails. - No short run compensation is needed to get back
to the average in the long run.
22Need for Law of Large Numbers
- Our inability to accurately distinguish random
behavior from systematic influences points out
the need for statistical inference to supplement
exploratory analysis of data - Probability calculations can help verify that
what we see in the data is more than a random
pattern
23How Large is a Large Number?
- The Law of Large Numbers says that the actual
mean outcome of many trials gets close to the
distribution mean ? as more trials are made - It doesnt say how many trials are needed to
guarantee a mean outcome close to ? - That depends on the variability of the random
outcomes - The more variable the outcomes, the more trials
are needed to ensure that the mean outcome x-bar
is close to the distribution ?
24More Laws of Large Numbers
- The Law of Large Numbers is one of the central
facts about probability - LLN explains why gambling, casinos, and insurance
companies make money - LLN assures us that statistical estimation will
be accurate if we can afford enough observations - The basic Law of Large Numbers applies to
independent observations that all have the same
distribution - Mathematicians have extended the law to many more
general settings
25What if Observations are not Independent
- You are in charge of a process that manufactures
video screens for computer monitors - Your equipment measures the tension on the metal
mesh that lies behind each screen and is critical
to its image quality - You want to estimate the mean tension ? for the
process by the average x-bar of the measurements - The tension measurements are not independent
26AYK 4.82
- Use the Law of Large Numbers applet on the text
book website
27Sampling Distributions
- The Law of Large Numbers assures us that if we
measure enough subjects, the statistic x-bar will
eventually get very close to the unknown
parameter ?
28Sampling Distributions
- What if we dont have a large sample?
- Take a large number of samples of the same size
from the same population - Calculate the sample mean for each sample
- Make a histogram of the sample means
- the histogram of values of the statistic
approximates the sampling distribution that we
would see if we kept on sampling forever
29- The idea of a sampling distribution is the
foundation of statistical inference - The laws of probability can tell us about
sampling distributions without the need to
actually choose or simulate a large number of
samples
30Mean and Standard Deviation of aSample Mean
- Suppose that x-bar is the mean of a SRS of size n
drawn from a large population with mean ? and
standard deviation ? - The mean of the sampling distribution of x-bar is
? and its standard deviation is - Notice averages are less variable than
individual observations!
31Mean and Standard Deviation of aSample Mean
- The mean of the statistic x-bar is always the
same as the mean ? of the population - the sampling distribution of x-bar is centered at
? - in repeated sampling, x-bar will sometimes fall
above the true value of the parameter ? and
sometimes below, but there is no systematic
tendency to overestimate or underestimate the
parameter - because the mean of x-bar is equal to ?, we say
that the statistic x-bar is an unbiased estimator
of the parameter ?
32Mean and Standard Deviation of aSample Mean
- An unbiased estimator is correct on the average
in many samples - how close the estimator falls to the parameter in
most samples is determined by the spread of the
sampling distribution - if individual observations have standard
deviation ?, then sample means x-bar from samples
of size n have standard deviation - Again, notice that averages are less variable
than individual observations
33Mean and Standard Deviation of aSample Mean
- Not only is the standard deviation of the
distribution of x-bar smaller than the standard
deviation of individual observations, but it gets
smaller as we take larger samples - The results of large samples are less variable
than the results of small samples - Remember, we divided by the square root of n
34Mean and Standard Deviation of aSample Mean
- If n is large, the standard deviation of x-bar is
small and almost all samples will give values of
x-bar that lie very close to the true parameter ? - The sample mean from a large sample can be
trusted to estimate the population mean
accurately - Notice, that the standard deviation of the sample
distribution gets smaller only at the rate - To cut the standard deviation of x-bar in half,
we must take four times as many observations, not
just twice as many (square root of 4 is 2)
35Example
- Suppose we take samples of size 15 from a
distribution with mean 25 and standard deviation
7 - the distribution of x-bar is
- the mean of x-bar is
- 25
- the standard deviation of x-bar is
- 1.80739
36What About Shape?
- We have described the center and spread of the
sampling distribution of a sample mean x-bar, but
not its shape - The shape of the distribution of x-bar depends on
the shape of the population distribution
37Sampling Distribution of a Sample Mean
- If a population has the N(?, ?) distribution,
then the sample mean x-bar of n independent
observations has the - distribution
38Example
- Adults differ in the smallest amount of dimethyl
sulfide they can detect in wine - Extensive studies have found that the DMS odor
threshold of adults follows roughly a Normal
distribution with mean ? 25 micrograms per
liter and standard deviation ? 7 micrograms per
liter
39Example
- Because the population distribution is Normal,
the sampling distribution of x-bar is also Normal - If n 10, what is the distribution of x-bar?
40What if the Population Distribution is not
Normal?
- As the sample size increases, the distribution of
x-bar changes shape - The distribution looks less like that of the
population and more like a Normal distribution - When the sample is large enough, the distribution
of x-bar is very close to Normal - This result is true no matter what shape of the
population distribution as long as the population
has a finite standard deviation ?
41Central Limit Theorem
- Draw a SRS of size n from any population with
mean ? and finite standard deviation ? - When n is large, the sampling distribution of the
sample mean x-bar is approximately Normal - x-bar is approximately
42Central Limit Theorem
- More general versions of the central limit
theorem say that the distribution of a sum or
average of many small random quantities is close
to Normal - The central limit theorem suggests why the Normal
distributions are common models for observed data
43How Large a Sample is Needed?
- Sample Size depends on whether the population
distribution is close to Normal - We require more observations if the shape of the
population distribution is far from Normal
44Example
- The time X that a technician requires to perform
preventive maintenance on an air-conditioning
unit is governed by the Exponential distribution
(figure 4.17 (a)) with mean time ? 1 hour and
standard deviation ? 1 hour - Your company operates 70 of these units
- The distribution of the mean time your company
spends on preventative maintenance is
45Example
- What is the probability that your companys units
average maintenance time exceeds 50 minutes? - 50/60 0.83 hour
- So we want to know P(x-bar gt 0.83)
- Use Normal distribution calculations we learned
in Chapter 2!
464.86 ACT scores
- The scores of students on the ACT college
entrance examination in a recent year had the
Normal distribution with mean µ 18.6 and
standard deviation s 5.9
474.86 ACT scores
- What is the probability that a single student
randomly chosen from all those taking the test
scores 21 or higher?
484.86 ACT scores
- About 34 of students (from this population)
scored a 21 or higher on the ACT - The probability that a single student randomly
chosen from this population would have a score of
21 or higher is 0.34
494.86 ACT scores
- Now take a SRS of 50 students who took the test.
What are the mean and standard deviation of the
sample mean score x-bar of these 50 students? - Mean 18.6 same as µ
- Standard Deviation 0.8344 sigma/sqrt(50)
504.86 ACT scores
- What is the probability that the mean score x-bar
of these students is 21 or higher?
514.86 ACT scores
- About 0.2 of all random samples of size 50
(from this population) would have a mean score
x-bar of 21 or higher. - The probability of having a mean score x-bar of
21 or higher from a sample of 50 students (from
this population) is 0.002.
52Section 4.4 Summary
- When we want information about the population
mean µ for some variable, we often take a SRS and
use the sample mean x-bar to estimate the unknown
parameter µ.
53Section 4.4 Summary
- The Law of Large Numbers states that the actually
observed mean outcome x-bar must approach the
mean µ of the population as the number of
observations increases.
54Section 4.4 Summary
- The sampling distribution of x-bar describes how
the statistic x-bar varies in all possible
samples of the same size from the same population.
55Section 4.4 Summary
- The mean of the sampling distribution is µ, so
that x-bar is an unbiased estimator of µ.
56Section 4.4 Summary
- The standard deviation of the sampling
distribution of x-bar is sigma over the square
root of n for a SRS of size n if the population
has standard deviation sigma. That is, averages
are less variable than individual observations.
57Section 4.4 Summary
- If the population has a Normal distribution, so
does x-bar.
58Section 4.4 Summary
- The Central Limit Theorem states that for large n
the sampling distribution of x-bar is
approximately Normal for any population with
finite standard deviation sigma. That is,
averages are more Normal than individual
observations. We can use the fact that x-bar has
a known Normal distribution to calculate
approximate probabilities for events involving
x-bar.