Title: Sampling
1Sampling
2Rules for Expectation
- Examples
- Mean E(X) ?xp(x)
- Variance E(X-?)2 ?(x- ?)2p(x)
- Covariance E(X-?x)(Y-?y)
- ?(x- ?x)(y- ?y)p(x,y)
3Rules for Expectation
- E(X Y) E(X) E(Y)
- E(aX bY) aE(X) bE(Y)
- E(R) where R10XY
- E(10 X Y) 10 E(X) E(Y)
- Eg(X,Y) ?x ?yg(x,y)p(x,y)
4Sampling
- What can we expect of a random sample drawn from
a known population? - Can we generalize findings from our random sample
to the population? - This is the heart of inferential statistics.
5Definitions
- Population The total collection of objects to be
studied. - Each individual observation in a random sample
has the population probability distribution p(x).
See Table 6-1, p.190 - Random Sample A sample in which each individual
has an equal chance of being selected.
6Definitions (continued)
- The sample mean is not as extreme (doesnt vary
so widely) as the individual values in the sample
because it represents an average. - In other words, extreme observations are diluted
by more typical observations. See Figure 6-2. - A sample is representative if it has the same
characteristics as the population random samples
are much more likely to be representative.
7Sampling with or without replacement
- In large samples, these are practically
equivalent. - A very simple random sample (VSRS) is a sample
whose n observations X1, X2, Xn are independent.
The distribution of each X is the population
distribution p(x), that is - p(x1)p(x2)p(xn)
8Small Samples
- The exception to this rule occurs in small
samples, where sampling without replacement
significantly changes the probability of other X
values (see page 216). - Example calculating the probability of various
poker hands
9How Reliable is the Sample?
- Suppose we calculate the sample mean (M), and we
want to know how close it comes to ?, the
population mean. - Imagine collecting many different samples,
getting a sample mean for each sample. We could
build the sampling distribution of M, denoted
p(M). - Example everyone flip a coin 10 times and tell
me how many heads you flipped.
10How Reliable is the Sample?
- Rather than actually sampling, we can simulate
this sampling on a computer, which is called
Monte Carlo sampling (or simulation). - We can also derive mathematical formulas for the
sampling distribution of M.
11Moments of the Sample Mean
- Recall our objective is to estimate a population
mean, ?. If we take a random sample of
observations from the population and calculate
the sample mean, how good will M be as an
estimator of its target, ?? - We start with the definition of the sample mean
M 1/n(X1 X2 Xn)
12Moments of the Sample Mean
- We start by calculating the expectation of the
sample mean - E(M) 1/nE(X1) E(X2) E(Xn)
- Remember that each observation X has the
population distribution p(x) with mean ?. Thus
E(X1) E(X2) ? - E(M) 1/n? ? ?
- 1/nn ? ?
13Moments of the Sample Mean
- We can see that E(M) ?
- On average, the sample mean will be on target,
that is, equal to ?. - Of course, an individual sample mean is likely to
be a little above or below its target (think of
the coin flips we did). - The key question is how much above or below? We
must find the variance of M.
14Moments of the Sample Mean
- Var (M) 1/n2var(X1) var(X2)
- var(Xn)
- Each observation X has the population
distribution p(x) with variance ?2, so - Var (M) 1/n2?2 ?2 ?2
- 1/n2n ?2 ?2/n
- Standard deviation (M) ?/?n
15Standard error
- This typical deviation of M from its target, ?,
represents the estimation error, and is commonly
called the standard error. - What happens as n increases?
- The standard error decreases, thus the larger the
sample, the more accurately M estimates ?!
16The shape of the sampling distribution
- Figure 6-3 shows 3 different parent population
distributions. We see that as n increases, the
sampling distribution has an approximately normal
shape. - Central Limit Theorem In random samples of size
n, M fluctuates around ? with a standard error of
?/?n. Thus as n increases, the sampling
distribution of M concentrates more and more
around ? and becomes normal (bell-shaped).
17Normal approximation rule
- If we know the normal approximation rule, or
Central Limit Theorem, we can look at the
probability of particular values (or ranges) of M
using the standard normal table. - Example Suppose a population of men on a large
southern campus has a mean height of ?69 inches
with a standard deviation ? 3.22 inches.
18Normal approximation rule
- If a random sample of n 10 men is drawn, what
is the chance that the sample mean M will be
within 2 inches of the population mean? - E(M) ? 69
- SE ?/?n 3.22/ ?10 1.02
- We want to find the probability that M is within
2 inches, or between 67 and 71.
19Normal approximation rule
- Z M - ? M - ?
- SE ?/?n
- Z 71 69 1.96
- 1.02
- Thus a sample mean of 71 is nearly 2 standard
errors about its expected value of 69. - P(Z gt 1.96) .025, likewise P(Z lt -1.96) .025
20Normal approximation rule
- Probability (67 lt M lt 71)
- 1 .025 - .025 .95
- We can conclude that there is a 95 chance that
the sample mean will be within 2 inches of the
population mean. - Note that there are 2 formulas for Z-scores, one
for individual values of X, and one for sample
means, M.
21Another Example
- Suppose a large statistics class has marks
normally distributed with ?72, and ?9. - What is the probability that an individual
student drawn at random will have a mark over 80? - Here we are comparing a single students score to
the distribution of scores.
22Another Example
- Z X ? 80 72 .89
- ? 9
- Pr(Z gt .89) .187
- What is the probability that a random sample of
10 students will have a sample mean over 80? - In this case, we are comparing the sample mean to
all possible sample means, the sampling
distribution.
23Another Example
- Z M - ? 80 72 2.81
- ?/?n 9/?10
- Pr(Z gt 2.81) .002
- This sample mean is very unlikely. This shows
that taking averages tends to reduce the
extremes.
24Proportions
- We often express our data as proportions, such as
the proportion of heads in a sample of 10 coin
flips. - Normal Approximation Rule for Proportions In
random samples of size n, the sample proportion P
fluctuates around the population proportion ?
with a standard error of ??(1- ?)/n
25Proportions
- We can see again that as n increases, our sample
proportion gets closer to the population
proportion. - Example A population of voters has 60
Republicans and 40 Democrats. What is the
chance that a sample of 100 will produce a
minority of Republicans (less than 50)?
26Proportion Example
- Z P - ? P - ?
- SE ??(1- ?)/n
- Z .5 - .6 -2.00
- ?.6(1- .6)/100
- Pr(Z lt -2.00) .023 or 2
27Normal Approximation to the Binomial
- Of your first 10 grandchildren, what is the
chance there will be more than 7 boys? - This is the same as the proportion of boys is
more than 7/10. - We could use the binomial to solve this problem.
- Assume p(boy) .5
28Normal Approximation to the Binomial
- P(S gt 7) P(S8) P(S9) P(S10)
- You could calculate this or just use the
cumulative binomial table on pages 670-671. - P(S gt 7) .044 .010 .001 .055
- We can also use what we know about proportions
and that they will approximate the normal
distribution to solve this problem.
29Normal Approximation to the Binomial
- We want to know the probability of getting more
than 7 boys. We must calculate this as p7/10
because we are dealing with a continuous
distribution (normal), so everything between 7
and 8 must be included. - Z P - ? .7 - .5 1.26
- ??(1- ?)/n ?.5(1-.5)/10
- Pr(Z gt 1.26) .104
30Normal Approximation to the Binomial
- Obviously, this involves some error. We can
correct with the continuity correction, where we
take the half way point between 7 and 8. - Z P - ? .75 - .5 1.58
- ??(1- ?)/n ?.5(1-.5)/10
- Pr(Z gt 1.58) .057
31Normal Approximation to the Binomial
- Note that this is very close to our estimate
calculated from the binomial distribution, .055!
32Monte Carlo Simulations
- A computer program that repeats sampling and
constructs a sampling distribution. - This approach is particularly useful for
providing sampling distributions that cannot be
derived easily theoretically.