Title: 45-733: lecture 7 (chapter 6)
145-733 lecture 7 (chapter 6)
2Samples from populations
- There is some population we are interested in
- Families in the US
- Products coming off our assembly line
- Consumers in our products market segment
- Employees
3Samples from populations
- We are interested in some quantitative
information (called variables) about these
populations - Income of families in the US
- Defects in products coming off our assembly line
- Perception of consumers of our product
- Productivity of our employees
4Samples from populations
- All the information (accessible to statistics)
about a quantity in a population is contained in
its distribution function - Real-world distribution functions are complicated
things - In real life, we usually know little or nothing
about the distribution functions of the variables
we are interested in
5Samples from populations
- Because distribution functions are complex, we
only try to find out about certain aspects of
them (parameters) - Average income of families in the US
- Rate of defects coming off our production line
- of customers who view our product favorably
- Average pieces/hour finished by a worker
6Samples from populations
- Of course, we do not begin by knowing even these
quantities - One possibility is to measure the whole
population - Allows us to answer any question about the
distribution or parameters, using the techniques
of chapter 2 - However, this is almost always expensive and
often infeasible
7Samples from populations
- Instead, we take a sample
- Taking a sample
- We select only a few of the members of the
population - We measure the variables of interest for those
members we select - Examples
- Phone survey
- Take 1 out of each 10,000 units off our prod line
8Samples from populations
- The whole of statistics is figuring out what we
can learn about the population from a sample - What can we say about the distribution of a
variable from the information in a sample? - What can we say about the parameters we are
interested in from our sample? - How good is the information in our sample about
the population?
9Samples from populations
- Example
- We are interested in how favorably our product is
viewed by customers - We do a phone survey of our 5 good friends and
ask them if they view our product favorably or
unfavorably - All 5 say favorably
- What can we conclude?
10Samples from populations
- Example
- We are interested in how favorably our product is
viewed by customers - We do a phone survey of 500 people who have
purchased our product before and ask them if they
view our product favorably or unfavorably - 466 say they view our product favorably
- What can we conclude?
11Samples from populations
- Example
- We are interested in how favorably our product is
viewed by customers - We do a phone survey of 500 random adults and
ask them if they view our product favorably or
unfavorably - 351 say they view our product favorably
- What can we conclude?
12Samples and statistics
- As a practical matter, we are usually interested
in using our sample to say something about a
parameter of the distribution we care about - To get at this parameter, we construct a variable
called an estimator or statistic
13Samples and statistics
- Example
- If we want to know the average income of families
in the US, we draw a sample from a random phone
survey of 1000 families - We ask, among other things, for their family
income - To estimate E(I), we calculate the estimator or
statistic called sample mean
14Samples and statistics
- Example
- But, what does the sample mean of income tell us
about E(I)? - Answering this question is the subject of the
rest of the course, and of statistics in general
15Random sampling
- There are different ways to sample a population,
different sampling schemes - The simplest sampling scheme is called simple
random sampling or just random sampling - If there is a population of size N from which we
are to draw a sample of size n, random sampling
just says that the probability of any one of the
N members of the population being drawn is just
1/N, and that the draws are independent.
16Statistic or estimator
- A statistic (or estimator) is any function of a
sample - It is an algorithm which tells us what we would
do given a sample - Example
- Sample mean
- Sample variance
17Statistic as random variable
- A statistic is a random variable!!
- A statistic is a random variable!!
- A statistic is a random variable!!
- A statistic is a random variable!!
- A statistic is a random variable!!
- A statistic is a random variable!!
- A statistic is a random variable!!
- A statistic is a random variable!!
18Statistic as random variable
- A simple example
- Consider the Bernoulli random variable X with
parameter p - We are interested in p, the probability of a
success - To estimate p, we will calculate the sample mean
of X
19Statistic as random variable
- A simple example
- First, with a sample size of n1
20Statistic as random variable
- A simple example
- Next, with a sample size of n2
21Statistic as random variable
- A simple example
- Next, with a sample size of n3
22Statistic as random variable
- The statistic is a random variable
- It has a distribution
- Probability function or density
- Cumulative distribution function
- It has an expectation
- It has a variance / standard deviation
23Statistic as random variable
- For the Bernoulli example
- Expectation, variance with n1
24Statistic as random variable
- For the Bernoulli example
- Expectation, variance with n2
25Statistic as random variable
- For the Bernoulli example
- Expectation, variance with n3
26Statistic as random variable
- For the Bernoulli example
- Probability function, n1
p
1-p
0
p
1
27Statistic as random variable
- For the Bernoulli example
- Probability function, n2
0
p
1
1/2
28Statistic as random variable
- For the Bernoulli example
- Probability function, n3
0
1
2/3
1/3
p
29Sample mean
- As we have discussed before, the sample mean of a
random variable X from a sample of size n is
30Sample mean
- The sample mean is a random variable!!
- Sample mean is made out of n random variables
therefore, it is a random variable
31Sample mean
- Lets suppose X is a random variable with mean ?X
and standard deviation ?X, and lets consider the
sample mean
32Sample mean
- Since the sample mean is a random variable, we
can ask about its expectation
33Sample mean
- Since the sample mean is a random variable, we
can ask about its expectation
34Sample mean
- The expectation of the sample mean is equal to
the expectation of the underlying random variable - On average, the sample mean is equal to the
underlying random variable
35Sample mean
- We can also ask about the variance of the sample
mean
36Sample mean
- If it is an independent, random sample then the
covariances are all zero
37Sample mean
- The variance of the sample mean is less than the
variance of the underlying random variable - The variance of the sample mean gets smaller as
the sample size increases - The variance of the sample mean goes to zero as
the sample size goes to infinity
38Sample mean
39Sample mean
- Say that
- On average, the sample mean is equal to the mean
of the underlying random variable, regardless of
sample size - As the sample size grows, the variance of the
sample mean shrinks, eventually approaching zero
40Sample mean
- What would happen if the sample size got to
infinity? - Then the sample mean would no longer be a random
variable, it would literally equal the population
mean, E(X)
41Sample mean
n100
n1
42Sample mean
n1000
n100
n1
43Sample mean
- Finite sample correction
- What has gone before has assumed either that you
sample with replacement or that the population
you are sampling from is very large (infinite) - Just as we needed to use hypergeometric rather
than binomial when sampling from a small pop
without replacement, so here
44Sample mean
- Finite sample correction
- For a population of size N, sampled without
replacement by a sample of size n
45Sample mean
- Normal variables and
- If X is normal, then so is X-bar
- If X is normal, then
46Sample mean
- Central limit theorem and
- As long as X comes from an independent random
sample
47Sample proportion
- Consider W a Bernoulli and an independent random
sample of size n - Observe that X W1 W2 Wn is distributed
Binomial (and therefore approx normal)
48Sample proportion
- The sample mean (I.e. sample proportion) is
- Just a binomial divided by n
- Also approx normal
49Sample proportion
- To emphasize that we are estimating the p
parameter of the Bernoulli, we may write
50Sample proportion
- Just as before, the sample mean has the same
expectation as the underlying Bernoulli random
variable
51Sample proportion
- Just as before, the sample mean has the variance
of the underlying Bernoulli random variable over
n
52Sample proportion
- Just as before, if there is a finite population
sampled w/o replacement
53Sample variance
- As we have discussed before, the sample variance
and sample standard deviation are given by
54Sample variance
- Sometimes these are written
55Sample variance
56Sample variance
57Sample variance
- It turns out that, if X is distributed normal
58Sample variance
- It turns out that (by the CLT), if X is from an
independent random sample
59Sample variance
- Discuss Chi-Squared distribution
60Sample variance
- Example (problem 46, page 251)
- A drug company manufactures pills
- These pills have normally distributed weight
- The drug co wants the variance of weight to be
smaller than 1.5 milligrams squared - Drug co collects a sample of size 20
- The sample variance is 2.05
- How likely is it that a sample variance this
high or higher would be found if the true
variance is 1.5?