Title: 730 Lecture 20
1730 Lecture 20
Todays lecture
Computer Intensive Methods
2Sampling distributions
Population distribution F
Sampling distribution Fs
3The basic idea (iid case)
F S Fs
Population distribution statistic equals
sampling distribution
4Examples
5Sampling distributions the alternatives
- Derived from theory (if we can), or
- Can use simulation! Eg
- Simulate X1,,Xn from Expo(q)
- Compute s1sample mean
- Repeat N10,000 times
- Get sample s1,sN from sampling distribution
- Display graphically, calculate std dev etc
6R code
nlt-10 Nlt-10000 thetalt-5 slt-numeric(N) for(i in
1N) generate sample from population
distribution xlt-rgamma(n,1)theta Calculate
statistic silt-mean(x) sqrt(var(s)) 1
1.560461 (Correct value is 5/sqrt(10) 1.5811)
7R code- graphs
par(mfrowc(1,2)) hist(s,breaks50) alphaslt-((1N
)-0.5)/N gamma.quantileslt-thetaqgamma(alphas,n)/n
plot(sort(s),gamma.quantiles,xlab"order
statistics") abline(0,1)
8Graphs
9The big problem.
- In practice we dont know F!
- What can we do?
- Estimate F!
- How?
10Estimating F
- Two methods
- Non-parametric use Empirical distribution
function - Parametric assume form of F is known but F
depends on unknown parameters.
11Method 1Non-parametric method
- Estimate F by EDF Fn(x)
- Fn (x)proportion of sample that is x
- MaxxFn (x) -F(x) 0 in prob
- Ön(Fn (x) -F(x) ) N(0, F(x)(1-F(x) )
12Empirical distribution function (cont)
EDF jumps up 1/n at each data value eg for n3
13EDF example
EDF of a N(0,1) sample of 50
14Sampling from the EDF
- The EDF of a sample x1,,xn is the df of a
discrete distribution that has probability mass
1/n at each data point of the sample. - Thus, to draw a sample of size N from this
distribution we draw a random sample of size N
with replacement from x1,,xn
15Method 2 the parametric method
- If we assume that the df of the population is
F(x,q) where F is known but q is not, estimate F
by is an estimate of q.
16The bootstrap
- To estimate the standard error of a statistic S
- Estimate the population df.
- Draw a random sample of size n from the estimated
F and calculate S from the sample. - Repeat N times, get s1,,sN
- Calculate the std dev of the N values s1,,sN
17The bootstrap (cont)
18Example
- Suppose we want to estimate the standard error of
the sample variance. The population distribution
is exponential and n10.
19R code
nlt-10 Nlt-1000 thetalt-5 theta is the true
value generate a sample xlt-rgamma(n,1)theta
now do non-parametric bootstrap (use
EDF) slt-numeric(N) for(i in 1N) bootstrap.sample
lt-sample(x,n,replaceT) silt-var(bootstrap.sample
) sqrt(var(s)) 1 22.19461
20R code (cont)
now do parametric bootstrap (use
exponential with estimated mean) xbarlt-mean(x) slt-
numeric(N) for(i in 1N) bootstrap.samplelt-rgamm
a(n,1)xbar silt-var(bootstrap.sample) sqrt(va
r(s)) 1 35.26925
21Theory
Using tedious algebra, one can show that
For the exponential, m49q4, m2q2. Thus
22Results
- For q5, n10, the exact variance is
25xsqrt(74/90) 22.66912 - The nonparametric bootstrap did very well
(22.19461) - The parametric bootstrap was not very good
(35.26925) - Any ideas why?