Title: Biostatistics and Computer Applications
1Biostatistics and Computer Applications
Sampling mean Difference between means Students
t distribution Chi-square distribution F
distribution SAS programming 12/30/02
2Recap (Probability and variable distribution)
- Event and probability P(E)n/N (n is large)
- Additive rule and multiplication rule
- Binomial distribution
- Poisson distribution
- Normal distribution
3Binomial Distribution
Poisson Distribution
4Normal Distribution
5Standard Normal Distribution
In general, 68.26 of data within ?1? of ?,
95.45 of data within ?2? of ?, 99.73
of data within ?3? of ?, 95 ?
?1.96? 99 ? ?2.58?
0.025
0.025
6Distribution of the Sample Mean
Parameters are constant Statistics are
variable This is a critically important
concept!!!
Population N (?, )
sample of size n
sample of size n
n
n
n
Mathematical derive or Sampling
?
7Random Samples
- Infer the characteristics of a population from a
representative sample - Usual strategy is through a random sample
- Random sample
- Each object has a equal probability of being
selected as a random number - Two methods
- Exhaustive (all possible) sampling from a small
population (Nn) (Lecture). - Large number of sampling from a infinite
population (SAS program).
8Sampling Distributions
- Definition
- Distribution of all possible values that can be
assumed by some statistic, computed from samples
of the same size randomly drawn from the same
population, called the sampling distribution of
that statistic - Statistic Mean, difference between two means,
variance etc.
9Sampling Distributions
- Construction
- From a finite population of size N, randomly draw
all possible samples of size n (total number of
sample Nn). - Compute the statistic of interest for each sample
(mean, standard deviation) - Investigate the distribution of that statistic
and relate it to the population parameter.
10Sampling Distribution of mean
- Example An approximately normal distributed
population with N4, Xi2,3,3,4 - n2, independent random sampling,
- Total sample is Nn42 16.
11Sampling Distribution of mean
Mean of sampling statistics
Unbiased estimate
12Sampling Distribution of mean
Frequency distribution of sample means
If we draw n4, then total sampling is 44 256
13Distribution of Sample Mean
- 1. Mean of sampling distribution of mean has same
mean as original population -
- 2. Variance of sampling distribution of mean is
the population variance divided by sample size - and
- called standard error of mean. A measure of
sampling error of x-bar, and decrease as n
increase.
14Distribution of Sample Mean
- 3. If , then
- 4. Given a population of unknown distribution or
nonnormal distribution with a mean and
variance , the sampling distribution of
will be approximately normally distributed
with a mean of and variance of
when n is large (Central limit theorem).
15Implications of Central Limit Theorem
- Assured of approximately normal sampling
distribution - Normally distributed population
- Nonnormal population with large sample size
- Population with unknown functional form with
large sample size - How large is large for sample size?
- at least 30, but as many as practical (20 may be
ok if not much skewed)
16Application of sample distribution
- Example 1. Suppose we have a normal population
with mean12.5 year and variance 0.78 year2.
If we draw a random sample with n10 and, whats
the probability the sample mean is large than
14.7 year?
17Application of sample distribution
- Example 2. Cotton, fiber length (mm),
. Whats the
probability if we draw an sample with n4, the
difference between sample mean and population
mean is larger than 0.5 mm? What if n25? - If n25,
18Distribution of the Sample Mean
Another example Suppose that for Oklahoma sixth
grade students the mean number of missed school
days is 5.4 days with a standard deviation of 2.8
days. What is the probability that a random
sample of size 49 will have a mean number of
missed days greater than 6 days? Find the
probability that a random sample of size 49 from
this population will have a mean greater than 6
days. ? 5.4 days ? 2.8 days n 49
19Distribution of the Sample Mean
Sampling distribution of (for samples of
size 49)
Population distribution of individual Xs
0.0668
What is the probability that a random sample
(size 49) from this population has a mean between
4 and 6 days? Check that .
20Sampling Distributions of
n2
21Sampling Distributions of
22Distribution of
- 1.
- 2.
- and
- called standard error of the difference
between two means.
23Distribution of
24Summary
25Students t-distribution
What happen if s is unknown? is normally
distributed, is not (quite)!
W.S. Gosset worked for Guinness Brewing in
Dublin, IR. He was forced to publish under the
pseudonym Student. In 1908 he derived the
distribution of which is now known as
Students t-distribution.
26Normal distribution and t distributions
When ? is unknown we replace it with the
estimate, s. The statistic has a t-distribution
with n-1 degrees of freedom.
27Students t Distribution
Cumulative function F(t) Right tail
probability Two tails probability (t
symmetrical distribution) Students
t-distribution table (two sides, selected t for
certain probabilities 0.05, 0.01)
28Students t distribution table
29Distribution of sample variance
- Same method for sample mean
- N4, Xi2,3,3,4 n2 and n4 sampling
30Distribution of sample variance
- Left skewed distribution, not z or t distribution
- Central Limit Theorem For a normal distributed
population, when ngt100, the distribution of
sample standard deviation s is approximately
normal distributed with a mean and standard
deviation of s - Standard error of sample standard deviation
31Chi-square distribution
- Definition (population)
-
- Sample
32Chi-square distribution
- Probability density function
- Cumulative function
- Right side (tail) probability
2
33Properties of the Chi-Square Distribution
- 1.     The distribution is not symmetric,
(as the df increases, the distribution becomes
more symmetric) - 2.     All values are gt 0 (positive)Â
- 3.     The distribution is different for
each df (df n-1). As df (dfgt30) increases, the
distribution approaches a Normal distribution.
34Example
- A normal distributed population with variance 2,
please calculate if n5, the probability that
and the probability that -
35Chi-square table
36F distribution
- Draw two independent random samples from a normal
distribution with variance - Definition
-
37F distribution
- Probability density function
- Cumulative function
- Right tail probability
38F distribution table
- Only F values for P0.95, 0.99 etc.
- Two degree of freedom (df1, df2)
- Right side (tail) probability, arranged for
P(FgtFi)0.05 or 0.01, denote as F0.05 and F0.01.
39F distribution table
40F distribution table
41SAS programming
- Distribution of the sample mean
- Distribution of the difference of two means
- Students t distribution
- Chi-square distribution
- F distribution
42SAS functions for sampling distribution
- Useage xfunction_name(arguments)
- Probability and density function
- PROBT(x,df lt,ncgt) probability of t
distribution - PROBCHI(x,df) Chi-square
- PROBF(x,ndf,ddflt,ncgt) F distribution
- Quantile function
- CINV(p,dflt,ncgt)
- FINV(p,ndf,ddflt,ncgt)
- TINV(p,dflt,ncgt)
- PROBIT(p)
43Sampling distribution of mean
- Data sample_mean
- miu100
- sigma10
- do k1 to 1000
- do i1 to 100 by .5
- zrannor(0)
- xmiuzsigma
- output
- end
- end
- proc sort
- by k
- Proc means noprint
- var x
- by k
- output outmean_sample meansample_mean
- run
- proc print datamean_sample
- run
- proc univariate datamean_sample
- var sample_mean
- run
44Sampling distribution of difference between two
means (1/3)
- / This program draw two samples from two normal
distributions and shows the distribution of
difference between sample means / - Data sample_mean
- miu1100 sigma110
- miu2200 sigma210
- do k1 to 1000
- do i1 to 100 by .5
- z1rannor(0) z2rannor(0)
- x1miu1z1sigma1 x2miu2z2sigma2
- output end end
45Sampling distribution of difference between two
means (2/3)
- proc sort
- by k
- Proc means noprint
- var x1 x2 by k
- output outtwo_means meanmean1 mean2
- run
- data dmeans
- set two_means
- dmeanmean1-mean2
- run
46Sampling distribution of difference between two
means (3/3)
- proc print datadmeans
- run
- proc univariate datadmeans
- var mean1 mean2 dmean
- run
47SAS program (t distribution)
- options nodate
- data tdistribution
- df15
- do i-5 to 5 by .1
- prob_zprobnorm(i)
- prob_tprobt(i,df)
- output
- end
- proc print
- var i prob_z prob_t df
- proc plot
- plot prob_zi prob_ti
- run
48SAS program (Chi-square and F distributions)
- data distribution
- df15 df15 df220
- do i0 to 50 by .5
- prob_chi2probchi(i,df)
- prob_fprobf(i,df1,df2)
- output
- end
- proc print
- var i prob_chi2 prob_f df df1 df2
- proc plot
- plot prob_chi2i prob_fi
- run
49SAS program (distribution table)
- data sampling
- df30 df15 df210
- do p0 to 1 by 0.01
- normalprobit(p)
- ttinv(p,df)
- chi_squareCINV(p,df)
- fFINV(p,df1,df2)
- output
- end
- proc print
- var normal t chi_square f p df df1 df2
- run
- proc plot
- plot pnormal pt pchi_square pf
- run
- quit