Estimation 2: 1 - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Estimation 2: 1

Description:

What is the difference in body mass index (BMI) between breast cancer cases ... given to you in the body of the table. The body of the table is comprised of the ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 74
Provided by: penelop9
Learn more at: http://people.umass.edu
Category:
Tags: body | estimation | index | mass

less

Transcript and Presenter's Notes

Title: Estimation 2: 1


1
Estimation More Applications
2
  • So far
  • defined appropriate confidence interval estimates
    for a single population mean, ?.
  • Confidence interval estimators are valuable
    because they provide
  • Indicate the width of the central (1-alpha) of
    the sampling distribution of the estimator
  • Provide an idea of how much the estimator might
    differ if another study was done.

3
  • Next step
  • extend the principles of confidence interval
    estimation to develop CI estimates for other
    parameters.
  • The important things to keep track of FOR EACH
    PARAMETER
  • What is the appropriate probability distribution
    to describe the spread of the point estimator of
    the parameter?
  • What underlying assumptions about the data are
    necessary?
  • How is the confidence interval calculated?

4
  • The confidence interval estimates of interest
    are
  • Confidence Interval calculation for the
    difference between two means, ?1 ?2 , for
    comparing two independent groups.
  • Confidence Interval calculation for the mean
    difference, ?d , for paired data.
  • Population variance, s2, when the underlying
    distribution is Normal. We will introduce the
    ?2 (Chi-square) distribution.
  • The ratio of two variances for comparing
    variances of 2 independent groups
    introducing the F-distribution.

5
  • Confidence Interval Estimation for
  • Population proportion, ?, using the Normal
    approximation for a Binomial proportion.
  • The difference between two proportions, p1 p2
    two independent groups.

6
1. Confidence Interval calculation for the
difference between two means, ?1 ?2, for
Two Independent groups
  • We are often interested in comparing two groups
  • What is the difference in mean blood pressure
    between males and females?
  • What is the difference in body mass index (BMI)
    between breast cancer cases versus non-cancer
    patients?
  • How different is the length of stay (LOS) for
    CABG patients at hospital A compared to hospital
    B?
  • We are interested in the similarity of the two
    groups.

7
  • Statistically, we focus on
  • the difference between the means of the two
    groups.
  • Similar groups will have small differences, or
    no difference, between means.
  • Thus, we focus on estimating the difference in
    means

m1
-
m2

An obvious point estimator is the difference
between sample means, x1 x2
8
To compute a confidence interval for this
difference, we need to know the standard error of
the difference.
Suppose we take independent random samples from
two different groups
We know the sampling distribution of the mean for
each group
9
What is the distribution of the difference
between the sample means, (x1 x2) ?
  • The sum (or difference) of normal RVs will be
    normal. What will be the mean and variance?
  • This is a linear combination of two independent
    random variables. As a result,



10
In general, for any constants a and b,

  • That is, the distribution of the sum of ax1 and
    bx2,
  • The mean is the sum of am1 and bm2
  • The variance is the sum of (a2)(var of sampling
    distribution of x1) and (b2)(var of sampling
    distribution of x2)

11
Letting a 1, and b -1, we have
Thus, the standard error of the difference
between means
12
  • Once we have
  • a point estimate
  • its standard error,
  • we know how to compute a confidence interval
    estimate.

Confidence Interval Estimate
Point Estimate
Confidence Coefficient
Std Error

?
Percentile From N(0,1)
x1 x2
s2 known ?
Percentile from tdf
Est. of std error
s2 estimated from samples ?
13
Example Data are available on the weight gain
of weanling rats fed either of two diets. The
weight gain in grams was recorded for each rat,
and the mean for each group computed Diet Group
1 Diet Group 2 n1 12 rats n2 7
rats x1 120 gms x2 101 gms What is the
difference in weight gain between rats fed on the
2 diets, and a 99 CI for the difference?
14
  • We will assume
  • a. The rats were selected via independent simple
    random samples from two populations.
  • b. the variance of weight gain of weanling rats
    is known, and is the same for both diet groups
  • s12 s22 400
  • Construct a 99 confidence interval estimate of
    the difference in mean weight gain, m1 m2
  • Point Estimate x1 x2 120 101 19 gms
  • Std error of point estimate



15
  • 3. With known variance, use a percentile of
    N(0,1)
  • For (1 a) .99, z.995 2.576
  • The 99 CI for m1 m2 is
  • (5.5, 43.5) gms

16
How do we interpret this interval? (5.5,
43.5) gms With different samples, we will have
different estimates of the true difference in
gains. The endpoints of the confidence interval
indicate how wide the difference in estimates is
expected to be 99 of the time. Alternatively,
if we repeatedly selected samples and computed a
CI, then for 99 of the intervals computed would
include the true difference in gains.
17
  • How do we compute a confidence interval when
  • we dont know the population variance(s),
  • but must estimate them from our samples?
  • If s12 and s22 are UNknown
  • Is it reasonable to assume that the variances of
    the two groups are the same?
  • That is, is it OK to assume unknown s12 s22 ?
  • Questions to consider
  • Do data arise from the same measurement process?
  • Have we have sampled from the same population?
  • Does difference in groups lead us to expect
    different variability as well as different mean
    levels?

18
  • If OK to assume variances equal s12 s22 s2
  • We have 2 estimates of same parameter, s2
  • One from each sample s12 and s22
  • We can create a pooled estimate sp2
  • This is a weighted average of the 2 estimates of
    the variance
  • Weighting by (ni1) for the ith sample

19
The standard error of the difference in means, x1
x2 is then That is, Sp2 is used as an
estimator of the variance of x1 and of x2
rather than the two sample estimates
20
  • Use the t-distribution to compute percentiles
  • we are estimating the variance from the samples
  • One degree of freedom is lost for each sample
    mean we estimated, resulting in
  • df (n1 1) (n2 1) n1 n2 2
  • Thus, our confidence interval estimator when the
    variance is Unknown, but assumed equal for the
    two groups

21
  • Example Weanling Rats Revisited
  • Diet Group 1 Diet Group 2
  • n1 12 rats n2 7 rats
  • x1 120 gms x2 101 gms
  • s12 457.25 g2 s22 425.33 g2
  • Is there a difference in mean weight gain among
    rats fed on the 2 diets?
  • Use sample estimates of the variance
  • Assume that the variances are equal since
  • the rats in each group come from the same breed
  • were fed the same number of calories on their
    different diets
  • Used same scale in weighing.

22
  • Assuming s12 s22, equal but unknown, construct
    a 99 CI for the difference in means, m1 m2 .
  • Point Estimate x1 x2 120 101 19 gms
  • Std error of point estimateStep 1 sp2

23
  • Step 2 SE of point estimate
  • Confidence Coefficient for 99 CI
  • df n1n2 2 12 7 2 17
  • 1-a .99 ? a/2 .005 ? 1 a/2.995
  • t17.995 2.898

24
  • Confidence interval estimate of (m1 m2)
  • (-10.1, 48.1)
  • Again, we can conclude that if we repeated the
    study many times, and looked at how widely the
    sample mean differences were spread out (99 of
    the time), then the width would be equal to the
    confidence interval width.
  • Notice that the width is wider than when we know
    the variance.

25
  • What if it is not reasonable to assume that the
    variances of the two groups are the same?
  • When it seems likely that s12 ? s22
  • For example
  • we have used a different measuring process
  • we have other reasons to believe both the mean
    level and variability are different between the
    two populations
  • Then
  • Use separate estimates of the variance from each
    sample, s12 and s22
  • Compute Satterthwaites df and appropriate
    t-value

26
Satterthwaites Formula for Degrees of freedom
  • horrible avoid computing by hand!
  • Note
  • it is a function both of the sample sizes and the
    variance estimates.
  • When in fact the variances and sample sizes are
    similar the df will be similar to the pooled
    variance df.

27
Putting it all together yields the CI estimator
when UNknown s12 ? s22 Note use separate
estimates of standard error of sample means for
each sample
28
Example Weanling Rats Once Again! Assume that
the population variances are not equal s12 ?
s22 . Diet Group 1 Diet Group 2 n1 12
rats n2 7 rats x1 120 gms x2 101
gms s12 457.25 g2 s22 425.33 g2 Is there a
difference in mean weight gain among rats fed on
the 2 diets? Compute at 99 CI for the
difference in the group means, assuming s12 ? s22
.
29
  • Point Estimate x1 x2 120 101 19 gms
  • Std error of point estimate
  • Confidence Coefficient for 99 CI
  • df f 13.08 ? use 13
  • 1-a .99 ? a/2 .005 ? 1 a/2.995
  • t13.995 3.012

30
  • Confidence interval estimate of (m1 m2)
  • (-11.0, 49.0)
  • The interpretation of this confidence interval is
    the same. Notice that it is somewhat wider than
    the previous two intervals- indicating a wider
    variation in the sample mean difference when
    variances are not equal between groups.

31
  • Since the unit is common to the two measures, we
    expect
  • the two responses to the unit to be similar in
    some respects
  • We expect the 1st and 2nd responses within a unit
    to be related.
  • Studies use this design to reduce the effects of
    subject-to-subject variability
  • This variability can be reduced by subtracting
    the common part out.
  • We do this by taking the difference between the 2
    measures, on the same subject or unit.

32
  • Analysis of Paired Data Focuses on
  • difference Response 2 Response 1
  • for each subject, or paired unit.
  • Work with the differences
  • as if never saw the individual paired responses
  • and see only the differences as our data set
  • The data set comprised of differences has been
    reduced to a one sample set of data.
  • We already know how to work with this.

33
1st Response
Difference 2nd 1st
2nd Response
1
x1 10
y1 12
d1 1210 2

xi
yi
di xi yi
n
xn 14
y1 11
dn 1114 -3
  • Note
  • The order in which you take differences is
    arbitrary, but it must be consistent. If you
    choose yi xi , then compute that way for all
    pairs.
  • Direction is important. Keep track of positive
    and negative differences.


34
Confidence Interval Calculations for the mean
difference, md
Preliminaries
  1. Compute sample of differences, d1, , dn , where
    n of paired measures.
  2. Obtain sample mean and sample variance of the
    differences
  3. Treat like any other 1-sample case for estimating
    a mean, m, (here a mean difference.)






35
Example Reaction times in seconds to 2
different stimuli are given below for 8
individuals. Estimate the average difference in
reaction time, with a 95 CI. Does there appear
to be a difference in reaction time to the 2
stimuli? Subject X1 X2 Difference (X2 X1)
1 1 4 3 2 3 2 -1 3 2 3 1 4
1 3 2 5 2 1 -1 6 1 2 1 7
3 3 0 8 2 3 1
36
  • We have paired data
  • each subject was measured for each stimuli
  • we focus on the within-subject difference.
  • Since I have subtracted in the direction X2 X1
  • a positive difference means longer reaction time
    for stimulus 2
  • a negative difference means a longer reaction
    time for stimulus 1.
  • We can compute the mean and standard deviation of
    the differences
  • d .75 and Sd 1.39

37
  • For a 95 confidence interval,
  • using my sample estimate of standard error,
  • use the t-distribution.
  • The confidence interval is
  • d tn-1 .1-a/2(sd/?n) .75 t 7
    .975(sd/?8)
  • .75 2.36 (1.39/?8)
  • 95 CI is (-0.41, 1.91)
  • The results indicate that repeating the study may
    produce an estimate quite different from that
    observed, and even possibly a negative estimate.

38
  • Notes
  • It is a common error to fail to recognize paired
    data, and therefore fail to compute the
    appropriate confidence interval.
  • The mean difference md is equal to the difference
    in means, m2 m1 if we ignore pairs your point
    estimate will be correct.
  • However, the variance of the mean difference does
    NOT equal the variance of the difference in means
    so the confidence interval will not be
    correctly estimated if you neglect to use a
    paired data approach.
  • Sd2/n (S12/n) (S22/n)-2Cov/n

39
Confidence Interval Estimation of the Variance,
s2 Standard Deviation, s and Ratio of Variances
of 2 groups
40
3. Confidence Interval for the variance,
s2 Introducing the c2 Distribution
  • What if our interest lies in estimation of the
    variance, ?2 ?
  • Some common examples are
  • Standardization of equipment repeated
    measurement of a standard should have small
    variability
  • Evaluation of technicians are the results from
    person i too variable
  • Comparison of measurement techniques is a new
    method more variable than a standard method?

41
We have an obvious point estimator of ?2 ? s2,
which we have shown earlier is an unbiased
estimator (when using Simple random with
replacement sampling). How do we get a
confidence interval? We will define a new
standardized variable, based upon the way in
which s2 is computed That is, (n-1)s2 / s2
follows a chi-square distribution with n-1
degrees of freedom
42
A quick and dirty derivation We defined the
sample variance as Multiplying each side by
(n-1)
Note this is the numerator from the ?2 variable.
This side is the sum of squared deviations from
the mean.
43
Recall, for X N(m, s2) We can standardize
as
If we square this, we have a squared standard
normal variable
That is, a squared standard normal variable
follows a chi- square distribution, with 1 degree
of freedom this is the definition of a
chi-square, df1
44
If we sum n such random variables, we define a
chi-square distribution with n degrees of freedom
However, if we first estimate ? from the data
x, we reduce the degrees of freedom
45
Features of the Chi Square Distribution
  • Chi-squared variables are sums of squared
    Normally distributed variables.
  • Chi-squared random variables are always positive.
    (Why? square is always positive)
  • The distribution is NOT symmetric. A typical
    shape is




0
46
Features of the Chi Square Distribution
  • Each degree of freedom defines a different
    distribution.
  • The shape is less skewed as n increases.

df 100
47
How to Use the Chi Square Table Table 6,
Rosner The format is the same as for the Student
t-tables
2
2
2
2
c
c
c
c
d

.005
.995
.01
.025
1
7.88


2 5
10.60 16.75
Each row gives information for a separate chi
square distribution, defined by the degrees of
freedom.
The column heading tells you which percentile
will be given to you in the body of the table.
The body of the table is comprised of the values
of the percentile
48
c2
distribution
This area .995
with 5 df

16.750
Pr c25 16.750.995
  • Note Because the c2 distribution is not
    symmetric
  • will often need to look up both upper and lower
    percentiles of the distribution

49

Confidence Interval for s2
For To obtain a (1-a) confidence interval, we
want to find percentiles of the c2 distribution
so that
This area Is a/2
This area is a/2
(1 a)
2
c
2
c
a/2
1- a/2
50
Substitute for X2 in the middle of the inequality
A little algebra yields the confidence interval
formula
51
Confidence Interval for s2
Lower limit of the (1 a) CI
Upper limit of the (1 a) CI
52
Exercise
A precision instrument is guaranteed to read
accurately to within ? 2 units. A sample of 4
readings on the same object yield 353, 351, 351,
and 355. Find a 95 confidence interval estimate
for the population variance, s2 and also for the
population standard deviation, s.
53
Solution
  • Point Estimate We must first estimate the mean,
  • x 352.5, and then the variance, s2 3.67



2. Since n4, the correct chi-square
distribution has df n-1 3.
3. For a (1-a) .95 CI a .05 ? a/2 .025
and 1- a/2 .975 We want c23.025 and
c23.975





This area .025
This area .025
.95
c23.025
c23.975
54
  • 4. Using Table 6 in Rosner (page 758)
  • Using column labeled .025 read down to df 3 row
    ? c23.025 .216
  • Using column labeled .975 read down to df 3 row
    ? c23,.925 9.35
  • (or use Minitab or other program)

55
Using Minitab Calc ? Prob Dist ? Chi sq
Inverse Cumulative Probability
Degrees of freedom df n-1
Input desired percentiles e.g., .025, .975
Inverse Cumulative Distribution
Function Chi-Square with 3 DF P( X lt x)
x 0.0250 0.2158 0.9750
9.3484
56
5. Compute Limits
Lower limit of the 95 CI
Upper limit of the (1 a) CI
The 95 CI for s2 (1.18, 50.97)
57
  • 6. To compute a confidence interval for the
    standard deviation, s
  • always compute a CI for the variance
  • then take the square root of the upper and lower
    limits
  • 95 CI for s ( ?1.18, ?50.97 ) (1.09, 7.14)
  • Point estimate for s ?3.67 1.92
  • Does this precision instrument meet its
    guarantee to accuracy within ? 2 units?

58
  • Note that the confidence intervals for s
  • are wide
  • are not symmetric about the point estimate
  • Only with very large n
  • will you find relatively narrow confidence
    interval estimates for the variance and standard
    deviation.

LL s UL
1.09 1.92 7.14
59
Confidence Interval calculation for the ratio of
two variances introducing the F-distribution
  • We are often interested in comparing the
    variances of 2 groups.
  • This may be the primary question of interest
  • I have a new measurement procedure are the
    results more variable than the standard
    procedure?
  • Comparison of variances may also be a preliminary
    analysis to determine whether it is appropriate
    to compute a pooled variance estimate or not,
    when the goal is comparing the mean levels of two
    groups.

60
  • For comparing variances, we use a RATIO rather
    than a difference.
  • We look at the ratio of variances sx2/sy2
  • If this ratio is 1 ? the variances are the same
  • If it is far from 1 ? the variances differ.
  • In order to
  • make probability statements about ratios of
    variances
  • to compute confidence intervals
  • we need to introduce another distribution, known
    as the
  • F-distribution

61
A Definition of the F Distribution
IF x1, xnx are each independent Normal (mx,
sx2) and y1, yny are each independent Normal
(my, sy2) and if we calculate sample variances
in the usual way
62
THEN
The ratio follows an F-distribution with two
degree of freedom specifications for the
numerator and for the denominator.
numerator df nx 1 denominator df ny 1
63
Percentiles of the F-distribution are tabulated,
as in the Appendix of Rosner, Table 9, pages
762-764.
  • Using the Table
  • Each row defines a different percentile
    distribution for a given denominator df.
  • Each column defines a different numerator df.
  • The body of the table gives values of the
    F-distribution.
  • Only the upper-tail percentiles (.90, , .999) of
    the distribution are tabulated.

64
df for df for numerator denominator
Freedom p 7 8 12 24
? 20 .90 2.04
2.00 1.89 1.77 1.61 .95
2.51 2.45 2.28 2.08 1.84
30 .90
.95
  • Example Find the 95th percentile of an
    F-distribution with df12,20? (num,den is the
    standard order)
  • Under the denominator df col, find the row for
    df20
  • Under p, the percentile, find the row for p.95
  • Find the column headed by numerator df12.
  • Read the value at their intersection F12,20
    .95 2.28 .

65
For lower-tail percentiles (.005, , .10) which
are not tabulated, we use the fact that The
percentiles of an F with numerator dfa
denominator dfb are related to The
percentiles of an F with numerator dfb
denominator dfa as
66
Example What is the 5th percentile of an F-
distribution with df12,20?


We have already looked up this value.
Lower and upper tail percentiles can be computed
directly using Minitab no need to invert.
67
Use Minitab Calc ? Probability Distributions
? F
Inverse Cumulative Prob
Numerator and Denominator df
Desired Percentile
68
Inverse Cumulative Distribution Function F
distribution with 12 DF in numerator and 20 DF in
denominator P( X lt x) x 0.9500
2.2776
69
Confidence Interval for the ratio of 2 Variances
sx2/sy2
IF x1, xnx are each independent Normal (mx,
sx2) and y1, yny are each independent Normal
(my, sy2) A point estimate of sx2/sy2 is
sx2/sy2
70
A (1-a) Confidence Interval Estimate has
71
Example Pelicans were exposed to DDT. Is there
a difference in variability of residue found in
juveniles and nestling birds? Juvenile
Pelicans Nestling Pelicans n1 10 n2
13 s1 .017 s2 .006 Compute a
95 Confidence Interval for the ratio of true
variances, s12/s22 .
72
  • Point estimate
  • s12/s22 (.017)2 / (.006)2 8.03
  • Percentiles of F
  • F9,12.975 3.44
  • F9,12.025 .286
  • Confidence Limits

73
A 95 CI for the variance ratio is (2.33, 31.07)
.
Notice that the width of the sampling
distribution of variance ratios is very broad.
Since 1 is not in the interval, it appears that
the variances of the groups are different, with
juvenile pelicans having a greater variability in
DDT residue. Note always work with variances,
not standard deviations.
Write a Comment
User Comments (0)
About PowerShow.com