Title: The sample size calculation for heterogeneous variances
1The sample size calculation for heterogeneous
variances
- Wei-ming Luh
- ???
- National Cheng Kung University,
- ????
- Tainan, Taiwan
2- Selecting an insufficient sample size yields a
study with inadequate sensitivity, whereas
selecting an excessive sample size wastes
resources.
3Sufficient sample size is important.
- Textbooks Mace (1974), Cohen (1988), Kraemer
Thiemann (1987), and Desu Raghavarao (1990). - Computer programs Gorman Primavera (1995),
Lenth (2000), Morse (1999), SAS Institute (1999) - Thomas (1998) even provided a comprehensive list
of power-analysis software
//www.forestry.ubc.ca/conservation/power/
4One-sample t testGiven a, ß
- The minimum sample size needed
-
- is the standardized effect
size, which is the difference of the actual and
hypothesized means in standard deviation, s,
units
5Guenther (1981)s modification
6Caution!
- Conventional formulas are based on the assumption
of normality and variance homogeneity.
7Assumption violationHeavy-tailed and asymmetric
distribution
8In the context of long-tailed distributions and
heterogeneous variances
-
- Trimmed mean method to correct non-normality
- Approximate test to take care of heterogeneity
(Behrens-Fisher problem )
91. Trimmed mean
Let be the order
statistics of random sample
Let be the proportion of trimming in each
tail of the distribution So the effective
sample size is
10Yuens method (1974)
which is distributed approximately as the Student
t with degrees of freedom
11Winsorized Variance(replacing)
12Effective sample size
13Original sample size
Always rounding up to the next highest integer.
14Monte Carlo simulation
- 1. We generated data by using SAS RANNOR function
to create the standard normal observations (Z) - 2. We used the g-and-h distributions (Hoaglin,
1985) to transform Z to reflect the target
distribution shapes
15Five distribution shapes
- The corresponding skew and kurtosis Normal
(g0, h0) (0, 0) - heavy-tailed (g0, h.1) (0, 5.5)
- (g0, h.2) (0, 36.22)
- Asymmetrical (g0.5, h0) (1.75, 8.9)
- (g0.5, h.2) (13.16, 42895)
16Sample size table (for d1, one-sided test,
power.80)
17Simulation results (a.05)
18Conclusion
- The heavier the distribution tail, the fewer the
subjects needed for the trimmed mean method. - The trimmed sample size can achieve the desired
statistical power while the conventional sample
size formulas result in over-sampling.
192. unequal variances
Yuens method (1974)
20 Schouten (1999),
21Monte Carlo simulation
22(No Transcript)
23- Generalized Behrens-Fisher problem
- Schwertman (1987)
- The largest mean difference can be tested as
24More questions
- How much to trim?
- When variances are unknown and possibly unequal,
the usual unbiased estimate can be used. - How about confidence interval?
- How about cost constraint?
25More sample size formulas should be developed for
robust statistics.
- Thank you for your listening.
luhwei_at_mail.ncku.edu.tw