Title: Estimation
1Estimation hypothesis testing (F-test,
Chi2-test, t-tests)
Estimation and hypothesis testing
- Introduction
- t-tests
- Outlier tests (k SD Grubbs, Dixon's Q)
- F-test, ?2 (Chi2)-test ( 1-sample F-test)
- Tests and confidence limits
- Analysis of variance (ANOVA)
- Introduction
- Model I ANOVA
- Performance strategy
- Testing of outliers
- Testing of variances (Cochran "C", Bartlett)
- Model II ANOVA
- Applications
CINHST CINHST-EXCEL CINHST-Exercise
Grubbs, free download from http//www.graphpad.c
om/articles/outlier.htm
CochranBartlett ANOVA
Power
2Introduction
Introduction
- When we have a set/sets of data ("sample"), we
often want to know whether a statistical estimate
thereof (e.g., difference in 2 means, difference
of a SD from a target) is pure coincidence or
whether it is "statistically significant". We can
approach this problem in the following way - The null hypothesis H0 (no difference) is tested
against the alternative hypothesis H1 (there is a
difference) on the basis of collected data. The
decision acceptance/rejection of the hypothesis
is made with a certain probability, most often
with 95 (statistical significance). - Because, usually, we have a limited set of data
("sample"), we extrapolate the estimates from our
sample to the underlying populations by use of
the statistical distribution theory and we assume
random sampling. - Hypothesis testing Example
- Is the difference
- between the means
- of two data sets
- real or only accidental?
- Statistical significance in more detail
- In statistics the words significant and
significance have specific meanings. A
significant difference, means a difference that
is unlikely to have occurred by chance. A
significance test, shows up differences unlikely
to occur because of a purely random variation. To
decide if one set of results is significantly
different from another depends not only on the
magnitude of the difference in the means but also
on the amount of data available and its spread.
Different?
3Significance testing Qualitative investigation
Introduction
- Adapted from Shaun Burke, RHM Technology Ltd,
High Wycombe, Buckinghamshire, UK. Understanding
the Structure of Scientific Data - LC GC Europe Online Supplement
probably not different and would 'pass' the
t-test (tcrit gt tcalc)
probably different and would 'fail' the t-test
(tcrit lt tcalc)
could be different but not enough data to say for
sure (i.e., would 'pass' the t-test tcrit gt
tcalc)
practically identical means, but with so many
data points there is a small but statistically
siginificant ('real') difference and so would
'fail' the t-test (tcrit lt tcalc)
spread in the data as measured by the variance
are similar would 'pass' the F-test (Fcrit gt
Fcalc)
spread in the data as measured by the variance
are different would 'fail' the F-test (Fcrit lt
Fcalc)
could be a different spread but not enough data
to say for sure would 'pass' the F-test (Fcrit gt
Fcalc)
4General remarks
Introduction
- General requirements for parametric tests
- Random sampling
- Normal distributed data
- Homogeneity of variances, when applicable
- Note on the testing of means
- When we test means, the central limit theorem is
of great importance because it favours the use of
parametric statistics. - Central limit theorem (see also "sampling
statistics") - The means of independent observations tend to be
normally distributed - irrespective of the primary type of distribution.
- Implications of the central limit theorem
- When dealing with mean values, the type of
primary distribution is of limited importance,
e.g. the t-test for comparisons of means. - When dealing with percentiles, e.g. reference
intervals, the type of distribution is indeed
important.
5Overview of test procedures (parametric)
Introduction
- Testing levels
- 1-sample t-test comparison of a mean value with
a target or limit - t-test comparison of mean values (unpaired)
Perform F-test before - t-test equal variances
- t-test unequal variances
- paired t-test comparison of paired measurements
(x, y) - Testing outliers
- k SD Grubbs (http//www.graphpad.com/articles
/outlier.htm) - Dixon's Q (Annex n 3 to 25)
- Testing dispersions
- F-test for comparison of variances F s22/s12
- ?2 (Chi2)-test or 1-sample F-test
- Testing variances (several groups)
- Cochran "C"
6t-tests
t-tests
- Difference between a mean and a target
("One-sample" t-test) - With 95-CI xm /- t0.05n s/?N (s/?N
Standard error) - ? t (µ0 - xm)/(s/?N)
- For t
- Degrees of freedom n N-1
- Probability a 0.05
- Important ? t-distribution (see before sampling
statistics) - Difference between two means
- Perform F-test before, decide on the outcome to
use the t-test with equal or unequal variances. - Given independence, the difference between two
variables that are normally distributed is also
normally distributed. - The variance of the difference is the sum of the
individual variances - t (xm2 xm1)/s2/N1 s2/N20.5
- where s2 is a common estimate of the variance (
"pooled variance") - s2 (N1 1)s12 (N2 1)s22/(N1 N2 2)
7t-test different variances
t-tests
- The difference is still normally distributed
given s1 ? s2 and the difference of means has the
variance s12/N1 s22/N2, which is estimated as
s12/N1 s22/N2. - However, the t value t (xm2 xm1)/s12/N1
s22/N20.5 does not strictly follow the
t-distribution. The problem is mainly of academic
interest and special tables for t have been
provided (Behrens, Fisher, Welch). - gtPerform F-test before t-test!
- paired t-test comparison of mean values (paired
data) - Example Measurements before and after treatment
in patients. When testing for a difference with
paired measurements, the paired t-test is
preferable. This is because such measurements are
correlated and pairing of the data reduces the
random variation. Thereby, it increases the
probability of detecting a difference. - Calculations
- The individual paired differences are computed
- difi x2i x1i
- The mean and standard deviation of the N (N1
N2) differences are computed - difm S difi /N sdif S (difi
difm)2/(N-1)0.5 - SEdif sdif/N0.5
- Testing for whether the mean paired difference
deviates from zero - t (difm 0)/SEdif (N-1 degrees of freedom)
8Outliers
Outliers
- Outliers have great influence on parametric
statistical tests. Therefore, it is desirable to
investigate the data for outliers (see Figure,
for example). - Testing for outliers can be done with the
following techniques - k SD Grubbs (http//www.graphpad.com/articles
/outlier.htm) - Dixon's Q (Annex n 3 to 25)
- All assume normal distributed data.
- The k SD method (outlier point gt k SD away
from the mean) - With this method, it is important to know that
the statistical chance to find an outlier will
increase with the number of data investigated.
The upper point is an outlier according to the
"Grubb's test" (P lt 0.05)
9F-test Comparing variances
F-test ?2 (Chi2)-test
- If we have two data sets, we may want to compare
the dispersions of the distributions. - Given normal distributions, the ratio between the
variances is considered. - The variance ratio test was developed by Fisher,
therefore the ratio is usually referred to as the
F-ratio and related to tables of the
F-distribution. - Calculation
- F s22 /s12
- Note The greater value
- should be in the numerator ? F ? 1!
- Example
- F s22 /s12
- (0.228)2/(0.182)2
- 1.6 n.s.
- Degrees of freedom
- df2(Num) 14-1 13
- df1(Denom) 21-1 20
- Critical(0.05) F 2.25
- F-test Some notes
Numerator Denominator
10F-test (ctd.)
F-test ?2 (Chi2)-test
- ?2 (Chi2)-test (or 1-sample F-test)
- Comparing a variance with a target or limit
- Chi2exp s2exp n/s2Man
- Test whether Chi2exp ? Chi2critical (1-sided,
0.05). - One-sided because we test versus a targhet or a
limit. - The Chi2-test is used in the CLSI EP 5 protocol.
- Relationships between F, t, and Chi2
- Relationship between Chi2 and F
- Chi2/n Fn,? n degrees of freedom.
- Relationship between F and t
- The one-tailed F-test with 1 and n degree of
freedom is equivalent to the t-test with n degree
of freedom. The relationship t2 F holds for
both calculated and tabular values of these two
distributions t(12,0.05) 2.17881 F(1,12,0.05)
4.7472 - Peculiarities and problems with the EXCEL F-test
11Interpretation of the P-value
P-values
- A test for statistical significance (at a certain
probability P), tests whether a hypothesis has to
be rejected or not, for example, the
nulhypothesis. - The nulhypothesis of the F-test is that 2
variances are not different or that an
experimentally found difference is only by
chance. - The nulhypothesis of the F-test will not be
rejected when the calculated probability Pexp is
greater or equal than the chosen probability P (P
usually chosen as 0.05 5), or when the
experimental Fexp value is smaller or equal than
the critical Fcrit value. - Example
- Fexp (calculated) 1.554
- Critical value of Fcrit 2.637
- Pexp (from experiment) 0.182
- Chosen probability
- P 0.05
- Observation
- The calculated P-value (0.182 18) is greater
than the chosen P-value (0.05 5). However, the
experimental F-value is lt the critical F-value. - Conclusion
- The nulhypothesis is not rejected, this means
that the difference of the variances is only by
chance. - NOTE
12Tests and confidence limits
Tests and confidence limits
- We have seen for the 1-sample t-test the close
relationship between confidence intervals and
significance testing. In many situations, one can
use either of them for the same purpose.
Confidence intervals have the advantage that they
can be shown in graphs and they provide
information about the spread of an estimate
(e.g., a mean). - The tables below give an overview about the
concordance between CI's and significance testing
for means and variances (SD's).
-t 2-sided, or 1-sided 1-sided for comparison
with claims -When stable s is known, z may be
chosen instead of t
13Exercises
CINHST CINHST-EXCEL
- This tutorial/EXCEL template explains the
connection between Significance Tests and
Confidence Intervals when the purpose is Null
Hypothesis Significance Testing (NHST). Indeed,
for the specific purpose of NHST, P-values as
well as CI's can be used (look whether the null
value or target value is inside or outside the
CI), they are just two sides of the same medal. - Examples are the comparison of
- i) a standard deviation (SD) with a target value,
- ii) two standard deviations,
- iii) a mean with a target value,
- iv) two means, and
- v) a mean paired difference with a target value.
- The statistical tests involved are the 1-sample
F-test, F-test, 1-sample t-test, t-test, and the
paired t-test, respectively, the CI's of SD, F,
mean, mean difference, and mean paired
difference. - Another exercise shows how NHST is influenced by
- -The magnitude of the difference
- -The number of data-points
- -The magnitude of the SD
- Please follow the guidance given in the "Exercise
Icons" and read the comments.
CINHST-Exercise
Grubbs
14Notes
Notes
15Analysis of Variance ANOVA
ANOVA
- The Three Universal Assumptions of Analysis of
Variance - 1. Independence
- 2. Normality
- 3. Homogeneity of Variance
- Overview of the concepts
- Model I (Assessing treatment effects)
- Comparison of mean values of several groups.
- Model II (Random effects)
- Study of variances Analysis of components of
variance - Model I and II Identical computations - but
different purposes and interpretations! - Why ANOVA?
- Model I (Assessing treatment effects)
- ANOVA is an extension of the commonly used
t-test for comparing the means of two groups. - The aim is a comparison of mean values of
several groups. - The tool is an assessment of variances.
16Introduction Types of ANOVA
ANOVA
- One-way Only one type of classification, e.g.
into various treatment groups - Ex. Study of serum cholesterol level in various
treatment groups - Two-way Subclassification within treatment
groups, e.g. according to gender - Ex. Do various treatments influence serum
cholesterol in the same way in men and women?
(not considered further here) - Principle of One-way ANOVA
- Case 1 Null-hypothesis valid
Distances within (- - -) and between () groups
are squared and summed, and finally compared.
No significant difference between groups Red
distances are small the main source of
variation is within-groups.
Significant difference between groups Red
distances are large the main source of
variation is between-groups.
17Introduction Mathematical model
ANOVA
- One-way ANOVA
- Mathematical model (example treatment)
- Yij Grand mean
- (?j) treatment (between-group) effectj
- ?ij (within-group)
- Null hypothesis Treatment group effects are
zero - Alternative hypothesis Treatment group effects
present - Avoiding some of the pitfalls using ANOVA
- In ANOVA it is assumed that the data are normally
distributed. Usually in ANOVA we dont have a
large amount of data so it is difficult to prove
any departure from normality. It has been shown,
however, that even quite large deviations do not
affect the decisions made on the basis of the
F-test. - A more important assumption about ANOVA is that
the variance (spread) between groups is
homogeneous (homoscedastic). The best way to
avoid this pitfall is, as ever, to plot the data.
There also exist a number of tests for
heteroscedasity (i.e., Bartlett's test and
Levene's test). It may be possible to overcome
this type of problem in the data structure by
transforming it, such as by taking logs. If the
variability within a group is correlated with its
mean value then ANOVA may not be appropriate
and/or it may indicate the presence of outliers
in the data. Cochran's test can be used to test
for variance outliers.
18Model I ANOVA Violation of assumptions
ANOVA
19Model I ANOVA Short summary
ANOVA
- Plot your data
- Generally, the procedure is robust towards
deviations from normality. - However, it is indeed sensitive towards
outliers, i.e. investigate for outliers within
groups. - When the variance within groups is not constant,
e.g. being proportional to the level, logarithmic
transformation may be appropriate. - Testing for variance homogeneity may be carried
out by Bartletts test. - Cochran's test can be used to test for variance
outliers. - When F is significant
- ? Supplementary analyses will not be addressed
in more detail! - Maximum against minimum (Student-Newman-Keuls
procedure) - Pairwise comparisons with control of type I
error (Tukey) - Post test for trend (regression analysis)
- Control versus others (Dunnett)
- Control group (C) versus treatment groups
- Often, focus is on effects in
- treatment groups versus
20Model II (random effects) ANOVA
ANOVA
Example Ranges of serum cholesterol in
different subjects.
- Model II (random effects) ANOVA
- (analysis of components of variation)
- Mathematical model
- Yij Grand mean
- Between-group variation ?j (?B)
- Within-group variation ?i (?W)
- Reminder
21Total variance (total standard deviation)
ANOVA
- The standard deviation (s) of calculated results
(propagation of s) - 1. Sums and differences
- y a(sa) b(sb) c(sc) ? sy SQRTsa2
sb2 sc2 (SQRT square root) - Do not propagate CV!
- 2. Products and quotients
- y a(sa) b(sb) / c(sc) ? sy/y
SQRT(sa/a)2 (sb/b)2 (sc/c)2 - 3. Exponents (the x in the exponent is
error-free) - y a(sa)x ? sy/y x sa/a
- Addition of variances stot SQRTs21 s22
- A large component will dominate
- Forms the basis for the suggestion by Cotlove et
al. SDA lt 0.5 x SDI - A analytical variation
- I within-individual biological variation
- ? In a monitoring situation the total random
variation of changes is only increased up to 12
as long as this relation holds true.
22Software output
ANOVA
- One-way ANOVA Output of statistical programs
- Variances within and between groups are evaluated
- Interpretation of model I ANOVA The F-ratio
- If the ratio of between- to within-mean square
exceeds a critical F-value (refer to a table or
look at the P-value), a significant difference
between group means has been disclosed. - F Fisher published the ANOVA approach in 1918.
- Components of variation
- Relation to standard output of statistics
programs
XGP Group mean XGM Grand mean
df Degrees of freedom (Mean square Variance
Squared SD)
F MSB/MSW n SDB2 SDW2/SDW2
For unequal group sizes, a sort of average n is
calculated according to a special formula n0
1/(K-1)N - ?ni2/N
23Conclusion
ANOVA
- Model I ANOVA
- A general tool for assessing differences between
group means - Model II ANOVA
- Useful for assessing components of variation
- Nonparametric ANOVA
- Kruskall-Wallis test A generalization of the
Mann-Whitney test to deal with gt 2 groups. - Friedmans test A generalization of Wilcoxons
paired rank test to more than two repeats. - The study of components of variation not
suitable for nonparametric analysis. - Software
- ANOVA is included in standard statistical
packages (SPSS, BMDP, StatView, STATA,
StatGraphics etc.) - Variance components may be given or be derived
from mean squares as outlined in the tables.
24Exercises
CochranBartlett
- Many statistical programs do not include the
Cochran or Bartlett test. Therefore, they have
been elaborated in an EXCEL-file. - The CochranBartlett file contains the formula's
for the - -Cochran test for an outlying variance (including
the critical values) - -Bartlett test for variance homogeneity
- Both are important for ANOVA
- -A calculation example
- More experienced EXCEL users may be able to adapt
this template to their own applications. - This tutorial contains interactive exercises for
self-education in Analysis of Variance (ANOVA). - ANOVA can be used for 2 purposes
- -Model I (Assessing treatment effects)
- Comparison of MEAN values of several
groups. - -Model II (Random effects)
ANOVA
25Notes
Notes
26Notes
Notes
27The statistical Power concept sample size
calculations
Power and sample size
- When testing statistical hypotheses, we can make
2 types of errors. The so-called type I (or a
error) and the type II (or b error). The power of
a statistical test is defined as 1- b error. The
power concept is demonstrated in the figure
below, denoting the probability of the a-error by
p, and the one of the b-error by q. Like
significance testing, power calculations can be
done 1-and 2-sided. - Purpose of power analysis and sample-size
calculation - Some key decisions in planning any experiment
are, "How precise will my parameter estimates
tend to be if I select a particular sample size?"
and "How big a sample do I need to attain a
desirable level of precision? - Power analysis and sample-size calculation allow
you to decide (a) how large a sample is needed to
enable statistical judgments that are accurate
and reliable and (b) how likely your statistical
test will be to detect effects of a given size in
a particular situation. "
28The statistical Power concept sample size
calculations
Power and sample size
- Calculations
- Definitions
- zp/2 probability of the nul-hypothesis
- (usually 95, 1- or 2-sided e.g. zp/2 1.65 or
1.96) - z1-q probability of the alternative-hypothesis
- (usually 90, always 1-sided e.g. z1-q 1.28)
- N number of measurements to be performed
- Mean versus a target value
- N SD/(mean target)2 (zp/2 z1-q)2
- Detecting a relevant difference (gives the number
required in each group) - N (SDDelta/Delta)2 (zp/2 z1-q)2
- Delta Difference to be detected
- SDDelta SQRT(SDx2 SDy2), usually SDx SDy
gtSDDelta ?2 SD - (requires previous knowledge of the SD)
- Example difference between 2 groups
29Exercises
Power
- This file contains 2 worksheets that explain the
power concept and allow simple sample-size
calculations. - Please use dedicated software for routine power
calculations. - Concept
- Use the respective "Spinners" to change the
values (or enter the values directly in the blue
cells) for - -Mean
- -SD
- For comparison of a sample mean versus a
target, use sample SD - For comparison of 2 sample means with the
same SD, - use SD SQRT(2)SD
- -Sample size
- -Significance level (Only with Spinner!!!)
- Limited to the same value for alpha- and
beta-error! - NOTE alpha 2-sided, beta 1-sided!!!
- gtObserve the effect on the power.
- Calculations
30Notes
Notes