Estimation

About This Presentation

Title:

Estimation

Description:

Estimation and hypothesis testing Estimation & hypothesis testing (F-test, Chi2-test, t-tests) Introduction t-tests Outlier tests (k SD: Grubbs, Dixon's Q) – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 31

Provided by: ProfD152

Category:

more less

Transcript and Presenter's Notes

Title: Estimation

1
Estimation hypothesis testing (F-test,
Chi2-test, t-tests)
Estimation and hypothesis testing

Introduction
t-tests
Outlier tests (k SD Grubbs, Dixon's Q)
F-test, ?2 (Chi2)-test ( 1-sample F-test)
Tests and confidence limits
Analysis of variance (ANOVA)
Introduction
Model I ANOVA
Performance strategy
Testing of outliers
Testing of variances (Cochran "C", Bartlett)
Model II ANOVA
Applications

CINHST CINHST-EXCEL CINHST-Exercise
Grubbs, free download from http//www.graphpad.c
om/articles/outlier.htm
CochranBartlett ANOVA
Power
2
Introduction
Introduction

When we have a set/sets of data ("sample"), we
often want to know whether a statistical estimate
thereof (e.g., difference in 2 means, difference
of a SD from a target) is pure coincidence or
whether it is "statistically significant". We can
approach this problem in the following way
The null hypothesis H0 (no difference) is tested
against the alternative hypothesis H1 (there is a
difference) on the basis of collected data. The
decision acceptance/rejection of the hypothesis
is made with a certain probability, most often
with 95 (statistical significance).
Because, usually, we have a limited set of data
("sample"), we extrapolate the estimates from our
sample to the underlying populations by use of
the statistical distribution theory and we assume
random sampling.
Hypothesis testing Example
Is the difference
between the means
of two data sets
real or only accidental?
Statistical significance in more detail
In statistics the words significant and
significance have specific meanings. A
significant difference, means a difference that
is unlikely to have occurred by chance. A
significance test, shows up differences unlikely
to occur because of a purely random variation. To
decide if one set of results is significantly
different from another depends not only on the
magnitude of the difference in the means but also
on the amount of data available and its spread.

Different?
3
Significance testing Qualitative investigation
Introduction

Adapted from Shaun Burke, RHM Technology Ltd,
High Wycombe, Buckinghamshire, UK. Understanding
the Structure of Scientific Data
LC GC Europe Online Supplement

probably not different and would 'pass' the
t-test (tcrit gt tcalc)
probably different and would 'fail' the t-test
(tcrit lt tcalc)
could be different but not enough data to say for
sure (i.e., would 'pass' the t-test tcrit gt
tcalc)
practically identical means, but with so many
data points there is a small but statistically
siginificant ('real') difference and so would
'fail' the t-test (tcrit lt tcalc)
spread in the data as measured by the variance
are similar would 'pass' the F-test (Fcrit gt
Fcalc)
spread in the data as measured by the variance
are different would 'fail' the F-test (Fcrit lt
Fcalc)
could be a different spread but not enough data
to say for sure would 'pass' the F-test (Fcrit gt
Fcalc)
4
General remarks
Introduction

General requirements for parametric tests
Random sampling
Normal distributed data
Homogeneity of variances, when applicable
Note on the testing of means
When we test means, the central limit theorem is
of great importance because it favours the use of
parametric statistics.
Central limit theorem (see also "sampling
statistics")
The means of independent observations tend to be
normally distributed
irrespective of the primary type of distribution.
Implications of the central limit theorem
When dealing with mean values, the type of
primary distribution is of limited importance,
e.g. the t-test for comparisons of means.
When dealing with percentiles, e.g. reference
intervals, the type of distribution is indeed
important.

5
Overview of test procedures (parametric)
Introduction

Testing levels
1-sample t-test comparison of a mean value with
a target or limit
t-test comparison of mean values (unpaired)
Perform F-test before
t-test equal variances
t-test unequal variances
paired t-test comparison of paired measurements
(x, y)
Testing outliers
k SD Grubbs (http//www.graphpad.com/articles
/outlier.htm)
Dixon's Q (Annex n 3 to 25)
Testing dispersions
F-test for comparison of variances F s22/s12
?2 (Chi2)-test or 1-sample F-test
Testing variances (several groups)
Cochran "C"

6
t-tests
t-tests

Difference between a mean and a target
("One-sample" t-test)
With 95-CI xm /- t0.05n s/?N (s/?N
Standard error)
? t (µ0 - xm)/(s/?N)
For t
Degrees of freedom n N-1
Probability a 0.05
Important ? t-distribution (see before sampling
statistics)
Difference between two means
Perform F-test before, decide on the outcome to
use the t-test with equal or unequal variances.
Given independence, the difference between two
variables that are normally distributed is also
normally distributed.
The variance of the difference is the sum of the
individual variances
t (xm2 xm1)/s2/N1 s2/N20.5
where s2 is a common estimate of the variance (
"pooled variance")
s2 (N1 1)s12 (N2 1)s22/(N1 N2 2)

7
t-test different variances
t-tests

The difference is still normally distributed
given s1 ? s2 and the difference of means has the
variance s12/N1 s22/N2, which is estimated as
s12/N1 s22/N2.
However, the t value t (xm2 xm1)/s12/N1
s22/N20.5 does not strictly follow the
t-distribution. The problem is mainly of academic
interest and special tables for t have been
provided (Behrens, Fisher, Welch).
gtPerform F-test before t-test!
paired t-test comparison of mean values (paired
data)
Example Measurements before and after treatment
in patients. When testing for a difference with
paired measurements, the paired t-test is
preferable. This is because such measurements are
correlated and pairing of the data reduces the
random variation. Thereby, it increases the
probability of detecting a difference.
Calculations
The individual paired differences are computed
difi x2i x1i
The mean and standard deviation of the N (N1
N2) differences are computed
difm S difi /N sdif S (difi
difm)2/(N-1)0.5
SEdif sdif/N0.5
Testing for whether the mean paired difference
deviates from zero
t (difm 0)/SEdif (N-1 degrees of freedom)

8
Outliers
Outliers

Outliers have great influence on parametric
statistical tests. Therefore, it is desirable to
investigate the data for outliers (see Figure,
for example).
Testing for outliers can be done with the
following techniques
k SD Grubbs (http//www.graphpad.com/articles
/outlier.htm)
Dixon's Q (Annex n 3 to 25)
All assume normal distributed data.
The k SD method (outlier point gt k SD away
from the mean)
With this method, it is important to know that
the statistical chance to find an outlier will
increase with the number of data investigated.

The upper point is an outlier according to the
"Grubb's test" (P lt 0.05)
9
F-test Comparing variances
F-test ?2 (Chi2)-test

If we have two data sets, we may want to compare
the dispersions of the distributions.
Given normal distributions, the ratio between the
variances is considered.
The variance ratio test was developed by Fisher,
therefore the ratio is usually referred to as the
F-ratio and related to tables of the
F-distribution.
Calculation
F s22 /s12
Note The greater value
should be in the numerator ? F ? 1!
Example
F s22 /s12
(0.228)2/(0.182)2
1.6 n.s.
Degrees of freedom
df2(Num) 14-1 13
df1(Denom) 21-1 20
Critical(0.05) F 2.25
F-test Some notes

Numerator Denominator
10
F-test (ctd.)
F-test ?2 (Chi2)-test

?2 (Chi2)-test (or 1-sample F-test)
Comparing a variance with a target or limit
Chi2exp s2exp n/s2Man
Test whether Chi2exp ? Chi2critical (1-sided,
0.05).
One-sided because we test versus a targhet or a
limit.
The Chi2-test is used in the CLSI EP 5 protocol.
Relationships between F, t, and Chi2
Relationship between Chi2 and F
Chi2/n Fn,? n degrees of freedom.
Relationship between F and t
The one-tailed F-test with 1 and n degree of
freedom is equivalent to the t-test with n degree
of freedom. The relationship t2 F holds for
both calculated and tabular values of these two
distributions t(12,0.05) 2.17881 F(1,12,0.05)
4.7472
Peculiarities and problems with the EXCEL F-test

11
Interpretation of the P-value
P-values

A test for statistical significance (at a certain
probability P), tests whether a hypothesis has to
be rejected or not, for example, the
nulhypothesis.
The nulhypothesis of the F-test is that 2
variances are not different or that an
experimentally found difference is only by
chance.
The nulhypothesis of the F-test will not be
rejected when the calculated probability Pexp is
greater or equal than the chosen probability P (P
usually chosen as 0.05 5), or when the
experimental Fexp value is smaller or equal than
the critical Fcrit value.
Example
Fexp (calculated) 1.554
Critical value of Fcrit 2.637
Pexp (from experiment) 0.182
Chosen probability
P 0.05
Observation
The calculated P-value (0.182 18) is greater
than the chosen P-value (0.05 5). However, the
experimental F-value is lt the critical F-value.
Conclusion
The nulhypothesis is not rejected, this means
that the difference of the variances is only by
chance.
NOTE

12
Tests and confidence limits
Tests and confidence limits

We have seen for the 1-sample t-test the close
relationship between confidence intervals and
significance testing. In many situations, one can
use either of them for the same purpose.
Confidence intervals have the advantage that they
can be shown in graphs and they provide
information about the spread of an estimate
(e.g., a mean).
The tables below give an overview about the
concordance between CI's and significance testing
for means and variances (SD's).

-t 2-sided, or 1-sided 1-sided for comparison
with claims -When stable s is known, z may be
chosen instead of t
13
Exercises
CINHST CINHST-EXCEL

This tutorial/EXCEL template explains the
connection between Significance Tests and
Confidence Intervals when the purpose is Null
Hypothesis Significance Testing (NHST). Indeed,
for the specific purpose of NHST, P-values as
well as CI's can be used (look whether the null
value or target value is inside or outside the
CI), they are just two sides of the same medal.
Examples are the comparison of
i) a standard deviation (SD) with a target value,
ii) two standard deviations,
iii) a mean with a target value,
iv) two means, and
v) a mean paired difference with a target value.
The statistical tests involved are the 1-sample
F-test, F-test, 1-sample t-test, t-test, and the
paired t-test, respectively, the CI's of SD, F,
mean, mean difference, and mean paired
difference.
Another exercise shows how NHST is influenced by
-The magnitude of the difference
-The number of data-points
-The magnitude of the SD
Please follow the guidance given in the "Exercise
Icons" and read the comments.

CINHST-Exercise
Grubbs
14
Notes
Notes
15
Analysis of Variance ANOVA
ANOVA

The Three Universal Assumptions of Analysis of
Variance
1. Independence
2. Normality
3. Homogeneity of Variance
Overview of the concepts
Model I (Assessing treatment effects)
Comparison of mean values of several groups.
Model II (Random effects)
Study of variances Analysis of components of
variance
Model I and II Identical computations - but
different purposes and interpretations!
Why ANOVA?
Model I (Assessing treatment effects)
ANOVA is an extension of the commonly used
t-test for comparing the means of two groups.
The aim is a comparison of mean values of
several groups.
The tool is an assessment of variances.

16
Introduction Types of ANOVA
ANOVA

One-way Only one type of classification, e.g.
into various treatment groups
Ex. Study of serum cholesterol level in various
treatment groups
Two-way Subclassification within treatment
groups, e.g. according to gender
Ex. Do various treatments influence serum
cholesterol in the same way in men and women?
(not considered further here)
Principle of One-way ANOVA
Case 1 Null-hypothesis valid

Distances within (- - -) and between () groups
are squared and summed, and finally compared.
No significant difference between groups Red
distances are small the main source of
variation is within-groups.
Significant difference between groups Red
distances are large the main source of
variation is between-groups.
17
Introduction Mathematical model
ANOVA

One-way ANOVA
Mathematical model (example treatment)
Yij Grand mean
(?j) treatment (between-group) effectj
?ij (within-group)
Null hypothesis Treatment group effects are
zero
Alternative hypothesis Treatment group effects
present
Avoiding some of the pitfalls using ANOVA
In ANOVA it is assumed that the data are normally
distributed. Usually in ANOVA we dont have a
large amount of data so it is difficult to prove
any departure from normality. It has been shown,
however, that even quite large deviations do not
affect the decisions made on the basis of the
F-test.
A more important assumption about ANOVA is that
the variance (spread) between groups is
homogeneous (homoscedastic). The best way to
avoid this pitfall is, as ever, to plot the data.
There also exist a number of tests for
heteroscedasity (i.e., Bartlett's test and
Levene's test). It may be possible to overcome
this type of problem in the data structure by
transforming it, such as by taking logs. If the
variability within a group is correlated with its
mean value then ANOVA may not be appropriate
and/or it may indicate the presence of outliers
in the data. Cochran's test can be used to test
for variance outliers.

18
Model I ANOVA Violation of assumptions
ANOVA
19
Model I ANOVA Short summary
ANOVA

Plot your data
Generally, the procedure is robust towards
deviations from normality.
However, it is indeed sensitive towards
outliers, i.e. investigate for outliers within
groups.
When the variance within groups is not constant,
e.g. being proportional to the level, logarithmic
transformation may be appropriate.
Testing for variance homogeneity may be carried
out by Bartletts test.
Cochran's test can be used to test for variance
outliers.
When F is significant
? Supplementary analyses will not be addressed
in more detail!
Maximum against minimum (Student-Newman-Keuls
procedure)
Pairwise comparisons with control of type I
error (Tukey)
Post test for trend (regression analysis)
Control versus others (Dunnett)
Control group (C) versus treatment groups
Often, focus is on effects in
treatment groups versus

20
Model II (random effects) ANOVA
ANOVA
Example Ranges of serum cholesterol in
different subjects.

Model II (random effects) ANOVA
(analysis of components of variation)
Mathematical model
Yij Grand mean
Between-group variation ?j (?B)
Within-group variation ?i (?W)
Reminder

21
Total variance (total standard deviation)
ANOVA

The standard deviation (s) of calculated results
(propagation of s)
1. Sums and differences
y a(sa) b(sb) c(sc) ? sy SQRTsa2
sb2 sc2 (SQRT square root)
Do not propagate CV!
2. Products and quotients
y a(sa) b(sb) / c(sc) ? sy/y
SQRT(sa/a)2 (sb/b)2 (sc/c)2
3. Exponents (the x in the exponent is
error-free)
y a(sa)x ? sy/y x sa/a
Addition of variances stot SQRTs21 s22
A large component will dominate
Forms the basis for the suggestion by Cotlove et
al. SDA lt 0.5 x SDI
A analytical variation
I within-individual biological variation
? In a monitoring situation the total random
variation of changes is only increased up to 12
as long as this relation holds true.

22
Software output
ANOVA

One-way ANOVA Output of statistical programs
Variances within and between groups are evaluated
Interpretation of model I ANOVA The F-ratio
If the ratio of between- to within-mean square
exceeds a critical F-value (refer to a table or
look at the P-value), a significant difference
between group means has been disclosed.
F Fisher published the ANOVA approach in 1918.
Components of variation
Relation to standard output of statistics
programs

XGP Group mean XGM Grand mean
df Degrees of freedom (Mean square Variance
Squared SD)
F MSB/MSW n SDB2 SDW2/SDW2
For unequal group sizes, a sort of average n is
calculated according to a special formula n0
1/(K-1)N - ?ni2/N
23
Conclusion
ANOVA

Model I ANOVA
A general tool for assessing differences between
group means
Model II ANOVA
Useful for assessing components of variation
Nonparametric ANOVA
Kruskall-Wallis test A generalization of the
Mann-Whitney test to deal with gt 2 groups.
Friedmans test A generalization of Wilcoxons
paired rank test to more than two repeats.
The study of components of variation not
suitable for nonparametric analysis.
Software
ANOVA is included in standard statistical
packages (SPSS, BMDP, StatView, STATA,
StatGraphics etc.)
Variance components may be given or be derived
from mean squares as outlined in the tables.

24
Exercises
CochranBartlett

Many statistical programs do not include the
Cochran or Bartlett test. Therefore, they have
been elaborated in an EXCEL-file.
The CochranBartlett file contains the formula's
for the
-Cochran test for an outlying variance (including
the critical values)
-Bartlett test for variance homogeneity
Both are important for ANOVA
-A calculation example
More experienced EXCEL users may be able to adapt
this template to their own applications.
This tutorial contains interactive exercises for
self-education in Analysis of Variance (ANOVA).
ANOVA can be used for 2 purposes
-Model I (Assessing treatment effects)
Comparison of MEAN values of several
groups.
-Model II (Random effects)

ANOVA
25
Notes
Notes
26
Notes
Notes
27
The statistical Power concept sample size
calculations
Power and sample size

When testing statistical hypotheses, we can make
2 types of errors. The so-called type I (or a
error) and the type II (or b error). The power of
a statistical test is defined as 1- b error. The
power concept is demonstrated in the figure
below, denoting the probability of the a-error by
p, and the one of the b-error by q. Like
significance testing, power calculations can be
done 1-and 2-sided.
Purpose of power analysis and sample-size
calculation
Some key decisions in planning any experiment
are, "How precise will my parameter estimates
tend to be if I select a particular sample size?"
and "How big a sample do I need to attain a
desirable level of precision?
Power analysis and sample-size calculation allow
you to decide (a) how large a sample is needed to
enable statistical judgments that are accurate
and reliable and (b) how likely your statistical
test will be to detect effects of a given size in
a particular situation. "

28
The statistical Power concept sample size
calculations
Power and sample size

Calculations
Definitions
zp/2 probability of the nul-hypothesis
(usually 95, 1- or 2-sided e.g. zp/2 1.65 or
1.96)
z1-q probability of the alternative-hypothesis
(usually 90, always 1-sided e.g. z1-q 1.28)
N number of measurements to be performed
Mean versus a target value
N SD/(mean target)2 (zp/2 z1-q)2
Detecting a relevant difference (gives the number
required in each group)
N (SDDelta/Delta)2 (zp/2 z1-q)2
Delta Difference to be detected
SDDelta SQRT(SDx2 SDy2), usually SDx SDy
gtSDDelta ?2 SD
(requires previous knowledge of the SD)
Example difference between 2 groups

29
Exercises
Power

This file contains 2 worksheets that explain the
power concept and allow simple sample-size
calculations.
Please use dedicated software for routine power
calculations.
Concept
Use the respective "Spinners" to change the
values (or enter the values directly in the blue
cells) for
-Mean
-SD
For comparison of a sample mean versus a
target, use sample SD
For comparison of 2 sample means with the
same SD,
use SD SQRT(2)SD
-Sample size
-Significance level (Only with Spinner!!!)
Limited to the same value for alpha- and
beta-error!
NOTE alpha 2-sided, beta 1-sided!!!
gtObserve the effect on the power.
Calculations