Title: Statistics
1Statistics
- Hypothesis testing Part 2
- richard.bailey_at_ouce.ox.ac.uk
2The world is highly variable and definite clear
predictions/tests are often hard to come bythere
are practical and theoretical difficulties with
falsification alsovariability is one such problem
Statistical inference and probability
statements.
3- Last week comparison of differences between
means - Required interval-scale data approx.
Normally-distributed parent populations for
application to small samples (nlt30) any
distribution for larger samples (ngt30)
the distribution of differences in means we would
see if we were to draw two large random samples
from a single population, many times
Expected probability or Frequency
0
Difference between the means
4Difficulties
- Not always obvious what specific difference we
are looking for - the mean/variance are not always the most
interesting parameters - Sometimes comparing the average isn't meaningful
e.g. bimodal and unimodal grainsize distribution,
with same mean - the distribution observed may not even follow a
known theoretical distribution - in nature many
samples have distributions with no theoretical
basis
5- data may not be interval scale
- The average doesnt mean anything here
average colour? - We might need to compare these to other similar
samples and look for differencescant use t-test - we still need to look for differences, in
some/all aspects of the distribution shape - would be useful to test if samples are the same
(from the same parent distribution), irrespective
of what that distribution is, and irrespective of
what the difference is between the two samples - distribution free test statistics
6Parametric and non-parametric statistics
- Parametric Statistics
- Independent random samples interval scale data
- values of interest are close to normally
distributed, i.e. the sampling distribution can
be reduced to a small (and known) number of
parameters mean, standard deviation, standard
error, etc. - parametric statistics are used to analyze such
data sets more powerful than non-parametric
tests - Non-Parametric (Distribution-free) Statistics
- Independent random samples nominal or ordinal
data scales (i.e. non-numerical) parameters
such as mean and standard deviation are
meaningless - Alternatively, non-normally-distributed interval
scale data (e.g. distribution of wages) - No assumption is made with regard to the
underlying distribution applicable in more
situations but less powerful than parametric
tests
7- Tests for normality of data
- Histogram shape Mean median mode
(approximately) - data proportions (from normal distributions
tables, z-scores) - More sophisticated tests, e.g. Shapiro-Wilks'
W-test - 4 choices for non-parametric data
- Apply parametric statistics anyway, hoping it
wont cause to much of a problem a lot of
evidence to suggest that parametric tests are
quite robust (i.e. insensitive to moderate
violations of the their assumptions) - Transform the data (e.g. log-transformation) then
use a parametric test often quite
straightforward and may work well, but many cases
where not possible - Use a non-parametric test less powerful but not
bound by strict requirements/assumptions work
for lower levels of data (i.e. ordinal and
norminal, as well as interval) power-efficiency
not so high as for parametric tests, therefore
more data needed to get equivalent level of
certainty - Apply both parametric and non-parametric tests
for the same data set can be useful if both
agree but a source of further concern if they
dont!
8Back to hypothesis testing.
- Average not necessarily meaningful
- Underlying distribution unknown
- Comparison of DISTRIBUTIONS rather than single
descriptors (e.g. mean, variance) - Often faced with nominal or ordinal data
What we need A distribution of expected
differences between multiple samples of any
data type, having any distribution and showing
any kind of difference (!)
9c2 tests for difference
- No assumptions made regarding underlying
population sampled - any observed distribution may be compared to any
other empirical or theoretical distribution - frequency data possible to compare data in
different measurement units (e.g. concentrations
of pollutants with species counts),ordinal data. - This test summarises any and all differences
between samples, as represented by their
frequency distributions differences in mean,
variance, skewness, etc. all summarised in a
single test statistic
10c2 distribution (Chi2)
Comparison of test statistic with critical
value from tables (ccalc and ccrit)
- c2 distribution is positively skewed, ranging
from 0 to infinity form depends on n (commonly
nk-1, where k number of classes the data have
been grouped in to). As k increases, the c2
distribution more closely resembles the Normal
distribution - The value of ccalc would be zero for two
identical samples larger values indicate great
difference between the samples (always 1-tailed
tests)
11Formal procedure (as for t-test)
- Formulate Null hypothesis (H0) and Alternative
hypothesis (H1) (always 1-tailed test) - Decide on level of significance, a
- Look-up critical value (threshold for rejecting
H0) - Calculate test statistic and compare to critical
value - If test statistic is beyond the critical value,
reject H0 at a - level of confidence equal to (100(1-a))
12Difference between observed and expected
- Observed values
- Frequency data - already organized in to classes
(e.g. days of week) - Interval (continuous) data, which must be put in
to classes (e.g. chemical concentrations, put in
to classes 0-1ppm, 1-2ppm, etc) - Ordinal data, again put in to classes
- Total number of observations is n, divided in to
a number of difference catagories/classes (k) - Expected values what we would expect if there
were no difference in the POPULATIONS - Uniform distribution (no dependence on class in
the population) 50 absent each week - Expected value for each class n / k (an
even spread over all classes) - Formally, Ei n / k
13Further considerations
- k must be chosen so that gt5 observations are
expected in each class ideally kgt10 (by
implication, ngt50 for expected uniform
distributions) - The boundaries of each class cannot overlap, but
can be adjusted freely and need not be of equal
size
Most likely to have this difference
Very unlikely to have near zero difference
Very unlikely to have Enormous difference
Total amount of difference
14Example one-sample c2 test for difference
- Are there any significance frequency differences
between each of the classes? - Are any of the tills significantly different in
terms of the number of flint pebbles they have in
them?
H0 there is no difference in frequencies between
any of the classes (same population
sampled, or at least one with same frequency
distribution) H1 there is a difference between
the classes (different pops)
c2calc
15n h 1 4 1 3
(obtained from c2 tables 90 confidence, a 0.1,
n 3)
c2calc6.5
- Reject H0 with 90 confidence a significant
difference has been observed between the observed
and expected frequency data only a 10 chance
that this observation results from chance
sampling doesnt tell us what/where the
differences are.
16Multiple samples and categories
- Simultaneously compare the frequency proportions
in many samples - Are there any difference in the proportion in any
of the categories? - We need to calculate expected frequencies that
are 'adjusted' to take account of (i) the size of
each sample, and (ii) the 'average' frequency in
each category.
17Multiple samples, multiple catagories
- k samples (Tills), h categories (lithologies)
18Multiple samples, multiple catagories
- k samples (Tills), h categories (lithologies)
19Test for non-uniform distribution
- Uniform expected distribution is not a
requirement for the expected values - E.g. we may want to test whether frequencies have
a particular non-uniform distribution e.g.
Normal distribution
20Example research questions
- Trees species in different forest plots
- H0 no difference between species distribution in
different plots - H1 there is a difference
- Unemployment in different age groups in different
cities - H0 no difference between unemployment patterns
in different cities - H1 there is a difference
- Ability to test the basic ideas that underpin
more general theories by testing specific
hypotheses
21Limitations of c2 test
- Data must be in the form of frequencies (can be
converted from interval) - Contingency table must have 2 or more categories
(columns) - Expected frequencies should be gt5 (20 can be lt5
if table larger than 2x2, but not lt1) - Samples assumed to be independent and randomly
chosen
22Other non-parametric tests for difference between
samples
- Two sample test Kolmogorov-Smirnov test,
D-statistic - Takes the place of chi-squared where sample sizes
are low - Requires both samples to have same n
- Mann-Whitney two-sample test, U-statistic
- Similar to 2 sample chi-squared and
Kolmogorov-Smirnov tests - Particularly sensitive to differences between
means (others are sensitive to any kind of
difference, e.g. dispersion or skewness) - Samples can have different n
- Most powerful (non-parametric) alternative to the
t-test - Multiple samples Kruskal-Wallis, H-statistic
- Difference between match-samples Wilcoxon,
T-statistic