Title: Hypothesis Tests
1Hypothesis Tests
- We collect data in order to learn about the
processes and systems those data represent.
Often we have prior ideas (hypotheses), of how
the system behaves. - Statistical tests are the most quantitative ways
to determine whether hypotheses can be
substantiated, or whether they must be modified
or rejected outright. - One important use of hypothesis test is to
evaluate and compare groups of data. E.g.
comparing water quality between 2 or more
aquifers, and making statements as to which are
different.
2- Hypothesis tests have at least two advantages
over educated opinions - 1. They insure that every analyst of a data set
using the same methods will arrive at the same
result. Computation can be checked on and agreed
to by others. - 2. They represent a measure of the strength of
the evidence (the p-value). The decision to
reject an hypothesis is augmented by the risk (?)
of that decision being incorrect.
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8Classification of Hypothesis Tests
- Hypothesis tests can be classified into five
major types based on the measurement scales of
the data being tested. - Within these types, the distributional shape of
the data determine which of two major divisions
of hypothesis tests, parametric or nonparametric,
are appropriate for use. - i.e. type of data objectives ? test
procedure to use
9Classification Based on Measurement Scales
- Y axis response variable or dependant variable.
- X axis explanatory variable - explains why and
how the response variable changes. - Measurement scales can be either continuous or
categorical. - e.g. of continuous variables concentration,
stream flow, porosity, temperature, etc. - e.g. of categorical variables aquifer type,
month, well, land use, group, station number,
etc. - Categorical variables used as response variable
include above/below a detection limit (0 or 1),
presence or absence of a particular species, etc.
10(No Transcript)
11(No Transcript)
12Classification Based on Data Distribution
- Parametric tests - assumes a particular
distribution, usually normal. - Nonparametric tests - distribution free (any
distribution including censored data). - Parametric tests are only more efficient if the
data truly follow the assumed distribution. If
they do not, the resulting test can reach
incorrect conclusion because it lacks the power
to detect real effects.
13- NP tests are only 5 to 15 less efficient than
the parametric procedures if the data are truly
normal. The NP tests are far more efficient (2
to 3 times more efficient) if the data are
skewed. - Therefore in general, for environmental data, NP
tests should be the default. This is because
environmental data are usually skewed, censored,
has outliers, etc. - Also NP tests are invariant to data
transformation. E.g. logs of data or original
values will give the same results.
14Versions of Nonparametric Tests
- i Exact test
- The p-values computed are exactly correct.
Usually used for small sample sizes. - ii Large sample approximation test
- Approximate p-values are obtained by assuming
that the distribution of the test statistic (not
the data) can be approximated by some common
distribution, e.g. normal. - iii Rank transformation test
- Use parametric procedures on the ranks of the
data instead of the data themselves. This method
is useful if the computer software do not have
nonparametric procedures, or when no theoretical
nonparametric procedures are available, e.g.
multi-factor ANOVA.
15Structure of Hypothesis Tests
- Hypothesis tests are performed following the
structure below. -
- 1. Choose the appropriate test.
- 2. Establish the null and alternate hypotheses.
- 3. Decide on an acceptable error rate ?.
- 4. Compute the test statistic for the data.
- 5. Compute the p-value.
- 6. Reject the null hypothesis if p ? ?.
16Step 1 Choose the Appropriate Test
- Test chosen based on measurement scales of the
data and objective of the test, and distribution
of the data. - e.g. testing differences in central values of two
or more groups of data with continuous response
variables. - Here, either the parametric t-test (only of the
data are normally distributed and have equal
variances), or the nonparametric rank-sum test
might be selected.
17- PARAMETRIC NONPARAMETRIC RANK TRANSFORM
- exact approximation
- Two Independent Data Groups
- two-sample t test rank sum test
t-test on ranks - or Mann-Whitney
- or Wilcoxon-Mann-Whitney
- Matched Pairs of Data
- paired t-test (Wilcoxon) t-test on
signed ranks signed-rank test - More than Two Independent Data Groups
- 1-way Kruskal-Wallis test 1-way
ANOVA on ranks - Analysis of
- Variance
- (ANOVA)
Guide to classification of some hypothesis tests
18- More than Two Dependent Variables
- Two-way Friedman Test 2-way ANOVA on
ranks - ANOVA
- Multi-way ANOVA Multi-way ANOVA on ranks
- Correlation between Two Continuous Variables
- Pearsons r Kendalls tau Spearmans rho
- or linear correlation (Pearsons r on
ranks) -
- Relation between Two Continuous Variables
- Linear Regression Mann-Kendall regression
on ranks - test for slope 0 test for slope 0 test
for monotonic change - Guide to classification of some hypothesis tests
19Step 2 Establish the Null and Alternate
Hypotheses
- This should be established prior to collecting
the data. - The null hypothesis (Ho) is what is assumed to be
true about the system under study prior to data
collection, until indicated otherwise. - The alternate hypothesis (Ha or H1) is the
situation anticipated to be true if the evidence
(the data) show that the null hypothesis is
unlikely.
20Types of Hypothesis Tests
- Two-sided tests occur when evidence on either
direction from the null hypothesis (larger or
smaller, positive or negative) would cause the
null hypothesis to be rejected in favour of the
alternative hypothesis. Is there a change or is
there any difference? - One-sided tests occur when departures in only one
direction from the null hypothesis would cause
the null hypothesis to be rejected in favour of
the alternative hypothesis. Is there an increase,
or is there a decrease? - If it cannot be stated prior to looking at any
data that departures for Ho in only one direction
are of interest, a two-sided test should be
performed.
21- Two-sided tests are more common when dealing with
environmental problems, however, examples where
one-sided tests would be appropriate include - 1. testing for decreased annual floods or
downstream sediment loads after completion of a
flood control dam, - 2. testing for decreased nutrient loads or
concentrations due to a new sewage treatment
plant or best management practice, - 3. testing for an increase in concentration when
comparing a suspected contaminated site to an
upstream or upgradient control site. E.g. Has
barium concentrations gone up after the GBS was
in operation?
22Step 3 Decide on an Acceptable Error Rate ?
- The ?-value, or significance level, is the
probability of incorrectly rejecting the null
hypothesis (rejecting Ho when it is in fact true,
called a Type I error). Traditionally, ? 5,
but other values such as 20 or 10 can be used.
- This is only one of four possible outcomes of an
hypothesis test.
23- Since ? represents one type of error, why not
keep it as small as possible? One way to do this
would be to never reject Ho - ? would then equal
zero. This is like throwing away the batteries
in a fire alarm.
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Step 4 Compute the Test Statistic From the Data
- Once a decision is made as to an acceptable Type
I risk ?, two steps can be taken to concurrently
reduce the risk of Type II error - 1. Increase the sample size n.
- 2. Use the test procedure with the greatest
power for the type of data being analyzed. i.e.
choice of parametric vs nonparametric tests is
very important. Power loss in parametric tests
increases as skewness and the number of outliers
increase.
- Test statistics summarize the information
contained in the data. If the test statistic is
not unusually different from what is expected to
occur if the null hypothesis is true, the null
hypothesis is not rejected. - If the test statistic is a value unlikely to
occur when Ho is true, the null hypothesis is
rejected. The p-value measures how unlikely the
test statistic is when Ho is true.
34Step 5 Compute the p-Value
- The p-value is the probability of obtaining the
computed test statistic, or one even less likely,
when the null hypothesis is true. The smaller
the p-value, the less likely is the observed test
statistic when Ho is true, and the stronger the
evidence for rejection of the null hypothesis.
- The p-value is also called the attained
significance level, the significance level
attained by the data. - The ?-level does not depend on the data, but
states the risk of making a Type I error that is
acceptable a priori to the scientist. The
?-value is the critical value which allows a
yes/no decision to be made. - The p-value provides more information - the
strength of the scientific evidence (e.g. p
0.049 vs. p 0.0001). Reporting the p-value
allows someone with a different risk tolerance
(different ?) to make their own yes/no decision.
35Step 6 Make the Decision to Reject Ho or Not
- Reject Ho when p-value lt ?-level
- When the p-value is greater than ?, Ho is not
rejected. The null hypothesis is never
accepted, or proven to be true. It is assumed
to be true until proven otherwise, and isnot
rejected when there is insufficient evidence to
do so. - Similar to court case - not guilty is not the
same as innocent! Just dont have enough
evidence to prove guilt (O.J. Simpson case)
36Example of Hypothesis Testing
- Testing for Normality of a Data Set
- Ho The data are normally distributed.
- Ha The data are not normally distributed.
- Test statistic r, correlation coefficient
between the data and their normal quantiles. - r is tested to see if it is significantly less
than 1.0.
37- Reject Ho when r lt r at given ?-level,
- or when p-value lt ?-level.
- Use of a larger ?-level will increase the power
to detect non-normality. - This is recommended when testing for normality
especially for small sample sizes.
38Summary
- 1. What are hypotheses tests?
- 2. Structure of a hypothesis test - 6 steps
procedure. - a. Objective data distribution measurement
scale type of test to use. - b. Set up null and alternate hypotheses.
- c. Decide on error rate ?.
- d. Compute test statistic.
- e. Compute p-value.
- f. Decision p-value lt ?-level --- reject
null hypothesis. - 3. Power of the test - nonparametric is more
powerful if data are not truly normal. More on
this later.