Title: HYPOTHESIS
1HYPOTHESIS TESTING
All of the figures in this PowerPoint are
from Statistics without Math Magnusson Mourao
2The concept of accepting a hypothesis through the
rejection of a null hypothesis is largely
credited to Karl Popper. A null hypothesis is a
statement about how the world would be if our
conjecture is wrong. Difference in means
Observed is 3.8-7.7 -3.9 Null is 7.7-7.0
0.7 Enough to reject the null hypothesis?
3We can assure ourselves that the difference
between our observed distribution and that of the
null is not accidental by generating a large
number of predictions based on the null
hypothesis.
4We could repeat this exercise a lot (say 100)
times, then determine the percentage of predicted
outcomes that have, in this case, a difference
between the means as big or bigger than our
observed value of -3.9. If these are fewer than
5, we reject the null hypothesis. William S.
Gossett standardized this process by dividing the
difference in means by the s.d. to derive the t
statistic (Student, 1908), now known as Students
t-test.
5It is conventional to represent the distributions
of data horizontally, rather than vertically.
6Assuming the data are normally distributed, we
can use the theoretical characteristics of the
distribution as the basis for testing the
differences between the means.
7This is, basically, what mathematicians/statistici
ans do. They dont physically sample their null
populations.
8Type I Error falsely rejecting the null
hypothesis and deciding that a phenomenon exists
when it does not. Type II Error accepting the
null hypothesis when it is false. Generally
inversely proportional to the probability of
making a Type I error. We avoid Type I errors by
setting bar high for rejecting the null
hypothesis (lt 5). This is very important to the
progress of science we dont want to build
knowledge based on faulty previous work. The
ability of a statistical test to reject the null
hypothesis when it is indeed false is called the
power of the test.
9Now lets look at an example where we compare
more than 2 groups. Streams with carnivorous
fish, without fish and with herbivorous
fish. When we do pair-wise comparisons using
t-tests, we compound Type I errors. Sir Ronald
Fisher used the comparison of variances.
10If the variability (e.g. range) is similar in
each category and the means differ, then the
total variability will be greater than the
variability within any one category. Vi lt VT
and VR VT / Vi lt 1 With no difference among
means, Vi VT and VR 1
Fisher used the ratio of the variances, now known
as the F-statistic
11Computer programs create the null distribution
based on the mean variability (variance). If the
variance is grossly dissimilar among groups, the
null variance will be underestimated and we may
be led to commit a Type I error in our Fishers
test (aka analysis of variance or ANOVA). It is
very important to visually inspect ones data
(and test for similarity of variances e.g.
Levenes test).
12The variability between the means of the two
groups is due to the factor (in this case fish).
The difference between this and the total
variability is the residual variability (not
attributed to any particular cause). When we do
this with variance, its called partitioning
variance.
13F is the ratio of the factor mean square to the
residual mean square. Mean squares are the sums
of squares divided by the df, and are analogous
to the variances. Less a few constants F
(d2Factor d2Residual) / (d2Residual) When
variance due to the factor is zero (null
hypothesis is correct), F 1.
14You can run into problems when you categorize a
continuous phenomenon. Plotting the results of
the narrow sampling regime at two temperatures
15John estimates a probability of 0.78 that the
null hypothesis is correct. Mary estimates a
probability of 0.035 that the null hypothesis is
correct, so she rejects the null
hypothesis. VResidual VFactor VLevels
VTotal
16Plotting the results of the wide sampling
regime at two temperatures
17John now rejects the null hypothesis (P 0.013),
and Mary accepts the null hypothesis (P
0.22). VResidual VFactor VLevels VWidth
VTotal
18We might expect a direct relationship between the
number of trees and the area of a given reserve.
19Here, our plot shows how closely the data conform
to our hypothetical relationship (Y a bX,
where a 0 and b 1).
20With insect activity and temperature, we may not
know the true relationship and we draw a line
that represents what the relationship may be
based upon the distribution of the data.
21We can fit the line to the data based on
minimizing the distance of the points to the line
(A), using the minimum area of the triangles
formed by the horizontal and vertical lines from
the points to the lines (B), or by minimizing the
sum of the squared vertical distances of the
points to the line (least squares regression)(C).
22The same basic concepts apply to regression
analysis as in ANOVA.
23If the residual variation approaches the total
variation, we assume that there is no effect of
the measured variable, i.e. VFactor 0 (null
hypothesis is correct).
24If we had studied a smaller temperature range, we
decrease the variation due to the factor, while
maintaining the residual variation and we lose
our ability to detect an effect.
25trees
monkeys
26Vertical lines represent the variability not
explained by the linear model. We can use these
residuals to calculate the partial regressions.
27Monkeys -0.667 X trees Monkeys 1.667 X
shrubs Monkeys 0.33 (-0.667 X trees)
(1.667 X shrubs) equation for the multiple
regression
28THINKING ABOUT YOUR DATA, BIOLOGICALLY
29Computer-Generated Phantom Variables
In this theoretical example (in which the data
were randomly generated), a graduate student is
appalled that none of the factors his advisor
suggested are significant.
30So, he tests for all possible interactions. He
finds a significant interaction (p0.001) between
snags and herbivorous fish. There are numerous
biological explanations for an interaction of
snags and herbivorous fish on crayfish density to
allow for an in-depth discussion in the
thesis. The problem is, this example is base on
random data. With 25 possible effects/interaction
s in an ANOVA, we would expect one to be
significant at the 0.05 level by chance.
31Data for 30 lakes Pollution heavy metal conc.
in ppb Fish mean per gill-net per
h Phytoplankton chlorophyll conc. Crayfish
per trap-hour Problem is that the data are all
on different scales. Divide each by s.d. of that
variable, effectively putting all in units of
s.d. e.g. an increase in one s.d. of pollution
leads to a decrease of so many s.d.s in the of
crayfish. Called standardized estimates of
parameters. Can use these standardized
coefficients in path analysis.
32Run multiple regression on standardized
variables Crayfish 0.0 - 0.16 x pollution
0.39 x fish 0.55 x phytoplankton The effect of
pollution is not significant (p0.53), that of
fish is questionable, (p0.07), there is a
strong effect of phytoplankton (p0.01). This is
counterintuitive, because a simple regression
indicates a significant (p0.03) positive effect
of pollution on crayfish (slope 0.41). This is
because Fig. 10.2 only shows direct effects and
doesnt represent the system biologically.
33Need to look at indirect effects as well. Get
standardized regressions for each direct effect
by simple regression.
Calculate indirect effects by multiplying path
coefficients along paths
To get the overall effect of pollution, we add
the direct and indirect effects (-0.16)(0.26)(0
.31) 0.41, which is the simple regression value.
34Predicting the mass of a tree from its diameter
is not a linear function, but may conform to a
power function of the form Biomass a x
Diameterb e1
35Ignoring the error term, if we take the logs of
both sides, we transform the equation into a
linear one that can be treated with ordinary
least-squares methods log10(biomass) log10(a)
b log10(diameter) e2 log10(biomass) -0.775
2.778 log10(diameter)
36Take antilog of the equation to yield Biomass
0.168 x Diameter2.778