Title: Numbers, statistics, and Biology
1 BIODIVERSITY and ENVIRONMENTAL
MANAGMENT
Dr Ole R. Vetaas, ole.vetaas_at_global.uib.
no UNIFOB - Global, University of Bergen,
www.global.uib.no Ecology and Environmental
Change Research Group, Dept. of Biology, UiB
www.eecrg.uib.no
2Numbers, statistics, and Biology
- WHY statistical analyses?
3SCIENCE BASED MANAGMENT
- The value of this degree you may obtain depend on
- GOOD THESIS
- Good research question
- Good research question relay on theory and
observations - Facts or factual observation
- Theory Facts linked in causal logical frame
work
4STRUCTURE OF A THESIS
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS text, tables, and figures
- DISCUSSION
- References
- Appendices, raw data
5STRUCTURE OF A THESIS as a basic introduction
to philosophy of science
- INTRODUCTION
- Other researchers observation
- Theory
- Models
- Deduction
- Hypotheses
- Aims to test or evaluate hypothesis about nature
6Hypothesis formulation
- Everything is connected to everything
- YES, But the strength is different
- From strong causal link to indifferent
- Science is to find the strongest connections, and
evaluate the plausible causations - If we know causal links we understand the system
better
7HYPOTHESIS
- Hypotheses are potential statements that answers
Research questions - Example
- Are there more birds in the mid-hills than in
tropical lowland? - Hypothesis there are more birds in the mid-hills
than in the tropical lowland . - Where is the theory?
8THEORY
9THEORY
- Habitat diversity or heterogeneity theory
- Heterogeneity in topography increase surface
- microclimate south-north-exposed slopes
- Create many habitats
- Mixture of forest and open meadows suitable for
birds - Model topography ?micro climate ? many habitats?
DEDUCE an HYPOTHESIS - Hypothesis MORE BIRDS in mid hills than FLAT
AREA - (H0 (Null hypothesis) no difference in bird
richness in due to habitat diversity)
10TESTING the HYPOTHESIS
- Hypothesis there are more birds in the mid-hills
than in the tropical lowland - Predication more birds in the mid-hills than in
the tropical lowland - Collection of data with a scientific method
(repeatable for others) - FIGURE B represents BIRDS richness along the
elevation gradient - Our hypothesis is falsified or rejected
- THIS GOOD!!?
Low High
11Hypothetic Deductive method.
- hypothesis should be deduced from a model
- model based on rational theory that describes a
certain phenomenon - possible to falsify these hypotheses
- That is they can be wrong
- If always correct no good hypothesis
12What is science
Carl Popper
- Science and non-science
- real science should be able to formulate testable
hypothesis - be possible to falsify these hypotheses
- falsification criteria !!
13History philosophy of science
- 1900 century
- hypothesis testing program
- Numerical methods aimed to test hypothesis
were developed by - Pearson (correlation )
- R. A. Fisher ( t-test)
- Carl Popper made the theoretical basis for
hypothesis testing which utilised the
development in the statistical science.
14STRUCTURE OF A THESIS
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS text, tables, and figures
- DISCUSSION
- References
- Appendices, raw data
15Filed study
- Several different causes
- Several interactions
- Feedback loops cause and effect reinforcing the
original cause - True Interdisciplinary work is a new paradigm
16Data that may indicate causation
- Isolate the potential causes
- Effect of Land use on Biodiversity
- Same slope inclination
- Same aspect
- Same type of soil
17STRUCTURE OF A THESIS
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS text, tables, and figures illustrating
statistical results - DISCUSSION
- References
- Appendices, raw data
18NUMERICALMEETHODS What is statistical analyses
- Statistics is a branch of mathematics, which is a
special language - This special language is international
- Statistical methods should be viewed as a tool
for analyses and presentations of data in a
standardised way
19Statistical expressions elucidate the results
for an international audience
- repeatable for other researchers
- the procedure is arguable
- the progressive scientific process.
- enhance the degree of objectivity in the analysis
20STRUCTURE OF A THESIS
- INTRODUCTION
- MATERIALS AND METHODS
- RESULTS text, tables, figures and graphs that
illustrate statistical results - DISCUSSION
- References
- Appendices, raw data
21Discussion
- What did I find
- This agrees with other finding
- Other researchers found contrasting results
- What is the reason for agreement and disagreement
- What is the main causal factor
- INTERPRETATION
- Ecological sound reasoning
- Statistical significant may be different from
biological significant (specie richness example)
22INTERPRETATION
- QUNTITATIVE METHOD DO NOT NECCESARILY GIVE CLEAR
CUT RESULTS - INTERPRETATION
- QULITATIVE EVALUATION OF PLAUSIBLE EXPLANTION
- YOU WILL ALSO LEARN QULITATIVE METHODS IN BERGEN
23Universities, faculties and tradition
Natural Sciences
FIELD SCIENCES BIOLOGY, GEOLOGY, GEOGRAPHY
Art Humanities
24The Scientific Field-method
- RESEARCH PROCEDURE IN MOST FIELD STUDIES
- 1. THEORHY state of art what do we know,
- 2. Research question
- 3. Hypothesis or hypotheses
- 4. Collect data in the field
- 5. Qualitative and quantitative analyses
confronting the hypothesis with the field-data - 5. Interpret the result and explain it for the
scientific community public
25Good results depend on
- Testable hypothesis,
- Good sampling design
- Reduced set of potential causal factors
26Numerical methods. aimed to test hypothesis
- accomplish certain assumptions in order to be a
valid test - difficult to fulfil when the hypothesis relates
to the processes in the real world, - field situation
27 THE DIFFERENCE BETWEEN
IDEAL AND REAL WORLD Â IDEAL WORLD
(LAB.) REAL WORLD ASSUMPTIONS IN NATURE
LANDSCAPE ----------------------------------------
------------------------------------------------ Â
ONE RESPONSE VARIABLE MANY INTERACTIVE
RESPONSE VARIABLES Â ONE OR A SET OF FEW
INDEPENDENT MANY INTERACTIVE EXPLANATORY
VARIABLES EXPLANATORY VARIABLES Â
   NORMAL DISTRIBUTION SKEWED DISTRIBUTION OF
VARIABLES AND ERROR OF VARIABLES AND
ERROR REPRESENTATIVE SAMPLE GEOGRAPHICAL
LIMITATION Â INDEPENDENT SAMPLES SPATIAL OR
TEMPORAL AUTOCORRELATED SAMPLES
DEPENDENT INDEPENDENT VARIABLES PHYLOGENNETICAL
RELATION-SHIPS DEPENDENT Â
28Statistics orginally made for experimental
situation
Most species in north or south slope
29A test of the null hypothesis by inferential
statistics is in a strict sense only valid if
- Ø There is only one causal factor that causes the
investigated changes in the response variable, or
there is a set of few independent causal factors
that cause the response.
30Statistics orginally made for experimental
situation
Test tubes with green alaga add a toxic element
to test effect
Add toxic element different doses
control
31Real world
Many factores influnce species richness
32Normal or gaussian dsitribution
- Ø The variables of interest have a normal
distribution, or the residual error after a
regression should have a normal distribution. - Biological variables may take many forms of
distribution, e.g. skewed or bimodal.
33NORMAL DISRTRIBUTION Â MISCONCEPTION REGARDING
THE EXPLANATORY VARIABLE Â EXPERIMENTAL DESIGNE
UNIFORM DISTRIBUTION Â RESPONSE VARIBELS HAVE TO
BE NORMAL IF WE ARE COMAPRING MEANS OF TWO OR
MORE POPULATIONS BY t-TEST OR ANOVA Â Â IN
CLASSICAL ANOVA AND REGRESSION RESIDUALS HAVE TO
BE NORMAL DISTRIBUTED Â Â GENERALIZED LINEAR
MODELS OR GENERALIZE ADDITIVE MOEDLS CAN COPE
WITH VARIOUS DISTRIBUTIONS Â
34Dry weight wheat Gr. Pr. m2
Mean growing season temperature
35ALWAYS CHECK RESIDUALS AFTER REGRESSION!!
Residuals after regression
Residuals after regression
Missing factor or wrong factor
ok
-2 -1 0 1 2
-2 -1 0 1 2
36IN CLASSICAL ANOVA AND REGRESSION RESIDUALS HAVE
TO BE NORMAL DISTRIBUTED Â Â GENERALIZED LINEAR
MODELS OR GENERALIZE ADDITIVE MOEDLS CAN COPE
WITH VARIOUS DISTRIBUTIONS
37NORAMLITY t-TEST DIFFERENC OF MEAN
FREQUNCY of plots with x number of species
MEAN NUMBER OF SPECIES
Equal variance
south
north
X NUMBER OF SPECIES
38NORAMLITY t-TEST DIFFERENC OF MEAN
FREQUNCY of plots with x number of species
MEAN NUMBER OF SPECIES
Non-Equal variance
south
north
X NUMBER OF SPECIES
39Representative sample
Ø The objects sampled are a representative sample
of the total population. Normally, a very small
fraction of the total population of the target
organism is sampled in biological field studies.
Thus in a strict sense the analyses will be site
conditional, and the result may not be inferred
to be valid for the total population.
40REPRESNTATIVE SAMPLE
- gtRandom sampling
- gtState How big is the total polulation one aims
to infer about  - ESTIMATE THE NICHE OF AN ORGANISME, WITH DATA
FROM A LIMITED REGION - CAN WE INFER THAT THE RESULT IS VALID FOR THE
SPECIES - Â
- HYPOTHEIS ABOUT THE TOTAL POPULATION, I.E. THE
SPECIESÂ
41Popperian falsification
The classical popperian falsification can not be
done without some ad-hoc adjustment of the
degrees of freedom  MAJOR PROBLEM IN CAUSAL
ANALYESE NOT IN DESCRIPTIVE ANALYSES BY MEANS OF
NUMERICAL METHODS Â SOFTWARE AND NUMERICAL OPTIONS
EXSIT TO COPE WITH THIS PROBLEM Â SAMPLING MOST
IMPORTANT MAKE A GRID ! Comparative biology
phylogenetic relationships among the objects
42SPATIAL AUTOCORRELATION DISTANCE DECAYÂ
All objects have a geographical distribution,
objects located close to each other are on
average more similar than those with a more
distant location, Â Â Hence the similarity among
objects are a function of their distance to each
other spatial dependency. Â Hence the
objects are not independet of each other  The
degrees of freedom is not equal to the number of
samples minus one (df n 1 is not
true) Â Thus the statistical p-value can not
be used to evaluate if the hypotehsis is
falsified or not. Â