Title: ANALYSIS OF BIOLOGICAL DATA BIOL4062/5062
1ANALYSIS OF BIOLOGICAL DATABIOL4062/5062
2- Introduction
- Assignments
- Tentative schedule
- Analysis of biological data
3Introduction
- Instructors
- Purpose of class
- Related classes
- Books
- Computer programs
http//myweb.dal.ca/hwhitehe/BIOL4062/handout4062
.htm
4- Instructor Hal Whitehead
- LSC3076 (Ph 3723 email hwhitehe_at_dal.ca)
- Best times 800-900 a.m.
- Teaching Assistant ?
- Other instructors
- Dr David Lusseau
5Why Analysis of Biological Data?
- Biologists
- increasingly using quantitative techniques
- to analyze larger and larger data sets
- need skills in data analysis
- especially in broad area of ecology
- BIOL4062/5062
- introduce techniques for analysis of biological
data - emphasis will be on the practical use and abuse
of techniques, not derivations or mathematical
formulae - in assignments students explore real and
realistic data sets
6Related classes
- Design of Biological Experiments (BIOL4061/5061)
- most useful for those who work with systems that
can be manipulated - Courses in Statistics
- more emphasis on mathematical sides
7Some books (on reserve)
- Legendre, L. and P. Legendre. Numerical Ecology
(2nd edition). Elsevier (1998) - Manly, B.F.J. Multivariate statistical methods a
primer (2nd edition). Chapman Hall (1994) - Other books
- Many, do not need to be right up to date
8Computer programs
- MINITAB
- SPSS
- SYSTAT
- SAS
- MATLAB (Statistics toolbox)
- S-plus
- R
9Computer programs
- MINITAB
- SPSS
- SYSTAT
- SAS
- MATLAB (Statistics toolbox)
- S-plus (freely available at Dal.?)
- R (freely available on the web)
- on GS.DAL.CA
- in Biology-Earth Sciences computer lab
10Assignments
- Type 1
- artificial data sets for trying different
techniques - Type 2
- real data set to try a real analysis
11Type 1 assignments
- Five assignments, sent by email (next few days)
- Each 10 final mark
- Artificial but realistic data sets
- Different data sets to each student, but
structurally similar - More analyses expected for graduate students
(BIOL5062) - Analyze using a computer statistical package
12Type 1 assignments
- Hand in a short write-up, explaining clearly
- what you did
- what you found
- what you think the results might mean
biologically - Beware of
- Rubbish!
- Check the results against patterns in the
original data to make sure they make sense. - Over-interpreting the results
- Not answering the questions posed
13Type 1 assignments
- Five assignments
- Multiple regression 10
- Log-linear models 10
- Principal components analysis 10
- Discriminant function analysis 10
- Cluster analysis, multidimensional scaling,
network analysis 10
14Type 2 assignment
- Find a biological data set, and then analyze it
- The analysis should not be
- part of past, present, or future Honours, MSc or
PhD thesis, or used for another class - self-plagiarism
- that, or repeat that, done by someone else
plagiarism
15Type 2 assignment
- The analysis can
- use same data as in thesis or another course, but
totally different analysis - use data collected by your supervisor, or someone
else, but you should ask them - use a data set that you find on the web, or
somewhere else, but you should check that it is
OK - be submitted for publication, but you must check
that you have all necessary permissions
16Type 2 assignment
- Minimum sizes of data set (ask Hal for exceptions
or in case of uncertainty) - For undergraduates (BIOL4062)
- gt50 units x gt3 variables
- For graduates (BIOL5062)
- gt50 units x gt5 variables
- either, two types of variables
- e.g. Dependent Independent Species
Environment - or, link two data sets with one at least as large
as the undergraduate data set - Must address at least 3 biological questions
(BIOL4062), or 4 questions (BIOL5062)
17Type 2 assignment (4 steps)
- a) Short meeting with Hal or to discuss your
proposed data set and proposed analysis feedback - bring draft of 2b assignment
- b) Description of data set and proposed analysis.
- where it came from
- its structure(s) (number of variables, units,
names of variables, types of variables, ...) - proposed biological questions
- proposed analytical methods
- possible problems
- Example on web
18Type 2 assignment (4 steps)
- c (i) Presentation of results to the class by
graduate students - biological questions being addressed
- brief description of the data set
- how you analyzed it
- conclusions
- Example in Class
- c(ii) Undergraduate students should go to
graduate presentations and will be tested on
general issues arising from them on last day
19Type 2 assignment (4 steps)
- d) Write-up of your analysis as for a scientific
journal paper - Max 5 pages (4062) or 7 pages (5062)
single-spaced - excluding references, tables, figures
- Explain biological question, methods in
sufficient detail for someone to replicate them,
problems, and biological conclusions - Show graphically, or in tables, the major effects
- Do not just present summaries of ordinations or
significance levels of hypotheses tests - Introduction and Discussion can be shorter and
less detailed than in published paper - sufficient to give a good feel for biological
issue being examined and the potential biological
significance of the results
Example on web
20Type 2 assignment
- Marks
- 2b Description of data set and proposed analysis
5 - 2c 15
- (i) Presentation of results by graduate students
(BIOL5062) - (ii) Test on general principles from graduate
student presentations (BIOL4062) - 2d Write-up of results 30
21Tentative schedule
22SYSTAT demo. at end of lectures
23Analysis of Biological Data
- Types of biological data
- History (very abbreviated!)
- The process of biological data analysis
- why garbage may come out
- Hypothesis testing and data analysis
- assumptions
- other issues
24Types of biological data
- Morphometric
- Community ecology
- organism distribution and environmental variation
- Genetic data for ecological and evolutionary
questions - Population data for management, conservation,
evolutionary questions - Behavioural, physiological, ...
25Development of biological data analysis
- gt1850 Displays
- gt1900 ANOVA's, regression, correlation
- without computers
- gt1930 Non-parametric methods
- gt1970 Multiple regression and multivariate
analysis - matrix algebra using computers
- gt1980 Robust methods bootstraps, jackknives,
permutations - need powerful computers
26(No Transcript)
27Garbage in gt Garbage out
- Good data Errors gt Garbage in gt Garbage out
- Check data entry
- Good data Errors in routine gt Garbage out
- Check results, run routines on data with known
answer, - run on 2 routines
- Good data Wrong model gt Garbage out
- Think about, read about and discuss model
28Hypothesis Testing Data Analysis
- Hypothesis
- Experimental Design
- Experiment
- Analysis
- Conclusion
- ANOVA, T-test
- Agriculture
- Experimental ecology
- Physiology
- Animal behaviour
- Data Collection
- Data Analysis
- Hypothesis
- scatter plots, box plots, most multivariate
analyses - Fisheries
- Community ecology
- Paleontology
29Some assumptions
- Normality
- can only be properly examined on large data sets
- mainly a problem on small ones
- an important issue for hypothesis testing
- normality desirable in data analysis
- Linearity
- makes hypothesis testing easier
- makes data analysis easier
- Independence
- major problem for hypothesis testing
- no problem, or advantage, in data analysis
30Other issues in data analysis
- Missing data
- Often present in ecological data
- Outliers
- What do we do with apparent outliers?
- Remove them?
- Multiple comparisons
- Major issue with hypothesis testing
- Not an issue with data analysis
- although Patterns appear in random data
31Next class
- Inference in ecology and evolution
- Null hypothesis statistical tests
- Effect size statistics
- Bayesian statistics
- Information theoretic model comparisons
32(No Transcript)
33Performance in BIOL4062/5062
- Graduate students (BIOL5062)
- some do well with rather little effort
- some do well with a lot of effort
- Undergraduate students (BIOL4062)
- most do well with some effort
- adequate statistical background
- some do poorly
- inadequate statistical background or effort