ANALYSIS OF BIOLOGICAL DATA BIOL4062/5062 - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

ANALYSIS OF BIOLOGICAL DATA BIOL4062/5062

Description:

to analyze larger and larger data sets. need skills in data analysis ... Paleontology. Some assumptions. Normality. can only be properly examined on large data sets ... – PowerPoint PPT presentation

Number of Views:638
Avg rating:3.0/5.0
Slides: 34
Provided by: HalWhi9
Category:

less

Transcript and Presenter's Notes

Title: ANALYSIS OF BIOLOGICAL DATA BIOL4062/5062


1
ANALYSIS OF BIOLOGICAL DATABIOL4062/5062
  • Hal Whitehead

2
  • Introduction
  • Assignments
  • Tentative schedule
  • Analysis of biological data

3
Introduction
  • Instructors
  • Purpose of class
  • Related classes
  • Books
  • Computer programs

http//myweb.dal.ca/hwhitehe/BIOL4062/handout4062
.htm
4
  • Instructor Hal Whitehead
  • LSC3076 (Ph 3723 email hwhitehe_at_dal.ca)
  • Best times 800-900 a.m.
  • Teaching Assistant ?
  • Other instructors
  • Dr David Lusseau

5
Why Analysis of Biological Data?
  • Biologists
  • increasingly using quantitative techniques
  • to analyze larger and larger data sets
  • need skills in data analysis
  • especially in broad area of ecology
  • BIOL4062/5062
  • introduce techniques for analysis of biological
    data
  • emphasis will be on the practical use and abuse
    of techniques, not derivations or mathematical
    formulae
  • in assignments students explore real and
    realistic data sets

6
Related classes
  • Design of Biological Experiments (BIOL4061/5061)
  • most useful for those who work with systems that
    can be manipulated
  • Courses in Statistics
  • more emphasis on mathematical sides

7
Some books (on reserve)
  • Legendre, L. and P. Legendre. Numerical Ecology
    (2nd edition). Elsevier (1998)
  • Manly, B.F.J. Multivariate statistical methods a
    primer (2nd edition). Chapman Hall (1994)
  • Other books
  • Many, do not need to be right up to date

8
Computer programs
  • MINITAB
  • SPSS
  • SYSTAT
  • SAS
  • MATLAB (Statistics toolbox)
  • S-plus
  • R

9
Computer programs
  • MINITAB
  • SPSS
  • SYSTAT
  • SAS
  • MATLAB (Statistics toolbox)
  • S-plus (freely available at Dal.?)
  • R (freely available on the web)
  • on GS.DAL.CA
  • in Biology-Earth Sciences computer lab

10
Assignments
  • Type 1
  • artificial data sets for trying different
    techniques
  • Type 2
  • real data set to try a real analysis

11
Type 1 assignments
  • Five assignments, sent by email (next few days)
  • Each 10 final mark
  • Artificial but realistic data sets
  • Different data sets to each student, but
    structurally similar
  • More analyses expected for graduate students
    (BIOL5062)
  • Analyze using a computer statistical package

12
Type 1 assignments
  • Hand in a short write-up, explaining clearly
  • what you did
  • what you found
  • what you think the results might mean
    biologically
  • Beware of
  • Rubbish!
  • Check the results against patterns in the
    original data to make sure they make sense.
  • Over-interpreting the results
  • Not answering the questions posed

13
Type 1 assignments
  • Five assignments
  • Multiple regression 10
  • Log-linear models 10
  • Principal components analysis 10
  • Discriminant function analysis 10
  • Cluster analysis, multidimensional scaling,
    network analysis 10

14
Type 2 assignment
  • Find a biological data set, and then analyze it
  • The analysis should not be
  • part of past, present, or future Honours, MSc or
    PhD thesis, or used for another class
  • self-plagiarism
  • that, or repeat that, done by someone else
    plagiarism

15
Type 2 assignment
  • The analysis can
  • use same data as in thesis or another course, but
    totally different analysis
  • use data collected by your supervisor, or someone
    else, but you should ask them
  • use a data set that you find on the web, or
    somewhere else, but you should check that it is
    OK
  • be submitted for publication, but you must check
    that you have all necessary permissions

16
Type 2 assignment
  • Minimum sizes of data set (ask Hal for exceptions
    or in case of uncertainty)
  • For undergraduates (BIOL4062)
  • gt50 units x gt3 variables
  • For graduates (BIOL5062)
  • gt50 units x gt5 variables
  • either, two types of variables
  • e.g. Dependent Independent Species
    Environment
  • or, link two data sets with one at least as large
    as the undergraduate data set
  • Must address at least 3 biological questions
    (BIOL4062), or 4 questions (BIOL5062)

17
Type 2 assignment (4 steps)
  • a) Short meeting with Hal or to discuss your
    proposed data set and proposed analysis feedback
  • bring draft of 2b assignment
  • b) Description of data set and proposed analysis.
  • where it came from
  • its structure(s) (number of variables, units,
    names of variables, types of variables, ...)
  • proposed biological questions
  • proposed analytical methods
  • possible problems
  • Example on web

18
Type 2 assignment (4 steps)
  • c (i) Presentation of results to the class by
    graduate students
  • biological questions being addressed
  • brief description of the data set
  • how you analyzed it
  • conclusions
  • Example in Class
  • c(ii) Undergraduate students should go to
    graduate presentations and will be tested on
    general issues arising from them on last day

19
Type 2 assignment (4 steps)
  • d) Write-up of your analysis as for a scientific
    journal paper
  • Max 5 pages (4062) or 7 pages (5062)
    single-spaced
  • excluding references, tables, figures
  • Explain biological question, methods in
    sufficient detail for someone to replicate them,
    problems, and biological conclusions
  • Show graphically, or in tables, the major effects
  • Do not just present summaries of ordinations or
    significance levels of hypotheses tests
  • Introduction and Discussion can be shorter and
    less detailed than in published paper
  • sufficient to give a good feel for biological
    issue being examined and the potential biological
    significance of the results

Example on web
20
Type 2 assignment
  • Marks
  • 2b Description of data set and proposed analysis
    5
  • 2c 15
  • (i) Presentation of results by graduate students
    (BIOL5062)
  • (ii) Test on general principles from graduate
    student presentations (BIOL4062)
  • 2d Write-up of results 30

21
Tentative schedule
22
SYSTAT demo. at end of lectures
23
Analysis of Biological Data
  • Types of biological data
  • History (very abbreviated!)
  • The process of biological data analysis
  • why garbage may come out
  • Hypothesis testing and data analysis
  • assumptions
  • other issues

24
Types of biological data
  • Morphometric
  • Community ecology
  • organism distribution and environmental variation
  • Genetic data for ecological and evolutionary
    questions
  • Population data for management, conservation,
    evolutionary questions
  • Behavioural, physiological, ...

25
Development of biological data analysis
  • gt1850 Displays
  • gt1900 ANOVA's, regression, correlation
  • without computers
  • gt1930 Non-parametric methods
  • gt1970 Multiple regression and multivariate
    analysis
  • matrix algebra using computers
  • gt1980 Robust methods bootstraps, jackknives,
    permutations
  • need powerful computers

26
(No Transcript)
27
Garbage in gt Garbage out
  • Good data Errors gt Garbage in gt Garbage out
  • Check data entry
  • Good data Errors in routine gt Garbage out
  • Check results, run routines on data with known
    answer,
  • run on 2 routines
  • Good data Wrong model gt Garbage out
  • Think about, read about and discuss model

28
Hypothesis Testing Data Analysis
  • Hypothesis
  • Experimental Design
  • Experiment
  • Analysis
  • Conclusion
  • ANOVA, T-test
  • Agriculture
  • Experimental ecology
  • Physiology
  • Animal behaviour
  • Data Collection
  • Data Analysis
  • Hypothesis
  • scatter plots, box plots, most multivariate
    analyses
  • Fisheries
  • Community ecology
  • Paleontology

29
Some assumptions
  • Normality
  • can only be properly examined on large data sets
  • mainly a problem on small ones
  • an important issue for hypothesis testing
  • normality desirable in data analysis
  • Linearity
  • makes hypothesis testing easier
  • makes data analysis easier
  • Independence
  • major problem for hypothesis testing
  • no problem, or advantage, in data analysis

30
Other issues in data analysis
  • Missing data
  • Often present in ecological data
  • Outliers
  • What do we do with apparent outliers?
  • Remove them?
  • Multiple comparisons
  • Major issue with hypothesis testing
  • Not an issue with data analysis
  • although Patterns appear in random data

31
Next class
  • Inference in ecology and evolution
  • Null hypothesis statistical tests
  • Effect size statistics
  • Bayesian statistics
  • Information theoretic model comparisons

32
(No Transcript)
33
Performance in BIOL4062/5062
  • Graduate students (BIOL5062)
  • some do well with rather little effort
  • some do well with a lot of effort
  • Undergraduate students (BIOL4062)
  • most do well with some effort
  • adequate statistical background
  • some do poorly
  • inadequate statistical background or effort
Write a Comment
User Comments (0)
About PowerShow.com