Title: Masters course Bioinformatics Data Analysis and Tools
1Masters courseBioinformatics Data Analysis
and Tools
- Lecture 1 Introduction
- Centre for Integrative Bioinformatics
- FEW/FALW
- heringa_at_cs.vu.nl
2Course objectives
- There are two extremes in bioinformatics work
- Tool users (biologists) know how to press the
buttons and the biology but have no clue what
happens inside the program - Tool shapers (informaticians) know the
algorithms and how the tool works but have no
clue about the biology - Both extremes are dangerous, need a breed that
can do both
3At the end of this course
- You will have seen a couple of algorithmic
examples - You will have got an overview about the methods
used in the field - You will have a firm basis of the physics and
thermodynamics behind a lot of processes and
methods - You will have an idea of and some experience as
to what it takes to shape a bioinformatics tool
4Bioinformatics
Studying informatic processes in biological
systems (Hogeweg)
Information technology applied to the management
and analysis of biological data (Attwood and
Parry-Smith)
Applying algorithms with mathematical formalisms
in biology (genomics) -- USA
5This course
- General theory of crucial algorithms (GA, NN,
HMM, etc..) - Method examples
- Research projects within own group
- Repeats
- Contact alignment
- Domain boundary prediction
- Physical basis of biological processes and tools
6Bioinformatics
Bioinformatics
Large - external (integrative) Science Human
Planetary Science Cultural Anthropology
Population Biology Sociology
Sociobiology Psychology Systems
Biology Biology Medicine
Molecular Biology
Chemistry Physics Small
internal (individual)
7Genomic Data Sources
- DNA/protein sequence
- Expression (microarray)
- Proteome (xray, NMR,
- mass spectrometry)
- Metabolome
- Physiome (spatial,
- temporal)
Integrative bioinformatics
8Protein structural data explosion
Protein Data Bank (PDB) 14500 Structures (6
March 2001) 10900 x-ray crystallography, 1810
NMR, 278 theoretical models, others...
9Algorithms in bioinformatics
- string algorithms
- dynamic programming
- machine learning (NN, k-NN, SVM, GA, ..)
- Markov chain models
- hidden Markov models
- Markov Chain Monte Carlo (MCMC) algorithms
- stochastic context free grammars
- EM algorithms
- Gibbs sampling
- clustering
- tree algorithms
- text analysis
- hybrid/combinatorial techniques and more
10Integrative bioinformatics _at_ VU
- Studying informational processes at biological
system level - From gene sequence to intercellular processes
- Computers necessary
- We have biology, statistics, computational
intelligence (AI), HTC, .. - VUMC microarray facility
- Enabling technology new glue to integrate
- New integrative algorithms
- Goals understanding cells in terms of genomes,
fighting disease (VUMC)
11Bioinformatics _at_ VU
- Progression
- DNA gene prediction, predicting regulatory
elements - mRNA expression
- Proteins docking, domain prediction
- Metabolic pathways metabolic control
- Cell-cell communication
12(No Transcript)
13Bioinformatics _at_ VU
- Qualitative challenges
- High quality alignments (alternative splicing)
- In-silico structural genomics
- In-silico functional genomics reliable
annotation - Protein-protein interactions.
- Metabolic pathways assign the edges in the
networks - Cell-cell communication find membrane associated
components - New algorithms
14Bioinformatics _at_ VU
- Quantitative challenges
- Understanding mRNA expression levels
- Understanding resulting protein activity
- Time dependencies
- Spatial constraints, compartmentalisation
- Are classical differential equation models
adequate or do we need more individual modeling
(e.g macromolecular crowding and activity at
oligomolecular level)? - Metabolic pathways calculate fluxes through time
- Cell-cell communication tissues, hormones,
innervations
Need complete experimental data for good
biological model system to learn to integrate
15Bioinformatics _at_ VU
- VUMC
- Neuropeptide addiction
- Oncogenes disease patterns
- Reumatic diseases
16Bioinformatics _at_ VU
- Quantitative challenges
- How much protein produced from single gene?
- What time dependencies?
- What spatial constraints (compartmentalisation)?
- Metabolic pathways assign the edges in the
networks - Cell-cell communication find membrane associated
components
17Integrative bioinformatics
- Integrate data sources
- Integrate methods
- Integrate data through method integration
(biological model)
18Integrative bioinformaticsData integration
Algorithm
Data
tool
Biological Interpretation (model)
19Integrative bioinformaticsData integration
Data 1
Data 2
Data 3
20Integrative bioinformaticsData integration
Data 1
Data 2
Data 3
Algorithm 1
Algorithm 2
Algorithm 3
tool
Biological Interpretation (model) 1
Biological Interpretation (model) 2
Biological Interpretation (model) 3
21Bioinformatics
- Nothing in Biology makes sense except in the
light of evolution (Theodosius Dobzhansky
(1900-1975)) - Nothing in Bioinformatics makes sense except in
the light of Biology