Title: Computational biology and computational biologists
1Computational biology and computational
biologists
- Tandy Warnow, UT-Austin
- Department of Computer Sciences
- Institute for Cellular and Molecular Biology
- Program in Evolution, Ecology, and Behavior
- Center for Computational Biology and
Bioinformatics
2Two computational biologists
- One computational biologist needs to know a lot
of biology - Another needs to know a lot of mathematics
3Another two computational biologists
- Craig Benham mathematics of stressed DNA
(understanding regulation) - Gene Myers whole genome sequencing and BLAST
4Two different types of computational biologists
- One works on mathematical or computational
problems (derived from biology) that are well
posed, and are hard to solve -- these need
significant computer science/math/statistics - One works on biological problems that are not
well posed, and where the computer
science/math/statistics needed may be easier - Both can be problems that are important to
biologists, and which they cannot solve without
computational biologists involvement
5My view of Pasteurs Quadrant
Hard math
Easy math
Easily applicable
Not applicable
6My view of Pasteurs Quadrant
Hard math
What computational scientists want
Easy math
Easily applicable
Not applicable
7My view of Pasteurs Quadrant
Hard math
What computational scientists want
What computational scientists do
Easy math
Easily applicable
Not applicable
8My view of Pasteurs Quadrant
Hard math
What computational scientists want
What computational scientists do
What biologists want
Easy math
Easily applicable
Not applicable
9Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
10DNA Sequence Evolution
11Molecular Systematics
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
V
W
12Computational challenges for Assembling the Tree
of Life
- 8 million species for the Tree of Life -- cannot
currently analyze more than a few hundred (and
even this can take years) - We need new methods for inferring large
phylogenies - hard optimization problems! - We need new software for visualizing large trees
- We need new database technology
- Not all phylogenies are trees, so we need methods
for inferring phylogenetic networks
13Time is a bottleneck for MP and ML
- Systematists tend to prefer trees with the
optimal maximum parsimony score or optimal
maximum likelihood score however, both problems
are hard to solve - (Our experimental studies show that polynomial
time methods do not do as well as MP or ML
heuristics, when trees are big and have high
rates of evolution)
Local optimum
MP score
Global optimum
Phylogenetic trees
14MP/ML heuristics
Fake study
Performance of hill-climbing heuristic
MP score of best trees
Time
15DCM-boosting Speeding up MP/ML heuristics
Fake study
Performance of hill-climbing heuristic
MP score of best trees
Desired Performance
Time
16Characteristics
- The research can be published in
mathematics/statistics/computer science journals
and conferences, and evaluated along these lines - These people can be faculty in Math/Statistics/Com
puter Science departments, and maybe in some
biology departments - Substantive improvements are hard, but if
achieved will have enormous impact on many
biologists - Why? These are old problems, endorsed by
biologists, of a computational nature.
17The other type
- Deals with problems like protein fold
prediction, inferring metabolic or regulatory
networks, finding genes within genomes, or even
computing a good multiple sequence alignment - Needs to know a lot of biology to pose
appropriate computational problems - Resultant algorithms may not (in some cases) make
for interesting or publishable mathematics - Note generally new problems because of new data
18Whats needed (for all types)
- Ability to collaborate with a variety of people,
and learn what they want to achieve - Ability to be flexible in terms of how one
evaluates research results (e.g., real vs.
simulated data, theory versus experiment) - Ability to communicate research results to
different types of researchers - Ability to use a variety of techniques to solve
biological problems - Ability to model and pose appropriate
computational approaches for biological problems
19Difficult questions
- What departments should have computational
biologists (especially of the second type)? - Should there be departments of computational
biology? - Should there be PhD programs in computational
biology? - How to evaluate a computational biologist of
either type?
20Some issues for academic computational biologists
- Journal versus conference papers, and number of
each - Experimental/empirical versus theoretical work
- Software versus papers
- Authorship order within publications
- Promotion and Tenure in two departments?
- Biggest issue How to predict future success???