Title: Challenges for computer science as a part of Systems Biology
1Challenges for computer scienceas a part of
Systems Biology
- Benno SchwikowskiInstitute for Systems
BiologySeattle, WA
2Towards integrative models
- Proteininteraction
- Interaction partner
- Direct/indirect- Affinity
- Effect
- DNA
- Sequence
- Genomic locus
- Domain content
- Intron/exon structure
- Regulatory motifs
- Chemical modifications
- SNPs - Splice variants- Accessibility
- Variation
- mRNA
- Abundance- Regulatory information-
initiation/ termination signals
- Protein- Abundance- State
- Localization
- 3D structure
- Functional characterization
- Half-life
- Active sites
- Biochemical function- Cellular role
3Challenge Integrative models
- Across genes and proteins Many genes involved
(e.g., multifactorial diseases) - Across model systems Lack of experimental
platforms in target system - Across levels of biological organization(e.g.
gene regulatory processes involving
phosphorylation) - Across experiments Robustness against errors in
mass spectrometry, mRNA measurements - Across timescales
4Challenge Capturing evolutionary constraints
DNA RNA Proteins Modules Organelles Cells Organs I
ndividuals Populations Ecologies
"Nothing in biology makes sense except in the
light of evolution. Theodosius Dobzhansky
5Challenge Which tools and experiments to use
6Challenge Choosing experiments
- Machine LearningDetermine most likely
classification/parameterization on the basis of a
randomly sampled dataset - Active LearningAllow an algorithm to query
selected data points, using the result of
previous queries.
7Challenge Relations between system variables can
be quite complex
- Yuh, Bolouri, Davidson, Science, 1998
8Challenge Relations between system variables can
be quite complex
- Yuh, Bolouri, Davidson, Science, 1998
9Challenge Develop models that allow extremely
efficient algorithms
AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT...
GAACGGAGTACGT... TCGTGACGGTGAT...
10CLUSTALW(1.74) multiple sequence
alignment Cotton ACGGTT-TCCATTGGATGA---AATGAGATAA
GAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA--
-----AGGCTTTACCATT Pea GTTTTT-TCAGTTAGCTTA---GTGGG
CATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATG
ATA-------AGG--TTAGCACA Tobacco TAGGAT-GAGATAAGATT
A---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GA
CTTAAATGAAGA-------ATGGCTTAGCACC Ice-plant TCCCAT-
ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAG
GATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC Turnip
ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCA
TGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAG
C Wheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACT
CAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAA
CAAGCAAA Duckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCA
TT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGA
CTGCCAACATTAATTAAA Larch TAACAT-ATGATATAACAC---CGG
GCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGAC
TAACAAAA--TGAAAGTACAAGACC Cotton CAAGAAAAGTTTCCAC
CCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT--
--AGGATCCAACGTCACCCTTTCTCCCA-----A Pea C---AAAACTT
TTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATT
TTC----ACAATCCAACAA-ACTGGTTCT---------A Tobacco AA
AAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAA
TGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA I
ce-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG
-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTT
A-ACGATAA Turnip CAAAAGCATTGGCTCAAGTTG-----AGACGAG
TAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGT
TATTTCT---------A Wheat GCTAGAAAAAGGTTGTGTGGCAGCCA
CCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCG
ACGGCAATGCTTCTTC-------- Duckweed ATATAATATTAGAAAA
AAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-
CTAGACTCCAATTTACCCAAATCACTAACCAATT Larch TTCTCGTAT
AAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCA
CACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA Cotton
ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-T
ATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACT
A Pea GGCAGTGGCC---AACTAC--------------------CACAA
TTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--
ACATTA Tobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGAT
AT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------T
GGTGGGCA-ACGATG Ice-plant GGCTCTTAATCAAAAGTTTTAGGT
GTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGA
AGGGGG----TGCTATGGA-GCAAGG Turnip CACCTTTCTTTAATCC
TGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAG
GGCTTCATACCTCT----TGCGCTTCTCACTATA Wheat CACTGATCC
GGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA
-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG Duckweed
TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCC
TATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAAT
C Larch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCC
AATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTAT
A-TCTATA Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAA
CTGTAGCAT--ATAGTAC Pea TATAAAGCAAGTTTTAGTA-CAAGCTT
TGCAATTCAACCAC--A-AGAAC Tobacco CATAGACCATCTTGGAAG
T-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plant TCCTCAT
CAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC Larch
TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA
Turnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG
-AGAAAAG Wheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGC
ATCCTCCTCTCCTCC Duckweed CATGGGGCGACG---CAGTGTGTGG
AGGAGCAGGCTCAGTCTCCTTCTCG
11Challenge Developing models that allow extremely
efficient algorithms
AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT...
GAACGGAGTACGT... TCGTGACGGTGAT...
ACGT
ACGT
ACGT
ACGG
Parsimony score 1
J. Comp Biol. 2002
12An Exact Algorithm(generalizing Sankoff and
Rousseau 1975)
Wu s best parsimony score for subtree rooted
at node u, if u is labeled with string s.
4k entries
AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TC
GTGACGGTG
ACGG 0 ACGT ? ...
J. Comp Biol. 2002
13What are good challenges to tackle?
-
- Biological/medical questions asked
- Experimental technologies to acquire a lot of
relevant data - Available datasets with a formalized notion of
data quality
14Memory complexity O(k ? 42k ) per node
Average sequence length
Number of species
Time complexity Total time O(n k (42k l ))
Motif length
J. Comp Biol. 2002
15Technology-based challengesUniversal DNA Tag
Systems
- Existing applications in high-throughput
technologies - Universal DNA arrays
- Padlock probes
- LYNX mRNA technology
16Formalization
Define weight(A/T)1, weight(C/G)2 weight(AACT
TG) 112112 8 ? melting temperature
(AACTTG) 2weight
l-u code problemGiven two integers, l lt u, find
the largestset of tags such that Each tag has
weight ? u Each string of weight ? l occurs at
most once
J. Comp Biol. 2000 2003
17Challenge Visualization
Andrea Weston et al._at_ ISB Cytoscape
18Challenge Visualization
Cytoscape, pre-release 2.0
19A computer scientists perspective
- Biology is so digital, and incredibly
complicated I can't be as confident about
computer science as I can about biology. Biology
easily has 500 years of exciting problems to work
on, it's at that level. - Donald Knuth, 7 Dec 1993
Donald Knuth