Challenges for computer science as a part of Systems Biology PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Challenges for computer science as a part of Systems Biology


1
Challenges for computer scienceas a part of
Systems Biology
  • Benno SchwikowskiInstitute for Systems
    BiologySeattle, WA

2
Towards integrative models
  • Proteininteraction
  • Interaction partner
  • Direct/indirect- Affinity
  • Effect
  • DNA
  • Sequence
  • Genomic locus
  • Domain content
  • Intron/exon structure
  • Regulatory motifs
  • Chemical modifications
  • SNPs - Splice variants- Accessibility
  • Variation
  • mRNA
  • Abundance- Regulatory information-
    initiation/ termination signals
  • Protein- Abundance- State
  • Localization
  • 3D structure
  • Functional characterization
  • Half-life
  • Active sites
  • Biochemical function- Cellular role

3
Challenge Integrative models
  • Across genes and proteins Many genes involved
    (e.g., multifactorial diseases)
  • Across model systems Lack of experimental
    platforms in target system
  • Across levels of biological organization(e.g.
    gene regulatory processes involving
    phosphorylation)
  • Across experiments Robustness against errors in
    mass spectrometry, mRNA measurements
  • Across timescales

4
Challenge Capturing evolutionary constraints
DNA RNA Proteins Modules Organelles Cells Organs I
ndividuals Populations Ecologies

"Nothing in biology makes sense except in the
light of evolution. Theodosius Dobzhansky
5
Challenge Which tools and experiments to use
6
Challenge Choosing experiments
  • Machine LearningDetermine most likely
    classification/parameterization on the basis of a
    randomly sampled dataset
  • Active LearningAllow an algorithm to query
    selected data points, using the result of
    previous queries.

7
Challenge Relations between system variables can
be quite complex
  • Yuh, Bolouri, Davidson, Science, 1998

8
Challenge Relations between system variables can
be quite complex
  • Yuh, Bolouri, Davidson, Science, 1998

9
Challenge Develop models that allow extremely
efficient algorithms
AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT...
GAACGGAGTACGT... TCGTGACGGTGAT...
10
CLUSTALW(1.74) multiple sequence
alignment Cotton ACGGTT-TCCATTGGATGA---AATGAGATAA
GAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA--
-----AGGCTTTACCATT Pea GTTTTT-TCAGTTAGCTTA---GTGGG
CATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATG
ATA-------AGG--TTAGCACA Tobacco TAGGAT-GAGATAAGATT
A---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GA
CTTAAATGAAGA-------ATGGCTTAGCACC Ice-plant TCCCAT-
ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAG
GATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC Turnip
ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCA
TGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAG
C Wheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACT
CAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAA
CAAGCAAA Duckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCA
TT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGA
CTGCCAACATTAATTAAA Larch TAACAT-ATGATATAACAC---CGG
GCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGAC
TAACAAAA--TGAAAGTACAAGACC Cotton CAAGAAAAGTTTCCAC
CCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT--
--AGGATCCAACGTCACCCTTTCTCCCA-----A Pea C---AAAACTT
TTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATT
TTC----ACAATCCAACAA-ACTGGTTCT---------A Tobacco AA
AAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAA
TGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA I
ce-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG
-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTT
A-ACGATAA Turnip CAAAAGCATTGGCTCAAGTTG-----AGACGAG
TAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGT
TATTTCT---------A Wheat GCTAGAAAAAGGTTGTGTGGCAGCCA
CCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCG
ACGGCAATGCTTCTTC-------- Duckweed ATATAATATTAGAAAA
AAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-
CTAGACTCCAATTTACCCAAATCACTAACCAATT Larch TTCTCGTAT
AAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCA
CACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA Cotton
ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-T
ATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACT
A Pea GGCAGTGGCC---AACTAC--------------------CACAA
TTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--
ACATTA Tobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGAT
AT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------T
GGTGGGCA-ACGATG Ice-plant GGCTCTTAATCAAAAGTTTTAGGT
GTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGA
AGGGGG----TGCTATGGA-GCAAGG Turnip CACCTTTCTTTAATCC
TGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAG
GGCTTCATACCTCT----TGCGCTTCTCACTATA Wheat CACTGATCC
GGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA
-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG Duckweed
TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCC
TATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAAT
C Larch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCC
AATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTAT
A-TCTATA Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAA
CTGTAGCAT--ATAGTAC Pea TATAAAGCAAGTTTTAGTA-CAAGCTT
TGCAATTCAACCAC--A-AGAAC Tobacco CATAGACCATCTTGGAAG
T-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plant TCCTCAT
CAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC Larch
TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA
Turnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG
-AGAAAAG Wheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGC
ATCCTCCTCTCCTCC Duckweed CATGGGGCGACG---CAGTGTGTGG
AGGAGCAGGCTCAGTCTCCTTCTCG
11
Challenge Developing models that allow extremely
efficient algorithms
AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT...
GAACGGAGTACGT... TCGTGACGGTGAT...
ACGT
ACGT
ACGT
ACGG
Parsimony score 1
J. Comp Biol. 2002
12
An Exact Algorithm(generalizing Sankoff and
Rousseau 1975)
Wu s best parsimony score for subtree rooted
at node u, if u is labeled with string s.
4k entries
AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TC
GTGACGGTG
ACGG 0 ACGT ? ...
J. Comp Biol. 2002
13
What are good challenges to tackle?
  • Biological/medical questions asked
  • Experimental technologies to acquire a lot of
    relevant data
  • Available datasets with a formalized notion of
    data quality

14
Memory complexity O(k ? 42k ) per node
Average sequence length
Number of species
Time complexity Total time O(n k (42k l ))
Motif length
J. Comp Biol. 2002
15
Technology-based challengesUniversal DNA Tag
Systems
  • Existing applications in high-throughput
    technologies
  • Universal DNA arrays
  • Padlock probes
  • LYNX mRNA technology

16
Formalization
Define weight(A/T)1, weight(C/G)2 weight(AACT
TG) 112112 8 ? melting temperature
(AACTTG) 2weight
l-u code problemGiven two integers, l lt u, find
the largestset of tags such that Each tag has
weight ? u Each string of weight ? l occurs at
most once
J. Comp Biol. 2000 2003
17
Challenge Visualization
Andrea Weston et al._at_ ISB Cytoscape
18
Challenge Visualization
Cytoscape, pre-release 2.0
19
A computer scientists perspective
  • Biology is so digital, and incredibly
    complicated I can't be as confident about
    computer science as I can about biology. Biology
    easily has 500 years of exciting problems to work
    on, it's at that level.
  • Donald Knuth, 7 Dec 1993

Donald Knuth
Write a Comment
User Comments (0)
About PowerShow.com