Universitat%20Aut - PowerPoint PPT Presentation

About This Presentation
Title:

Universitat%20Aut

Description:

explosion of biological data – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 42
Provided by: Rode48
Category:

less

Transcript and Presenter's Notes

Title: Universitat%20Aut


1
explosion of biological data
2
genome technologies
  • DNA sequencing
  • DNA microarrays
  • mass spectroscopy and 2-D gels
  • yeast two hibrids
  • X-ray cristallography and NMR

3
growth of sequence data
4
Moores law
5
google hits X-informatics
bioinformatics 2,270,000
chemoinformatics 10,600
astroinformatics 31
neuroinformatics 49,300
socioinformatics 318
geoinformatics 38,000
meteoinformatics 2
econoinformatics 83
ecoinformatics 36,400
biology 17,000,000
6
decodificació del genoma
the genome sequence
  • ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGA
    AGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTA
    GCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACT
    CAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGG
    GACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAA
    GGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCC
    CCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTG
    TCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAG
    CCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGA
    AAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGA
    GGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGG
    GGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAG
    GCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAG
    GGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGT
    TGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
    TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAG
    TTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTG
    TGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCT
    CGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCC
    CATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGA
    GGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAG
    CGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCA
    GCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCA
    GCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGC
    CTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTT
    TTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCT
    CTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAA
    TTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGT
    TAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGAT
    GAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGT
    TCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGC
    CATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCA
    CCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACC
    ATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCT
    TCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCT
    GGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGAC
    AGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACA
    CAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATC
    CCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGA
    GACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAA
    AACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGG
    CTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCC
    GAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGT
    CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTA
    GGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGG
    GAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGA
    GACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACAC
    CTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCC
    AGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCT
    GGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGA
    TTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATT
    TGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTC
    AGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCA
    AGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAG
    GCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTG
    GGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGC
    TGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGG
    CCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCAT
    TCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGC
    TTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCT
    CTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGT
    CCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAG
    ATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAG
    CGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCA
    CCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCG
    TCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTC
    ACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCC
    TTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

7
the genome sequence
ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGA
AGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTA
GCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACT
CAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGG
GACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAA
GGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCC
CCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTG
TCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAG
CCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGA
AAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGA
GGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGG
GGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAG
GCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAG
GGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGT
TGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAG
TTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTG
TGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCT
CGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCC
CATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGA
GGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAG
CGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCA
GCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCA
GCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGC
CTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTT
TTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCT
CTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAA
TTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGT
TAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGAT
GAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGT
TCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGC
CATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCA
CCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACC
ATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCT
TCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCT
GGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGAC
AGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACA
CAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATC
CCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGA
GACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAA
AACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGG
CTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCC
GAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGT
CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTA
GGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGG
GAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGA
GACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACAC
CTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCC
AGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCT
GGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGA
TTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATT
TGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTC
AGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCA
AGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAG
GCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTG
GGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGC
TGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGG
CCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCAT
TCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGC
TTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCT
CTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGT
CCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAG
ATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAG
CGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCA
CCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCG
TCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTC
ACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCC
TTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC
8
the genome sequence
ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGA
AGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTA
GCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACT
CAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGG
GACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAA
GGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCC
CCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTG
TCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAG
CCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGA
AAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGA
GGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGG
GGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAG
GCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAG
GGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGT
TGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAG
TTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTG
TGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCT
CGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCC
CATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGA
GGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAG
CGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCA
GCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCA
GCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGC
CTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTT
TTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCT
CTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAA
TTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGT
TAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGAT
GAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGT
TCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGC
CATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCA
CCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACC
ATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCT
TCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCT
GGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGAC
AGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACA
CAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATC
CCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGA
GACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAA
AACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGG
CTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCC
GAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGT
CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTA
GGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGG
GAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGA
GACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACAC
CTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCC
AGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCT
GGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGA
TTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATT
TGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTC
AGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCA
AGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAG
GCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTG
GGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGC
TGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGG
CCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCAT
TCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGC
TTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCT
CTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGT
CCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAG
ATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAG
CGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCA
CCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCG
TCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTC
ACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCC
TTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC
9
(No Transcript)
10
gagttttatcgcttccatgacgcagaagttaacactttcggatatttctg
atgagtcgaaaaattatcttgataaagcaggaattactactgcttgttta
cgaattaaatcgaagtggactgctggcggaaaatgagaaaattcgaccta
tccttgcgcagctcgagaagctcttactttgcgacctttcgccatcaact
aacgattctgtcaaaaactgacgcgttggatgaggagaagtggcttaata
tgcttggcacgttcgtcaaggactggtttagatatgagtcacattttgtt
catggtagagattctcttgt
MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFY
TPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQL
ENYCN
11
probabilistic patterns ingene predictionroderic
guigó serrarobert castelo
  • (IMIM-UPF-CRG)

12
decodificació del genoma
the genome sequence
  • ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGA
    AGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTA
    GCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACT
    CAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGG
    GACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAA
    GGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCC
    CCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTG
    TCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAG
    CCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGA
    AAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGA
    GGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGG
    GGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAG
    GCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAG
    GGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGT
    TGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGT
    TGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAG
    TTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTG
    TGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCT
    CGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCC
    CATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGA
    GGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAG
    CGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCA
    GCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCA
    GCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGC
    CTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTT
    TTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCT
    CTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAA
    TTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGT
    TAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGAT
    GAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGT
    TCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGC
    CATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCA
    CCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACC
    ATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCT
    TCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCT
    GGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGAC
    AGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACA
    CAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATC
    CCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGA
    GACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAA
    AACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGG
    CTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCC
    GAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGT
    CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTA
    GGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGG
    GAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGA
    GACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACAC
    CTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCC
    AGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCT
    GGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGA
    TTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATT
    TGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTC
    AGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCA
    AGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAG
    GCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTG
    GGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGC
    TGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGG
    CCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCAT
    TCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGC
    TTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCT
    CTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGT
    CCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAG
    ATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAG
    CGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCA
    CCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCG
    TCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTC
    ACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCC
    TTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

13
the amino acid sequence of the proteins
  • QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQES
    KPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERI
    EKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDL
    FIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSP
    ESEQFRADHPFLFLIKHNPTNTIVYFGRYWS

14
eukaryotic gene structure
15
(No Transcript)
16
eukaryotic gene structure
acceptor
donor
17
modeling donor sites
GGG GTGAGCCCAG GTG GTAAGAGACA TAG GTGAGTGTGA GCG
GTAGGTACTC CAG GTAATTTTCT AAG GTAGGCTCTG AGG
GTGAGTCCAG GAG GTGGGTCACA CAG GTCAGTCTTT ACG
GTAAGACCTG CAG GTGGGTGCTG CAG GTAAGCAGTG AGG
GTGAGTTCAG CAG GTAAGCATTG AGG GTGAGTTCAG
18
the donor site pattern reflects underlying
biological constraints
19
the donor site pattern reflects underlying
biological constraints
20
the donor site pattern
21
prediction of splice sites
22
modeling dependencies
23
modeling dependencies, first order markov models
Weigth Array Models (WAM) Zhang and Marr (1993)
24
(No Transcript)
25
extending the Markov order
  • Salzberg et al., (1998) Interpolated Markov
    Models
  • Cawley (2000) Variable length Markov Models

26
modeling non-local dependencies in splice sites
  • Burge and Karlin, 1997. Maximal Dependence
    Decomposition (MDD)
  • Agarwal and Bafna, 1998
  • Yeo and Burge, 2003
  • Zhao et al., 2004. Permutated Variable Length
    Markov Models (PVMLL)
  • Cai et al., 2000 Dash and Gopalakrishman, 2001.
    Bayesian Networks
  • Castelo and Guigó, 2004, Inclusion-Driven Learned
    Bayesian Networks (idlBNS)

27
idlBNs
  • Bayesian Networks allow one to learn from the
    data those (in)dependencies that conform an
    acyclic digraph (DAG).
  • Inclusion-driven structure learning algorithms
    (Castelo and Kocka, 2003) under the assumption
    that the data is sampled from a DAG-distribution,
    and in the limit of the size of the sample they
    learn a correct DAG structure using a consistent
    scoring metric.

28
(No Transcript)
29
prediction of splice sites vs. gene prediction
30
sites
exons
genes
e8
e1
the gene prediction problem
31
(No Transcript)
32
gene prediction accuracy
BG-570 SN SP (SN.SP)/2
PWM 0.36 0.35 0.355
FMM 0.38 0.43 0.405
idlBN 0.45 0.37 0.410
SN fraction of true exons predicted correctly
SP fraction of predicted exons that are correct
33
(codon usage table)
34
coding statistics
35
the real accuracy
Accuracy on human chromosome 22
sensitivity specificity
genscan 0.79 0.53
twinscan 0.80 0.62
SGP 0.79 0.66
36
search for additional patterns
  • real exons with weak splice sites, Fairbrother
    et al., 2002
  • pseudoexons with strong splice sites, Zhang and
    Chasin, 2004

37
Fairbrother et al., 2002. splicing enhancers in
exons with weak sites
38
Zhang and Chasin, 2004. splicing silencers in
pseudoexons with strong sites
39
Bioinformatic approach scheme
40
G-rich motifs are able to influence 5 splice
site recognition
NE
?U1
U1
(1)
(2)
(3)
41
in collaboration with Juan Valcárcel, CRG
42
INHIBITORY EFFECT OF ON 5SS RECOGNITION BY U1
snRNP
U1
Weak 5ss followed by a G-rich element
(1)
Deletion of the G-rich element in (1)
(2)
Strong 5 ss
(3)
43
TIA-1 promotes U1 snRNP binding to weak 5 splice
sites Followed by uridine-rich sequences
XXXXXXXXX
TIA-1
44
(No Transcript)
45
the second genetic code
  • genetic code
  • mapping of nucleotide triplets into 3 into the
    twenty aminoacids
  • highly deterministic a given triplet always
    codes for the same amino acid
  • splicing code
  • mapping of nucleotide sequences into 3 and 5
    intron boundaries.
  • inherently stochastic the probability of an
    splicing sequence to participate in the
    definition of an inron boundary ranges from zero
    to one, and it is conditionated to very many
    different factors (which could be other sequences)
Write a Comment
User Comments (0)
About PowerShow.com