Title: Bioinformatika podzimn
1Bioinformatikapodzimní škola výpocetní chemie,
Praha 2006
Jirí Vondrášek Ústav organické chemie a biochemie
AV CR
2bioinformatika
Informatika nad biologickými molekulami
(daty). Bioinformatika extrahuje molekulární
informacní systém pro molekulární
biologii. Bioinformatika je konceptualizovaná
molekulární biologie (ve smyslu fyzikálne
chemickém) na níž je aplikována informatika
(odvozená od matematické informatiky a
statistiky).
Aplikace teorie biotechnologie farma
cie medicína genetické inženýrství
3bioinformatika
strukturovaná data (databáze), hypotézy
experimentální data
pocítacová analýza
4velikosti genomu
Mycoplasma genitalium 0.58 Mbp Escherichia
coli 4.6 Mbp Saccharomyces cerevisiae 16
chr. 11.2 Mbp Arabidopsis thaliana 5 chr. 115.4
Mbp Drosophila melanogaster 5 chr. 137.0
Mbp Homo sapiens 24 chr. 3.3 Gbp
5centrální dogma molekulární genetiky
6DNA
evolucní vztahy mezi geny a organizmy
funkce
geny
struktura
proteiny
7sekvence
8sekvence
gtjana (4797 nt) GAATTCGCCGCGGGGCTGCGCATCACCGATGCCG
CCACCATCGAGATCGTCGAGATGGTACTGGCCGGCTCGATCAACAAGCAG
CTCGTCGGCTACATCA ACGAAGCGGGCGGCAAGGCCGTCGGCCTGTGCG
GCAAGGACGGCAACATGGTGTCCGCCACCAAGGCGACGCGCACCATGGTC
GATCCGGATTCGCGGAT CGAAGAGGTGATCGACCTCGGTTTCGTCGGCG
AGCCGGAGAAGGTCGACCTCACCCTGCTCAACCAGCTGATCGGCCACGAG
TTGATCCCGGTGCTGGCG CCGCTGGCGACCTCCGCGTCGGGCCAGACCT
TCAACGTCAATGCCGACACCTTTGCAGGTGCGGTTGCCGGTGCGCTGCGG
GCCAAGCGCCTGCTGCTGC TGACCGACGTGCCGGGCGTGCTCGACCAGA
ACAAGAAGCTGATCCCCGAACTGTCGATCAAGGATGCCCGCAAGCTGATC
GCAGACGGCACCATCTCGGG CGGCATGATCCCCAAGGTCGAGACCTGCA
TCTACGCGCTCGAACAGGGCGTCGAAGGCGTCGTCATCCTCGACGGCAAG
GTCCCGCACGCAGTGCTGCTC GAATTGTTCACCAACCAGGGCACCGGCA
CGCTGATCCACAAGTGATGCGAGGCTGCGGCGACAACATCCGTCATGGCC
GGGCTCGTCCCGGCCATCCACG TCTTTCCGGCGGTTTTCTCAGCAAGAC
GTGGATGCCCGGCACAAGGCCGGGCATGACGGGGTGGAGATCGCGCGCCC
TCGCCGCCATTGTCACCACCCTC GCCCTCACCTCCGCCGCCCACGCCGA
CCTCAAGCTCTGCAACCGCATGAGCTACGTGGTCGAGACGGCGATCGGGG
TCGATTCCAACGGCACCACCGCCT CGCGCGGATGGCTGCGGATTGATCC
GGCGCAATGCCGGGTCGTGGTGCAAGGCGCGCTCAACGCCGACCGCATCA
TGCTGAATGCCCGCGCGCTGGCGGT GTACGGCGTCTCGCCGCTGCCGCA
GAACGGCACTGACCGGCTGTGCATTGCCGAAGACAATTTCGTCATCGCCG
CCGCGCGGCAATGCCGCGGCGGCCAA ACGCTCGCCGCCTTCACCGAGAT
CAAGCCCACCGACACCGAGGACGGCAACAAGATCGCTTATCTGGCGGAAG
ACTCCGGCTACGACGACGAACAGGCCA AACTCGCCGCGATCCAGCGGCT
GCTGGTGATCGCCGGTTACGACGCCTCGCCGATCGACGGCGTCGACGGCC
CGAAGACGCAGGCCGCGCTGTCCGCCTT CCTCAAGAGCCGAGGCCTGAA
GCCCGAGATCGTCGATGCGCCGGATTTCTTCGACGTGATGATCAAGGCAG
TGCAGCAGCCGTCCGGCAGCGGGCTGACC TGGTGCAACGACACCAAGTA
CAAGATCATGGCGGCCGTCGGCGAAGACGACGGCAAGACTGTCACCAGCC
GCGGCTGGTACGGTGTTGCGCCCGGCCAAT GCCTGCGCCCCGACCTCGG
CGCACAGCCGAAGCGGGTGTTCAGCTTCGCCGAAGCGGTCGACGGCAGCG
GCAGGCCGGTGACCATCAAGGGCCGTGCGCT GAACTGGGGCGGCGGCGT
GACGCTGTGCACGCGTGACAGCAAGTTCGAGATCGGCGAGCAAGGCGATT
GCGCGGCGCGCGGCCTCGCCGCCACCGGCTTC GCCGCCGTCGATCTCAG
TAGCGGCAAGACATTGAGGTTGTCCGCCCCATGATGCAGCTCGGCAAACG
CGGCTTCGATCACGTCGAGACCTGGGTGTTCGA TCTCGACAACACGCTG
TACCCGCATCACCTCAACCTATGGCAGCAGGTCGATGCGCGGATCCGCGA
CTTCGTCGCCGACTGGCTGAAGGTTTCGCCGGAA GAAGCCTTCCGTATC
CAGAAGGATTACTACAAGCGCTACGGCACCACGATGCGCGGGATGATGAC
CGAGCACGGCGTTCACGCCGACGACTACCTGGCTT ATGTCCACGCCATC
GACCATTCGCCGCTGCAGCCGAATCCGGCGATGGGCGATGCGATCGAGCG
ACTGCCGGGCCGCAAGCTGATCCTGACCAACGGCTC GACCGCCCATGCG
GGCAAGGTGCTGGAGCGGCTCGGCATCGGCCATCATTTCGAGGCGGTGTT
CGACATCATTGCGGCCGACCTCGAGCCGAAGCCGGCG CCGCAGACCTAC
CGCCGTTTTCTCGATCGCCATGGTGTCGACCCGGCCCGCGCCGCGATGTT
CGAAGACCTCGCCCGCAACCTCACCGTGCCGCACCAGC TCGGCATGACC
ACCGTGCTGGTGGTGCCTGACGATAGCCAGGACGTGGTCCGCGAAGATTG
GGAGCTTGAAGGCCGCGACGCCGCCCACGTCGATCACGT GACTGATGAT
TTGACAGGGTTCTTGGGGAAGCTGAGTTCGCTGTAGGCCGGGGACGCCTC
CCAAGCGTCAATCGTCATCGCCGCCGGATGCAAGGCGGCT AGGTATTGC
GGAGCGCTCGCGATCTTCCGTCCAATGCCCTGGGATACTGGATCGCCCGG
ACGAGCCGGGCGACGACGTTGAAGAGAGATGACGTGGCGTC ACCACATC
CCCCGCCGTCATCGCCCGCGCAGGCGGGCGATGACTTGGCGGACGGGGCG
GCGCCTTGACTCCGACCCGGCGAATCCGGACAACACTCCGCA AAACTCT
CCCTGAAATCAGCCTCCCAAGGACCCGTCGATGCCGCTCACCGCCCTGGA
ATCTACCATCAACGCCGCTTTCGACGCGCGCGACACCGTTACC GCGGCG
ACGCAGGGCGAGATTCGTCAGGCCGTCGAGGATGCGCTCGATCTGCTCGA
CCAGGGCAAGGTGCGGGTGGCGCGGCGCGACGACTCCGGCGCCT GGACG
GTCAATCAGTGGCTGAAGAAAGCAGTGCTGCTGTCGTTCCGGCTCAACGA
CATGGGCGTGATCGCCGGCGGCCCGGGCGGCGCCAACTGGTGGGA CAAG
GTGCCGTCGAAGTTCGAGGGCTGGGGTGAGAACCGCTTCCGCGAGGCCGG
CTTCCGCGCCGTGCCGGGCCGATCGTCGCGCGTCGGCCTTTATCGC CAA
GACGCGGTACTGATCCGTCCTTCGTCAATCTCGGCGCTTACGTCGATGAA
AGCACCATGGTCGAACACCTGGGCGACCGTCGGCTCCTGCGCCCAGA TC
GGCAAGCGCGTGCACATCTCCGGCGGTGCCGGCATCGGCGGCGTGCTCGA
GCCGCTGCAGGCCGGCCCGGTGATCATCGAGGACGACTGCTTCATCGG C
GCCCGCTCCGAAGTCGCCGAAGGCGTGATCGTGCGCAAGGGTGCGGTGCT
GGCGATGGGCGTTTTCCTCGGCGCCTCGACCAAGATCGTCGACCGCGAG
ACCGGCGAAATCTTCGTCGGCGAAGTGCCGGAATATGCCGTGCTGGTGCC
CGGCACCCTGCCCGGCAAGCCGATGAAGAACGGCGCCCCCGGCCCAGCCA
CCGCCTGCGCGGTGATCGTCAAGCGCGTCGACGAGCGCACCCGTTCCAA
GACCTCGATCAACGAATTGCTGCGGGACTGACACCTGTAGGAGGCGCGAA
T GGACTGGACCACGCTGTTCTTCAGCTTTCGAGGTCGGATCAATCGCGC
CAAATACTGGCTGGTCGGACTGATCTACGTCGCCGCCTGGATGG .
9všeobecná analýza
Co lze v DNA najít?
strukturní a organizacní elementy evolucní
vztahy geny promotory a další rídící
elementy cizí DNA
10Jak najít geny?
geny
11geny
Leucin Rhodobacter capsulatus antikodón pocet
CUA 3 lt1 CUC 119
16 CUG 458 60 CUU 157
20 UUA 0 0 UUG 27 3
Escherichia coli 4 9 52
10 11 13
12geny
13Jaké proteiny geny kódují?
alignment
14alignment
11
Dot plot
SSEARCH BLITZ
SSEARCH ftp//ftp.virginia.edu/pub/fasta BLITZ
... http//www.ebi.ac.uk
1n
FASTA BLAST
nn
PSI-BLAST HMMER
ClustalW MultAlign
n
15alignment
11
Dot plot
SSEARCH BLITZ
1n
FASTA BLAST
FASTA http//www.ebi.ac.uk BLAST http//ncbi.nlm.n
ih.gov/blast
nn
PSI-BLAST HMMER
ClustalW MultAlign
n
16alignment
11
Dot plot
SSEARCH BLITZ
1n
FASTA BLAST
nn
PSI-BLAST HMMER
PSI-BLAST http//ncbi.nlm.nih.gov HMMER
ClustalW MultAlign
ClustalW MultAlign
n