Title: Bioinformatics I
1Swiss Institute of Bioinformatics
Bioinformatics I
2003 - 2004
Mihaela Zavolan Michael Primig Erik van
Nimwegen Torsten Schwede
2For lecture notes, articles, references, links,
etc. see Teaching _at_ http//www.bioz.unibas.
ch/personal/schwede/ http//www.bioz.unibas.ch/pe
rsonal/primig/
321. 10. Introduction M.P. M.Z. E.vN. 28. 10.
Sequence Comparison I pair wise aligments,
dynamic programming M. Zavolan 04. 11.
Sequence Comparison II multiple sequence
aligments A. Brüngger (Novartis) 11. 11.
Alignment Programs and Sequence Database
Searching A. Brüngger (Novartis) 18. 11.
Evolution of Genes and Genomes M. Primig 25.
11. Evolution of Protein Families T. Schwede
02. 12. no lecture (SAB meeting) --- 09.
12 Models of Molecular Evolution E. van Nimwegen
16. 12. Genefinding I protein-coding genes,
procaryotic genome annotation M. Zavolan 06.
01. Genefinding II eucaryotic genome
annotation M. Zavolan 13. 01.
Pharmacogenomics SNPs T. Schwede 20. 01.
Clustering E. van Nimwegen 27. 01. Microarray
Data Computation M. Peitsch 03. 02. Microarray
Data Analysis M.P. T.S. 10. 02. WebXam
4What is Bioinformatics?
An operational definition The applications of
computer sciences to molecular biology - in
particular to the study of macromolecules such as
proteins and nucleic acids.
- Some Synonyms
- Molecular Bioinformatics
- Computational Biology
- Biocomputing
5What is not Bioinformatics?
- Bio-inspired computer sciences (e.g.
artificial life, neural networks, genetic
algorithms) - Genome Sequencing (e.g. genetic mapping and
contig assembly) - Biomathematics or bio-statistics
- Modeling of biological systems (Ecological or
population modeling) - Protein Structure Refinement (X-Ray, NMR)
- DNA Computers
6Why do we need Bioinformatics?
7Why do we need Bioinformatics?
8Genome Projects need to store and organize DNA
sequences.
Why do we need Bioinformatics?
9DNA databases...
gggtctctcttgttagaccagatctgagcctgggagctctctggctaact
agggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaag
tagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcag
accaaatttagtcagtgtgaaaaatctctagcagtggcgcctgaacaggg
acttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcgg
cttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagta
cgccaaaattttgactagcggaggctagaaggagagagatgggtgcgaga
gcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggtt
aaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaa
gcagggagctagaacgattcgcagtcaatcctggcctattagaaacatca
gaaggttgtagacaaatactgggacaactacaaccagcccttcagacagg
atcagaagaacttagatcattatataatacagtagcaaccctctattgtg
tgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagata
gaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctga
cacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaaca
tccaggggcaaatggtacatcaggccatatcacctagaactttaaatgca
tgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccat
gttttcagcattatcagaaggagccaccccacaagatttaaacaccatgc
taaacacagtggggggacatcaagcagccatgcaaatgttaaaagagacc
atcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagg
gcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcag
gaactactagtacccttcaggaacaaatagcatggatgacaaataatcca
cctatcccagtaggagaaatctataagagatggataatcctgggattaaa
taaaatagtaaggatgtatagccctaccagcattctggacataaaacaag
gaccaaaggaaccctttagagactatgtagaccggttctataagactcta
agagccgagcaagcttcacaggaggtaaaaaattggatgacagaaacctt
gttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgg
gaccagcagctacactagaagaaatgatgacagcatgtcagggagtggga
ggacccggccataaagcaagagttttggcagaagcaatgagccaagtaac
aaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaa
aaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaat
tgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggaca
ccaaatgaaagattgtactgagagacaggctaattttttagggaaaatct
ggccttcccacaggggaaggccagggaattttcctcagaacagactagag
ccaacagccccaccagccccaccagaagagagcttcaggtttggggaaga
gacaacaactccctctcagaagcaggagctgatagacaaggaactgtatc
cttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataa
agataggggggcaactaaaggaagctctattagatacaggagcagatgat
acagtattagaagaaataaatttgccaggaagatggaaaccaaaaatgat
agggggaattggaggttttatcaaagtaagacagtatgatcaaatactcg
tagaaatctgtggacataaagctataggtacagtattagtaggacctaca
cctgtcaacataattggaagaaatctgttgactcagattggttgcacttt
aaattttcccattagtcctattgaaactgtaccagtaaaattaaagccag
gaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaata
aaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaatttc
aaaaatcgggcctgaaaatccatataatactccagtatttgccataaaga
aaaaagacagtactaaatggagaaaattagtagatttcagagaacttaat
aagaaaactcaagacttctgggaagttcaattaggaataccacatcccgc
agggttaaaaaagaaaaaatcagtaacagtactggatgtgggtgatgcat
atttttcagttcccttagataaagaattcaggaagtacactgcatttacc
atacctagtataaacaatgagacaccagggattagatatcagtacaatgt
gcttccacagggatggaaaggatcaccagcaatattccaaagcagcatga
caaaaatcttagagccttttagaaaacaaaatccagacatagttatctat
caatacatggacgatttgtatgtaggatctgacttagaaatagggcagca
tagaacaaaaatagaggaactgagacaacatctgttgaagtggggattta
ccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggt
tatgaactccatcctgataaatggacagtacagcctatagtgctgccaga
aaaggacagctggactgtcaatgacatacagaagttagtgggaaaattga
attgggcaagtcagatttacccagggattaaagtaaagcaattatgtaga
ctccttaggggaaccaaggcactaacagaagtaataccactaacaaaaga
agcagagctagaactggcagaaaacagggaaattctaaaagaaccagtac
atggagtgtattatgacccatcaaaagacttaatagcggaaatacagaag
caggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaa
tctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatg
taaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagta
atatggggaaagactcctaaatttaaactacccatacaaaaagaaacatg
ggaaacatggtggacagagtattggcaagccacctggattcctgagtggg
agtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaa
gaacccataataggagcagaaactttctatgtagatggggcagctaacag
ggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaa
aagttgtctccataactgacacaacaaatcagaagactgagttacaagca
attcttctagcattacaggattctggattagaagtaaacatagtaacaga
ctcacaatatgcattaggaatcattcaagcacaaccagataaaagtgaat
cagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtc
tacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagt
agataaattagtcagtactggaatcaggaaagtactctttttagatggaa
tagataaagcccaagaagaacatgaaaaatatcacagtaattggagggca
atggctagtgattttaacctgccacctgtggtagcaaaagagatagtagc
cagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtag
actgtagtccaggaatatggcaactagattgtacacatttagaaggaaaa
attatcctggtagcagttcatgtagccagtggatatatagaagcagaagt
tattccagcagaaacagggcaggaaacagcatactttctcttaaaattag
caggaagatggccagtaaaaacagtacatacagacaatggcagcaatttc
accagtactacagttaaggccgcctgttggtgggcaggaatcaagcagga
atttggcattccctacaatccccaaagtcaaggagtagtagaatctataa
ataaagaattaaagaaagttataggacagataagagatcaggctgaacat
cttaagacagcagtacaaatggcagtattcatccacaattttaaaagaaa
aggggggattggggggtacagtgcaggggaaagaatagtagacataatag
caacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaa
aattttcgggtttattacagggacagcagagatccactttggaaaggacc
agcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataata
gtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattat
ggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgagga
ttagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaa
gctaagggatggttttatagacatcactatgaaagtactcatccgagaat
aagttcagaagtacacatcccactagggaatgcaaaattggtaataacaa
catattggggtctacatacaggagaaagagactggcatttgggtcaagga
gtctccatagaattgaggaaaaggagatatagcacacaattagaccctaa
cctagcagaccaactaattcatctgcattactttgattgtttttcagaat
ctgctataagaaatgccatattaggacatatagttagccctaggtgtgaa
tatcaagcaggacataacaaggtaggatctctacagtacttggcactaac
agcattagtaagaccaagaaaaaagataaagccacctttgcctagtgtta
caaaactgacagaggatagatggaacaagccccagaagaccaagggccac
aaagggaaccatacaatgaatggacactagaacttttagaggagctcaag
aatgaagctgttagacattttcctaggatatggctccatagcttagggca
acatatctatgaaacttatggagatacttgggcaggagtggaagccataa
taagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaa
catagcagaatagacattcttcgacgaaggagagcaagaaatggagccag
tagatcctagactagagccctggaagcatccaggaagtcagcctaggact
gcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttg
tttcataacaaaaggcttaggcatctcctatggcaggaagaagcggagac
agcgacgaagagctcctcaagacagtcagactcatcaagtttctctatca
aagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagt
agcattagtagtagcagcaataatagcaatagttgtgtggtccatagtat
tcatagaatataggaaaataagaagacaaaacaaaatagaaaggttgatt
gatagaataatagaaagagcagaagacagtggcaatgagagtgacggaga
tcaggaagaattatcagcacttgtggaaatggggcacgatgctccttggg
atgttaatgatctgtaaagctgcagaaaatttgtgggtcacagtttatta
tggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcag
atgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcc
tgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgac
agaaaattttaacatgtggaaaaataacatggtagaccaaatgcatgagg
atataattagtttatgggatcaaagcctaaagccatgtgtaaaattaacc
ccactctgtgttactttaaattgcactgattatgggaatgatactaacac
caataatagtagtgctactaaccccactagtagtagcgggggaatggagg
ggagaggagaaataaaaaattgctctttcaatatcaccagaagcataaga
gataaagtgaagaaagaatatgcacttttttatagtcttgatgtaatacc
aataaaagatgataatactagctataggttgagaagttgtaacacctcag
tcattacacaggcctgtccaaaggtatcctttgaaccaattcccatacat
tattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagtt
caatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatg
gaattaggccagtagtatcaactcaactgctgttaaatggcagtctagca
gaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagt
cataatagtacatctgaatgaatctgtagaaattaattgtacaagactca
acaacattacaaggagaagtatacatgtaggacatgtaggaccaggcaga
gcaatttatacaacaggaataataggaaaaataagacaagcacattgtaa
cattagtagagcaaaatggaataacactttaaaacagatagttacaaaat
taagagaacaatttaagaataaaacaatagtctttaatcaatcctcagga
ggggacccagaaattgtaatgcacagttttaattgtggaggggaattttt
ctactgtaattcaacacaactgtttaacagtacttggaatggtactgcat
ggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgc
agaataaaacaaattataaacatgtggcaggaagtaggaaaagcaatgta
tgcacctcccatcagaggacaaattagatgttcatcaaatattacagggc
tgatattaacaagagatggtggtattaaccagaccaacaccaccgagatt
ttcaggcctggaggaggagatatgaaggacaattggagaagtgaattata
taaatataaagtagtaaaaattgaaccattaggagtagcacccaccaagg
caaagagaagagtggtgcaaagagaaaaaagagcagtgggaataatagga
gctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagc
gtcaatgacgctgacggtacaggccagacaattattgtctggtatagtgc
aacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttg
cacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgt
ggaaagatacctaagggatcaacagctcctggggttttggggttgctctg
gaaaactcatttgcaccactgctgtgccttggaatactagttggagtaat
aaatctctgagtcagatttgggataacatgacctggatgcagtgggaaag
ggaaattgataattacacaagcttaatatacaacttaattgaagaatcgc
aaaaccaacaagaaaagaatgaacaagagttattggaattagataactgg
gcaagtttgtggaattggtttagcataacaaattggctgtggtatataaa
aatattcataatgatagtaggaggcttggtaggtttaagaatagttttta
ctgtactttctatagtaaatagagttaggcagggatactcaccattgtcg
tttcagacgcgcctcccagccaggaggggacccgacaggcccgaaggaat
cgaagaagaaggtggagagagagacagagacagatccggtcaattagtgg
atggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttc
agctaccaccgcttgagagacttactcttgattgtaacgaggattgtgga
acttctgggacgcagggggtgggaagccctcaaatattggtggaatctcc
tacaatattggattcaggaactaaagaatagtgctgttagcttgctcaac
gccacagccatagcagtagctgagggaactgatagggttatagaagtatt
acaaagagcttgtagagctattctccacatacctagaagaataagacagg
gcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtag
taaaattggatggcctactgtaagggaaagaatgagaagagctgagccag
cagcagatggggtgggagcagtatctcgagacctggaaaaacatggagca
atcacaagtagtaatacagcaactaacaatgctgattgtgcctggctaga
agcacaagaggaggaggaggtgggttttccagtcagacctcaggtacctt
taagaccaatgacttacaagggagcgttagatcttagccactttttaaaa
gaaaaggggggactggaagggctaatttggtcccagaaaagacaagacat
ccttgatttgtgggtccaccacacacaaggctacttccctgattggcaga
actacacaccagggccagggatcagatatccactgacctttggttggtgc
ttcaagctagtaccagttgagccagagaaggtagaagaggccaatgaagg
agagaacaacagattgttacaccctgtgagcctgcatgggatggaggacc
cggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgt
cacatggcccgagagctgcatccggagtactacaaggactgctgacactg
agctttctacaagggactttccgctggggactttccagggaggcgtggcc
tgggcgggactggggagtggcgagccctcagatgctgcatataagcagct
gctttttgcctgtactgggtctctcttgttagaccagatctgagcctggg
agctctctggctaactagggaacccactgcttaagcctcaataaagcttg
ccttgagtgcttca
Human immunodeficiency virus type 1, complete
genome 9214 BP
10DNA databases
- GenBank (USA) http//www.ncbi.nlm.nih.gov/Genbank
/ - EMBL (Europe) http//www.ebi.ac.uk/embl/
- DDBJ (Japan) http//www.ddbj.nig.ac.jp/
11(No Transcript)
12 EMBL/GenBank/DDBJ Annotations
Warning !
- DNA data base annotations are full of errors
- In sequences, in annotations, in CDS attribution
- No consistency of annotations
- Most annotations are done by scientist who
submit data - Heterogeneity of data quality and updating
13 Some interesting sequence annotations
- FT source 1..124
- FT /db_xref"taxon4097"
- FT /organelle"plastidchloropla
st" - FT /organism"Nicotiana
tabacum" - FT /isolate"Cuban cahibo
cigar, gift from President Fidel - FT Castro"
- Or
- FT source 1..17084
- FT /chromosome"complete
mitochondrial genome" - FT /db_xref"taxon9267"
- FT /organelle"mitochondrion"
- FT /organism"Didelphis
virginiana" - FT /dev_stage"adult"
- FT /isolate"fresh road killed
individual" - FT /tissue_type"liver"
14Taxonomy Browser _at_ EBI
http//www.ebi.ac.uk/newt/
15Why do we need Bioinformatics?
How do we find protein coding regions, introns
and exons in genomic DNA sequences?
16gggtctctcttgttagaccagatctgagcctgggagctctctggctaact
agggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaag
tagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcag
accaaatttagtcagtgtgaaaaatctctagcagtggcgcctgaacaggg
acttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcgg
cttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagta
cgccaaaattttgactagcggaggctagaaggagagagatgggtgcgaga
gcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggtt
aaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaa
gcagggagctagaacgattcgcagtcaatcctggcctattagaaacatca
gaaggttgtagacaaatactgggacaactacaaccagcccttcagacagg
atcagaagaacttagatcattatataatacagtagcaaccctctattgtg
tgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagata
gaggaagagcaaaacaaaagtaagaaaaaagcacagcaagcagcagctga
cacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaaca
tccaggggcaaatggtacatcaggccatatcacctagaactttaaatgca
tgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccat
gttttcagcattatcagaaggagccaccccacaagatttaaacaccatgc
taaacacagtggggggacatcaagcagccatgcaaatgttaaaagagacc
atcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagg
gcctcatccaccaggccagatgagagaaccaaggggaagtgacatagcag
gaactactagtacccttcaggaacaaatagcatggatgacaaataatcca
cctatcccagtaggagaaatctataagagatggataatcctgggattaaa
taaaatagtaaggatgtatagccctaccagcattctggacataaaacaag
gaccaaaggaaccctttagagactatgtagaccggttctataagactcta
agagccgagcaagcttcacaggaggtaaaaaattggatgacagaaacctt
gttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgg
gaccagcagctacactagaagaaatgatgacagcatgtcagggagtggga
ggacccggccataaagcaagagttttggcagaagcaatgagccaagtaac
aaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaa
aaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaat
tgcagggcccctaggaaaaggggctgttggaaatgtggaaaggagggaca
ccaaatgaaagattgtactgagagacaggctaattttttagggaaaatct
ggccttcccacaggggaaggccagggaattttcctcagaacagactagag
ccaacagccccaccagccccaccagaagagagcttcaggtttggggaaga
gacaacaactccctctcagaagcaggagctgatagacaaggaactgtatc
cttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataa
agataggggggcaactaaaggaagctctattagatacaggagcagatgat
acagtattagaagaaataaatttgccaggaagatggaaaccaaaaatgat
agggggaattggaggttttatcaaagtaagacagtatgatcaaatactcg
tagaaatctgtggacataaagctataggtacagtattagtaggacctaca
cctgtcaacataattggaagaaatctgttgactcagattggttgcacttt
aaattttcccattagtcctattgaaactgtaccagtaaaattaaagccag
gaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaata
aaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaatttc
aaaaatcgggcctgaaaatccatataatactccagtatttgccataaaga
aaaaagacagtactaaatggagaaaattagtagatttcagagaacttaat
aagaaaactcaagacttctgggaagttcaattaggaataccacatcccgc
agggttaaaaaagaaaaaatcagtaacagtactggatgtgggtgatgcat
atttttcagttcccttagataaagaattcaggaagtacactgcatttacc
atacctagtataaacaatgagacaccagggattagatatcagtacaatgt
gcttccacagggatggaaaggatcaccagcaatattccaaagcagcatga
caaaaatcttagagccttttagaaaacaaaatccagacatagttatctat
caatacatggacgatttgtatgtaggatctgacttagaaatagggcagca
tagaacaaaaatagaggaactgagacaacatctgttgaagtggggattta
ccacaccagacaaaaaacatcagaaagaacctccattcctttggatgggt
tatgaactccatcctgataaatggacagtacagcctatagtgctgccaga
aaaggacagctggactgtcaatgacatacagaagttagtgggaaaattga
attgggcaagtcagatttacccagggattaaagtaaagcaattatgtaga
ctccttaggggaaccaaggcactaacagaagtaataccactaacaaaaga
agcagagctagaactggcagaaaacagggaaattctaaaagaaccagtac
atggagtgtattatgacccatcaaaagacttaatagcggaaatacagaag
caggggcaaggtcaatggacatatcaaatttatcaagagccatttaaaaa
tctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatg
taaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagta
atatggggaaagactcctaaatttaaactacccatacaaaaagaaacatg
ggaaacatggtggacagagtattggcaagccacctggattcctgagtggg
agtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaa
gaacccataataggagcagaaactttctatgtagatggggcagctaacag
ggagactaaattaggaaaagcaggatatgttactaacaaagggagacaaa
aagttgtctccataactgacacaacaaatcagaagactgagttacaagca
attcttctagcattacaggattctggattagaagtaaacatagtaacaga
ctcacaatatgcattaggaatcattcaagcacaaccagataaaagtgaat
cagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtc
tacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagt
agataaattagtcagtactggaatcaggaaagtactctttttagatggaa
tagataaagcccaagaagaacatgaaaaatatcacagtaattggagggca
atggctagtgattttaacctgccacctgtggtagcaaaagagatagtagc
cagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtag
actgtagtccaggaatatggcaactagattgtacacatttagaaggaaaa
attatcctggtagcagttcatgtagccagtggatatatagaagcagaagt
tattccagcagaaacagggcaggaaacagcatactttctcttaaaattag
caggaagatggccagtaaaaacagtacatacagacaatggcagcaatttc
accagtactacagttaaggccgcctgttggtgggcaggaatcaagcagga
atttggcattccctacaatccccaaagtcaaggagtagtagaatctataa
ataaagaattaaagaaagttataggacagataagagatcaggctgaacat
cttaagacagcagtacaaatggcagtattcatccacaattttaaaagaaa
aggggggattggggggtacagtgcaggggaaagaatagtagacataatag
caacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaa
aattttcgggtttattacagggacagcagagatccactttggaaaggacc
agcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataata
gtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattat
ggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgagga
ttagaacatggaaaagtttagtaaaacaccatatgtatgtttcaaggaaa
gctaagggatggttttatagacatcactatgaaagtactcatccgagaat
aagttcagaagtacacatcccactagggaatgcaaaattggtaataacaa
catattggggtctacatacaggagaaagagactggcatttgggtcaagga
gtctccatagaattgaggaaaaggagatatagcacacaattagaccctaa
cctagcagaccaactaattcatctgcattactttgattgtttttcagaat
ctgctataagaaatgccatattaggacatatagttagccctaggtgtgaa
tatcaagcaggacataacaaggtaggatctctacagtacttggcactaac
agcattagtaagaccaagaaaaaagataaagccacctttgcctagtgtta
caaaactgacagaggatagatggaacaagccccagaagaccaagggccac
aaagggaaccatacaatgaatggacactagaacttttagaggagctcaag
aatgaagctgttagacattttcctaggatatggctccatagcttagggca
acatatctatgaaacttatggagatacttgggcaggagtggaagccataa
taagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaa
catagcagaatagacattcttcgacgaaggagagcaagaaatggagccag
tagatcctagactagagccctggaagcatccaggaagtcagcctaggact
gcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttg
tttcataacaaaaggcttaggcatctcctatggcaggaagaagcggagac
agcgacgaagagctcctcaagacagtcagactcatcaagtttctctatca
aagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagt
agcattagtagtagcagcaataatagcaatagttgtgtggtccatagtat
tcatagaatataggaaaataagaagacaaaacaaaatagaaaggttgatt
gatagaataatagaaagagcagaagacagtggcaatgagagtgacggaga
tcaggaagaattatcagcacttgtggaaatggggcacgatgctccttggg
atgttaatgatctgtaaagctgcagaaaatttgtgggtcacagtttatta
tggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcag
atgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcc
tgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgac
agaaaattttaacatgtggaaaaataacatggtagaccaaatgcatgagg
atataattagtttatgggatcaaagcctaaagccatgtgtaaaattaacc
ccactctgtgttactttaaattgcactgattatgggaatgatactaacac
caataatagtagtgctactaaccccactagtagtagcgggggaatggagg
ggagaggagaaataaaaaattgctctttcaatatcaccagaagcataaga
gataaagtgaagaaagaatatgcacttttttatagtcttgatgtaatacc
aataaaagatgataatactagctataggttgagaagttgtaacacctcag
tcattacacaggcctgtccaaaggtatcctttgaaccaattcccatacat
tattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagtt
caatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatg
gaattaggccagtagtatcaactcaactgctgttaaatggcagtctagca
gaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagt
cataatagtacatctgaatgaatctgtagaaattaattgtacaagactca
acaacattacaaggagaagtatacatgtaggacatgtaggaccaggcaga
gcaatttatacaacaggaataataggaaaaataagacaagcacattgtaa
cattagtagagcaaaatggaataacactttaaaacagatagttacaaaat
taagagaacaatttaagaataaaacaatagtctttaatcaatcctcagga
ggggacccagaaattgtaatgcacagttttaattgtggaggggaattttt
ctactgtaattcaacacaactgtttaacagtacttggaatggtactgcat
ggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgc
agaataaaacaaattataaacatgtggcaggaagtaggaaaagcaatgta
tgcacctcccatcagaggacaaattagatgttcatcaaatattacagggc
tgatattaacaagagatggtggtattaaccagaccaacaccaccgagatt
ttcaggcctggaggaggagatatgaaggacaattggagaagtgaattata
taaatataaagtagtaaaaattgaaccattaggagtagcacccaccaagg
caaagagaagagtggtgcaaagagaaaaaagagcagtgggaataatagga
gctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagc
gtcaatgacgctgacggtacaggccagacaattattgtctggtatagtgc
aacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttg
cacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgt
ggaaagatacctaagggatcaacagctcctggggttttggggttgctctg
gaaaactcatttgcaccactgctgtgccttggaatactagttggagtaat
aaatctctgagtcagatttgggataacatgacctggatgcagtgggaaag
ggaaattgataattacacaagcttaatatacaacttaattgaagaatcgc
aaaaccaacaagaaaagaatgaacaagagttattggaattagataactgg
gcaagtttgtggaattggtttagcataacaaattggctgtggtatataaa
aatattcataatgatagtaggaggcttggtaggtttaagaatagttttta
ctgtactttctatagtaaatagagttaggcagggatactcaccattgtcg
tttcagacgcgcctcccagccaggaggggacccgacaggcccgaaggaat
cgaagaagaaggtggagagagagacagagacagatccggtcaattagtgg
atggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttc
agctaccaccgcttgagagacttactcttgattgtaacgaggattgtgga
acttctgggacgcagggggtgggaagccctcaaatattggtggaatctcc
tacaatattggattcaggaactaaagaatagtgctgttagcttgctcaac
gccacagccatagcagtagctgagggaactgatagggttatagaagtatt
acaaagagcttgtagagctattctccacatacctagaagaataagacagg
gcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtag
taaaattggatggcctactgtaagggaaagaatgagaagagctgagccag
cagcagatggggtgggagcagtatctcgagacctggaaaaacatggagca
atcacaagtagtaatacagcaactaacaatgctgattgtgcctggctaga
agcacaagaggaggaggaggtgggttttccagtcagacctcaggtacctt
taagaccaatgacttacaagggagcgttagatcttagccactttttaaaa
gaaaaggggggactggaagggctaatttggtcccagaaaagacaagacat
ccttgatttgtgggtccaccacacacaaggctacttccctgattggcaga
actacacaccagggccagggatcagatatccactgacctttggttggtgc
ttcaagctagtaccagttgagccagagaaggtagaagaggccaatgaagg
agagaacaacagattgttacaccctgtgagcctgcatgggatggaggacc
cggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgt
cacatggcccgagagctgcatccggagtactacaaggactgctgacactg
agctttctacaagggactttccgctggggactttccagggaggcgtggcc
tgggcgggactggggagtggcgagccctcagatgctgcatataagcagct
gctttttgcctgtactgggtctctcttgttagaccagatctgagcctggg
agctctctggctaactagggaacccactgcttaagcctcaataaagcttg
ccttgagtgcttcaggtctctcttgttagaccagatctgagcctgggagc
tctctggctaactagggaacccactgcttaagcctcaataaagcttgcct
tgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagcta
gagatcccttcagaccaaatttagtcagtgtgaaaaatctctagcagtgg
cgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcg
acgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacgg
cgactggtgagtacgccaaaattttgactagcggaggctagaaggagaga
gatgggtgcgagagcgtcgatattaagcgggggaggattagatagatggg
aaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacat
ttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcct
attagaaacatcagaaggttgtagacaaatactgggacaactacaaccag
cccttcagacaggatcagaagaacttagatcattatataatacagtagca
accctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagc
tttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagc
aagcagcagctgacacaggaaatagcagccaggtcagccaaaattacccc
atagtgcagaacatccaggggcaaatggtacatcaggccatatcacctag
aactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccag
aagtaatacccatgttttcagcattatcagaaggagccaccccacaagat
ttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaat
gttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatc
cagtgcatgcagggcctcatccaccaggccagatgagagaaccaagggga
agtgacatagcaggaactactagtacccttcaggaacaaatagcatggat
gacaaataatccacctatcccagtaggagaaatctataagagatggataa
tcctgggattaaataaaatagtaaggatgtatagccctaccagcattctg
gacataaaacaaggaccaaaggaaccctttagagactatgtagaccggtt
ctataagactctaagagccgagcaagcttcacaggaggtaaaaaattgga
tgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactatt
ttaaaagcattgggaccagcagctacactagaagaaatgatgacagcatg
tcagggagtgggaggacccggccataaagcaagagttttggcagaagcaa
tgagccaagtaacaaattcagctaccataatgatgcagaaaggcaatttt
aggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggca
catagccaaaaattgcagggcccctaggaaaaggggctgttggaaatgtg
gaaaggagggacaccaaatgaaagattgtactgagagacaggctaatttt
ttagggaaaatctggccttcccacaggggaaggccagggaattttcctca
gaacagactagagccaacagccccaccagccccaccagaagagagcttca
ggtttggggaagagacaacaactccctctcagaagcaggagctgatagac
aaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccc
cttgtcacaataaagataggggggcaactaaaggaagctctattagatac
aggagcagatgatacagtattagaagaaataaatttgccaggaagatgga
aaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtat
gatcaaatactcgtagaaatctgtggacataaagctataggtacagtatt
agtaggacctacacctgtcaacataattggaagaaatctgttgactcaga
ttggttgcactttaaattttcccattagtcctattgaaactgtaccagta
aaattaaagccaggaatggatggcccaaaagttaaacaatggccattgac
agaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaagg
aaggaaaaatttcaaaaatcgggcctgaaaatccatataatactccagta
tttgccataaagaaaaaagacagtactaaatggagaaaattagtagattt
cagagaacttaataagaaaactcaagacttctgggaagttcaattaggaa
taccacatcccgcagggttaaaaaagaaaaaatcagtaacagtactggat
gtgggtgatgcatatttttcagttcccttagataaagaattcaggaagta
cactgcatttaccatacctagtataaacaatgagacaccagggattagat
atcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattc
caaagcagcatgacaaaaatcttagagccttttagaaaacaaaatccaga
catagttatctatcaatacatggacgatttgtatgtaggatctgacttag
aaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttg
aagtggggatttaccacaccagacaaaaaacatcagaaagaacctccatt
cctttggatgggttatgaactccatcctgataaatggacagtacagccta
tagtgctgccagaaaaggacagctggactgtcaatgacatacagaagtta
gtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaa
gcaattatgtagactccttaggggaaccaaggcactaacagaagtaatac
cactaacaaaagaagcagagctagaactggcagaaaacagggaaattcta
aaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagc
ggaaatacagaagcaggggcaaggtcaatggacatatcaaatttatcaag
agccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcc
cacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccac
agaaagcatagtaatatggggaaagactcctaaatttaaactacccatac
aaaaagaaacatgggaaacatggtggacagagtattggcaagccacctgg
attcctgagtgggagtttgtcaatacccctcccttagtaaaattatggta
ccagttagagaaagaacccataataggagcagaaactttctatgtagatg
gggcagctaacagggagactaaattaggaaaagcaggatatgttactaac
aaagggagacaaaaagttgtctccataactgacacaacaaatcagaagac
tgagttacaagcaattcttctagcattacaggattctggattagaagtaa
acatagtaacagactcacaatatgcattaggaatcattcaagcacaacca
gataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaa
aaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggag
gaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactc
tttttagatggaatagataaagcccaagaagaacatgaaaaatatcacag
taattggagggcaatggctagtgattttaacctgccacctgtggtagcaa
aagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatg
catggacaagtagactgtagtccaggaatatggcaactagattgtacaca
tttagaaggaaaaattatcctggtagcagttcatgtagccagtggatata
tagaagcagaagttattccagcagaaacagggcaggaaacagcatacttt
ctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaa
tggcagcaatttcaccagtactacagttaaggccgcctgttggtgggcag
gaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagta
gtagaatctataaataaagaattaaagaaagttataggacagataagaga
tcaggctgaacatcttaagacagcagtacaaatggcagtattcatccaca
attttaaaagaaaaggggggattggggggtacagtgcaggggaaagaata
gtagacataatagcaacagacatacaaactaaagaactacaaaaacaaat
tacaaaaattcaaaattttcgggtttattacagggacagcagagatccac
tttggaaaggaccagcaaagcttctctggaaaggtgaaggggcagtagta
atacaagataatagtgacataaaagtagtgccaagaagaaaagcaaagat
cattagggattatggaaaacagatggcaggtgatgattgtgtggcaagta
gacaggatgaggattagaacatggaaaagtttagtaaaacaccatatgta
tgtttcaaggaaagctaagggatggttttatagacatcactat
Finding genes in prokaryotic genomes ...
17 Finding genes in prokaryotic genomes ...
- Intrinsic Information
- Transcription signals
- Codon usage
- GC content
- External Information
- Similarity to known proteins in other organisms
18 Finding genes in eukaryotic genomes ...
- Some of the problems
- Complexity of genome increases with complexity
of organisms - 1-2 coding sequence
- Intron exon structure
- Alternative splicing (30 of all genes)
- Pseudogenes
19Why do we need Bioinformatics?
Under which conditions is a certain gene
transcribed?
20Gene Expression Analysis using DNA
Microarrays
- Microarrays are ordered sets of DNA molecules of
known or unknown sequence immobilized on a
support (e.g. glass). - DNA
- DNA replication
- RNA
- Genome-wide Protein-DNA interactions
21PCR Microarrays
22GeneChips
23(No Transcript)
24Examples of Microarray images
PCR Microarray
Genechip
25Analysis Interpretation making sense of the
expression data
- The most frequent problem is to find sets of
genes that show correlated expression profiles
during a process (cell cycle, development), a
pathological condition (cancer, myopathy) or an
external stimulus (temperature, medium, drug) - Computing raw data
- Filter genes (up- or downregulated)
- Cluster expression profiles
http//www.biozentrum.unibas.ch/primig/
26Where do we need Bioinformatics?
What do we know about a specific protein?
27Minimal content of a  protein sequence db
- Sequences !!
- Accession number (AC)
- Taxonomic data
- References
- ANNOTATION/CURATION
- Keywords
- Cross-references
- Documentation
28SWISS-PROT/TrEMBL
- Collaboration between the SIB (CH) and EMBL/EBI
(UK) - SWISS-PROT Fully annotated (manually),
non-redundant, cross-referenced, documented
protein sequence database. - TrEMBL is automatically generated (from
annotated EMBL coding sequences (CDS)) and
annotated using software tools.
http//www.expasy.org/sprot/
29ExPASy Web Server ExPASy Expert Protein
Analysis System
http//www.expasy.org/
30- Some databases in the field of molecular
biology - AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR,
AsDb, BBDB, BCGD, Beanref, Biolmage,
BioMagResBank, BIOMDB, BLOCKS,
BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE,
CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase,
CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB,
CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP,
DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD,
DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc,
EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB,
ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS,
Genbank, GeneCards, Genline, GenLink, GENOTK,
GenProtEC, GermOnline, GIFTS, GPCRDB, GRAP,
GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS,
HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb,
HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN,
ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG,
Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel,
MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat,
MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR,
MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA,
OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD,
Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB,
PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,
PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP,
SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD,
SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList,
SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-MODEL
Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS,
TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR,
VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM,
......... etc
31Why do we need Bioinformatics?
How can we compare protein sequences?
32Sequence alignments and comparison
1 MYTAILORISRICH 2 MONTAILLEURESTRICHE
33HBA_CHICK VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYP
PTKTYFPHF-DL 48 HBAD_CHICK ML-TAEDKKLIQQAWEKAA
SHQEEFGAEALTRMFTTYPQTKTYFPHF-DL 48 HBPI_CHICK
AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV
48 HBB_CHICK VHWTAEEKQLITGLWGKV--NVAECGAEALA
RLLIVYPWTQRFFASFGNL 48 HBE_CHICK
VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL
48 HBRH_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALA
RLLIVYPWTQRFFDNFGNL 48 MYG_CHICK
GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL
49 .... . .. . ..
.. . .. HBA_CHICK
SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV
93 HBAD_CHICK SP-----GSDQVRGHGKKVLGALGNAVKNVD
NLSQAMAELSNLHAYNLRV 93 HBPI_CHICK
SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV
93 HBB_CHICK SSPTAILGNPMVRAHGKKVLTSFGDAVKNLD
NIKNTFSQLSELHCDKLHV 98 HBE_CHICK
SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV
98 HBRH_CHICK SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLD
NIKNTYAKLSELHCEKLHV 98 MYG_CHICK
KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI
99 . . .. ... . . ..
.. . .. .. HBA_CHICK
DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR--
141 HBAD_CHICK DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAA
FDKFLSAVSAVLAEKYR-- 141 HBPI_CHICK
DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR--
141 HBB_CHICK DPENFRLLGDILIIVLAAHFSKDFTPECQAA
WQKLVRVVAHALARKYH-- 146 HBE_CHICK
DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH--
146 HBRH_CHICK DPENFRLLGNILIIVLAAHFTKDFTPTCQAV
WQKLVSVVAHALAYKYH-- 146 MYG_CHICK
PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF
149 . .... . . . . ... .
. . .. . HBA_CHICK ----
141 HBAD_CHICK ---- 141 HBPI_CHICK ----
141 HBB_CHICK ---- 146 HBE_CHICK ----
146 HBRH_CHICK ---- 146 MYG_CHICK GFQG
153 Consensus length 154
Identity 19 ( 12.3) Similarity 51 (
33.1) Character to show that a position in the
alignment is perfectly conserved '' Character
to show that a position is well conserved '.'
Multiple Sequence Alignment (MSA)
- Programs
- CLUSTALW
- T_COFFEE
- MULTALIGN
34NCBI BLAST http//www.ncbi.nlm.nih.gov/blast/
BLAST Basic Local Alignment Search Tool
35Why do we need Bioinformatics?
Can we predict protein structures?
36Protein Structure Modeling
- Ab initio modeling
- Threading Fold Recognition
- Homology Modeling
?
MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN
AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR
NAKLKPVYDS LDAVRRCALI NMVFQMGETG
VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI
TTFRTGTWDA YKNL
37Some remarks on bioinformatics in general ...
- It does not replace but complement experimental
research - It helps plan experiments
- Good bioinformatics studies take a long time
- Like anywhere else some garbage in, a lot of
garbage out
38For lecture notes, articles, references, links,
etc. see Teaching _at_ http//www.bioz.unibas.
ch/personal/schwede/ http//www.bioz.unibas.ch/pe
rsonal/primig/