Title: KVL bioinformatics.
1 KVL bioinformatics. Introduction. KVL, 31
August 2004, 8-9.20 Henrik Christensen, Ph.
D.Department of Veterinary Pathobiology, KVL
Stigbøjlen 4 1870 Frederiksberg C Email
hech_at_kvl.dkÂ
2Anvendelse af bioinformatik på KVL Bioinformatik
anvendes i forhold til større fagområder/uddannel
ser (agronom, bioteknologi, levnedsmiddel,
veterinær). Der arbejdes kun lidt med udvikling
af metoder til bioinformatik (algoritmer,
programudvikling, computer-teknologi). Eksperimen
tielt laboratoriearbejde indenfor biokemi og
molekylærbiologi vil ofte kræve anvendelse af
bioinformatik.
3Anvendelse af bioinformatik ved KVL.
 Organisme-niveau Eukaryoter Husdyr La
ndbrugsplanter (Arabidopsis) Svampe Prokaryoter
Bacteria Virus Sygdomsfremkaldende i dyr
og planter Bakteriofager
4Anvendelse af bioinformatik ved KVL.
 Molekylært niveau Genomics Microarrays
Proteomics Protein stucture
5Biological background Evolution of organisms.
1. All organisms are related by common descent
2. and have evolved by Natural
Selection. probably all the organic beings ..
have descended from some one primordial form.
This preservation of favour-able variations
and the rejection of injurious variations, I
call Natural Selection. (The Origin of
Species. C. Darwin).
6Biological background Evolution of genes.
All genes are related (Zuckerkandl and
Pauling 1960s). Genes sometimes evolve
independently of the organisms (formulated as
The Selfish Gene. R. Dawkins 1976). Only a few
thousand gene families exists (One thousand
familes for the molecular biologist. Cyrus
Chothia 1992. Nature 357, 543-4).
7Evolution of genes Orthologs (same
function) Paralogs (different functions)
? globin mouse
? globin
Globin ancestor
? globin
Dublication
Ancestral globin gene
Myoglobin
Myoglobin mouse
Myoglobin man
? globin
Myoglobin
Hemoglobin
? globin
8Biological background. Homology. Sequences are
homologous if they have shared a common ancestor.
Orthology (ortho-, exact). Two sequences (DNA
or protein) are orthologous if derived from a
common ancestor (homologous) and encoding the
same function in different species.Paralogy
(para-, in parallel). Two sequences (DNA or
protein) are paralogous if derived from a common
ancesor (homologous) but encoding different
functions in the same species. They represent
dublicated genes. Two proteins may appear
similar because they descend with divergence from
a common ancestral gene.Fitch 1970. syst. Zool.
19, 99-113.
9The most frequent error in bioinformatics. 67
homologous Homology (including orthology
and paralogy) is a property that cannot be
measured. Sequences can be homologous or not.
The degree sequences are related can be
measured by similarity or identity. For
example the two protein sequences are homologous
with 65 similarity and but 61 identity.
10Some fundamental terms and concepts. The
central dogma (Crick) General transfer of
information From DNA to DNA, from DNA to RNA,
from RNA to protein. The genetic code Start ATG
(Met) Stop TAA, TAG, TGA (OCH, AMB,
OPA) ORF Open Reading Frame (from Start to
Stop). CDS Coding Sequence.
11Prediction of proteins from DNA
sequences. Sequenced proteins
(ten-thousands). Predicted but manually curated
(approx. 150.000 eg. Swiss-prot). Predicted
with similarity to known proteins (Putative -,
Predicted - , hypothetical proteins, COGs
millions). ORFs without known function
(ORFans).
12Differences between Eukaryotes, prokaryotes and
vira Eukaryotes. Coding DNA sequence are
separated in regions of exons and introns.
Introns are spliced out of the primary
transcript and only the exons are translated
into protein. EST (Expressed Sequence Tags).
mRNA transcribed into cDNA and
sequenced. Only the exons are obtained and
proteins can be predicted directly.
Prokaryotes. Coding genes are normally
translated directly into protein.
13Differences between Eukaryotes, prokaryotes and
vira Vira. The genetic information is stored in
ssRNA, dsRNA or DNA. Reverse transcription of
the RNA vira into DNA is needed before
sequencing. The mutation rate is many times
higher in vira compared to eukaryotes and
prokaryotes (lack of a mismatch repair system).
High variations in DNA and protein sequences.
14Differences between Programs, databases and
servers. Program is the code facilitating
computer communication and data-analysis. BLAST
is a program Databases are files storing the
sequences. Databases can be downloaded and used
locally or accessed through servers. GenBank is
a database. A server is a computer facilitating
access to programs and databases. Internet
servers provide access to daily updated
information. Servers are located at NCBI.
15Other important terms. Contig (contiguous clone
map). Piece of genome assembled after shotgun
cloning sequencing and assembly. Genomic
sequencing DNA sample mechanically sheared by
sonication. Cloning and sequencing of
fragments. Assembly based on overlapping
sequences. Contig
16Egne forskningsområder Anvendt bioinformatik I
forhold til bakterier og vira Diverse DNA
sekvens-sammenligninger. Fylogeny til
klassifikation, taksonomi og epidemiologi.
Rekombination og forudsigelse af selektion på
gener f. eks. antibiotikaresistens.
17(No Transcript)