Introduction to bioinformatics Lecture 2 Genes and Genomes - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Introduction to bioinformatics Lecture 2 Genes and Genomes

Description:

Transcription Translation = Expression. But DNA can also be transcribed into non-coding RNA ... very early on ('Celtic gene') Autosomal, recessive, ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 57
Provided by: heri1
Category:

less

Transcript and Presenter's Notes

Title: Introduction to bioinformatics Lecture 2 Genes and Genomes


1
Introduction to bioinformaticsLecture 2Genes
and Genomes
2
Organisational
  • Course website http//ibi.vu.nl/teaching/mnw_2yea
    r/mnw2_2007.php
  • or click on
  • http//ibi.vu.nl
  • (gtteaching gtIntroduction to Bioinformatics)
  • Course book Bioinformatics and Molecular
    Evolution by Paul G. Higgs and Teresa K. Attwood
    (Blackwell Publishing), 2005, ISBN (Pbk)
    1-4051-0683-2
  • Lots of information about Bioinformatics can be
    found on the web.

3
DNA sequence
.....acctc ctgtgcaaga acatgaaaca nctgtggttc
tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg
ggcccaggac tggggaagcc tccagagctc aaaaccccac
ttggtgacac aactcacaca tgcccacggt gcccagagcc
caaatcttgt gacacacctc ccccgtgccc acggtgccca
gagcccaaat cttgtgacac acctccccca tgcccacggt
gcccagagcc caaatcttgt gacacacctc ccccgtgccc
ccggtgccca gcacctgaac tcttgggagg accgtcagtc
ttcctcttcc ccccaaaacc caaggatacc cttatgattt
cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag
ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac
ggcgtggagg tgcataatgc caagacaaag ctgcgggagg
agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac
cgtcctgcac caggactggc tgaacggcaa ggagtacaag
tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg
cctggtcaaa ggcttctacc ccagcgacat cgccgtggag
tgggagagca atgggcagcc ggagaacaac tacaacacca
cgcctcccat gctggactcc gacggctcct tcttcctcta
cagcaagctc accgtggaca agagcaggtg gcagcagggg
aacatcttct catgctccgt gatgcatgag gctctgcaca
accgctacac gcagaagagc ctctc.....
4
Genome size
Organism Number of base pairs ?X-174
virus 5,386 Epstein Bar Virus 172,282 Mycopla
sma genitalium 580,000 Hemophilus
Influenza 1.8 ? 106 Yeast (S. Cerevisiae) 12.1
? 106 Human 3.2 ? 109 Wheat 16 ?
109 Lilium longiflorum 90 ? 109 Salamander 1
00 ? 109 Amoeba dubia 670 ? 109
5
Four DNA nucleotide building blocks
G-C is more strongly hydrogen-bonded than A-T
6
A gene codes for a protein
CCTGAGCCAACTATTGATGAA
CCUGAGCCAACUAUUGAUGAA
PEPTIDE
7
Central Dogma of Molecular Biology
Transcription
Translation
Replication
DNA
mRNA
Protein
Transcription is carried out by RNA polymerase
(II) Translation is performed on
ribosomes Replication is carried out by DNA
polymerase Reverse transcriptase copies RNA into
DNA
Transcription Translation Expression
8
But DNA can also be transcribed into non-coding
RNA
  • tRNA (transfer) transfer of amino acids to
    theribosome during protein synthesis.
  • rRNA (ribosomal) essential component of the
    ribosomes (complex with rProteins).
  • snRNA (small nuclear) mainly involved in
    RNA-splicing(removal of introns). snRNPs.
  • snoRNA (small nucleolar) involved in chemical
    modifi-cations of ribosomal RNAs and other RNA
    genes. snoRNPs.
  • SRP RNA (signal recognition particle) form
    RNA-protein complex involved in mRNA secretion.
  • Further microRNA, eRNA, gRNA, tmRNA etc.

9
Eukaryotes have spliced genes
  • Promoter involved in transcription initiation
    (TF/RNApol-binding sites)
  • TSS transcription start site
  • UTRs un-translated regions (important for
    translational control)
  • Exons will be spliced together by removal of the
    Introns
  • Poly-adenylation site important for transcription
    termination (but also mRNA stability,
    export mRNA from nucleus etc.)

10
DNA makes mRNA makes Protein
11
DNA makes RNA makes Protein
yet another picture to appreciate the above
statement
12
Some facts about human genes
  • There are about 20.000 25.000 genes in the
    human genome ( 3 of the genome)
  • Average gene length is 8.000 bp
  • Average of 5-6 exons per gene
  • Average exon length is 200 bp
  • Average intron length is 2000 bp
  • 8 of the genes have a single exon
  • Some exons can be as small as 1 or 3 bp

13
DMD the largest known human gene
  • The largest known human gene is DMD, the gene
    that encodes dystrophin 2.4 milion bp over 79
    exons
  • X-linked recessive disease (affects boys)
  • Two variants Duchenne-type (DMD) and becker-type
    (BMD)
  • Duchenne-type more severe, frameshift-mutations
    Becker-type milder phenotype, in frame-
    mutations

Posture changes during progression of Duchenne
muscular dystrophy
14
Nucleic acid basics
  • Nucleic acids are polymers

nucleotide
nucleoside
  • Each monomer consists of 3 moieties

15
Nucleic acid basics (2)
  • A base can be of 5 rings
  • Purines and Pyrimidines can base-pair (Watson-
    Crick pairs)

Watson and Crick, 1953
16
Nucleic acid as hetero-polymers
  • Nucleosides, nucleotides
  • DNA and RNA strands

(Ribose sugar, RNA precursor)
(2-deoxy ribose sugar, DNA precursor)
  • REMEMBER
  • DNA deoxyribonucleotidesRNA ribonucleotides
    (OH-groups at the 2 position)
  • Note the directionality of DNA (5-3 3-5) or
    RNA (5-3)
  • DNA A, G, C, T RNA A, G, C, U

(2-deoxy thymidine tri- phosphate, nucleotide)
17
So
  • DNA

RNA
18
Stability of base-pairing
  • C-G base pairing is more stable than A-T (A-U)
    base pairing (why?)
  • 3rd codon position has freedom to evolve
    (synonymous mutations)
  • Species can therefore optimise their G-C content
    (e.g. thermophiles are GC rich) (consequences for
    codon use?)

Thermocrinis ruber, heat-loving bacteria
19
(No Transcript)
20
Single Letter Code
DNA codons
Amino Acid
ATT, ATC, ATA
I
Isoleucine  
CTT, CTC, CTA, CTG, TTA, TTG
L
Leucine  
GTT, GTC, GTA, GTG
V
Valine
TTT, TTC
F
Phenylalanine  
ATG
M, Start
Methionine
TGT, TGC
c
Cysteine 
GCT, GCC, GCA, GCG
A
Alanine      
GGT, GGC, GGA, GGG
G
Glycine  
CCT, CCC, CCA, CCG
P
Proline      
ACT, ACC, ACA, ACG
T
Threonine  
TCT, TCC, TCA, TCG, AGT, AGC
S
Serine       
TAT, TAC
Y
Tyrosine  
TGG
W
Tryptophan  
CAA, CAG
Q
Glutamine  
AAT, AAC
N
Asparagine  
CAT, CAC
H
Histidine 
GAA, GAG
E
Glutamic acid  
GAT, GAC
D
Aspartic acid 
AAA, AAG
K
Lysine       
CGT, CGC, CGA, CGG, AGA, AGG
R
Arginine  
TAA, TAG, TGA
Stop
Stop codons
21
DNA compositional biases
  • Base compositions of genomes GC (and therefore
    also AT) content varies between different
    genomes
  • The GC-content is sometimes used to classify
    organism in taxonomy
  • High GC content bacteria Actinobacteriae.g. in
    Streptomyces coelicolor it is 72Low GC
    content Plasmodium falciparum (20)
  • Other examples



22
Genetic diseases cystic fibrosis
  • Known since very early on (Celtic gene)
  • Autosomal, recessive, hereditary disease (Chr.
    7)
  • Symptoms
  • Exocrine glands (which produce sweat and mucus)
  • Abnormal secretions
  • Respiratory problems
  • Reduced fertility and (male) anatomical anomalies

3,000
20,000
30,000
23
cystic fibrosis (2)
  • Gene product CFTR (cystic fibrosis transmembrane
    conductance regulator)
  • CFTR is an ABC (ATP-binding cassette) transporter
    or traffic ATPase.
  • These proteins transport molecules such as
    sugars, peptides, inorganic phosphate, chloride,
    and metal cations across the cellular membrane.
  • CFTR transports chloride ions (Cl-) ions across
    the membranes of cells in the lungs, liver,
    pancreas, digestive tract, reproductive tract,
    and skin.

24
cystic fibrosis (3)
  • CF gene CFTR has 3-bp deletion leading to Del508
    (Phe) in 1480 aa protein (epithelial Cl-
    channel)
  • Protein degraded in Endoplasmatic Reticulum (ER)
    instead of inserted into cell membrane

Theoretical Model of NBD1. PDB identifier 1NBD as
viewed in Protein Explorer http//proteinexplorer.
org
Diagram depicting the five domains of the CFTR
membrane protein (Sheppard 1999).
The deltaF508 deletion is the most common cause
of cystic fibrosis. The isoleucine (Ile) at amino
acid position 507 remains unchanged because both
ATC and ATT code for isoleucine
25
Lets return to DNA and RNA structure
  • Unlike three dimensional structures of proteins,
    DNA molecules assume simple double helical
    structures independent of their sequences.
  • There are three kinds of double helices that have
    been observed in DNA type A, type B, and type Z,
    which differ in their geometries.
  • RNA on the other hand, can have as diverse
    structures as proteins, as well as simple double
    helix of type A.
  • The ability of being both informational and
    diverse in structure suggests that RNA was the
    prebiotic molecule that could function in both
    replication and catalysis (The RNA World
    Hypothesis).
  • In fact, some viruses encode their genetic
    materials by RNA (retrovirus)

26
Three dimensional structures of double helices
Side view A-DNA, B-DNA, Z-DNA
Space-filling models of A, B and Z- DNA
Top view A-DNA, B-DNA, Z-DNA
27
Major and minor grooves



28
Forces that stabilize nucleic acid double helix
  • There are two major forces that contribute to
    stability of helix formation
  • Hydrogen bonding in base-pairing
  • Hydrophobic interactions in base stacking

Same strand stacking
cross-strand stacking


29
Types of DNA double helix
  • Type A
  • major conformation RNA
  • minor conformation DNA
  • Right-handed helix
  • Type B
  • major conformation DNA
  • Right-handed helix
  • Type Z
  • minor conformation DNA
  • Left-handed helix


30
Secondary structures of Nucleic acids
  • DNA is primarily in duplex form
  • RNA is normally single stranded which can have a
    diverse form of secondary structures other than
    duplex.


31
Non B-DNA Secondary structures
  • Cruciform DNA
  • Slipped DNA
  • Triple helical DNA


Hoogsteen basepairs
Source Van Dongen et al. (1999) , Nature
Structural Biology  6, 854 - 859
32
More Secondary structures
  • RNA pseudoknots
  • Cloverleaf rRNA structure


16S rRNA Secondary Structure Based
onPhylogenetic Data
Source Cornelis W. A. Pleij in Gesteland, R. F.
and Atkins, J. F. (1993) THE RNA WORLD. Cold
Spring Harbor Laboratory Press.
33
3D structures of RNA transfer-RNA structures
  • Secondary structure of tRNA (cloverleaf)
  • Tertiary structure of tRNA


34
3D structures of RNA ribosomal-RNA structures
  • Secondary structure of large rRNA (16S)
  • Tertiary structure of large rRNA subunit


35
3D structures of RNA Catalytic RNA
  • Secondary structure of self-splicing RNA
  • Tertiary structure of self-splicing RNA


36
Some structural rules
  • Base-pairing is stabilizing
  • Un-paired sections (loops) destabilize
  • 3D conformation with interactions makes up for
    this


37
Three main principles
  • DNA makes RNA makes Protein
  • Structure more conserved than sequence
  • Sequence Structure Function

38
How to go from DNA to protein sequence
A piece of double stranded DNA 5
attcgttggcaaatcgcccctatccggc 3 3
taagcaaccgtttagcggggataggccg 5
DNA direction is from 5 to 3
39
How to go from DNA to protein sequence
6-frame conceptual translation using the codon
table 5 attcgttggcaaatcgcccctatccggc 3 3
taagcaaccgtttagcggggataggccg 5
So, there are six possibilities to make a protein
from an unknown piece of DNA, only one of which
might be a natural protein
40
Remark
  • Identifying (annotating) human genes, i.e.
    finding what they are and what they do, is a
    difficult problem
  • First, the gene should be delineated on the
    genome
  • Gene finding methods should be able to tell a
    gene region from a non-gene region
  • Start, stop codons, further compositional
    differences
  • Then, a putative function should be found for the
    gene located

41
Evolution and three-dimensional protein structure
information
Isocitrate dehydrogenase The distance from the
active site (in yellow) determines the rate of
evolution (red fast evolution, blue slow
evolution)
Dean, A. M. and G. B. Golding Pacific Symposium
on Bioinformatics 2000
42
Genomic Data Sources
  • DNA/protein sequence
  • Expression (microarray)
  • Proteome (xray, NMR,
  • mass spectrometry)
  • Metabolome
  • Physiome (spatial,
  • temporal)

Integrative bioinformatics
43
Genomic Data Sources Vertical Genomics
genome
transcriptome
proteome
metabolome
physiome
Dinner discussion Integrative Bioinformatics
Genomics VU
44
DNA makes RNA makes Protein(reminder)
45
DNA makes RNA makes ProteinExpression data
  • More copies of mRNA for a gene leads to more
    protein
  • mRNA can now be measured for all the genes in a
    cell at ones through microarray technology
  • Can have 60,000 spots (genes) on a single gene
    chip
  • Colour change gives intensity of gene expression
    (over- or under-expression)

46
(No Transcript)
47
Proteomics
  • Elucidating all 3D structures of proteins in the
    cell
  • This is also called Structural Genomics
  • Finding out what these proteins do
  • This is also called Functional Genomics

48
(No Transcript)
49
Protein-protein interaction networks
50
Metabolic networksGlycolysis and
Gluconeogenesis
Kegg database (Japan)
51
High-throughput Biological Data
  • Enormous amounts of biological data are being
    generated by high-throughput capabilities even
    more are coming
  • genomic sequences
  • arrayCGH (Comparative Genomic Hybridization)
    data, gene expression data
  • mass spectrometry data
  • protein-protein interaction data
  • protein structures
  • ......

52
Protein structural data explosion
Protein Data Bank (PDB) 14500 Structures (6
March 2001) 10900 x-ray crystallography, 1810
NMR, 278 theoretical models, others...
53
Dickersons formula equivalent to Moores law
n e0.19(y-1960) with y the year.
On 27 March 2001 there were 12,123 3D protein
structures in the PDB Dickersons formula
predicts 12,066 (within 0.5)!
54
Sequence versus structural data
  • Structural genomics initiatives are now in full
    swing and growth is still exponential.
  • However, growth of sequence data is even more
    rapidly. There are now more than 500 completely
    sequenced genomes publicly available.
  • Increasing gap between structural and sequence
    data (Mind the gap)

55
Bioinformatics
Bioinformatics
Large - external (integrative) Science Human
Planetary Science Cultural Anthropology
Population Biology Sociology
Sociobiology Psychology Systems
Biology Biology Medicine
Molecular Biology
Chemistry Physics Small
internal (individual)
56
Bioinformatics
  • Offers an ever more essential input to
  • Molecular Biology
  • Pharmacology (drug design)
  • Agriculture
  • Biotechnology
  • Clinical medicine
  • Anthropology
  • Forensic science
  • Chemical industries (detergent industries, etc.)
Write a Comment
User Comments (0)
About PowerShow.com