Inside the Genome - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Inside the Genome

Description:

Inside the Genome 2001: The Human Genome Prologue RNA word the dark matter of genomics How many coding genes in the human genome? The Bet of 2000: Mean 61710 ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 46
Provided by: TalP2
Category:
Tags: code | genetic | genome | inside

less

Transcript and Presenter's Notes

Title: Inside the Genome


1
Inside the Genome
2
2001 The Human Genome
The club resident JD Watson Back2back with DJ.
Venter and
International Human Genome Sequencing
Consortium, Nature, 409 860-921 (2001)
Venter et. al. , Science 2921304-1351 (2001)
3
Prologue RNA word the dark matter of genomics
  • How many coding genes in the human genome?
  • The Bet of 2000
  • Mean 61710
  • Range 30,000 150,000
  • By the end of the genome project the estimated
    number of human protein-coding genes declined to
    only 25,000
  • What is the source for that discrepancy?
  • ESTs based estimation Vs. Whole Genome annotation

4
RNA revolution
  • The majority of the transcriptional output comes
    from non coding RNA
  • an average of 10 of the human genome (compared
    with 1.5 exonic sequences) resulted in
    transcripts Cheng et al. 2005
  • Or even more...62 of the mouse genome is
    transcribed FANTOM3 Science 2005

5
Various RNAs A partial list
  • messenger RNA (mRNA)
  • Ribosomal RNA (rRNA)
  • Transfer RNA (tRNA)
  • Small nuclear RNA (snRNA)
  • Small nucleolar RNA (snoRNA)
  • Short interfering RNA (siRNA)
  • Micro RNA (miRNA)

6
RNAs are not merely the intermediary cousins of
proteins -The Central dogma of molecular biology
Revisited
Genome
miRNA
Regulation by proteins
Regulation by RNA
Transcriptome
Proteome
7
Research in Biology is complex
  • Deciphering Biological Systems
  • The advantage (what makes this quest feasible)
    and the hindrance (what makes this quest
    inherently difficult) both explained by
    evolution.

8
The Hindrance Topological Entanglement of
functional interconnections
  • The difficulties in our research fundamentally
    owe their complexity to the designer natural
    selection.
  • What is it - a Robot or a UFO ?
  • The reason lies in the profound difference
    between systems designed by natural selection
    and those designed by intelligent engineers
    Langton 1989 Artificial Life.

9
  • Bottom linewe investigate an outrageously
    complex weave of interconnections
  • The textbook networks represent only the tip of
    the iceberg.
  • miRNAs and Regolomics
  • microRNAs - Expected to represent 1 of
    predicted genes Lim et al., 2003
  • Lewis et al., (2003) estimate average of five
    targets per miRNA
  • Many targets are transcription factors - miRNAs
    regulate the regulators

10
The advantage universal homology, thus
enabling comparative biology.
  • Bottom linethe research in biology advances
    through a reductionist approach - using simple
    model organisms to infer functionality of
    homologous systems.

11
Human genome statistics
2.91 billion base pairs 24,000 protein coding
genes (gt30,000 non-coding genes ???) 1.5 exons
(127 nucleotides) 24 introns (3,000
nucleotides) 75 intergenic (no genes) Repetitive
elements rule ( 45 dispersed repeat) Average
size of a gene is 27,894 bases Contains an
average of 8.8 exonsTitin contains 234
exons. Ave. of 4 diff. proteins per gene
(alternative splicing)
12
Detecting genes in the human genome
  • Gene finding methods
  • Ab initio use general knowledge of gene
    structure rules and statisticsThe challenge
    small exons in a sea of introns
  • Homology-based The problem will not detect
    novel genes

13
Genscan (ab initio)
\\// (o o) -. .-.
.-oOOo(_)oOOo-. .-. .-. .-. .-. .-. .-.
.-. .-. .-. .-. .-. .-. .-. .-. .-. .-.
X\ /X\ /X\ /X\ /X\
/X\ /X\ /X\ /X\ /X\
/X\ / \X/ \X/ \X/
\X/ \X/ \X/ \X/ \X/
\X/ \X/ \ ' -' -' -' -'
-' -' -' -' -' -' -' -' -' -'
-' -' -' -' -' -' -'
  • Based on a probabilistic model of a gene
    structure
  • Takes into account- promoters - gene
    composition exons/introns- GC content- splice
    signals
  • Goes over all 6 reading frames

Burge and Karlin, 1997, Prediction of complete
gene structure in human genomic DNA, J. Mol.
Biol. 268
14
Splicing
15
Eukaryotic splice sites
Poly-pyrimidine tract
16
CpG Islands another signal
  • CpG islands are regions of the genome with a
    higher frequency of CG dinucleotides (not
    base-pairs!) than the rest of the genome
  • CpG islands often occur near the beginning of
    genes ? maybe related to the binding of the TF
    Sp1

17
Gene Ontology
  • GO describes proteins in terms of biological
    process(e.g. induction of apoptosis by external
    signals) cellular component(e.g. membrane
    fraction)molecular function(e.g. protein
    kinase)

18
Comparative proteome analysis
Functional categories based on GO
19
Comparative proteome analysis
  • Humans have more proteins involved in
    cytoskeleton, immune defense, and transcription

20
Evolutionary conservation of human proteins
???
21
Horizontal (lateral) gene transfer
  • Lateral Gene Transfer (LGT) is any process in
    which an organism transfers genetic material to
    another organism that is not its offspring

22
  • Mechanisms
  • Transformation
  • Transduction (phages/viruses)
  • Conjugation

23
Bacteria to vertebrate LGT detection
  • E-value of bacterial homolog X9 better than
    eukaryal homolog

Human query Hit e-value Frog ..
4e-180 Mouse 1e-164 E.Coli .. 7e-124
Streptococcus .. 9e-71 Worm .0.1
24
Bacteria to vertebrate LGT
Non-vertebrates
Bacteria
vertebrates
25
(No Transcript)
26
Bacteria to vertebrate LGT??
  • Hundreds of sequenced bacterial genome vs.
    handful of eukaryotes
  • Gene finding in bacteria is much easier than in
    eukaryotes
  • On the practical side rigid mechanical barriers
    to LGT in eukaryotes (nucleus, germ line)

27
Repetitive Elements in the Human Genome
28
Repeats statistics
  • The human genome is 45 dispersed repeat
  • 20 LINEs, (AT rich)
  • 13 is SINES (11 Alu), (GC rich)
  • 8 LTR (retrovirus like) and
  • 2 DNA transposons
  • Another 3 is tandem simple sequence repeats
    (e.g. triplet)
  • And another 3-5 is segmentally duplicated at
    high similarity (over 1kb over 90 id)
  • Identifying and screening these out is essential
    to avoid fake matches

29
LINEs and SINEs
  • Highly successful elements in eukaryotes
  • LINE - Long Interspersed Nuclear Element (gt5,000
    bp)
  • SINE - Short Interspersed Nuclear Element (lt 500
    bp)
  • SINEs are freeriders on the backs of LINEs
    encode no proteins

30
The C-value paradox
  • Genome size does not correlate with organism
    complexity

Amoeba Rice Human Yeast
670 billion 4.3 billion 3 billion 12 million Genome size
? 30,000 20-25,000 6,275 Number of genes
31
Repetitive elements
  • The C-value mystery was partially resolved when
    it was found that large portions of genomes
    contain repetitive elements

32
Are Alus functional??
  • SINEs are transcribed under stress
  • SINE RNAs may bind a protein kinase ? promote
    translation under stress
  • Need to be in regions which are highly
    transcribed
  • Role in alternative splicing

33
Segment duplications
  • 1077 segmental duplications detected
  • Several genes in the duplicated regions
    associated with diseases (may be related to
    homologous recombination)
  • Most are recent duplications (conservation of
    entire segment, versus conservation of coding
    sequences only)

34
Genome-wide studies
35
Sequenced genomes
36
  • 481 segments gt 200 bp absolutely conserved (100
    identity) between human, rat and mouse

37
Comparison with a neutral substitution rate
  • Compare the substitution rate in a any 1Mb region
  • Probability of 10-22 of obtaining 1
    ultranconserved element (UE) by chance

38
481 UEs
100 intronic
111 UE overlap a known mRNA exonic UEs
256 - no overlap (non-exonic)
156 inter-genic
114 - inconclusive
39
Who are the genes?
Type 1 exonic Type 2 genes which are near
non-exonic UEs (???)
40
Intergenic UEs
  • Genes which flank intergenic UEs are enriched for
    early developmental genes
  • Are UEs distal enhancers of these genes?

41
Gene enhancer
  • A short region of DNA, usually quite distant from
    a gene (due to chromatin complex folding), which
    binds an activator
  • An activator recruits transcription factors to
    the gene

42
Experimental studies of UEs
Tested 167 UEs (both mouse-human UEs and
fish-human UEs) for enhancer activity cloned
before a reporter gene to test their
activity 45 functioned as enhancers
43
A bioinformatic success
  • Ultraconservation can predict highly important
    function!

44
BUT
Ahituv PLoS Biol. 2007 Sep5(9)e234
Chose 4 UEs which are near specific genesgenes
which show a specific phenotype when
knocked-out Performed complete deletion of these
UEs the mice were viable and did not show any
different phenotype
45
Conclusions
  • Ultraconservation can be indicative of important
    function
  • And sometimes not- gene redundancy- long-range
    phenotypes- laboratories cannot mimic life
Write a Comment
User Comments (0)
About PowerShow.com