Genome Organization overview - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Genome Organization overview

Description:

C value paradox: the amount of DNA in the haploid cell of an organism is not ... Newt and lungfish genomes ~ 5 and 50 x larger than human ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 59
Provided by: Gloria75
Category:

less

Transcript and Presenter's Notes

Title: Genome Organization overview


1
  • Genome Organization overview
  • Eukaryotic genomes are complex and DNA amounts
    and organization vary widely between species
  • C value paradox the amount of DNA in the haploid
    cell of an organism is not related to its
    evolutionary complexity or number of genes

2
C-value Paradox
Drosophila has 20X smaller genome than human and
2X fewer genes Newt and lungfish genomes 5
and 50 x larger than human
3
Number of genes does increase in higher organisms
4
Re-association Kinetics
Cot (initial DNA concentration x time)
Complexity sum length of all single copy
(unique) sequence in a genome
5
  • There are different classes of eukaryotic DNA
  • based on sequence complexity
  • revealed by re-association kinetics

6
A time line of genomics research
7
Status of plant genome sequences as of January
2006
8
The human genome
  • Two versions of human genome sequences were
    published in February 2001. DNA sequences that
    encode proteins make up only 5 of the genome
  • 50 sequences are transposable elements
    clusters of gene-rich regions are separated by
    gene deserts
  • CH 19 has the highest gene density, CH 13 Y
    show the lowest gene density

9
The human genome
  • Gene total estimated 30,000-40,000 (now maybe
    25,000), w/ an average gene size of 27 Kb
  • Hundreds of genes share homology w/ those of
    bacteria
  • The number of introns vary greatly (from 0 for
    histone to 234 for titin)

10
The human genome
  • Genes larger contain more and larger introns
    compared to these in invertebrates (dystrophin
    gene is 2.5 Mb)
  • Genes are not evenly spaced on CHs
  • The most common genes include those involved in
    nucleic acid metabolism-7.5 receptors-5
    protein kinases-2.8 cytoskeletal structural
    proteins-2.8

11
The human genome predicted gene function
12
Any 2 human genomes are roughly 99.9 identical
On average 0.1
Chr - chromosome n - Number of samples
examined bp - Number of basepairs sequences S -
Number of polymorphic sites p - Nucleotide
divergence
Przeworski, M., et al. (2000) Trends Genet 16,
296-302.
13
Yet phenotypic differences abound!
14
Genome organization in plants
  • Size of genome varies widely (100 Mb-5,500 Mb)
  • Many tandem gene duplications larger
    duplications some interchromosomal duplications
    also observed
  • Large-genome plants also have genes clustered
    with long stretches of intergenic DNA
  • In maize, the intergenic sequences are composed
    mainly of transposons

15
(No Transcript)
16
(No Transcript)
17
  • Genome Organization
  • gene identification
  • Genes can be difficult to identify/predict based
    on genome sequence
  • The human genome appears to contain fewer genes
    than originally predicted but an estimated
    35,000 genes produce an estimated 150,000 proteins

18
  • Genome Organization
  • gene identification
  • No one to one correspondence between
  • Genome (all genes of an organism)
  • Transcriptome (all transcripts of an organism)
  • Proteome (all proteins of an organism

19
Variable estimates of human gene content
20
Gene identification the simple view
21
Gene identification the challenges
from Klug Cummings 1997
22
Gene identification the challenges
  • Non coding sequences
  • Promoters and enhancers of gene expression can be
    distant from the coding region itself
  • Genes can have alternative promoters
  • Genes can have alternative terminators

23
(No Transcript)
24
Gene identification the challenges
  • Introns and exons
  • Most eukaryotic genes have introns
  • Introns are often much longer than exons
  • Often many introns so mRNA much shorter than
    genomic DNA
  • Intron size can vary between the same gene of
    different species
  • Splice junctions are difficult to predict
  • Alternative splicing

25
Gene identification the challenges
Introns and exons
  • Eukaryotes only
  • Removal of internal parts of the newly
    transcribed RNA.
  • Takes place in the cell nucleus

26
Introns and exons
27
Introns numerous longer than exons
28
  • Variable intron size
  • same gene, different organism

29
Introns alternative splicing
  • Different splice patterns from the same sequence,
    therefore different products from the same gene.

30
One gene many proteins alternative splicing
3 cleavage
31
  • Exon shuffling
  • Different genes having similar exons

32
Why genome, transcriptome and proteome dont
correlate in size
  • More sophisticated regulation of expression
  • Proteome vastly larger than genome
  • Alternate splicing, promoters, terminators (59
    of genes with an average of 3 different products)
  • RNA editing
  • Post-translational modifications
  • Moonlighting
  • Same protein different function depending on
    cellular location

33
Gene Identification
  • Open reading frames
  • Sequence conservation
  • Database searches
  • Synteny
  • Sequence features
  • CpG islands
  • Evidence for transcription
  • ESTs, microarrays
  • Gene inactivation
  • Transformation, TEs, RNAi

34
Gene identification - Open reading frames
  • 5'atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgat
    gtataa
  • frame 1
  • atg ccc aag ctg aat agc gta gag ggg ttt
  • M   P   K   L   N   S   V   E   G   F
  • tca tca ttt gag gac gat gta taa
  • S   S   F   E   D   D   V    
  • frame 2  
  • tgc cca agc tga ata gcg tag agg ggt ttt cat
  • C   P   S      I   A     R   G   F H
  • cat ttg agg acg atg tat
  • H   L   R   T   M   Y   

35
Gene identification - Database searches
36
Gene identification - Synteny
Mouse-human synteny and sequence conservation A)
Blocks of synteny between mouse chromosome 11 and
parts of 5 different human chromosomes B)
Enlarged block with perfect correspondence in
order, orientation and spacing of 23 putative
genes, and 245 conserved squence blocks of gt 100
bp with gt70 identity, many in noncoding
regions Caution! Even regions of high synteny
may not show perfect gene-for-gene
correspondence from Gibson Muse (2002) A
Primer of Genome Science, Sinauer Inc.
37
Gene identification CpG islands
  • Defined as regions of DNA of at least 200 bp in
    length that have a GC content above 50 and a
    ratio of observed vs. expected CpGs close to or
    above 0.6
  • Used to help predict gene sequences, especially
    promoter regions.

38
Gene identification evidence of transcription
  • Sequencing libraries of cDNA clones yields
    expressed sequence tags ESTs (not necessarily
    full-length)

39
  • Genome Organization
  • duplicated genes
  • Gene families
  • paralogs
  • orthologs (homologs)
  • Pseudogenes

40
Duplicated genes
  • Paralogs evolved one from another through gene
    duplication
  • Encode closely related proteins
  • Formed by duplication of an ancestral gene
    followed by mutation
  • Five functional genes and two pseudogenes ?

41
Pseudogenes
  • Nonfunctional copies of genes
  • Formed by duplication of ancestral gene, or by
    reverse transcription and integration of the cDNA
  • Not expressed due to mutations that produce a
    stop codon, nonsense or frameshift, or mutations
    that prevent mRNA transcription or processing

42
Duplicated genes
  • Can be clustered as in ?-globin cluster, or
    dispersed in genome as seen for entire globin
    family in humans

43
Duplicated genes
  • Paralogs vs orthologs (or homologs)
  • Different members of the globin gene family are
    paralogs, having evolved one from another through
    gene duplication. Paralogs are separated by a
    gene duplication event.
  • Each specific family member (e.g. ? globin ?
    human) is an ortholog (homolog) of the same
    family member in another species. Both evolved
    from an ancestral ? globin ? gene. Orthologs
    (homologs) are separated by a speciation event.
  • It is not always easy to distinguish true
    orthologs from paralogs when comparing large
    multigene families between species. Especially in
    polyploid organisms!

44
  • Genome Organization
  • transcripts that do not encode proteins (ncRNA)
  • lt 5 of higher eukaryotic genome is protein
    coding
  • 97-98 of the transcriptional output of the
    human genome is ncRNA
  • Introns)
  • Transfer RNAs (tRNA)
  • 500 tRNA genes in human genome
  • Ribosomal RNAs
  • Tandem arrays on several chromosomes
  • 150-200 copies of 28S 5.8S 18S cluster
  • 200-300 copies of 5S cluster

45
  • Genome Organization- ncRNA
  • 97-98 of the transcriptional output of the
    human genome is ncRNA
  • Small nucleolar RNAs (snoRNAs)
  • Single genes
  • Modify rRNAs
  • Small nuclear RNAs (snRNAs)
  • Spliceosomes
  • Small regulatory RNAs
  • Micro RNAs (miRNA)
  • Short interfering RNAs (siRNA)
  • Participate in transcriptional and
    non-transcriptional gene silencing, regulation
    of translation
  • Many come from intergenic regions recently
    recognized as transcribed

46
  • Genome Organization - ncRNA
  • 97-98 of the transcriptional output of the human
    genome is ncRNA
  • Longer regulatory RNAs
  • ncRNAs derived from introns of protein-coding
    genes and introns and exons of non-protein-coding
    genes constitute the majority of the genomic
    programming in higher organisms
  • Explains why very different organisms show little
    difference in protein coding sequence

47
(No Transcript)
48
  • Genome Organization
  • repetitive DNA
  • 50 of human genome
  • Moderately repeated DNA
  • Tandemly repeated rRNA, tRNA and histone genes
    (gene products needed in high amounts)
  • Large duplicated gene families
  • Mobile DNA
  • Segmental duplications

49
Repetitive DNA - Segmental duplications
  • Found especially around centromeres and telomeres
  • Often come from nonhomologous chromosomes
  • Many can come from the same source
  • Tend to be large (10 to 50 kb)
  • Unique to humans?

50
Repetitive DNA - Segmental duplications
51
Repetitive DNA Transposon derived repeats
  • Most of the moderately repeated DNA sequences
    found throughout higher eukaryotic genomes (45
    of human genome)
  • Some encode enzymes that catalyze movement
  • Long interspersed elements (LINE)
    retrotransposons
  • Short interspersed elements (SINE)
    retrotransposons
  • LTR (long terminal repeat) retrotransposons
  • DNA transposons

52
Repetitive DNA Transposon derived repeats
53
Repetitive DNA Transposon derived repeats
  • Different regions of the genome differ in density
    of repeats
  • Most LINEs accumulate in AT rich regions
  • Alu elements accumulate in GC rich regions

54
  • Genome Organization
  • repetitive DNA
  • Simple-sequence Repeats
  • 3 of genome
  • Highly repeated short sequences found in
    centromeres and telomeres
  • Variable numbers of tandem repeats (VNTR)
    dispersed throughout the genome

55
Repetitive DNA Highly repetitive satellite DNA
56
Repetitive DNA VNTRs
  • dispersed throughout the genome
  • 1 13 base repeat unit
  • microsatellite, SSR
  • includes trinucleotide repeats in protein coding
    genes
  • 14 500 repeats
  • minisatellites
  • Used as mapping and fingerprinting markers

57
Over view of human genome composition
58
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com