BB30055: Genes and genomes - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

BB30055: Genes and genomes

Description:

BB30055: Genes and genomes Major insights from the HGP – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 29
Provided by: bssagjb
Category:

less

Transcript and Presenter's Notes

Title: BB30055: Genes and genomes


1
BB30055 Genes and genomes
Major insights from the HGP
2
Major insights from the HGP
  1. Gene size, content and distribution
  2. Proteome content
  3. SNP identification
  4. Distribution of GC content
  5. CpG islands
  6. Recombination rates
  7. Repeat content

Nature (2001) 15th Feb Vol 409 special issue pgs
814 875-914.
3
1) Gene size
4
Gene content.
  • More genes Twice as many as drosophila /
    C.elegans
  • Uneven gene distribution Gene-rich and
    gene-poor regions
  • More paralogs some gene families have extended
    the number of paralogs e.g. olfactory gene family
    has 1000 genes
  • More alternative transcripts Increased RNA
    splice variants produced thereby expanding the
    primary proteins by 5 fold (e.g. neurexin genes)

5
Gene distribution
Genes generally dispersed (1 gene per 100kb)
Class III complex at HLA 6p21.3
Overlapping genes (transcribed from 2 DNA
strands) - Rare
Genes- within genes E.g. NF1 gene
HMG3 Fig 9.8
6
Uneven gene distribution
  • Gene-rich
  • E.g. MHC on chromosome 6 has 60 genes with a GC
    content of 54
  • Gene-poor regions
  • 82 gene deserts identified
  • ? Large or unidentified genes
  • What is the functional significance of these
    variations?

7
2) Proteome content
  • proteome more complex than invertebrates

Protein Domains (sections with identifiable
shape/function) Domain arrangements in
humans largest total number of domains is
130 largest number of domain types per protein is
9 Mostly identical arrangement of domains
A
A
B
B
C
B
C
C
C
C
Protein X
8
Proteome more complex than invertebrates
  • no huge difference in domain number in humans
  • BUT, frequency of domain sharing very high in
    human proteins (structural proteins and proteins
    involved in signal transduction and immune
    function)
  • However, only 3 cases where a combination of 3
    domain types shared by human yeast proteins.
  • e.g carbomyl-phosphate synthase (involved in the
    first 3 steps of de novo pyrimidine biosynthesis)
    has 7 domain types, which occurs once in human
    and yeast but twice in drosophila


9
3) SNPs (single nucleotide polymorphisms)
  • Sites that result from point mutations in
    individual base pairs
  • biallelic
  • 60,000 SNPs lie within exons and untranslated
    regions (85 of exons lie within 5kb of a SNP)
  • May or may not affect the ORF
  • Most SNPs may be regulatory
  • More than 1.4million SNPs identified
  • One every 1.9kb length on average
  • Densities vary over regions and chromosomes
  • e.g. HLA region has a high SNP density,
    reflecting maintenance of diverse haplotypes over
    many MYears

Nature (2001) 15th Feb Vol 409 special issue pgs
821-823 928
10
How does one distinguish sequence errors from
polymorphisms?
  • sequence errors
  • Each piece of genome sequenced at least 10 times
    to reduce error rate (0.01)
  • Polymorphisms
  • Sequence variation between individuals is 0.1
  • To be defined as a polymorphism, the altered
    sequence must be present in a significant
    population
  • Rate of polymorphisms in diploid human genome is
    about 1 in 500 bp

Nature (2001) 15th Feb Vol 409 special issue pgs
821-823 928
11
SNPs and disease
12
SNPsand risk of disease
N(291)S
13
SNPsand pharmacogenomics
14
4) Distribution of GC content
  • Genome wide average of 41
  • Huge regional variations exist
  • E.g.distal 48Mb of chromosome 1p-47 but
    chromosome 13 has only 36
  • Confirms cytogenetic staining with G-bands
    (Giemsa)
  • dark G-bands low GC content (37)
  • light G-bands high GC content (45)

Nature (2001) 15th Feb Vol 409 special issue pg
876-877
15
5) CpG islands
CpG
TpG
Methyl CpG
Deamination
methylated at C
CpG islands show no methylation
  • Significance of CpG islands
  • Non-methylated CpG islands associated with the 5
    ends of genes
  • Usually overlap the promoter region
  • Aberrant methylation of CpG islands linked to
    pathologies like cancer or epigenetic diseases
    like Rhetts syndrome

http//www.sanger.ac.uk/HGP/cgi.shtml
16
Inheritance of CpG methylation
17
Epigenetic disease Rett Syndrome
  • Characterised by neurodevelopmental problems
    after birth
  • mutations in a gene on the X chromosome, MECP2
    (methyl CpG-binding protein 2), whose protein
    normally binds to methylated CpG and represses
    gene expression
  • RS symptoms associated with the failure of
    mutated MECP2 to regulate transcription of a
    specific gene, DLX5, one allele of which is
    normally imprinted. Without the MeCP2 protein,
    production of the Dlx5 protein is increased,
    which influence production of the
    neurotransmitter GABA in the brain

DLX5
DLX5
18
CpG islands
  • Greatly under-represented in human genome
  • 28,890 in number (5 times less than expected)
  • 56 of human genes and 47 of the mouse genes
    have CpG islands
  • Variable density
  • e.g. Y 2.9/Mb but
  • 16,17 22 have 19-22/Mb
  • Average is 10.5/Mb

Nature (2001) 15th Feb Vol 409 special issue pg
877-888
19
6) Recombination rates
  • 2 main observations
  • Recombination rate increases with decreasing arm
    length
  • Recombination rate suppressed near the
    centromeres and increases towards the distal
    20-35Mb

20
7) Repeat content
  • Age distribution
  • Comparison with other genomes
  • Variation in distribution of repeats
  • Distribution by GC content
  • Y chromosome

Nature (2001) 409 pp 881-891
21
Repeat content.
a) Age distribution
  • Most interspersed repeats predate eutherian
    radiation (confirms the slow rate of clearance of
    nonfunctional sequence from vertebrate genomes)
  • LINEs and SINEs have extremely long lives
  • 2 major peaks of transposon activity
  • No DNA transposition in the past 50MYr
  • LTR retroposons teetering on the brink of
    extinction

22
a) Age distribution
  • overall decline in interspersed repeat activity
    in hominid lineage in the past 35-40MYr
  • compared to mouse genome, which shows a younger
    and more dynamic genome

23
b) Comparison with other genomes
  • Higher density of transposable elements in
    euchromatic portion of genome
  • Higher abundance of ancient transposons
  • 60 of IR made up of LINE1 and Alu repeats
  • whereas DNA transposons represent only 6
  • (a few human genes appear likely to have
    resulted from horizontal transfer from
    bacteria!!)

24
c) Variation in distribution of repeats
  • Some regions show either
  • High repeat density
  • e.g. chromosome Xp11 a 525kb region shows 89
    repeat density
  • Low repeat density
  • e.g. HOX homeobox gene cluster (lt2 repeats)
  • (indicative of regulatory elements which have low
    tolerance for insertions)

25
d) Distribution by GC content
  • High GC gene rich High AT gene poor
  • LINEs abundant in AT-rich regions
  • SINEs lower in AT-rich regions
  • Alu repeats in particular retained in actively
    transcribed GC rich regions E.g. chromosme 19 has
    5 Alus compared to Y chromosome

26
e) The Y chromosome !
  • Unusually young genome (high tolerance to gaining
    insertions)
  • Mutation rate is 2.1X higher in male germline
  • Possibly due to cell division rates or different
    repair mechanisms

27
  • Working draft published Feb 2001
  • Finished sequence April 2003
  • Annotation of genes going on
  • (refer International Human Genome Sequencing
    Consortium. Finishing the euchromatic sequence of
    the human genome. Nature 21 October 2004 (doi
    10.1038/nature03001)

28
References
  • Chapter 9 pp 265-268
  • HMG 3 by Strachan and Read
  • Chapter 10 pp 339-348
  • Genetics from genes to genomes by Hartwell et al
    (2/e)
  • Nature (2001) 409 pp 879-891
Write a Comment
User Comments (0)
About PowerShow.com