Title: BIO341 Gene Discovery Section 3 Genome Organisation
1BIO341Gene DiscoverySection 3 - Genome
Organisation
- Jasper Rees
- Department of Biochemistry, UWC
- www.biotechnology.uwc.ac.za/teaching/BIO341
2Eukaryote Genome organisation
- Genome sizes
- Cot analysis and genome complexity
- Repetitive and unique sequences
- mRNA expression
- How much coding sequence in a genome
- Rot analysis of mRNA expression
- Tissue specific genes
3Genome sizes in different phyla
4Cot analysis of genomic DNA
- Anneal DNA and measure double stranded component
- Annealing rate depends on initial concentration
(Co) and time (t) - Speed of annealing is dependent on sequence
complexity - Sequence complexity is the length of unique
sequence
5Complexity of Genomes
- Chemical complexity amount of DNA determined by
chemical analysis - Kinetic complexity amount of DNA determined by
Cot analysis - Generally values agree well, except of polyploid
genomes
6Several classes of sequence complexity in
eukaryote genomes
- Cot analysis shows three classes of sequence
complexity - Highly repetitive, mostly from centromeres
- Middle repetitive, mostly from longer repeat
sequences - Unique sequences
7Repetitive and unique sequence in eukaryote
genomes
- As genomes get larger, proportion of repetitive
sequence increases - Amount of unique sequence is still high even in
large genomes - Distribution of three class of sequence not
revealed by Cot analysis
8mRNA as a product of genomic DNA
- Using mRNA as a probe for reassociation kinetics
- Shows that majority of mRNA is from unique DNA
sequences - Not all unique DNA sequences are transcribed
9Estimating the amount of coding sequence in a
genome
- Saturation of reassociation experiment shows the
total proportion of DNA transcribed in small - The result will vary depending on the tissue
source of mRNA - This is with cytoplasmic (spliced) mRNA
10Measurement of the sequence complexity of mRNA
expressed
- Rot analysis shows the abundance of mRNA
- Chick oviduct expresses large amount of ovalbumin
(50 of mRNA) - About 10 other abundant genes (15 of mRNA,
including lyzosyme) - And 35 of the mRNA represents all other
transcribed genes in the tissue
11Overall expression of mRNA in different tissues
12Organisation of genes in genomes
- Overview of exon and intron size and numbers
- Differences between small genomes and large
genomes - Variations in gene strategy
- Implications for genome organisation
13Overview of splicing of introns
14Identification of introns gene and mRNA are not
contiguous
15Globin gene structure common structure to whole
family of genes
- Position of introns in coding sequences is
constant. - Length of introns varies
- All globin genes organised the same way
- Implies ancient evolutionary relationship
16Mammalian DHFR genes have common organisation
- Dihydrofolate reductase required for purine
nucleotide biosynthesis - Exon organisation is common to all mammals (To
all vertebrates?) - Introns vary greatly in length and sequence
between species - Typical of other genes
17Intron and exon length distribution
18Distribution of exon numbers
- Saccharomyces small genome, little repetitive
sequence, very few introns, highly compact - Flies and mammals, larger genomes, more
repetitive sequence, more introns. Genes dispersed
19Overall gene sizes when there are more introns
- Gene sizes a Normal Distribution in yeast
- But skewed distribution in flies, mammals (plants
etc) - Increase in gene size related to increase in
presence of introns and number of introns
20mRNA and gene sizes
- Yeasts few introns, mRNA mostly same size as
genes - Flies, mammals, plants etc mRNA much smaller
than genes. - hnRNA has same size as genes (from which is is
transcribed) - hnRNA spliced into mRNA
21Alternative functions in genes
- Promoters
- Termination
- Splicing
22Human Chromosome 16 - DWNN gene shows complex
organisation
DWNN gene is 35 000 bases long
23Exons are conserved, introns vary
- Dotplots show similarity graphically
- Can show for DNA or protein sequences
- For two mouse alpha globin genes, can show exons
are conserved, but introns do not show strong
sequence relationship. - Alpha globin genes recently duplicated
24Exons show similarity between species
- Zoo Blots used to detect related DNA sequences in
different species - Detect exons by hybridisation
- Based on conservation of coding sequence
- More highly conserved genes imply more
limitations on protein sequence variation - Basis for computational identification of genes
between species
25Generation of new genes
- Diversification of species occurs by addition of
new genes - New genes occur from mutation and selection of
existing genes - Or gene duplication followed by mutation and
selection - Or recombination between two genes (gene
shuffling) - Or horizontal transfer of genes from other
species - New genes are very very rarely ( never) de novo
events
26Gene duplications
- Occur as the result of duplicating region of
genome by recombination mechanisms - From small duplication, to very large
- One gene duplicated or many
- Gene duplication generally not bad for organism
- Duplication of genes allow for divergence
27cDNA to gene duplication
- Occurs by reverse transcription of mRNA into DNA
by retroviral enzymes, in germ line - Integration of cDNA into germline gives spliced
copy in DNA - Generally not functional copy of gene (no
promoter) - Provides sequences that could be fused to other
genes
28Pseudogenes
- Non functioning copies of genes
- Promoter, splicing, termination, translational
mutations - Various types, can be one or many mutations.
- Some transcribed, some not.
- Generated by recombination or cDNA insertion
- Can be close or distant in genome
29Gene families
- Group of genes with related sequences and
fucntions - May be small or large family
- May be closely related (sequence /function)
- Or very distant
- Generated by duplication by recombination
- Level of relationship of products depends on
extent of divergence
30Protein Families
- The result of gene family divergence
- Must have related sequences, and thus related
functions. - May be related functions
- Thus may all be kinases, but all have different
substrates - Examples kinases, proteases, globins, DNA
binding proteins, ATPases, NADP reductases
31Protein Superfamilies
- Large family relationship
- May only be domains of the protein that are
related - Functions may be very different
- Evolutionarily very divergent
- Often the result of exon-shuffling
- Examples, Immunoglobulin superfamily, protein
kinase superfamilies, blood clotting enzymes
32Homologs, orthologs, paralogs
33Exons and domains
- Can often relate exons to domains in proteins
- Domains are independently folding structures in
proteins - When exon boundaries coincident with the boundary
of the domains then can plug together exons to
create modular proteins - Need the same reading frame at each exon boundary
- Many good examples of this, but not for every
gene or domain.
34Exon shuffling
- If exons have the same reading frame
- because splicing is independent of coding
sequence - Can splice together whatever set of exons is
transcribed. - If splicing assembles fused open reading frame
- Then can translate protein
- If DNA recombination assembles two exons together
and they can translate into a protein that can
fold correctly into it domains, then this protein
is highly likely to be functional
35Uniqueness of Exon shuffling
- Exons only in eukaryotes
- Exon shuffling only in eukaryotes
- One mechanism for acceleration of gene
diversification in eukaryotes - Provides engine to power evolutionary change
- Is a selective advantage for presence of
introns/exons
36Globin gene family
- In humans have a and b globin gene clusters
- In adults form haemoglobin, a2b2 hetero-tetramers
which bind oxygen in the blood - Foetal and Embryonic forms bind oxygen more
tightly to transfer oxygen from maternal blood - Gene expression developmentally regulated
37Developmental expression of globins
- Embryonic z2e2 z2?2 and ?2e2
- Foetal z2?2
- Adult ?2?2 and ?2?2
- Sequential replacement of gene expression
- Genes activated along a and b clusters (see
figure 23.2)
38Beta-globin genes in other vertebrates
- Find varying numbers of genes
- However, have embryonic, foetal and adult forms
- Find pseudogenes in many clusters
- Can trace evolutionary history of genes by
analysing sequence relationships
39Evolution of globin gene families
- Evolution of both a and b globin gene families is
by duplication - Result of unequal crossing over events
- Fixation of additional genes in populations
allows divergence of function - Loss of function results in psuedogenes
- Divergence of function resulted in development of
three different types (embryonic, foetal and
adult) - See figure 23.7 for examples of diverence
40Why increased globin gene diversity?
- To generate globin proteins with different oxygen
binding affinities - Selective advantage to get more oxygen to
developing embryo/foetus - As development becomes more complex and embryo
develops in maternal body, then need more complex
oxygen transport, and thermodynamics
41Globin genetic diseases mutations
- Find many different mutations in a and b globins
- Class of diseases termed thalassaemias
- Point mutations that affect oxygen binding, and
regulation of oxygen binding (Hill effect etc) - Also promoter, splicing and poly A mutations
- Must be able to generate enough hemoglobin to
transport oxygen at all times though
42Globin genetic diseases deletions
- Also have major deletion events in a and b globin
clusters - Result from deletions occurring in unequal
crossing over events, between regions of homology
in globin clusters - Deletion of different combinations of genes gives
different phenotypes - See figures 23.5 and 23.6
43Why so many globin mutations
- In certain human populations globin mutations are
extremely common - Reduced hemoglobin function causes red blood
cells to be fragile - Fragile RBCs break open more easily when infected
with malaria parasite (Plasmodium) - So people with thalassaemia have increased
resistance to malaria, when heterozygous for
globin mutations
44Heterozygous Advantage
- Termed Heterozygous Advantage, so selects for
people with globin mutations in presence of
malaria - Results very high of thalassaemia in West
Africa, Mediterranean area, South East Asia. - Is a directly observable effect of selection
pressure on human populations resulting in
evolutionary divergence on historical timescales
45Recombination events
- Crossing over - homologous recombination
occurring normally during meiosis - Unequal crossing over resulting from mismatching
regions of homology, that are incorrectly aligned
during meiosis - Results in generation of additional sequence in
one chromosome, and loss of sequence in the other.
46Homologous Recombination
- Two genes in parental chromosomes
- Unequal crossing over
- Results in generation of 1 and 3 genes
47Gene duplications and divergence
- Generation of 2 genes from 1 gene occurs when
sequences outside gene allow unequal crossing
over - Once two copies of a gene exist on a chromosome
selection can occur - Multiple copies must be compatible with life,
true for many genes - Then copies can diverge by accumulating mutations
and developing new functions
48Divergence and selective advantage
- With divergence of function, have the possibility
that new function will provide advantage to
organism - Most mutations will destroy function of gene
- Small proportion will improve it
- These can be selected for in populations
- Many only give real advantage in unusual
environments - May gain frequency from population bottleneck
49Evolutionary or molecular clock
- Rate at which molecular changes are fixed in a
population is measurable - Can measure length of time since divergence of
species or genes using molecular clock - Can compare data with fossil record and isotopic
dating systems to calibrate clock
50ReplacementRates
- Replacement rates for non-coding positions faster
than for coding positions - Effect of selection on protein function
- Use non-coding for recent divergence, coding
sequence for older divergence
51Evolutionary divergence of globins
- Can measure evolution of globin gene clusters, in
terms of molecular clocks, and which species have
which globin genes - Increased complexity of globin gene clusters in
higher vertebrates - Present in lower vertbrates, invertebrates, and
distantly related protein found in plants - See Figures 23.7 and 23.9
52Gene correction
- Poorly understood mechanism by which duplicated
genes are corrected against each other - Can result in the maintenance of many identical
copies of a gene over many generations - Correction event may be rare. May replace many
divergent copies with one specific type. - Loss of gene correction will allow evolution of
duplicated genes
53Divergence or gene death
- When genes diverge can get loss of function
creation of a pseudogene. - Or generation of mutated functional version that
can be selected for or against - Selection over long period can result in the
emergence of a novel protein with changed
function - Duplication is essential for this divergence of
gene function
54Gene duplication central to evolution
- Require increased genetic complement to allow for
selection of novel biological functions - Gene duplication is the simplest and most common
way to create this - When selective pressure on one copy of the gene
is lost, then can accumulate mutations, and
select for new functions - Overall result is diversification of function and
creation of gene families
55Repetitive DNA Sequences
- Classes of repetitive sequences found in all
higher eukaryotes - Simple sequences found in satellite regions and
centromers - Middle repetitive sequences in several classes
- Many repetitive sequences are mobile genetic
elements (eg transposons)
56Repetitive sequences expand in higher eukaryote
genomes
- As genomes get larger, proportion of repetitive
sequence increases - Repetitive sequences fall into various classes by
sequence relationship and mode of amplification - Amount of unique sequence is still high even in
large genomes
57Simple sequence Satellites
- Defined by unique density caused by unusual
sequence composition - Made up of very short sequence repeats
58Satellites localise to centromers
- Sequences found in satellite regions
- And centromers
- Localised by in situ hybridisation
- Define centromeric structure and function
59Mammalian satellites
- Mouse satellites made up of 238 bp repeat
- This is made up of internally repeated sequences
- Result from repeated duplication of 9 bp unit
60Mini and micro satellites
- Generated from repeated sequences
- Repeat units from 2 to gt50 bp
- May result in many alleles at a locus
- May be coding or non-coding
- Valuable for genetic markers and genotyping
experiments - Used in DNA fingerprinting (eg in forensics)
61Microsatellite allele analysis
62Microsatellites and disease
- Microsatellites with 3 bp repeats common in human
genetic disease, where amplification generates
expanded protein sequences - Causes expansion of regions of poly-glutamine,
which cause protein precipitation and cell death - Expansion observed in families
- Disease gets worse with additional generations as
the microsatellite expands - Examples Huntingdons Disease, myotonic dystrophy
63Middle Repeat Sequences
- Can be caused by expansion of classes of larger
sequence elements - 300 - 20 000 bp
- Scattered through the genome
- Not caused by unequal crossing over
- Generally transcribed into RNA and converted to
DNA and integrated into the genome - Some elements can insert and excise, and are
therefore considered to be mobile elements
64Alu sequences
- Largest class of middle repetitive sequences in
humans - About 300 bp long
- Have polyA tract at one end
- Are related to 7SL RNA sequence
- Contain internal RNA Pol III promoter
- Thus when reverse transcribed and reintegrate
create a new Pol III promoter - Can cause mutations by integration into genes
65LINEs
- Genetic structure similar to retrovirus
- Long terminal repeats (LTR) at each end
- Internal sequences code for transposase proteins
- Transcribed by RNA Pol II
- Reverse transcribed and integrated into genome
- Cannot be excised
- Do not appear to be packaged as a virus
66Transposons
- Similar to LINE and retroviruses, except that can
insert and excise from genome - Some excisions leave small number of bases
inserted into genome - Others excise cleanly
- These are true mobile elements
- Can be the cause of frequently mutating and
reverting genotypes - Reponsible for classic maize colour mutations
67Spontaneous mutations in pears
68Distribution of repetitive sequences
- Distribution of middle repetitive sequences in
animal and plant genomes is completely random - Occur in introns, inter-genic regions, highly
transcribed and transcriptionally inactive
regions - May be transcriptionally inactive or active
- May be complete copies or partial copies
69Function of repetitive sequences
- junk DNA - implies has no function?
- Mobile elements generates
- Genome rearrangements
- Duplications
- Mutations and gene inactivation
- Gene activation
- Generation of additional DNA in genome
70BLAST and repetitive human sequences
- Repetitive sequences in your sequence will result
in large numbers of matches to genome data - Solution check the filter human repeats check
box in the search set up. - This will remove repeats based on a database of
repeat sequences. - With the switch on, you can detect the repeats
and identify them