Title: Chasing Ghosts: Detecting Repeatmediated Deletions in the Human Genome
1Chasing Ghosts Detecting Repeat-mediated
Deletions in the Human Genome
- Benjamin Good
- Supervisor Dixie Mager
2Big Picture
?
- Improve the understanding of the basic mechanisms
of evolution.
?
?
?
?
?
3Hypothesis 1
- Non-allelic homologous recombination (NAHR) plays
an important role in evolution by mediating
genomic rearrangements.
4Biological mechanisms
Position 10
- Homologous recombination
- a normal and ubiquitous process occurring during
meiosis - Chromosome length is conserved
Position 10
5NAHR
Repeat
Position 876
- Non-Allelic homologous recombination
- Indels
- Exchanges
- Inversions
-
Position 10
6Duplications and Deletions
D
B
A
E
C
F
g
e
d
f
b
a
c
C
B
A
D
E
F
e
d
g
b
a
c
f
B
A
f
g
Deletion
b
a
e
d
C
c
D
E
F
Duplication
7Hypothesis 2
- Transposable elements increase the likelihood of
rearrangements by making NAHR much more likely.
8Ideal Approach
- Catalogue all examples of NAHR within the genome.
- Quantify the relationship between transposable
elements and these putative evolutionary events.
9Can TSDs be used?
D
B
A
E
C
F
Before insertion
D
B
A
E
C
F
After insertion
10No
- Original hope was that the absence of target site
duplications in the flanking sequences of the
repeats could be used to infer recombination. - Problems are that
- RepeatMasker does not reliably label the edges
correctly - The TSDs may accumulate mutations
- Repeats often insert on top of one another
forming complex mosaics that are extremely
difficult to characterize computationally. - In any case, genome comparisons offer more
conclusive evidence -
11So, focus on what is possible
- Deletions in the human genome are visible through
comparisons with other genomes. - Identify putative deletions based on gaps in
pair-wise alignments - Check the borders of the gaps for repetitive
elements and make inferences about the causes of
the gaps.
12Comparative Approach
Baboon
T
U
D
B
A
E
C
F
S
Repeat
T
U
B
A
C
S
Human
- If repeats overlap edges of a gap it is strong
evidence for a NAHR-mediated deletion in that
genome - Different TSDs additional evidence for NAHR
132003 Primate Comparisons
- Bailey 2003 mentions the discovery of 9 Alu-Alu,
5 L1-L1, and 1 L2-L2 mediated deletions. These
deletions were identified using an alignment
between baboon and a 4.8 MB region of human
chromosome 7. - An investigation of the Alus, revealed that only
one of them was a deletion within the human. - This element was the inspiration for the rest of
the project.
14Human Baboon Alignment
- Generated using BL2SEQ at NCBI
- Human chimeric Alu
- Baboon sequence from Eichler lab.
- E -47
- E -23
Alu repStart 1 repLeft -97
AluSp/q repStart 251 repLeft-13
15Chimeric Alus
- A search for Alu-chimeras corresponding to this
structure yielded 91 examples in human chromosome
7 - But how to prove that they are associated with
deletions? - Back to genome comparisons.
16Site of a deletion in the Human Genome
Intron
Mouse Alignment
17Mouse View
Gap in mouse human alignment
18Mouse Baboon
Significant alignment between the mouse and the
baboon that has been lost in the human sequence.
19So What Now
- The deletion discovered by Bailey could have been
discovered using only the mouse. - Build a system that can scan the entire human
genome for deletions based on the human-mouse
alignment from UCSC
20Protocol 1 Accumulate local data
- 1) First, download HumanNet table containing
coordinates of gaps in the mouse-human alignment.
This is the mouse-human alignment from the
perspective of the mouse
2) Download RepeatMasker annotation for the human
genome.
21Protocol 2 Main Script
- Take all gaps that are at least 500bp larger than
their corresponding region in human. - Find all human repeats whose left and right edges
are close to the edges of the human coordinates
for the gap. - Find chimeric elements that span the gaps
- Characterize the contents of the mouse sequence
corresponding to the gap.
22Searching For
Protein Q8C5C0 Signal transduction
Mouse 3000 bp
Human AluJb 287 bp
23Results
- Scanned entire genome in 8.5 hours
- Checked 271,690 gaps
- Identified 456 potentially repeat-mediated
deletions. - Of these, 43 were chimeric and 297 were derived
from human Alu repeats.
24Chimeras Only 1 good
- Only one of the chimeras obeys the repeat
structure discussed previously (front repeat
missing the end and back repeat missing the
start). This is the Bailey deletion. - The others seem to be mostly complete repeats in
close proximity to each other.
25Repeat Content SINE enrichment
Classes of repeats associated with mouse gaps
Classes of repeats in the human genome
26Alu Families as expected
Types of Alus associated with mouse gaps
Types of Alus in the human genome
27Gap Content
- 18 gaps contained exons from known genes. 9 of
these correspond to mouse L1s - 78 gaps contained expressed ESTs
28The Human Net (Kent 2003)
- 400 level 1, 49 level 2, 4 level 3
29Gaps and higher net levels
30Summary
- In this project, we have designed and implemented
a method for cataloguing potential examples of
repeat-mediated deletions within the human genome
through comparative genomics. Many of the
examples identified are from transcribed
sequences and hence could have significant
phenotypic effects.
31Future Directions
- Other primate genomes.
- 3-way comparisons for validation.
- Deletions in regulatory regions?
32Thanks to the Mager Lab!
Somewhere in the Philippines