Title: Human Genome Structural Variation
1Human Genome Structural Variation
Evan Eichler Howard Hughes Medical
Institute University of Washington
June 3rd, 2006, HGM, Helsinki
2Structural Variation Disease
- Insertions, Deletions, Inversions, Duplications,
Translocations - Three different models of human genetic disease
- Rare Mendelian Diseaserare structural variants
that segregate cause disease (eg. Parkinsons
Alzheimer Disease) - Recurrent Genomic Disordersrecurrent
microdeletion or microduplication syndromes
mediated by duplication (most de novo) (eg. PWS,
CMT1A) - Common Disease Susceptibilitycommon, copy-number
polymorphisms that predispose to disease (Lpa
coronary heart disease, FCGR3 lupus CC3L1
AIDS susceptibility).
3Genome-Wide Screens of Normal Variation
Large (100kb)
Small (7kb)
- 1503 variants, 115 Mb, 800 genes structurally
variant - Non-randomly distributed
- Environmental interaction/segmental duplications
- Normal individuals differ by thousands of events
Eichler (2006) Nat. Genet
http//humanparalogy.gs.washington.edu/structuralv
ariation
4Structural Variation Disease Hypothesis
- Structural variants are common, more likely to
be under selective constraint, more recurrent and
associated with environmental interaction genes. - We hypothesize that they will have a significant
impact on rare and common disease in the
population that can not be tracked by LD
association mapping - Approach Genomics Architecture ApproachTarget
dynamic regions and then follow up with disease.
5Objectives.
- Discovery of Novel Genomic Disorders
- Sequence-Based Resolution of Structural Variation
6Model of Genomic Disease/Variation
A
B
C
TEL
A
B
C
TEL
Aberrant Recombination
GAMETES
Human Disease
Triplosensitive, Haploinsufficient and Imprinted
Genes
- Hypothesis Mechanism underlying Uncharacterized
Mental Retardation?
7Duplication Map of Human Genome
- 130 candidate regions (298 Mb)
- 23 associated with genetic disease
- Target patients array CGH
Bailey et al. (2002), Science2931003-1007
8Array Comparative Genomic Hybridization
12 mm
- High-throughput detection of large-scale
variation (gt50 kb), - CNV (Copy-Number Variation)
9Duplication Microarray Experimental Design
BACs
TEL
dist gt50 kblt5 Mb prop 95 identity, 10 kb
- 130 regions of the human genome
- 2178 BACs or on average 10-12 BACs per region
- Perform ArrayCGHreciprocal dye swap experiments
- Study Population
- Normal 269 samples 75 individual samples344
total (284 unrelated 60 trios) - Idiopathic Mental Retardation 291 probands
(Flint cohort, negative for FraX normal
karyotype) -
-
10Study Populations
- Normal unaffected (diversity panel and HapMap
Samples). Target 800 samples, Completed 75
269 samples344 total (284 unrelated 60 trios).
- Idiopathic Mental Retardation
11 Large-Scale CNPs among HapMap Samples
108 CNP novel BACs
149 CNP BACs previously identified
Altered CNV Frequency
- Total of 384 CNV BACs among HapMap DNAs (n263
HapMap samples passed criteria) - A total of 257 CNP BACs were identified (gt2
individuals) - 147 CNP observed previously in our study of
sample of 55 or published by others - 3.1 (9/257) were population-specific
- 127 singleton BAC observations
12Validation using Nimblegen High-Density Oligo
Arrays
Log2 T/R
Deletion
Duplication
- n9, high density oligonucleotide array (385,000)
- Overall, validation for 194/257 (75.4 true
positives).
Locke, Sharp, McCarroll et al., in press
13Study Populations
- Normal unaffected (diversity panel and HapMap
Samples). Target 800 samples, Completed 75
269 samples344 totalIdentified additional 257
CNPs.
- Idiopathic Mental Retardation
- Target 700 samples (300 samples Flint/Knight,
400 CWRU samples) 291 complete
14Evidence for Novel Recurrent Microdeletion
Syndromes from the Screening of 291 Mental
Retardation/MCA Cases
4 individuals with apparently identical
microdeletion never seen in gt300 normals
15Refinement of the Breakpoints of 17q21.3
Microdeletion
- Customized oligonucleotide microarray (n11,000)
Seg Dups
IMR103 Father
1.0
0.0
IMR103 Mother
Log2 Relative Hyb intensity
-1.0
1.0
0.0
IMR103
-1.0
1.0
0.0
-1.0
(1.5 STDEV)
1617q21.3 Microdeletion is Recurrent.
Normal
Normal
Affected IMR103
Affected IMR253
Affected IMR255
Affected IMR376
Normal
Normal
- 4/291 patients , estimate 1 of mental
retardation
17Clinical Phenotypes of 17q21.3 Microdeletion
Patients
IMR253-moderate dev. delay, sparse eyebrows,
protruding tongue, low-set lop ears, large nose,
seizures (4 yrs), markedly blonde hair
IMR669-delayed speech, hypotonia, mild
dysmorphic features, seizures fits until 16
months, upslanting palpebral fissure prominent
philtrum, bulbous nose, fair
IMR376-severe learning difficulties, markedly
hypotonia, hypopigmented, Mongolian slant, pale
blue, almond shaped eyes, protruding tongue,
extensible joints
IMR669
18(No Transcript)
191q21.1 Duplication-Mediated Microdeletion
IMR43
Father
Mother
2.4 Mb
2015q24.1-24.2 Duplication-Mediated Microdeletion
IMR349
Father
Mother
4.7 Mb
21Variation in IMR
- 291 IMR samples (Oxford Cohort) screened to date
- 23 (n31) novel sites of variation defined by gt2
BACs
- 5 are seen in more
- than one unrelated patient
- 7/9 events are de novo
- Novel Genomic Disorder Candidates
22Objectives.
- Discovery of Novel Genomic Disorders
- Sequence-Based Resolution of Structural Variation
23Intermediate-Size Structural Variation (ISV) and
Inversions
Gene
Phenotype
Size
Locus
Freq.
Type
Dup
GSTT1
halothane/epoxide sensitivity
17kb/94
54.3 kb
22q11.2
20 -/-
Deletion
DEF3A-OR
heart disease susceptibility
5 Mb
8p23
26 -/
Inversion
400kb/98.9
EMD/FLN
none
219 kb
Xq28
33 -/
Inversion
48kb/99
HERC2
susceptibility to Angelman syndrome
3 Mb
15q11.2
4 -/
Inversion
gt300kb/99.8
IGVH26
immune response variation
Variable
14q32.3
4-15 /-
Deletion/Dup
91-97
toxin resistance, cancer susceptibility
GSTM1
18 kb
1p13.3
50 -/-
Deletion
24kb/95.6
CYP2D6
1-29
Antidepressant drug resistance
5 kb
22q13.1
Duplication
5.4kb/91-97
CYP21A2
Congenital adrenal hyperplasia
35 kb
6p21.3
1.6 /-
Duplication
0
CYP2A6
nicotine metabolism
7 kb
19q13.2
1.3 /-
Duplication
24kb/96.2
SMN2
SMA susceptibility
88.7/99.8
gt100 kb
5q13
50 /-
Duplication
Adapted from Buckland, Ann Med
24Genome-wide Detection of Structural Variation
(gt8kb) by Fosmid End-Sequence Pairs
- Identified 295 potential candidates, Deletions,
Insertions Inversions - enriched among environmental interaction genes
Tuzun et al., 2005
25Fine-Scale Structural Variation Map (build35
vs. Fosmids)
- 1.3 Discordant Fosmids
- Identify 295 clusters (2 or more)
- 246 supported by second haplotype
- 147 inserts, 93 deletions, 57 inverts
- 18 putative L1 events10 deletions
- and 8 insertions (6 kb insertion)
- 89 locate within gene regions.
- 138 unique regions of the genome
- 159 duplicated regions of the genome
Insertion(Fosmid)
Deletion
Inversions
Heterochromatic regions
Duplicated regions
26Sequenced Structural Variation of APOBEC3B
- 24.5 kb deletion eliminates most of APOBEC3B but
creates fusion gene - Fusion APOBEC3A/3B lt1 frequency Africans, gt35
Papua New Guineans
27Sequenced Structural Variation of DEFA1
gt25 kb duplication DEFA1, additional exon 3
28Sequenced Structural Variation of LCE1E/D
Oligo ArrayCGH
Log2 Relative Signal Intensity
- 9.2 kb deletion of LCE1D gene creates a fusion
gene LCE1D/E - confirmed in 3 unrelated individuals by
oligonucleotide microarray technology
http//humanparalogy.gs.washington.edu/structuralv
ariation
29Sequencing Genic Structural Variation
30PCR Breakpoint Genotyping Assays for Structural
Variation
- Tested 11 structural variants (5 insertions, 4
deletions, and 2 inversions) - 7 successful assays (6 gt20 minor allele
frequency)
31Genotyping Illumina Golden-Gate Assays for
Binary Events
Newman et al., 2006
32- A Human Genome Structural Variation Initiative
- 2 scientific meetings (2005)
- 2 working groups (AHG, MSWG (12/05)
- Coordinating Committee (1/06)
- NIH Council (2/06)
- Press Release (3/15/06)
- Goal Complete Characterization
- of Structural Variation in
- 48 HapMap Samples
CEPH
33A Structural Variation Map of the Human Genome
(lt1 Mb)
ABC9 Japanese
ABC8 African
ABC7 African
G248 Hispanic
Insertion(Fosmid)
Deletion
Inversions
Heterochromatic regions
Duplicated regions
Putative Structural Variants from Four Individuals
34(No Transcript)
35Summary
- Genomic architecture approach systematically
identify - dynamic regions of structural variation
gtgtgtgtphenotype
- Large-Scale Variation
- Normals Identified 257 CNPs using a targeted
- microarray to duplicated regions
- IMR Identified 31 sites (gt2 BACs) unique to
patients - (n291 probands) (5 are recurrent and 3 are
confirmed de novo) - Goal Discovery of Novel Genomic Disorders
- Fine-Scale Variation Developed an approach to
map and - sequence common fine-scale variation within the
human - 1000 differences gt 8 kb from 4 individuals.
- Goal Disease association with common
disease/susceptibility
36Acknowledgements
Nimblegen Rebecca Selzer Peggy Eis
MIT Steve McCarroll David Altshuler
Eichler Lab Andy Sharp Devin Locke Sierra
Hansen Sean McGrath Eray Tuzun Matthew
Johnson Zhaoshi Jiang Jon Bleyhl Tera Newman Jeff
Bailey Anne Morrison Lisa Pertz Ze Cheng Xinwei
She James Sprague
UCSF Dan Pinkel Donna Albertson
UWGSC Maynard Olson Rajinder Kaul Hillary
Hayden Eric Haugen
CWRU/UChicago Stuart Schwartz Laurie Christ
Agencourt Doug Smith
Oxford Jonathan Flint Samantha Knight
NHGRI Jim Mullikin AHG/MSWG
UW Mark Rieder Debbie Nickerson
37Capturing Smaller Variants (lt8 kb)
a)
Select gt2 STDEV Clones
96 clones
Fingerprint MCD
Select 32 variant clones
Sequence
38The Missing Human Genome
39Finding Novel Human Sequence
40Fosmid Pairs that fail to Map to build35
- 1573 fosmid paired-end sequences fail to map to
build 35. - 644 have 150 bp gtQ30 at either end and have gt100
bp unique seq - 565 of these have no hit to HTGS BAC sequence
- Four independent restriction enzymes (EcoR I,
Hind III, Bgl II and Nsi I ) - 26 Contigs (constructed from Composite Mutual
Overlap Statistic (CMOS) and 94 singletons - Range in size from 208 kb to 38 kb (based on
fingerprint data). - Do they represent human sequence?
41FISH Summary of Orphan Fosmids
- 119 contigs/singletons tested by FISH
- 22 contigs (1,296 kb) acro, 12 contigs (458 kb)
pericentromeric, - 48 contigs (2,577 kb) subtelomeric
- 32 interstitial euchromatin (9 corresponding to
known gaps)
42Hybridization
2
R921
1.5
1
0.5
0
-0.5
-1
-1.5
D3767
1.5
1-3
5
10
15
20
4-5
1
6
0.5
Log2 Hybridization Relative Intensity Test/Referen
ce
7-14
0
15
-0.5
16-20
-1
-1.5
1.5
0
5
10
15
20
R1080
1
0.5
0
-0.5
-1
-1.5
-2
0
5
10
15
20
BAC Probes
43Genomic Variation
Forms of genetic variation.
Sequence
- Single base-pair changes point mutations
- Small insertions/deletions frameshift,
microsatellite, minisatellite - Mobile elementsretroelement insertions (300bp
-10 kb in size) - Large-scale genomic variation (gt10 kb)
- Large-scale Deletions
- Segmental Duplications
- Chromosomal variationtranslocations, inversions,
fusions.
Cytogenetics