Title: Reconstructing Genomic Architectures of Tumor Genomes
1Reconstructing Genomic Architectures of Tumor
Genomes
- Pavel Pevzner and Ben Raphael
Department of Computer Science Engineering
University of California, San Diego (joint work
with Colin Collins lab at UCSF Cancer Center)
2Chromosome Painting Normal Cells
3Chromosome Painting Tumor Cells
4Rearrangements in Tumors
- Change gene structure and regulatory wiring of
the genome. - Create bad novel fusion genes and break good
old genes. - Example translocation in leukemia.
Chromosome 9
promoter
ABL gene
Chromosome 22
promoter
BCR gene
promoter
BCR-ABL oncogene
- GleevecTM (Novartis 2001) targets BCR-ABL
oncogene.
5Complex Tumor Genomes
- What are detailed architectures of tumor genomes?
- What rearrangements/duplications produce these
architectures and what is the order of these
events? - What are the novel fusion genes and old broken
genes?
6Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 80 million yearsago
Human (X chrom.)
- What are the the architectural blocks forming
the existing genomes and how to find them? - What is the architecture of the ancestral genome?
- What is the evolutionary scenario for
transforming one genome into the other?
7History of Chromosome X
Rat Consortium, Nature, 2004
8Inversions
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
- Blocks represent conserved genes.
- In the course of evolution or in a clinical
context, blocks 1,2,,10 could be misread as - 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.
9Inversions
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
- Blocks represent conserved genes.
- In the course of evolution or in a clinical
context, blocks 1,,10 could be misread as 1, 2,
3, -8, -7, -6, -5, -4, 9, 10. - Evolution occurred one-two times every million
years. - Cancer may occur every month.
10Inversions
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The inversion introduced two breakpoints(disrupti
ons in order).
11Turnip vs Cabbage Different Gene Order
12Turnip vs Cabbage Different Gene Order
13Turnip vs Cabbage Different Gene Order
14Turnip vs Cabbage Different Gene Order
15Turnip vs Cabbage Different Gene Order
Before
After
Evolution is manifested as the divergence in gene
order
16Human-Mouse-Rat Phylogeny
Bourque et al., Genome Research, 2004
17Comparative Genomics of Cancer
Mutation, selection
Tumor genome
Human genome
- Identify recurrent aberrations
- Mitelman Database, gt40,000 aberrations
- Identify temporal sequence of aberrations
- Linear model Colorectal cancer (Vogelstein,
1988) -5q ? 12p ? -17p ? -18q - Tree model (Desper et al.1999)
18Measuring Structural Changes in Tumors
Cytogenetics
- Directly visualize (fluorescently) labeled
chromosomes. - Chromosome banding, mFISH, SKY
- Weakness
- Physical location of chromosomal junctions not
revealed. Low resolution. - No/little information about copy number changes.
- Requires metaphase chromosomes.
19Measuring Copy Number Changes in Tumors CGH,
array CGH
- Weakness
- No information about structural rearrangements
(inversions, translocations) or about the
positions of duplicated material.
20End Sequence Profiling (ESP)C. Collins et al.
(UCSF Cancer Center)
- Pieces of tumor genome clones (100-250kb).
Tumor DNA
2) Sequence ends of clones (500bp).
3) Map end sequences to human genome.
y
x
Human DNA
Each clone corresponds to pair of end sequences
(ES pair) (x,y).
21ES Pairs
- Order ES pair such that x lt y.
- ES pair (x,y) is
- valid if
- x,y on same chromosome. and
- l y x L, min (max) size of clone.
- x, y have opposite, convergent orientations
- invalid, otherwise.
- Results from rearrangement or experimental
noise.
y
x
L
22Tumor Genome Reconstruction Puzzle
Human genome (known)
B
C
E
A
D
Unknown sequence of rearrangements
Tumor genome (unknown)
-C
-D
E
A
B
Map ES pairs to human genome.
Reconstruct tumor genome
Location of ES pairs in human genome. (known)
23Tumor Genome Reconstruction
E
B
Tumor
-D
-C
A
B
C
E
A
D
Human
24Tumor Genome Reconstruction
25Tumor Genome Reconstruction
E
(x3,y3)
B
(x2,y2)
Tumor
-D
(x4,y4)
-C
(x1,y1)
A
B
C
E
A
D
y4 y3
x1 x2
x3 x4
y1 y2
26ESP Plot
E
(x3,y3)
(x4,y4)
D
(x2,y2)
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
(x1,y1)
C
B
A
B
C
E
A
D
Human
27E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
28E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
29E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
30E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
31E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
32E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
33E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
34E
D
- 2D Representation of ESP Data
- Each point is ES pair.
- Can we reconstruct the tumor genome from the
positions of the ES pairs?
Human
C
B
A
B
C
E
A
D
Human
35E
E
D
-D
Human
C
-C
B
B
A
A
B
E
D
A
C
Human
Reconstructed Tumor Genome
36Real data noisy and incomplete!
37Computational Framework
- Use knowledge of known rearrangement mechanisms
- e.g. inversions, translocations, etc.
- Find simplest explanation for data, given these
mechanisms. - Motivation Sorting by Reversals
38ESP Sorting Problem
- G 0,M, unichromosomal genome.
- Reversal ?s,t(x) x, if x lt s or x gt t,
- t (x s), otherwise.
B
C
A
G
x1
y1
x2
y2
t
s
?
-B
A
G ?G
x1
y1
x2
y2
- Given ES pairs (x1, y1), , (xn, yn)
- Find Minimum number of reversals ?s1,t1, , ?sn,
tn such that if ? ?s1,t1 ?sn, tn then (? x1,
? y1 ), , (? xn, ? yn) are valid ES pairs.
39B
C
A
x1
y1
t
x3
y2
x2
y3
s
?
-C
-B
A
x1
y1
y3
x3
y2
x2
t
Sequence of reversals.
s
All ES pairs valid.
s
t
40Sources of Invalid Pairs
- Chimeric clone random joining of pieces from
tumor genome (noise!). - rarely use DNA from same genomic regions.
- Corresponds to isolated, invalid ES pair (x,y)
- d(x,x) d(y,y) gt 2L for all (x,y) ? (x,y).
- Composite clone contain rearrangement
breakpoint - Give clusters of invalid BES pairs.
human
tumor
y1
x2
x3
y3
y2
x1
y1
x2
x3
y3
y2
x1
41ESP Genome Reconstruction Discrete Approximation
- Remove isolated invalid ES pairs (x,y).
- Divide genome into synteny blocks using clusters
C1, , Ck
l xi s yi t L
Locations of block junctions underdetermined
42Discrete Approximation
C1
B
C
D
E
F
G
A
C2
B
C
D
E
F
G
A
C3
43Reversals(also called inversions)
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
- Classically, blocks represent conserved genes.
- In the course of evolution or in a clinical
context, blocks 1,,10 could be misread as 1, 2,
3, -8, -7, -6, -5, -4, 9, 10. - Clinical occurs in many cancers.
- Evolution occurred about once-twice every
million years on the evolutionary path between
human and mouse.
44Reversals(also called inversions)
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
- Classically, blocks represent conserved genes.
- In the course of evolution or in a clinical
context, blocks 1,,10 could be misread as 1, 2,
3, -8, -7, -6, -5, -4, 9, 10. - Clinical occurs in many cancers.
- Evolution occurred one-two times every million
years on the evolutionary path between human and
mouse.
45Reversals(also called inversions)
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The inversion introduced two breakpoints(disrupti
ons in order).
46Sorting signed permutations
- Sorting by reversals (Sankoff et al.1990)
Signed permutation ? ?1?2?n
Reversal ?(i,j) ? ?(i,j) ?1?i-1 -?j ... -?i
?j1?n
Given ?, find a series of reversals ?1, , ?t
such that ? ?1 ?2 . ?t (1, 2, , n) and t is
minimal.
47Sorting by reversalsMost parsimonious scenarios
The reversal distance is the minimum number of
reversals required to transform p into g.Here,
the reversal distance is d4.
48Breakpoint graph
- Breakpoint Graph (Bafna and Pevzner, FOCS 94)
- DualityTheorem (Hannenhalli-Pevzner, STOC 1995)
d n 1 c h f where
c cycles h,f are rather complicated, but
can be computed from graph in polynomial time. - Here, d 8 1 5 0 0 4
49Breakpoint graph
- DualityTheorem for Sorting by Reversals simple
but imprecise version. - reversal distance number of elements 1
number of cycles
50Complexity of reversal distance
51GRIMM-Synteny on X chromosome From anchors to
synteny blocks
52GRIMM-Synteny on X chromosome 2-dimensional
breakpoint graph
53GRIMM-Synteny on X chromosome 2-dimensional
breakpoint graph
54GRIMM-Synteny on X chromosome2-dimensional
breakpoint graph
55GRIMM-Synteny on X chromosome2-dimensional
breakpoint graph
56Breakpoint Graph
Signed permutation ? A -C F -D B -E G
start
-C
F
-D
B
-E
G
end
A
Black edges adjacent elements of ? Gray edges
adjacent elements of i A B C D E F G For ?
?1?2?n, d(?) n1 - c(?) h(?) f(?)
Discrete approximation to ESP constructs
breakpoint graph from clusters.
57Multichromosomal Extension
- Concatenate chromosomes
- Translocations modeled by reversals in
concatenate - Fissions/fusions modeled by reversals with
empty chromosomes - Minimal sequence polynomial time (Hannenhalli
Pevzner 1995, Tesler 2003, Ozery-Flato and
Shamir, 2003.)
B2
A1
A2
A1
translocation
B2
B1
A2
B1
concatenation
concatenation
reversal
A2
A1
-B2
-B1
B2
A1
58Breast Cancer Tumor Genome
- MCF7 is human breast cancer cell line.
- Cytogenetic analysis (low-resolution) suggests
complex architecture. - Many translocations, inversions.
From Kytölä, et.al. Genes, Chroms Cancer
28308-317 (2000).
59ESP Data from MCF7 tumor genome
(Concatenation of 23 human chromosomes)
- Each point (x,y) is ES pair.
- 15005 clones (Oct. 2003)
- 11240 ES pairs
- 10453 valid (black)
- 737 invalid
- 489 isolated (red)
- 248 form 70 clusters
- (blue)
60Breast Cancer MCF7 Cell Line
Human chromosomes
MCF7 chromosomes
5 inversions 15 translocations
Raphael et al. 2003.
61Sparse Data Assumptions
- Each cluster results from single reversal.
2. Each clone contains at most one breakpoint.
62Complications with MCF7 Chromosomes 1,3,17, 20
33/70 clusters Total length 31Mb
63Rearrangement Signatures
Human
Tumor
inversion
A
C
B
A
C
-B
s
t
s
t
64Complex Tumor Genomes
65Structure of Duplications in Tumors?
- Duplicated segments may co-localize
- (Guan et al. Nat.Gen.1994)
Human genome
Tumor genome
- Mechanisms not well understood.
66Tumor Amplisomes
67Analyzing Duplications
duplication
A
B
C
D
E
u
w
v
duple
v,w are boundary elements of duple
u
v
w
A
B
C
D
E
68Analyzing Duplications
duplication
B
A
D
B
C
D
E
C
D
E
A
u
w
v
w
v
u
D
Path between boundary elements resolves duple.
B
A
u
v
w
C
D
E
A
B
69Duplication Complications
????
A
B
C
E
u
w
v
w
These configurations are frequent in MCF7 ESP
data.
v
u
70Duplication Complications
??
A
B
C
D
A
B
C
D
u
u
w
v
w
These configurations are frequent in MCF7 ESP
data.
v
u
71Resolving Duplication Complications
A
B
A
B
E
C
D
u
u
w
v
w
Path between boundary elements resolves duple.
v
u
w
v
72Resolving Duplication Complications
A
B
A
B
C
E
u
u
w
v
w
Multiple paths between duple boundary elements.
v
u
w
v
73Many Paths in MCF7!
74Duplication by Amplisome
Gives single model for all duplications
75Amplisome Reconstruction Problem
- Assume
- Tumor genome sequence is known.
- Insertions are independent,
- i.e. no insertions within insertions
- Approach
- Identify duplicated sequences A1, , Am
- Amplisome is shortest common superstring of A1,
, Am
76ESP Graph
- Black edges
- ES pairs.
- Adjacent blocks in human genome.
- Red edges
- blocks in human genome.
77Amplisome Reconstruction Problem
- Assume
- Tumor genome sequence is known.
- Insertions are independent,
- i.e. no insertions within insertions
- Approach
- Identify duplicated sequences A1, , Am
- Amplisome is shortest common superstring of A1,
, Am
78Reconstruction with Amplisomes
79Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Represent ESP data as a graph.
- Identify sites of duplication (duples).
- Search in graph to select amplisome structure.
80ESP Graph
- Black edges
- ES pairs.
- Adjacent blocks in human genome.
Red edges blocks in human genome.
81Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Represent ESP data as a graph.
- Identify sites of duplication (duples).
- Search in graph to select amplisome structure.
82Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Search in graph to select amplisome structure.
- Find shortest path containing subpaths between
each pair of duple boundary elements (vi, wi).
83Alternating Superpath Problem
- Given
- Edge colored ESP graph H (red/black edges).
- Pairs of vertices (v1, w1), , (vm, wm) in H.
- Find
- Shortest alternating red/black path (cycle) A
in H such that A contains an alternating path
from vi to wi for each i 1, , m. -
(or for largest number of i)
84Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Search in graph to select amplisome structure.
- Find shortest path containing subpaths between
each pair of duple boundary elements (vi, wi).
85Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Represent ESP data as a graph.
- Identify sites of duplication.
- Search in graph to select amplisome structure.
86Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Represent ESP data as a graph.
- Identify sites of duplication.
- Search in graph to select amplisome structure.
87Amplisome Reconstruction Problem
- Problem
- Find amplisome whose insertions best explain ESP
data. - Approach
- Represent ESP data as a graph.
- Identify sites of duplication.
- Search in graph to select amplisome structure.
88Reconstructed MCF7 amplisome
17
20
1
3
Chromosome colors
33 clusters Total length 31Mb
Explains 24/33 invalid clusters.
Raphael and Pevzner, 2004.
89 Sequencing Tumor Clones Confirms
Complex Mosaic Structure
90Whats Next?
- Human Genome Project, 2001
- Mouse Genome Project, 2002
- Rat Genome Project, 2003
- Chicken Genome Project, 2004
- Chimp Genome Project, 2005
- ???
91Tumor Genomes Projects
Mutation, selection
Tumor genome
Human genome
- Identify recurrent aberrations
- Identify temporal sequence of aberrations
- Use these data for tumor diagnostics and
therapeutics
92Current/Future Projects
- Unified model
- Duplications and rearrangements.
- Combine ESP and array-CGH data.
- Annotation/experimental verification of
genome/amplisome structure. - Primary tumors breast, brain, prostate, ovary
93Open Problems
- Analysis/algorithms for ESP Sorting Problem.
- Amplisome Reconstruction when allow insertions
within insertions. - Other models?
94Acknowledgements
University of California, San Diego
Colin Collins Stas Volik Joe Gray
Cancer Research Center University of California,
San Francisco