Reconstructing Genomic Architectures of Tumor Genomes - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Reconstructing Genomic Architectures of Tumor Genomes

Description:

Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California, San Diego – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 35
Provided by: BenR154
Category:

less

Transcript and Presenter's Notes

Title: Reconstructing Genomic Architectures of Tumor Genomes


1
Reconstructing Genomic Architectures of Tumor
Genomes
  • Pavel Pevzner and Ben Raphael

Department of Computer Science Engineering
University of California, San Diego (joint work
with Colin Collins lab at UCSF Cancer Center)
2
Chromosome Painting Normal Cells
3
Chromosome Painting Tumor Cells
4
Rearrangements in Tumors
  • Change gene structure and regulatory wiring of
    the genome.
  • Create bad novel fusion genes and break good
    old genes.
  • Example translocation in leukemia.

Chromosome 9
promoter
ABL gene
Chromosome 22
promoter
BCR gene
promoter
BCR-ABL oncogene
  • GleevecTM (Novartis 2001) targets BCR-ABL
    oncogene.

5
Complex Tumor Genomes
  • What are detailed architectures of tumor genomes?
  • What rearrangements/duplications produce these
    architectures and what is the order of these
    events?
  • What are the novel fusion genes and old broken
    genes?

6
Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 80 million yearsago
Human (X chrom.)
  • What are the the architectural blocks forming
    the existing genomes and how to find them?
  • What is the architecture of the ancestral genome?
  • What is the evolutionary scenario for
    transforming one genome into the other?

7
History of Chromosome X
Rat Consortium, Nature, 2004
8
Inversions
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
  • Blocks represent conserved genes.
  • In the course of evolution or in a clinical
    context, blocks 1,2,,10 could be misread as
  • 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.

9
Inversions
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
  • Blocks represent conserved genes.
  • In the course of evolution or in a clinical
    context, blocks 1,,10 could be misread as 1, 2,
    3, -8, -7, -6, -5, -4, 9, 10.
  • Evolution occurred one-two times every million
    years.
  • Cancer may occur every month.

10
Inversions
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The inversion introduced two breakpoints(disrupti
ons in order).
11
Turnip vs Cabbage Different Gene Order
12
Turnip vs Cabbage Different Gene Order
13
Turnip vs Cabbage Different Gene Order
14
Turnip vs Cabbage Different Gene Order
15
Turnip vs Cabbage Different Gene Order
  • Gene order comparison

Before
After
Evolution is manifested as the divergence in gene
order
16
Human-Mouse-Rat Phylogeny
Bourque et al., Genome Research, 2004
17
Comparative Genomics of Cancer
Mutation, selection
Tumor genome
Human genome
  • Identify recurrent aberrations
  • Mitelman Database, gt40,000 aberrations
  • Identify temporal sequence of aberrations
  • Linear model Colorectal cancer (Vogelstein,
    1988) -5q ? 12p ? -17p ? -18q
  • Tree model (Desper et al.1999)

18
Measuring Structural Changes in Tumors
Cytogenetics
  • Directly visualize (fluorescently) labeled
    chromosomes.
  • Chromosome banding, mFISH, SKY
  • Weakness
  • Physical location of chromosomal junctions not
    revealed. Low resolution.
  • No/little information about copy number changes.
  • Requires metaphase chromosomes.

19
Measuring Copy Number Changes in Tumors CGH,
array CGH
  • Weakness
  • No information about structural rearrangements
    (inversions, translocations) or about the
    positions of duplicated material.

20
End Sequence Profiling (ESP)C. Collins et al.
(UCSF Cancer Center)
  1. Pieces of tumor genome clones (100-250kb).

Tumor DNA
2) Sequence ends of clones (500bp).
3) Map end sequences to human genome.
y
x
Human DNA
Each clone corresponds to pair of end sequences
(ES pair) (x,y).
21
ES Pairs
  • Order ES pair such that x lt y.
  • ES pair (x,y) is
  • valid if
  • x,y on same chromosome. and
  • l y x L, min (max) size of clone.
  • x, y have opposite, convergent orientations
  • invalid, otherwise.
  • Results from rearrangement or experimental
    noise.

y
x
L
22
Tumor Genome Reconstruction Puzzle
Human genome (known)
B
C
E
A
D
Unknown sequence of rearrangements
Tumor genome (unknown)
-C
-D
E
A
B
Map ES pairs to human genome.
Reconstruct tumor genome
Location of ES pairs in human genome. (known)
23
Tumor Genome Reconstruction
E
B
Tumor
-D
-C
A
B
C
E
A
D
Human
24
Tumor Genome Reconstruction
25
Tumor Genome Reconstruction
E
(x3,y3)
B
(x2,y2)
Tumor
-D
(x4,y4)
-C
(x1,y1)
A
B
C
E
A
D
y4 y3
x1 x2
x3 x4
y1 y2
26
ESP Plot
E
(x3,y3)
(x4,y4)
D
(x2,y2)
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
(x1,y1)
C
B
A
B
C
E
A
D
Human
27
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
28
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
29
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
30
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
31
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
32
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
33
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
34
E
D
  • 2D Representation of ESP Data
  • Each point is ES pair.
  • Can we reconstruct the tumor genome from the
    positions of the ES pairs?

Human
C
B
A
B
C
E
A
D
Human
35
E
E
D
-D
Human
C
-C
B
B
A
A
B
E
D
A
C
Human
Reconstructed Tumor Genome
36
Real data noisy and incomplete!
37
Computational Framework
  • Use knowledge of known rearrangement mechanisms
  • e.g. inversions, translocations, etc.
  • Find simplest explanation for data, given these
    mechanisms.
  • Motivation Sorting by Reversals

38
ESP Sorting Problem
  • G 0,M, unichromosomal genome.
  • Reversal ?s,t(x) x, if x lt s or x gt t,
  • t (x s), otherwise.

B
C
A
G
x1
y1
x2
y2
t
s
?
-B
A
G ?G
x1
y1
x2
y2
  • Given ES pairs (x1, y1), , (xn, yn)
  • Find Minimum number of reversals ?s1,t1, , ?sn,
    tn such that if ? ?s1,t1 ?sn, tn then (? x1,
    ? y1 ), , (? xn, ? yn) are valid ES pairs.

39
B
C
A
x1
y1
t
x3
y2
x2
y3
s
?
-C
-B
A
x1
y1
y3
x3
y2
x2
t
Sequence of reversals.
s
All ES pairs valid.
s
t
40
Sources of Invalid Pairs
  • Chimeric clone random joining of pieces from
    tumor genome (noise!).
  • rarely use DNA from same genomic regions.
  • Corresponds to isolated, invalid ES pair (x,y)
  • d(x,x) d(y,y) gt 2L for all (x,y) ? (x,y).
  • Composite clone contain rearrangement
    breakpoint
  • Give clusters of invalid BES pairs.

human
tumor
y1
x2
x3
y3
y2
x1
y1
x2
x3
y3
y2
x1
41
ESP Genome Reconstruction Discrete Approximation
  • Remove isolated invalid ES pairs (x,y).
  • Divide genome into synteny blocks using clusters
    C1, , Ck

l xi s yi t L
Locations of block junctions underdetermined
42
Discrete Approximation
C1
B
C
D
E
F
G
A
C2
B
C
D
E
F
G
A
C3
43
Reversals(also called inversions)
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
  • Classically, blocks represent conserved genes.
  • In the course of evolution or in a clinical
    context, blocks 1,,10 could be misread as 1, 2,
    3, -8, -7, -6, -5, -4, 9, 10.
  • Clinical occurs in many cancers.
  • Evolution occurred about once-twice every
    million years on the evolutionary path between
    human and mouse.

44
Reversals(also called inversions)
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
  • Classically, blocks represent conserved genes.
  • In the course of evolution or in a clinical
    context, blocks 1,,10 could be misread as 1, 2,
    3, -8, -7, -6, -5, -4, 9, 10.
  • Clinical occurs in many cancers.
  • Evolution occurred one-two times every million
    years on the evolutionary path between human and
    mouse.

45
Reversals(also called inversions)
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
The inversion introduced two breakpoints(disrupti
ons in order).
46
Sorting signed permutations
  • Sorting by reversals (Sankoff et al.1990)

Signed permutation ? ?1?2?n
Reversal ?(i,j) ? ?(i,j) ?1?i-1 -?j ... -?i
?j1?n
Given ?, find a series of reversals ?1, , ?t
such that ? ?1 ?2 . ?t (1, 2, , n) and t is
minimal.
47
Sorting by reversalsMost parsimonious scenarios
The reversal distance is the minimum number of
reversals required to transform p into g.Here,
the reversal distance is d4.
48
Breakpoint graph
  • Breakpoint Graph (Bafna and Pevzner, FOCS 94)
  • DualityTheorem (Hannenhalli-Pevzner, STOC 1995)
    d n 1 c h f where
    c cycles h,f are rather complicated, but
    can be computed from graph in polynomial time.
  • Here, d 8 1 5 0 0 4

49
Breakpoint graph
  • DualityTheorem for Sorting by Reversals simple
    but imprecise version.
  • reversal distance number of elements 1
    number of cycles

50
Complexity of reversal distance
51
GRIMM-Synteny on X chromosome From anchors to
synteny blocks
52
GRIMM-Synteny on X chromosome 2-dimensional
breakpoint graph
53
GRIMM-Synteny on X chromosome 2-dimensional
breakpoint graph
54
GRIMM-Synteny on X chromosome2-dimensional
breakpoint graph
55
GRIMM-Synteny on X chromosome2-dimensional
breakpoint graph
56
Breakpoint Graph
Signed permutation ? A -C F -D B -E G
start
-C
F
-D
B
-E
G
end
A
Black edges adjacent elements of ? Gray edges
adjacent elements of i A B C D E F G For ?
?1?2?n, d(?) n1 - c(?) h(?) f(?)
Discrete approximation to ESP constructs
breakpoint graph from clusters.
57
Multichromosomal Extension
  • Concatenate chromosomes
  • Translocations modeled by reversals in
    concatenate
  • Fissions/fusions modeled by reversals with
    empty chromosomes
  • Minimal sequence polynomial time (Hannenhalli
    Pevzner 1995, Tesler 2003, Ozery-Flato and
    Shamir, 2003.)

B2
A1
A2
A1
translocation
B2
B1
A2
B1
concatenation
concatenation
reversal
A2
A1
-B2
-B1
B2
A1
58
Breast Cancer Tumor Genome
  • MCF7 is human breast cancer cell line.
  • Cytogenetic analysis (low-resolution) suggests
    complex architecture.
  • Many translocations, inversions.

From Kytölä, et.al. Genes, Chroms Cancer
28308-317 (2000).
59
ESP Data from MCF7 tumor genome
(Concatenation of 23 human chromosomes)
  • Each point (x,y) is ES pair.
  • 15005 clones (Oct. 2003)
  • 11240 ES pairs
  • 10453 valid (black)
  • 737 invalid
  • 489 isolated (red)
  • 248 form 70 clusters
  • (blue)

60
Breast Cancer MCF7 Cell Line
Human chromosomes
MCF7 chromosomes
5 inversions 15 translocations
Raphael et al. 2003.
61
Sparse Data Assumptions
  1. Each cluster results from single reversal.

2. Each clone contains at most one breakpoint.
62
Complications with MCF7 Chromosomes 1,3,17, 20
33/70 clusters Total length 31Mb
63
Rearrangement Signatures
Human
Tumor
inversion
A
C
B
A
C
-B
s
t
s
t
64
Complex Tumor Genomes
65
Structure of Duplications in Tumors?
  • Duplicated segments may co-localize
  • (Guan et al. Nat.Gen.1994)

Human genome
Tumor genome
  • Mechanisms not well understood.

66
Tumor Amplisomes
67
Analyzing Duplications
duplication
A
B
C
D
E
u
w
v
duple
v,w are boundary elements of duple
u
v
w
A
B
C
D
E
68
Analyzing Duplications
duplication
B
A
D
B
C
D
E
C
D
E
A
u
w
v
w
v
u
D
Path between boundary elements resolves duple.
B
A
u
v
w
C
D
E
A
B
69
Duplication Complications
????
A
B
C
E
u
w
v
w
These configurations are frequent in MCF7 ESP
data.
v
u
70
Duplication Complications
??
A
B
C
D
A
B
C
D
u
u
w
v
w
These configurations are frequent in MCF7 ESP
data.
v
u
71
Resolving Duplication Complications
A
B
A
B
E
C
D
u
u
w
v
w
Path between boundary elements resolves duple.
v
u
w
v
72
Resolving Duplication Complications
A
B
A
B
C
E
u
u
w
v
w
Multiple paths between duple boundary elements.
v
u
w
v
73
Many Paths in MCF7!
74
Duplication by Amplisome
Gives single model for all duplications
75
Amplisome Reconstruction Problem
  • Assume
  • Tumor genome sequence is known.
  • Insertions are independent,
  • i.e. no insertions within insertions
  • Approach
  • Identify duplicated sequences A1, , Am
  • Amplisome is shortest common superstring of A1,
    , Am

76
ESP Graph
  • Black edges
  • ES pairs.
  • Adjacent blocks in human genome.
  • Red edges
  • blocks in human genome.

77
Amplisome Reconstruction Problem
  • Assume
  • Tumor genome sequence is known.
  • Insertions are independent,
  • i.e. no insertions within insertions
  • Approach
  • Identify duplicated sequences A1, , Am
  • Amplisome is shortest common superstring of A1,
    , Am

78
Reconstruction with Amplisomes
79
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Represent ESP data as a graph.
  • Identify sites of duplication (duples).
  • Search in graph to select amplisome structure.

80
ESP Graph
  • Black edges
  • ES pairs.
  • Adjacent blocks in human genome.

Red edges blocks in human genome.
81
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Represent ESP data as a graph.
  • Identify sites of duplication (duples).
  • Search in graph to select amplisome structure.

82
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Search in graph to select amplisome structure.
  • Find shortest path containing subpaths between
    each pair of duple boundary elements (vi, wi).

83
Alternating Superpath Problem
  • Given
  • Edge colored ESP graph H (red/black edges).
  • Pairs of vertices (v1, w1), , (vm, wm) in H.
  • Find
  • Shortest alternating red/black path (cycle) A
    in H such that A contains an alternating path
    from vi to wi for each i 1, , m.

(or for largest number of i)
84
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Search in graph to select amplisome structure.
  • Find shortest path containing subpaths between
    each pair of duple boundary elements (vi, wi).

85
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Represent ESP data as a graph.
  • Identify sites of duplication.
  • Search in graph to select amplisome structure.

86
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Represent ESP data as a graph.
  • Identify sites of duplication.
  • Search in graph to select amplisome structure.

87
Amplisome Reconstruction Problem
  • Problem
  • Find amplisome whose insertions best explain ESP
    data.
  • Approach
  • Represent ESP data as a graph.
  • Identify sites of duplication.
  • Search in graph to select amplisome structure.

88
Reconstructed MCF7 amplisome
17
20
1
3
Chromosome colors
33 clusters Total length 31Mb
Explains 24/33 invalid clusters.
Raphael and Pevzner, 2004.
89
Sequencing Tumor Clones Confirms
Complex Mosaic Structure
90
Whats Next?
  • Human Genome Project, 2001
  • Mouse Genome Project, 2002
  • Rat Genome Project, 2003
  • Chicken Genome Project, 2004
  • Chimp Genome Project, 2005
  • ???

91
Tumor Genomes Projects
Mutation, selection
Tumor genome
Human genome
  • Identify recurrent aberrations
  • Identify temporal sequence of aberrations
  • Use these data for tumor diagnostics and
    therapeutics

92
Current/Future Projects
  • Unified model
  • Duplications and rearrangements.
  • Combine ESP and array-CGH data.
  • Annotation/experimental verification of
    genome/amplisome structure.
  • Primary tumors breast, brain, prostate, ovary

93
Open Problems
  • Analysis/algorithms for ESP Sorting Problem.
  • Amplisome Reconstruction when allow insertions
    within insertions.
  • Other models?

94
Acknowledgements
University of California, San Diego
  • Ray Brown

Colin Collins Stas Volik Joe Gray
Cancer Research Center University of California,
San Francisco
Write a Comment
User Comments (0)
About PowerShow.com