Title:
1alexes of all nations unite! Epicenter
Analysis in Cancer
- Alex Krasnitz, CSHL
- Search and knowledge building for biological
datasets, UCLA, 11.26-30, 2007 - Input segmented data from (ROMA) CGH.
- A predictive signal a whole-genome biomarker for
survival. - Pinning algorithm.
- Pins find cancer genes.
- Pins predict tissue of origin.
- Pins and progression.
2(ROMA) CGH in vitro and in silicoA method for
measuring relative copy numbers of short
fragments in a genome.
- A multistep process consisting of
- Digestion a restriction enzyme (BglII)
- PCR ? short (0.2-1.2kb) fragments are selected
- Hybridization to an oligonucleotide (50mer
probes) microarray (85K probe format used in
present study, higher resolution work in
progress) - Gridding
- Normalization
- Segmentation
- Thresholding
- CNP masking
- Horizontal slicing
3Raw and segmented ROMA profile FISH
validation of copy number variations detected by
ROMA.Segmentation algorithm (B. Lakshmi, M.
Wigler) replace a raw profile by a
piecewise-constant function minimizing variance.
4Cancer-free Female x Cancer-free Reference
MaleCNPs and SNPs are genetic markers
Heterozygous
CNPsCopy Number Polymorphisms
ROMA SNPs
Homozygous
5Typical tumor genomes are NOT normal. Still, they
may contain CNPs that must be filtered out.
6Genomic rearrangements in cancer (Bayani et al,
Seminars in Cancer Biology 17, 5, 2007)
7CNP masking determine positions of frequent CNPs
from a set of cancer-free genomes (500 cases)
excise these from cancer profiles in a minimally
intrusive fashion.
8Event identification horizontal slicing
- Allow multiple events at a locus in a profile.
- Select vertically non-overlapping segments of
maximal total length. These define tiers. - Assign remaining segments each to the closest
tier.
9Breast cancer study
- 257 frozen tissue samples of Scandinavian (140
Swedish, 117 Norwegian) origin. - Accompanied by clinical documentation.
Karolinska Inst.Sweden Total Node (pos/neg) Median Age At Diag. GradeI/II/III Size (mm) lt20/gt20 PR(/-) ER(/-) ERBB2amp/norm
Diploid (Survival gt7 yr) 60 28/31 52 8/11/33 19/41 41/9 43/7 3/57
Diploid (Survival lt7 yr) 39 14/25 57 3/12/16 11/25 20 /13 24/8 9/30
Aneuploid 41 28/13 49 0/2/22 21/20 14 /19 25/10 15/26
Oslo Micrometastasis Study (OMS) 123 52/46 63 10/50/41 44/55 43 /57 58 /44 27/76
progesterone (PR) and estrogen (ER) receptors
measured by ligand binding posgt0.5fg/mg
protein ERBB2 amplification scored by ROMA as
segmented ratio greater than 0.1 above baseline.
10A heuristic classification of breast cancer
profiles simplex, sawtooth and firestorm
Small of events overall per chromosome
Multiple events, no clustering
Multiple clustered events
11Fishers exact test strong association with
survival, no association with any clinical
parameter except age at diagnosis.
Initial observation firestorms lead to poor
survival. Quantify presence of firestorms by (sum
over inverse average lengths of adjacent
segments).
Is F a predictor of survival, and if so, is it
independent of clinical parameters?
Fd value Clinical parameter Discriminating principle p-value from Fishers test Odds ratio
0.08 Survival Above or below 7 yr 2.810-7 0.073
0.09 Survival Above or below 7 yr 5.910-7 0.070
0.1 Survival Above or below 7 yr 8.210-6 0.073
0.09 Grade 2 vs 3 0.39 0.58
0.09 Node condition Negative or positive 1.0 0.96
0.09 Size Smaller or larger than 20mm 0.40 (0.38 for 29) 0.62 (0.62 for 29)
0.09 ER status elow 0.05 fg/mg prot. 0.73 0.77
0.09 PR status Above or below 0.05 fg/mg prot. 0.75 0.70
0.09 HER2 amplification Above or below segment threshold 1.0 0.86
0.09 Age at diagnosis Above or below 57 years 0.0066 0.26
0.09 Adjuvant therapy -/ 0.44 0.64
0.09 Radiation therapy -/ 1.0 1.1
12KM plots for the Swedish diploid subset
(no significant change when adjusted for age at
diagnosis)
13Search for epicenters
- Key assumption observed amplifications and
deletions are more likely than not to confer a
selective advantage upon a neoplastic cell. - If so, expect frequently amplified regions of the
genome to be enriched in oncogenes. - Require methods for detecting such regions.
- Frequency plot inadequate.
14(No Transcript)
15Potential Benefits
- Massive data reduction (O(105) probes to 100
epicenters) a manageable set of predictors - Disentanglement
- Target selection for functional studies (cancer
gene finding)
16Pinning
- Consider a smallest unit of the genome containing
all its events (a chromosome). - For a given N, find N positions within that unit
that best explain the observed set of
(amplification or deletion) events, i.e., N
positions that are shared by the highest number
k(N) of events. - Multiple solutions occur, either due to a fuzzy
pin or due to N being too low. - Increment N until the increment I(N)k(N)-k(N-1)
reaches a pre-set minimal value. Note that I(N)
is a non-increasing function of N. - Pinning is convergent it is guaranteed to
recover the epicenters given enough data.
17Greedy pinning is not optimal
Greedy, N2 (5 out of 6)
Non-greedy, N2 (6 out of 6)
- Required exhaustive enumeration of all possible
N-pin configurations. - Pin positions a fixed grid or determined by
break points in the data. - In present data set up to 5 pins per chromosome,
O(100) pin positions. -
18Test of significance
- For the optimal N-pin solutions determine the
event score k(N), and the gain INk(N)-k(N-1). - Perform multiple whole-genome shuffles of the
events, including those of the opposite sign. For
each shuffle find its IN. Estimate a p-value by
comparison to the true IN.
19Interpretation of results consider only the
top-scoring pin configurations. Then, for pin i
in a top-scoring configuration, compute, at
coordinate x (the sum is over the inverse
lengths the events pinned by i and containing x)
Example 17q, 5 pins
20(No Transcript)
21(No Transcript)
22(No Transcript)
23Lung cancer deletions known tumor suppressors
and novel elements (213 cases, courtesy S. Powers)
24Estimates of utility
- Goal select the most promising 10 of the genome
to focus functional studies on. - Is pinning useful in this sense?
- A test how enriched is the top-scoring 10
quantile in known genetic elements implicated in
breast cancer? - We hit major known oncogenes, so can expect good
results. More formally, perform a database search
(top 10, 17q).
25Estimates of utility
Database Hits in region Hits in top 10
Atlas of Genetics and Cytogenetics in Oncology and Haematology 8 (annotated as amplified and/or overexpressed) 8 (p10-8)
NCBI map viewer 184 (hits on breast cancer) 64 (p210-16), likely overly conservative
NCBI map viewer 47 (genes implicated in breast cancer) 10 (p0.016), likely overly conservative
26Gene EnrichmentEpicenters are enriched in
(CCDS) genes compared to the genome and to the
copy number events because (a) epicenters bracket
genes and (b) genes are clustered.
organ, polarity of epis Gene count Enr. vs genome Enr. vs events Enr. vs gene brackets p genome p events
Breast amp 37 167 1.92 1.65 0.87 0.02 .054
Breast del 24 251 2.85 1.99 1.03 0.002 0.008
Lung amp 32 232 3.05 2.54 1.22 lt0.002 0.002
Lung del 37 425 2.30 1.63 1.00 0.002 0.028
Colon amp 23 231 2.33 1.82 0.98 0.006 0.038
Colon del 16 292 3.21 2.45 1.17 0.006 0.01
27Application predicting tissue of originRandom
forest classifier using joint sets of epicenters
as predictors
28Application early events in breast cancer
- Compute frequency weighted by inverse number of
events for contiguous groups of epicenters.
Outliers FISH-validated early 16p-1q
translocation. -
29Summary
- Pinning is a method for finding copy number
variation epicenters in (cancer) genomes. - Applied to a set of 257 FISH-validated breast
cancer genome profiles lung and colon cancer
sets. - The epicenters found by pinning are significantly
enriched in genes. - Epicenters find tissue of origin.
- Epicenters detect early lesions.
30ROMA-based Cancer Biology at CSHLMike Wigler,
Jim Hicks, Rob Lucito, Scott Powers, David Mu
FISH Primer Selection Program ProbesNicholas
Navin
ROMA Michael RiggsDiane EspositoJoan
AlexanderJen TrogeEvan Leibu
Bioinformatics Lakshmi Muthuswamy Boris
YamromAKVlad Grubor Yoon-Ha Lee
Tony LeottaJude Kendall
Deepa Pai Andy
Reiner John Healy
FISH (Karolinska)Susanne ManerPar Lundin
StatisticsXiaoyue Zhao Chris Yoon
FACS/Database Linda Rodgers
CollaboratorsAnders Zetterberg Karolinska
Inst.Anne-Lise Borressen-Dale Norway Radium
Hosp. Kenny Ye Albert Einstein Sch. Med.
Thea Tlsty UCSFLarry Norton - MSKCC