Title: Genome-wide Copy Number Analysis
1Genome-wide Copy Number Analysis
- Qunyuan Zhang,Ph.D.
- Division of Statistical Genomics
- Department of Genetics Center for Genome
Sciences - Washington University School of Medicine
- 02 - 08 2006
- Course M 21-621 Computational Statistical
Genetics
2Four Questions
- What is Copy Number ?
- What can Copy Number tell us?
- How to measure/quantify Copy Number?
- How to analyze Copy Number?
3What is Copy Number ?
- Gene Copy Number
- The gene copy number (also "copy number
variants" or CNVs) is the amount of copies of a
particular gene in the genotype of an individual.
Recent evidence shows that the gene copy number
can be elevated in cancer cells. For instance,
the EGFR copy number can be higher than normal in
Non-small cell lung cancer. Elevating the gene
copy number of a particular gene can increase the
expression of the protein that it encodes. - From Wikipedia www.wikipedia.org
4- DNA Copy Number
- A Copy Number Variant (CNV) represents a copy
number change involving a DNA fragment that is 1
kilobases or larger. - From Nature Reviews Genetics, Feuk et al. 2006
- DNA Copy Number ? DNA Tandem Repeat Number
(e.g. micro satellites) -
lt10 bases - DNA Copy Number ? RNA Copy Number
- RNA Copy Number Gene Expression Level
- DNA transcription
mRNA - Copy Number is the amount of copies of a
particular fragment of nucleic acid molecular
chain. It refers to DNA Copy Number in most
publications.
5What can Copy Number tell us?
- Genetic Diversity/Polymorphisms
- - restriction fragment length polymorphism (RFLP)
- - amplified fragment length polymorphism (AFLP)
- - random amplification of polymorphic DNA (RAPD)
- - variable number of tandem repeat (VNTR e.g.,
mini- and microsatellite) - - single nucleotide polymorphism (SNP)
- - presence/absence of transportable elements
-
- - structural alterations (e.g., deletions,
duplications, inversions ) - - DNA copy number variant (CNV)
- Association with phenotypes/diseases
genes/genetic factors
6 Genetic Alterations in Tumor Cells (DNA
Copy Number Changes)
7How to measure/quantify Copy Number?
8 Microarray From Image to Copy Number
9How to Analyze Copy Number?
?
10- General Procedures for Copy Number Analysis
11 Background Adjustment/Correction
Reduces unevenness of a single chip Makes
intensities of different positions on a chip
comparable Before adjustment
After
adjustment
Corrected Intensity (S) Observed Intensity
(S) Background Intensity (B) For each region
i, B(i) Mean of the lowest 2 intensities in
region i
AffyMetrix MAS 5.0
12(No Transcript)
13 Normalization
Reduces technical variation between chips Makes
intensities from different chips
comparable Before normalization
After normalization
14(No Transcript)
15 Raw Copy Number Data
16 Individual Level Analysis
- Analysis for each individual sample (or each
sample pair) - Significance test of CN amplification and
deletion - Boundary finding (smoothing and segmentation)
- CN estimation
17Intensities and Raw CNs, Chr. 1
(Piar101)Black Normal, Red Tumor,
Green Tumor- Normal
18Significance Test for Copy Number Changes
-log(p) values, chr. 1, pair101
19Genome-wide Raw CN Changes (Piar105)
20Genome-wide Widow-based Test of CN Changes
(Piar105)
- Log (p)
21SegmentationBioConductor R Packages
(www.bioconductor.org)GLAD package, adaptive
weights smoothing (AWS) methodDNAcopy package,
circular binary segmentation method
22CN Estimation Hidden Markov Model (HMM)
CNAT(www.affymetrix.com) dChip (www.dchip.org)
CNAG (www.genome.umin.jp)
position
hidden status (unknown CN )
observed status (raw CN log ratio of
intensities)
CN estimation finding a sequence of CN values
which maximizes the likelihood of observed raw
CN. Algorithm Viterbi algorithm (can be
Iterative) Information/assumptions below are
needed Background probabilities Overall
probabilities of possible CN values. P(CNx)
x-2,-1,0,1,2,3,, n (usually,nlt10) Transition
probabilities Probabilities of CN values of each
SNP conditional on the previous one.
P(CN_i1xCN_iy) x-2,-1,0,1,2,3,, or n
y-2,-1,0,1,2,3, , or n Emission probabilities
Probabilities of observed raw CN values of each
SNP conditional on the hidden/unknown/true CN
status. P(log ratioltxCNy)f(xCNy) xone of
real numbers y-2,-1,0,1,2,3, , or n
23HMM Estimation of CN for Chr. 1
(Piar101)Black Normal Intensities, Red
Tumor Intensities, Green Tumor- Normal Blue
HMM estimated CNs in Tumor Tissue
24 Population Level Analysis
- Analysis for the whole group (or sub-group) of
samples - Overall significance test
- Amplification and deletion frequencies
summarization - Common/concurrent region finding
- Associations (with mutations, LOHs, clinical
variables )
25Genome-wide Raw CN Changes(average over 400
pairs )
26Raw CN Changes of Chr. 14(average over 400
pairs )
27Sliding Window Analysis
28Genome-wide Raw Copy Number Changes(sliding
window plot, averaged over 400 pairs )
29Sliding Window Test of Significance of CN
Changes -log(p) values, based on 400 pairs
30CN Change Frequencies in Population ( Chr.14,400
pairs)Black Freq.(CNgt0) Red Freq.(CNgt0,
significant amplification at 0.01 level) Green
Freq.(CNlt0, significant deletion at 0.01 level)
31Population Level Segmentation Analysis (400
pairs)Circular Binary Segmentation approach,
Bioconductor Package DNAcopy
32Segmentation of Chr. 14(average result of 400
pairs)
33Visualization of Concurrent Regions of Chr.
14(400 pairs)
samples
positions
34Group-specific AnalysisBlack non-smokers,
Red non-smokers
35Separate Tumor Samples from Normal Samples Using
Six Chromosomal Peaks with Significant CN
Changes (Classification Based on RAW CN)
Tumor
Normal
36(No Transcript)
37Software
- Affymetrix Chips (www.affymetrix.com)
- Illumina Chips (www.illumina.com)
- CNAT(www.affymetrix.com)
- dChip (www.dchip.org)
- CNAG (www.genome.umin.jp)
- GenePattern www.broad.mit.edu/cancer/software/gen
epattern/ - BioConductor R Packages (www.bioconductor.org)
- GLAD package, adaptive weights smoothing (AWS)
method - DNAcopy package, circular binary segmentation
method - Widows ?
- Unix ?
- Parallel Computation ?
38References
- R Gentlemen et al. Bioinformatics and
computational biology solutions using R and
Bioconductor. Springer, 2005 - JL Freeman et al. Genome Research 2006
16949-961 - J Huang et al. Hum Genomics. 20041(4)287-99
- X Zhao et al. Cancer Research 2004
643060-3071 - Y Nannya et al. Cancer Research 2005, 65
6071-6079 - see google
39Acknowledgements
-
- Aldi Kraja
Li Ding - Ingrid Borecki John Osborne
- Michael Province
Ken Chen - Division of Statistical Genomics
Medical Sequencing Group - Center for Genome Sciences
- Washington University School of Medicine