Genome-wide Copy Number Analysis presentation

About This Presentation

Transcript and Presenter's Notes

Title: Genome-wide Copy Number Analysis

1
Genome-wide Copy Number Analysis

Qunyuan Zhang,Ph.D.
Division of Statistical Genomics
Department of Genetics Center for Genome
Sciences
Washington University School of Medicine
02 - 08 2006
Course M 21-621 Computational Statistical
Genetics

2
Four Questions

What is Copy Number ?
What can Copy Number tell us?
How to measure/quantify Copy Number?
How to analyze Copy Number?

3
What is Copy Number ?

Gene Copy Number
The gene copy number (also "copy number
variants" or CNVs) is the amount of copies of a
particular gene in the genotype of an individual.
Recent evidence shows that the gene copy number
can be elevated in cancer cells. For instance,
the EGFR copy number can be higher than normal in
Non-small cell lung cancer. Elevating the gene
copy number of a particular gene can increase the
expression of the protein that it encodes.
From Wikipedia www.wikipedia.org

DNA Copy Number
A Copy Number Variant (CNV) represents a copy
number change involving a DNA fragment that is 1
kilobases or larger.
From Nature Reviews Genetics, Feuk et al. 2006
DNA Copy Number ? DNA Tandem Repeat Number
(e.g. micro satellites)
lt10 bases
DNA Copy Number ? RNA Copy Number
RNA Copy Number Gene Expression Level
DNA transcription
mRNA
Copy Number is the amount of copies of a
particular fragment of nucleic acid molecular
chain. It refers to DNA Copy Number in most
publications.

5
What can Copy Number tell us?

Genetic Diversity/Polymorphisms
- restriction fragment length polymorphism (RFLP)
- amplified fragment length polymorphism (AFLP)
- random amplification of polymorphic DNA (RAPD)
- variable number of tandem repeat (VNTR e.g.,
mini- and microsatellite)
- single nucleotide polymorphism (SNP)
- presence/absence of transportable elements
- structural alterations (e.g., deletions,
duplications, inversions )
- DNA copy number variant (CNV)
Association with phenotypes/diseases
genes/genetic factors

6
Genetic Alterations in Tumor Cells (DNA
Copy Number Changes)
7
How to measure/quantify Copy Number?
8
Microarray From Image to Copy Number
9
How to Analyze Copy Number?

A Real Example

?
10

General Procedures for Copy Number Analysis

11
Background Adjustment/Correction
Reduces unevenness of a single chip Makes
intensities of different positions on a chip
comparable Before adjustment
After
adjustment
Corrected Intensity (S) Observed Intensity
(S) Background Intensity (B) For each region
i, B(i) Mean of the lowest 2 intensities in
region i
AffyMetrix MAS 5.0
12
(No Transcript)
13
Normalization
Reduces technical variation between chips Makes
intensities from different chips
comparable Before normalization
After normalization
14
(No Transcript)
15
Raw Copy Number Data
16
Individual Level Analysis

Analysis for each individual sample (or each
sample pair)
Significance test of CN amplification and
deletion
Boundary finding (smoothing and segmentation)
CN estimation

17
Intensities and Raw CNs, Chr. 1
(Piar101)Black Normal, Red Tumor,
Green Tumor- Normal
18
Significance Test for Copy Number Changes
-log(p) values, chr. 1, pair101
19
Genome-wide Raw CN Changes (Piar105)
20
Genome-wide Widow-based Test of CN Changes
(Piar105)
- Log (p)
21
SegmentationBioConductor R Packages
(www.bioconductor.org)GLAD package, adaptive
weights smoothing (AWS) methodDNAcopy package,
circular binary segmentation method
22
CN Estimation Hidden Markov Model (HMM)
CNAT(www.affymetrix.com) dChip (www.dchip.org)
CNAG (www.genome.umin.jp)
position
hidden status (unknown CN )
observed status (raw CN log ratio of
intensities)
CN estimation finding a sequence of CN values
which maximizes the likelihood of observed raw
CN. Algorithm Viterbi algorithm (can be
Iterative) Information/assumptions below are
needed Background probabilities Overall
probabilities of possible CN values. P(CNx)
x-2,-1,0,1,2,3,, n (usually,nlt10) Transition
probabilities Probabilities of CN values of each
SNP conditional on the previous one.
P(CN_i1xCN_iy) x-2,-1,0,1,2,3,, or n
y-2,-1,0,1,2,3, , or n Emission probabilities
Probabilities of observed raw CN values of each
SNP conditional on the hidden/unknown/true CN
status. P(log ratioltxCNy)f(xCNy) xone of
real numbers y-2,-1,0,1,2,3, , or n
23
HMM Estimation of CN for Chr. 1
(Piar101)Black Normal Intensities, Red
Tumor Intensities, Green Tumor- Normal Blue
HMM estimated CNs in Tumor Tissue
24
Population Level Analysis

Analysis for the whole group (or sub-group) of
samples
Overall significance test
Amplification and deletion frequencies
summarization
Common/concurrent region finding
Associations (with mutations, LOHs, clinical
variables )

25
Genome-wide Raw CN Changes(average over 400
pairs )
26
Raw CN Changes of Chr. 14(average over 400
pairs )
27
Sliding Window Analysis
28
Genome-wide Raw Copy Number Changes(sliding
window plot, averaged over 400 pairs )
29
Sliding Window Test of Significance of CN
Changes -log(p) values, based on 400 pairs
30
CN Change Frequencies in Population ( Chr.14,400
pairs)Black Freq.(CNgt0) Red Freq.(CNgt0,
significant amplification at 0.01 level) Green
Freq.(CNlt0, significant deletion at 0.01 level)
31
Population Level Segmentation Analysis (400
pairs)Circular Binary Segmentation approach,
Bioconductor Package DNAcopy
32
Segmentation of Chr. 14(average result of 400
pairs)
33
Visualization of Concurrent Regions of Chr.
14(400 pairs)
samples
positions
34
Group-specific AnalysisBlack non-smokers,
Red non-smokers
35
Separate Tumor Samples from Normal Samples Using
Six Chromosomal Peaks with Significant CN
Changes (Classification Based on RAW CN)
Tumor
Normal
36
(No Transcript)
37
Software

Affymetrix Chips (www.affymetrix.com)
Illumina Chips (www.illumina.com)
CNAT(www.affymetrix.com)
dChip (www.dchip.org)
CNAG (www.genome.umin.jp)
GenePattern www.broad.mit.edu/cancer/software/gen
epattern/
BioConductor R Packages (www.bioconductor.org)
GLAD package, adaptive weights smoothing (AWS)
method
DNAcopy package, circular binary segmentation
method
Widows ?
Unix ?
Parallel Computation ?

38
References

R Gentlemen et al. Bioinformatics and
computational biology solutions using R and
Bioconductor. Springer, 2005
JL Freeman et al. Genome Research 2006
16949-961
J Huang et al. Hum Genomics. 20041(4)287-99
X Zhao et al. Cancer Research 2004
643060-3071
Y Nannya et al. Cancer Research 2005, 65
6071-6079
see google

39
Acknowledgements

Aldi Kraja
Li Ding
Ingrid Borecki John Osborne
Michael Province
Ken Chen
Division of Statistical Genomics
Medical Sequencing Group
Center for Genome Sciences
Washington University School of Medicine

Write a Comment

User Comments (0)

About PowerShow.com

Genome-wide Copy Number Analysis PowerPoint PPT Presentation