Genome-wide Copy Number Analysis - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Genome-wide Copy Number Analysis

Description:

Genomewide Copy Number Analysis – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 40
Provided by: Qunyua6
Category:
Tags: analysis | copy | duer | genome | number | wide

less

Transcript and Presenter's Notes

Title: Genome-wide Copy Number Analysis


1
Genome-wide Copy Number Analysis
  • Qunyuan Zhang,Ph.D.
  • Division of Statistical Genomics
  • Department of Genetics Center for Genome
    Sciences
  • Washington University School of Medicine
  • 02 - 08 2006
  • Course M 21-621 Computational Statistical
    Genetics

2
Four Questions
  • What is Copy Number ?
  • What can Copy Number tell us?
  • How to measure/quantify Copy Number?
  • How to analyze Copy Number?

3
What is Copy Number ?
  • Gene Copy Number
  • The gene copy number (also "copy number
    variants" or CNVs) is the amount of copies of a
    particular gene in the genotype of an individual.
    Recent evidence shows that the gene copy number
    can be elevated in cancer cells. For instance,
    the EGFR copy number can be higher than normal in
    Non-small cell lung cancer. Elevating the gene
    copy number of a particular gene can increase the
    expression of the protein that it encodes.
  • From Wikipedia www.wikipedia.org

4
  • DNA Copy Number
  • A Copy Number Variant (CNV) represents a copy
    number change involving a DNA fragment that is 1
    kilobases or larger.
  • From Nature Reviews Genetics, Feuk et al. 2006
  • DNA Copy Number ? DNA Tandem Repeat Number
    (e.g. micro satellites)

  • lt10 bases
  • DNA Copy Number ? RNA Copy Number
  • RNA Copy Number Gene Expression Level
  • DNA transcription
    mRNA
  • Copy Number is the amount of copies of a
    particular fragment of nucleic acid molecular
    chain. It refers to DNA Copy Number in most
    publications.

5
What can Copy Number tell us?
  • Genetic Diversity/Polymorphisms
  • - restriction fragment length polymorphism (RFLP)
  • - amplified fragment length polymorphism (AFLP)
  • - random amplification of polymorphic DNA (RAPD)
  • - variable number of tandem repeat (VNTR e.g.,
    mini- and microsatellite)
  • - single nucleotide polymorphism (SNP)
  • - presence/absence of transportable elements
  • - structural alterations (e.g., deletions,
    duplications, inversions )
  • - DNA copy number variant (CNV)
  • Association with phenotypes/diseases
    genes/genetic factors

6
Genetic Alterations in Tumor Cells (DNA
Copy Number Changes)
7
How to measure/quantify Copy Number?
8
Microarray From Image to Copy Number
9
How to Analyze Copy Number?
  • A Real Example

?
10
  • General Procedures for Copy Number Analysis

11
Background Adjustment/Correction
Reduces unevenness of a single chip Makes
intensities of different positions on a chip
comparable Before adjustment
After
adjustment
Corrected Intensity (S) Observed Intensity
(S) Background Intensity (B) For each region
i, B(i) Mean of the lowest 2 intensities in
region i
AffyMetrix MAS 5.0
12
(No Transcript)
13
Normalization
Reduces technical variation between chips Makes
intensities from different chips
comparable Before normalization
After normalization
14
(No Transcript)
15
Raw Copy Number Data
16
Individual Level Analysis
  • Analysis for each individual sample (or each
    sample pair)
  • Significance test of CN amplification and
    deletion
  • Boundary finding (smoothing and segmentation)
  • CN estimation

17
Intensities and Raw CNs, Chr. 1
(Piar101)Black Normal, Red Tumor,
Green Tumor- Normal
18
Significance Test for Copy Number Changes
-log(p) values, chr. 1, pair101
19
Genome-wide Raw CN Changes (Piar105)
20
Genome-wide Widow-based Test of CN Changes
(Piar105)
- Log (p)
21
SegmentationBioConductor R Packages
(www.bioconductor.org)GLAD package, adaptive
weights smoothing (AWS) methodDNAcopy package,
circular binary segmentation method
22
CN Estimation Hidden Markov Model (HMM)
CNAT(www.affymetrix.com) dChip (www.dchip.org)
CNAG (www.genome.umin.jp)
position
hidden status (unknown CN )
observed status (raw CN log ratio of
intensities)
CN estimation finding a sequence of CN values
which maximizes the likelihood of observed raw
CN. Algorithm Viterbi algorithm (can be
Iterative) Information/assumptions below are
needed Background probabilities Overall
probabilities of possible CN values. P(CNx)
x-2,-1,0,1,2,3,, n (usually,nlt10) Transition
probabilities Probabilities of CN values of each
SNP conditional on the previous one.
P(CN_i1xCN_iy) x-2,-1,0,1,2,3,, or n
y-2,-1,0,1,2,3, , or n Emission probabilities
Probabilities of observed raw CN values of each
SNP conditional on the hidden/unknown/true CN
status. P(log ratioltxCNy)f(xCNy) xone of
real numbers y-2,-1,0,1,2,3, , or n
23
HMM Estimation of CN for Chr. 1
(Piar101)Black Normal Intensities, Red
Tumor Intensities, Green Tumor- Normal Blue
HMM estimated CNs in Tumor Tissue
24
Population Level Analysis
  • Analysis for the whole group (or sub-group) of
    samples
  • Overall significance test
  • Amplification and deletion frequencies
    summarization
  • Common/concurrent region finding
  • Associations (with mutations, LOHs, clinical
    variables )

25
Genome-wide Raw CN Changes(average over 400
pairs )
26
Raw CN Changes of Chr. 14(average over 400
pairs )
27
Sliding Window Analysis
28
Genome-wide Raw Copy Number Changes(sliding
window plot, averaged over 400 pairs )
29
Sliding Window Test of Significance of CN
Changes -log(p) values, based on 400 pairs
30
CN Change Frequencies in Population ( Chr.14,400
pairs)Black Freq.(CNgt0) Red Freq.(CNgt0,
significant amplification at 0.01 level) Green
Freq.(CNlt0, significant deletion at 0.01 level)
31
Population Level Segmentation Analysis (400
pairs)Circular Binary Segmentation approach,
Bioconductor Package DNAcopy
32
Segmentation of Chr. 14(average result of 400
pairs)
33
Visualization of Concurrent Regions of Chr.
14(400 pairs)
samples
positions
34
Group-specific AnalysisBlack non-smokers,
Red non-smokers
35
Separate Tumor Samples from Normal Samples Using
Six Chromosomal Peaks with Significant CN
Changes (Classification Based on RAW CN)
Tumor
Normal
36
(No Transcript)
37
Software
  • Affymetrix Chips (www.affymetrix.com)
  • Illumina Chips (www.illumina.com)
  • CNAT(www.affymetrix.com)
  • dChip (www.dchip.org)
  • CNAG (www.genome.umin.jp)
  • GenePattern www.broad.mit.edu/cancer/software/gen
    epattern/
  • BioConductor R Packages (www.bioconductor.org)
  • GLAD package, adaptive weights smoothing (AWS)
    method
  • DNAcopy package, circular binary segmentation
    method
  • Widows ?
  • Unix ?
  • Parallel Computation ?

38
References
  • R Gentlemen et al. Bioinformatics and
    computational biology solutions using R and
    Bioconductor. Springer, 2005
  • JL Freeman et al. Genome Research 2006
    16949-961
  • J Huang et al. Hum Genomics. 20041(4)287-99
  • X Zhao et al. Cancer Research 2004
    643060-3071
  • Y Nannya et al. Cancer Research 2005, 65
    6071-6079
  • see google

39
Acknowledgements
  • Aldi Kraja
    Li Ding
  • Ingrid Borecki John Osborne
  • Michael Province
    Ken Chen
  • Division of Statistical Genomics
    Medical Sequencing Group
  • Center for Genome Sciences
  • Washington University School of Medicine
Write a Comment
User Comments (0)
About PowerShow.com