Title: Analysis of Exon Arrays
1Analysis of Exon Arrays
- Slides provided by Dr. Yi Xing
2Outline
- Design of exon arrays
- Background correction
- Probe selection, expression index computation
- Evaluation of gene level index
- Exon level analysis
- Conclusion
31. Basic design of Exon Array
4Exon Array Probesets Classified by Annotational
Confidence
- Core probesets target exons supported by RefSeq
mRNAs. -
- Extended probesets target exons supported by ESTs
or partial mRNAs. - Full probesets target exons supported purely by
computational predictions.
52. Background modeling predict non-specific
hybridization from probe sequence
- Wu and Irizarry (2005) use probe effect modeling
to obtain more accurate expression index on 3
arrays - Johnson et al (2006) use probe effect modeling
to detect ChIP peaks for Tiling arrays - Kapur et al (2007) use probe effect modeling to
correct background for Exon array
6Background modeling in Exon Arrays
- logBi aniT ? ßjk Iijk ? ?k nik2 ei
- Estimate parameters from either
- Background probes (n 37,687)
- Full probes (n 400,000)
- test on a different array (with single scaling
constant)
- Full probes useful for modeling background
7Promoter array may be used to train exon array
background
8Preliminary conclusions
- Background correction based on background probe
effect modeling can greatly reduce background
noise - Model parameters are similar for different
ChIP-DNA samples, or for different RNA samples,
but not across DNA and RNA. - The data may be rich enough to support learning
of more complex models with even better
predictive power.
93. Probe selection and expression index
computation
10Gene-level visualization Heatmap of Intensities
major histocompatibility complex, class II, DM
beta
Probes
Core probes
Samples
11Heatmap of Pairwise Correlations
HLA_DMB
Probes
Probes
12First observations
- Heapmap of correlations is a useful complement to
heatmap of intensities - Core probes have higher intensity than extended
and full probes
13Probe selection for gene-level expression
- Most full and extended probes are not suitable
for estimating gene-level expression - Probes may target false exon predictions
- Even some core probes may not be suitable
- Bad probes with low affinity, or cross-hybridize
- Probes targeting differentially spliced exons
- Probe selection
- Selecting a suitably large subset of good probes
targeting constitutively spliced regions of the
gene - Use only to selected probes to estimate gene
expression
14Heatmap of CD44 core probes (Ordered By Genomic
Locations)
_____________ ________________________
_____________ constitutive
alternatively spliced constitutive
15ataxin 2-binding protein 1Â
16These examples motivated our Probe Selection
Strategy
- Probe selection procedure (on core probes)
- Hierarchical clustering of the probe intensities
across 11 tissues (33 samples), and cut the tree
at various heights (0.1,0.2,1.0). - Choose a height cutoff to strike a balance
between the size of the largest sub-group and the
correlation within the sub-group. - Iteratively remove probes if they do not
correlate well with current expression index - At least 11 core probes need to be chosen.
- If the total number of core probes is less than
11 for the entire transcript cluster, we skip
probe selection.
(Xing Y, Kapur K, Wong WH. PLoS ONE. 2006
201e88)
17Hierarchical Clustering of CD44 Core Probes
(distance1-corr, average linkage)
h0.1 44 (42) probes
18Computation of gene level expression index
Background correction
Normalization
(linear scaling or none)
Probe selection
Computation of Overall Gene Expression Indexes
(dChip type model)
optional
Gene level quantile normalization
GeneBASE Gene-level Background Adjusted Selected
probe Expression Download http//biogibbs.stanfor
d.edu/kkapur/GeneBASE/ Xing, Kapur, Wong, PLoS
ONE, 1e88, 2006 Kapur, Xing, Wong, Genome
Biology, 8R82, 2007
19In most cases selection does not affect fold
changes
20Sometimes, selections change fold-change
significantly
spectrin, beta, non-erythrocytic 4 (SPTBN4)
BetaIV spectrins are essential for membrane
stability and the molecular organization of nodes
of Ranvier along neuronal axons
214. Evaluations of gene level index
221st evaluation tissue fold change
Fold-change of liver over muscle, in 438 genes
with high fold-change in 3 expression array data
After selection
Before selection
23Probe selection allows more sensitive detection
of fold-changes
Zoom-in
After selection
Before selection
24FC of muscle over liver, in 500 genes detected to
be overexpressed in muscle over liver by 3 array
After selection
Before selection
25FC of muscle over liver
Zoom-in
After selection
Before selection
262nd evaluation Presence/Absence calls
- Use SAGE data to construct gold-standard
- Presence in tissue if 100 tags per million
- Absence if no tags in given tissue but gt100 tpm
in at least another tissue - Exon array A/P calls use sum of z-scores for
core probes (z-score is computed based on
background model)
27(a)
(c)
Cerebellum
Kidney
(b)
Heart
ROC curves shows that background correction
improves A/P calls. Red Exon, Z-score
call Blue Exon Affy call Brown 3 Affy call,
max probeset Purple 3 Affy call, min probe set
283rd evaluation Cross-species conservation
- 3 and Exon array data for six adult tissues in
both human and mouse - Expression computed for about 10,000 pairs of
human-mouse ortholog pairs
29Similarity of gene expression profiles in six
human tissues and six corresponding mouse
tissues. For each ortholog pair we calculated
the Pearson correlation coefficient (PCC) of
expression indexes across six tissues (solid
line). We also permutated ortholog relationships
and calculated the PCC for random human-mouse
gene pairs (dashed line).
3 arrays
Exon arrays
(Xing Y, Ouyang Z, Kapur K, Scott MP, Wong WH.
Mol Biol Evol. April 2007)
303 arrays scatter plot
Exon arrays scatter plot
Exon arrays also reveal conservation of absolute
abundance of transcripts in individual tissues!
3 arrays correlations
Exon arrays correlations
314th evaluation q-PCR
On log scale, exon array fold change estimate is
correlated with qPCR fold change (corr 0.9)
325. Issues in exon level analysis
33Challenges
- The experimental validation rate in several
published exon array studies are highly variable.
- Gardina et al. BMC Genomics 7325, 21
- Kwan et al. Genome Res 171210, 45
- Hung et al. RNA 14284, 22-56
- Clark et al. Genome Biol 8R64, 84.
- Most exons are targeted by no more than four
probes. No probes for splice junctions. - Noise in observed probe intensities (due to
background, cross-hybridization) can make the
inferred splicing pattern unreliable.
34MADS Microarray Analysis of Differential Splicing
1. Correction for background (non-specific
hybridization)
1. Kapur, Xing, Wong, Genome Biology, 8R82,
2007 2. Xing, Kapur, Wong WH. PLoS ONE. 2006
201e88 3. Xing et.al., 2008, RNA, 2008, 14(8)
1470-1479
35Splicing Index Corrected Probe
IntensityEstimated Gene Expression Level
36Analysis of gold-standard alternative splicing
data via PTB knockdown experiments
- Our gold-standard - a list of exons with
pre-determined inclusion/exclusion profiles in
response to PTB depletion (Boutz P, et.al. Genes
Dev. 2007, 21(13)1636-52.) - We used shRNA to knock-down PTB, generated Exon
array data, and analyzed data on gold-standard
exons. - MADS detected all exons with large changes
(gt25) in transcript inclusion levels, and
offered improvement over Affymetrixs analysis
procedure.
Collaboration with Douglas Black (UCLA)
Boutz P, et.al. Genes Dev. 2007, 21(13)1636-52.
37MADS sensitivity correlates with the magnitude of
change in exon inclusion levels of gold-standard
exons
Xing et.al., 2008, RNA, 2008, 14(8) 1470-1479
38Exon array detection of novel PTB-dependent
splicing events
control
shRNA knockdown of splicing repressor PTB
39Detection of alternative 3-UTR and Poly-A sites
of Ncam1
30 differentially spliced exons were tested 27
were validated. Validation rate 27/3090
40Cross-Hybridization
- Probes are designed to hybridize to their target
transcripts - Often probes have 0,1,2,3 base pair mismatches to
non-target transcripts - Cross-hyb seriously complicates exon-level
analysis.
41Mapping mismatches to probes
- 6,000,000 probes
- Each 25bp long
- 3,000,000,000bp genome sequence
- For 1-bp mismatch, a naïve search needs O(6M x 3G
x 25) years of CPU time - Fast matching algorithm (by Hui Jiang) makes this
feasible in hours
42Distribution of Number of Cross-hyb Transcripts
Full Probes
Core Probes
43Correction of sequence-specific
cross-hybridization to off-target transcripts
44Conclusion
- Gene level index is accurate and reflects
absolute abundance - We show that sequence-specific modeling of
microarray noise (background and
cross-hybridization) improves the precision of
exon-level analysis of exon array data. - Overall, our data demonstrate that exon array
design is an effective approach to study gene
expression and differential splicing. - Development of future probe rich exon arrays,
with increased probe density on exons and
inclusion of splice junction probes, will offer
more powerful tools for global or targeted
analysis of alternative splicing.