Gene Level Expression Profiling Using Affymetrix Exon Arrays - PowerPoint PPT Presentation

About This Presentation
Title:

Gene Level Expression Profiling Using Affymetrix Exon Arrays

Description:

Title: Slide 1 Author: Affymetrix, Inc. Last modified by: Alan Williams Created Date: 9/27/2004 12:57:55 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 28
Provided by: Affy5
Category:

less

Transcript and Presenter's Notes

Title: Gene Level Expression Profiling Using Affymetrix Exon Arrays


1
Gene Level Expression Profiling Using Affymetrix
Exon Arrays
  • Alan Williams, Ph.D.Director Chip
    DesignAffymetrix, Inc.

2
Exon Array Design StrategyGeneChip Human Exon
1.0 ST
  • All content is projected onto the genome
  • Content has hard edges and soft edges
  • Hard edges partition regions into multiple probe
    selection regions
  • Soft edges infer a probe selection region, but
    can be extended into a larger region by other
    content
  • Hard Edges
  • Internal splice site boundaries
  • PolyA sites
  • CDS Start and Stop Positions
  • Soft Edges
  • Transcript start and stop positions (except when
    there is evidence of a PolyA site)
  • Internal splice site boundaries for aligned cDNAs
    when there are unaligned cDNA bases
  • All splice site boundaries from syntenic cDNA
    content
  • Introducing some new concepts
  • Probe Selection Region (PSR)
  • Exon cluster
  • Transcript cluster (gene locus)

3
Probe Coverage Exon vs 3 Array Gene Coverage
RefSeq
HG-U133 2.0 Plus
Human Genome 1.0 ST
4
Content Sources GeneChip Human Exon 1.0 ST
  • Core Gene Annotations
  • RefSeq alignments
  • GenBank annotated full length alignments
  • Extended Gene Annotations
  • cDNA alignments
  • Ensembl annotations (Hubbard, T. et al.)
  • Mapped syntenic mRNA from rat and mouse
  • microRNA annotations
  • MitoMAP annotations
  • Vegagene (The HAVANA group, Hillier et al.,
    Heilig et al.)
  • VegaPseudogene (The HAVANA group, Hillier et al.,
    Heilig et al.)
  • Full Gene Annotations
  • Geneid (Grup de Recerca en Informàtica Biomèdica)
  • Genscan (Burge, C. et al.)
  • GenscanSubopt (Burge, C. et al.)
  • Exoniphy (Siepel et al.)
  • RNAgene (Sean Eddy Lab)
  • SgpGene (Grup de Recerca en Informàtica
    Biomèdica)
  • Twinscan (Korf, I. et al.)

5
Probes per RefSeq Transcript
gt 10 Probes 19849 98.40
gt 20 Probes 18541 91.9
gt 30 Probes 15645 77.60
gt 40 Probes 12789 63.40
gt 50 Probes 9868 48.90
HG-U133 Plus 2.0
6
Gene Level Summaries
  • With exon arrays we can combine exon-level
    probesets to obtain better gene-level estimates.
  • More probes for greater sensitivity
  • Gene level signal estimates based on expression
    throughout the locus rather than a single point
  • Simplified bioinformatics
  • More flexibility in restructuring probe groupings
    based on expert knowledge
  • There is a variety of well established tools
    (including R/BioConductor) and methods for
    secondary analysis of gene level array data
  • Challenge
  • Non-constitutive exons
  • Discovery/Speculative content

7
Gene Level Analysis on Exon Arrays
  • Sketch Normalization (Quantile-like)
  • PM-GCBG
  • IterPLIER
  • using Extended Meta Probeset File groupings
  • Users may want to do post summarization
    operations
  • Normalization
  • Log transform
  • Variance stabilization by adding positive bias
    (ie PLIER16)

8
Different Meta Probeset Lists
Core-Constitutive
9
IterPLIER
  • Start by generating PLIER signal estimate using
    all the probes
  • Pick 22 probes which are best correlated to the
    PLIER signal
  • Run PLIER on just the 22 probes
  • Pick 11 probes which are best correlated to the
    PLIER signal
  • Generate a final PLIER estimate with the 11
    probes
  • Corollary
  • If the meta probeset has 11 or fewer probes, then
    only 1 run of PLIER is performed and the result
    is equal to a regular PLIER result
  • If the meta probeset has more than 11 but 22 or
    fewer probes, then PLIER is run twice once on
    the full set of probes and once on the best 11

10
Correlation of Different Gene Level Estimates
11
Adding Low-signal Decoys
Correlation with original estimates as Genscan
Subopt probesets are added. (996 loci with 4-11
probesets)
Correlation with original estimates as mRNA
probesets are added. (996 loci with 4-11
probesets)
Iterative PLIER
Regular PLIER
12
Gene Level Performance
  • HuEx 1.0 ST vs HG-U133 Plus 2.0

13
Platform Concordance Probe Set Pairs vs.
Correlation Coefficient (1-way ANOVA p lt 10-8)
60 of matched probe sets have correlation 0.8
14
High Correlation GLYAT r0.9902
Log2(sig16)
15
Moderate Correlation TSN r0.6575
16
Poor Correlation SREBF1 r0.0482
17
Platform Gene Level Sensitivity
Human Exon 1.0 ST (23 overall)
Significant Probesets
HG-U133 Plus 2.0 (21 overall)
Exons
18
One Array, Two functions
  • Gene Level Expression and Transcript Diversity

19
TPM2
Heart
Muscle
20
Data Courtesy of Millennium
21
Data Courtesy of Millennium
22
Splicing Index defined
23
Splicing Index Examples
24
Alternative Splicing Detection
  • PAttern basedCorrelation (PAC)
  • Test whether exonscorrelate with eachother
  • ANOVA based(MiDAS)
  • Test a log-linearmodel
  • For more information see the Alternative
    Transcript Analysis Methods for Exon Arrays
    whitepaper
  • http//www.affymetrix.com/support/technical/whitep
    apers/exon_alt_transcript_analysis_whitepaper.pdf

ei,j,k exon signal for ith probeset, k tissue,
j gene gi,k gene signal for k tissue and j
gene ai,k log coupling for exon and gene signals
25
ROC Curves
  • PAC method not suitable for a two group data set
  • No filter on input data
  • Synthetic Data
  • Tissues mix exons across genes
  • Cancer mix in low expression exons

26
Alternative Splicing DetectionActive Area of
Research
  • Exon Array Workshop
  • 45 attendees
  • 11 presentations
  • New alternative splicing algorithms
  • New confidence in using Exon Arrays for
    Gene-Level expression profiling
  • New directions for filtering data for more robust
    results
  • http//www.affymetrix.com/corporate/events/2006_ex
    on_tiling_workshop.affx

27
Resources
  • Human, Mouse, Rat array content and annotation
    information
  • Array Support Page on Affymetrix.com
  • Various Analysis Whitepapers
  • Array Support Page on Affymetrix.com
  • Sample Data Sets
  • Sample Data section under Support
  • Colon cancer data set with 10 paired samples
  • Tissue data set
  • 11 tissues in triplicate
  • 4 different mixture levels for 3 tissues
  • Includes HG-U133 Plus 2.0 and Human Exon 1.0 ST
  • Analysis Software
  • Affymetrix Power Tools (APT)
  • ExACT
Write a Comment
User Comments (0)
About PowerShow.com