Notes and statistics on base level expression - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Notes and statistics on base level expression

Description:

Assessing the need for sequence-based normalization in tiling microarray experiments. ... or other sequence adjustments of expression, unless gene structure ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 23
Provided by: dong167
Category:

less

Transcript and Presenter's Notes

Title: Notes and statistics on base level expression


1
Notes and statistics on base level expression
May 2009
Don Gilbert
Biology Dept., Indiana University gilbertd_at_indiana
.edu
2
2007 Tile expression
DrosMel tiled by Affymetrix, finds new genes
(blue) and known (orange)
.
3
Precision improves 06-09
Measuring expression over gene structures,
Nimblegen (08) has higher precision than Affy
(06/07) RNA-Seq (09) has higher precision than
Nimblegen
.
4

microarray statistics for base level expression?
5
Gene or Base expression?
  • Base-level expression (tiles, rna-seq) calculate
    like gene differential expression (DE)
  • Per tile, per RNA-seq contig or per base
    treatment - control
  • Combine for tiles over gene
  • Independent (technically) observations, but
    biologically related
  • Increase DF, Power with longer gene
  • How to combine?
  • As independent replicates gene gt (tiles,
    technical, bio replicates)?
  • As nested block gene gt tiles gt replicates ?
  • As gene average gene mean(tiles) gt replicates
    ?
  • Compare with gene-level stats


6
Gene or Base expression?
Base level tests find expression better than gene
average
Base level sensitivity 42, Gene level
sensitivity 38 Both have specificity 37
Sensitivity 1 - false
rejection Specificity 1 - false discovery
7
Gene or Base expression?
DE is consistent over gene span though expression
Ave changes gene-level measure can miss this.
Expression over gene span, treatment(red) vs
control(green) with 3 replicates
8

gene structures expression
9
Sequence normalizing?
Idea is to remove sequence (GC) effects on probe
hyb. score
TileScope Royce TE, Rozowsky JS, and Gerstein,
MB. (2007). Assessing the need for
sequence-based normalization in tiling microarray
experiments. Bioinformatics, 23, 988-997.
10
Sequence normalizing?
Sequence-normalizing also removes Exon/Intron
signal !
Dont use it (TileScopes quantilenorm) .. or
other sequence adjustments of expression, unless
gene structure signals are included.
11
Intron-Exon Detection
Nimblegen and Solexa tile/base expression detects
gene structure, on average, fairly well.
12
Intron-Exon Update
Newest RNA-Seq finds intron/exon very
well (Stranded RNA-Seq, modEncode Gingeras lab,
March 2009 )
13
Differential expression
Gene end (3) has more expression, but
Example genes
exons
introns
constant differential over gene span, on average.
Green is treatment, red control. Line style
shows 3 replicates of Daphnia tiled expression.
14
Diff. Expr. distributions
Genes
Introns
TARs
Introns show a null DE distribution, genes and
TAR regions are wider. Use introns as baseline
for statistics?
Pred
Metal
Sex
15

multiple testing corrections
16
Multiple statistic tests
  • Problem perform 20,000 tests and p-values hit
    laws of chance. Pr 0.05 can happen 1,000 times
    by chance (false discovery, FDR).
  • DrosMel Affy line t-tests 2,284,383 / 5,395,023
    0.42 Sig
  • Bonferroni conservative 0.03 Sig
  • Benjamini Hochberg p.adjust(p,BH) 0.35 Sig
  • qvalue(p) distribution based 0.41 Sig
  • Storey, JD and R Tibshirani, 2003. Statistical
    significance for genomewide studies. PNAS
    1009440-9445
  • SAM permutation qvalue
  • However, p.adjust meant for 100s of tests, not
    Millions
  • Drosmel modEncode case 1900 pairwise Affy cell
    line (62 cells) DE comparisons x 14,000 genes
    26,600,000 t-tests

17
Multiple DE tests Daphnia
  • Much different corrections for experiments on
    same genes
  • Daphnia DE 3 expt.s (trt - con), 25000 genes, 3
    replicates
  • Predate, Metal genes have low expression,
    important to detect

18
Multiple statistic tests
  • Statisticians have turned p-value corrections
    into an industry, but they are really more of a
    band-aid than a solution
  • What about false rejection (FRR type II error)?
  • Balance errors, false rejection maybe more
    important
  • Solution 1 test fewer, directed hypotheses
  • Solution 2 measure error rate on knowns, eg.
    prediction of known genes
  • Solution 3 known null hypothesis, eg. introns

http//www.bioconductor.org/workshops/2009/Seattl
eApr09/DiffExpr/
19
1900 pairwise Affy cell line DE comparisons x
14,000 genes 26,600,000 t-tests
20
Hypotheses of interest are fewer 100s cells x
14,000 genes 2 Million tests
21
Summary
  • Base-level expression (tiles, rna-seq) measures
    gene expression better
  • Balances sensitivity (false rejection) with
    specificity (false discovery)
  • Base-level expression measures gene structures
    well
  • On average, and precision is improving for
    individual genes.
  • Multiple test corrections are needed but
    problematic
  • False discovery corrections for millions of tests
    leads to false rejections.
  • Determine empirical error rates where possible

22
End note
  • Summary pages
  • wfleabase.org/genome-summaries/tile-expression/
  • insects.eugenes.org/species/data/dmel5/modencode/
  • Genome expression maps
  • insects.eugenes.org8091/gbrowse/cgi-bin/gbrowse/d
    rosmelme/
  • expression in 52 cell lines (affy) and more
    precise solexa nimblegen for a few cell lines
  • insects.eugenes.org8091/gbrowse/cgi-bin/gbrowse/d
    aphnia_pulex8/
  • expression among 4 treatment groups (sex, metal
    stress, biotic predator) nimblegen

23
Differential expression
Gene models miss much expression
Known sex genes capture DE, but unknown regions
capture environmental stress expression, in
Daphnia.
Write a Comment
User Comments (0)
About PowerShow.com