Gene Set Enrichment Analysis - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Gene Set Enrichment Analysis

Description:

Significance threshold after correcting for multiple ... ES = max deviation from 0 ... http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats ... – PowerPoint PPT presentation

Number of Views:599
Avg rating:3.0/5.0
Slides: 15
Provided by: lsi61
Category:
Tags: analysis | enrichment | gene | max | php | set

less

Transcript and Presenter's Notes

Title: Gene Set Enrichment Analysis


1
Gene Set Enrichment Analysis
  • Eugen Lounkine
  • BFB Workshop 2. / 3. April 2009

2
Problems with single gene analysis
  • Differential expression analysis of individual
    genes has four major limitations
  • Significance threshold after correcting for
    multiple hyphothesis testing might be too high
  • List of statistically significant genes might not
    be connected in a biologically meaningful way
  • Misses effects on pathways (many subtle
    expression changes can be more effective than a
    single drastic one)
  • Overlap in significant genes between studies is
    sometimes very small

3
Gene Set Enrichment Analysis
  • Calculates a score for the enrichment of a whole
    gene set rather than single genes
  • More reliable hit, because in biological context
  • Gene sets are knowledge-based
  • Part of the sets are manually curated
  • Chromosome locations
  • Coexpression
  • Pathways

4
GSEA methodology
  • Input
  • List of genes, sorted by correlation with one of
    two phenotypes (e.g. signal-to-noise ratio)
  • Collection(!) of gene sets
  • Calculation of Enrichment Score
  • Test for significance
  • Computes nominal p-Value
  • Adjustment for multiple hypotheses testing
  • Necessary if comparing multiple gene sets
  • Computes FDR (false discovery rate)

5
Input
6
GCT Files
7
Enrichment Score calculation
ES max deviation from 0
Gene set genes(hits)
Leading Edge Genes
Correlation
If hit add score weighted by correlation,
otherwise decrease by 1 / non-hits
8
Enrichment plots
ES 0.43
ES -0.45
Evenly distributed, low enrichment
9
Significance
  • Is the observed ES statistically significant or
    could it occur in at random?
  • Multiple iterations with permutations of the
    phenotype labels
  • Preserves gene correlation
  • ES score is calculated for each iteration
  • True score is compared to random distribution
  • p-Value probability that score was achieved at
    random

10
Nominal p-Values
p 0.024
Random ES distribution
Actual ES
11
False Discovery Rate
  • When testing multiple gene sets (which is usually
    the case), one has to
  • Normalize the enrichment scores (-gt NES)
  • Correct for multiple hypothesis testing
  • The FDR corresponds to the probability that a
    given NES value provides a false positive
    prediction
  • FDR rates lt 25 are considered interesting

12
Leading Edge Analysis
13
Leading Edge Analysis
  • Analyzes multiple gene sets for common genes
  • These genes are of special interest, because many
    of their sets are enriched
  • Thus, Leading Edge Analysis allows to go from
    gene sets back to individual genes that are of
    high interest

14
Links to GSEA
  • http//www.broad.mit.edu/gsea/
  • http//www.broad.mit.edu/cancer/software/gsea/wiki
    /index.php/Data_formats
  • http//www.broad.mit.edu/gsea/doc/GSEAUserGuideFra
    me.html
  • Original Paper http//dx.doi.org/10.1073/pnas.050
    6580102
Write a Comment
User Comments (0)
About PowerShow.com