Hotspot algorithm - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Hotspot algorithm

Description:

The standard deviation for the expected number of tags in the smaller window is ... FDR Calculations Using Random Tags. FDR(z-score = T) = # of random peaks with z =T ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 14
Provided by: robertt67
Category:

less

Transcript and Presenter's Notes

Title: Hotspot algorithm


1
Hotspot algorithm
Idea gauge enrichment of tags relative to a
local background model based on the number of
tags in a 50kb surrounding window.
chr5131,975,056-132,012,092
2
Hotspot algorithm
Enrichment is measured as a z-score based on the
binomial distribution null model.
n tags
250 bp
50kb
N tags
Each tag in the large window is considered an
experiment, with probability of success
(landing in the smaller window)
(adjusted for uniquely mapping bases)
Given N tags in the large window, expected number
of tags in smaller window is
3
Hotspot algorithm
The standard deviation for the expected number of
tags in the smaller window is
And the z-score for the observed number of tags
in the smaller window is
4
Hotspot algorithm
hotspot
  • Each tag gets a z-score for the 250bp and 50kb
    windows centered on it.
  • A hotspot is a succession of tags within a 250bp
    window, each of whose z-score is greater than 2.
  • The hotspot is scored with the z-score for the
    250bp window centered on those tags.

5
Examples of different kinds of hotspots
  • Monsters
  • Noisy regions

6
Shadowed hotspots
Problem regions of very high enrichment can
inflate the background for neighboring regions,
deflating z-scores
Same as above, rescaled
These would be highly significant in isolation,
but are missed due to shadowing by the monster.
chr1604,351-609,350
7
Shadowed hotspots
Solution implement a two-pass hotspot detection
scheme.
  • Run first pass of hotspot detection
  • Delete all tags falling in the first-pass
    hotspots
  • Compute new hotspots with deleted background
  • Combine hotspots from first and second passes,
    and re-score all using the deleted background
    all 50kb windows will only include tags from
    deleted background.

8
Hotspots are robust to regions of duplication
Called peaks (height z-score)
Disparate peak heights, but comparable z-scores
9
Random Tags
As a null model for doing FDR calculations, we
generate tags uniformly over the uniquely
mappable (for 27-mers) bases of the genome. We
use the same number of tags for observed and
random data.
Observed tags
Observed hotspots
Random tags
Random hotspots
The random data still coalesce into hotspots.
10
Properties of Random Tags
  • Still lots of hotspots!
  • 146,752 in random data with same number of tags
    as observed
  • 395,433 in observed (GM)

11
Properties of Random Tags
Average tag density
Distance to Tx start sites
Enriched in promoters?! (Yes, slightly, since
uniquely mappable 27-mers are enriched in
promoters.)
12
FDR Calculations Using Random Tags
Observed
Random
This is probably conservative, since numerator is
likely an overestimate of the number of false
positives in the observed data.
13
Extending to multiple cell types
  • Call a location multi-cell verified (MCV) if
    hotspot peaks from different cell types overlap
    there (after fattening peaks to 300bp).
  • Score these MCV zones with the maximum z-score
    over the cell type peaks.
  • MCV peaks are then identified by looking at the
    summed density in the zones.
  • Repeat with multiple random datasets to get
    random MCV peaks for FDR calcs.

chr5131,585,550-131,597,894 (GM and BJ)
Write a Comment
User Comments (0)
About PowerShow.com