Title: Multiscale correlations in continuous genomic data
1Multi-scale correlations in continuous genomic
data
- Bob Thurman, Ph.D.
- Research Scientist
- Stamatoyannopoulos and Noble Labs
- Department of Genome Sciences
- University of Washington
- 6 January, 2008
- PSB 2008, The Big Island of Hawaii
2Wavelet correlations
- A technique using wavelets to uncover
correlations between continuous genomic datasets
at multiple scales. - Two case studies from ENCODE project. Multiple
measures of functionality. To what extent do
they agree or disagree? - DNaseI sensitivity (chromatin accessibility) vs.
histone modifications. - H3K4me2 (activating) vs. H3K27me3 (repressive)
3Wavelet representation of continuous data
DNaseI
H3K4me2
Wavelet coefficient measures the strength of
the change of the signal at the given position,
when considered at the given scale. A tool for
time-frequency analysis.
4Significance of correlations
5Statistical significance via sampling
0/1000 samples with that enrichment (plt.001)
Sampling strategy based on correlations between
large chunks randomly sampled separately from
each dataset.
6Sample size matters
For 500kb region, p 0.0004 (2/5000 exceeding
samples)
7High correlation correlates with gene density
Each point corresponds to one of the 31 ENCODE
regions of size 500kb
8Case 2 H3K27me3 vs. H3K4me2
2kb
scale
32kb
9Significance of pattern of positive and negative
correlations
plt0.001 for the observed fraction of correlation
values above 0.5 and below -0.5
10Use 2-state HMMs to identify hi/hi regions
11H3K27me3/H3K4me2 hi/hi regions
- 329 segments, covering 3425074 total bp, or
approximately 10 of ENCODE. - Over-represented GO categories
- Six transcription-related terms
- Regulation of cellular, physiological and
biological processes - Development
12Summary
- Tool for exploring correlations between any two
continuous genomic datasets - Multiple scales explored simultaneously
- Correlation values at each position enable local,
regional and global analyses.
13Acknowledgments
- ENCODE project for funding and data
- John Stam, Bill Noble, and all my homies in the
Noble and Stam labs - Don Percival (UW) for help with wavelets
14Scale matters
Related recent work successful partition of
Barski modifications at domain scale.
41kb scale
Input resolution