Multiscale correlations in continuous genomic data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Multiscale correlations in continuous genomic data

1
Multi-scale correlations in continuous genomic
data

Bob Thurman, Ph.D.
Research Scientist
Stamatoyannopoulos and Noble Labs
Department of Genome Sciences
University of Washington
6 January, 2008
PSB 2008, The Big Island of Hawaii

2
Wavelet correlations

A technique using wavelets to uncover
correlations between continuous genomic datasets
at multiple scales.
Two case studies from ENCODE project. Multiple
measures of functionality. To what extent do
they agree or disagree?
DNaseI sensitivity (chromatin accessibility) vs.
histone modifications.
H3K4me2 (activating) vs. H3K27me3 (repressive)

3
Wavelet representation of continuous data
DNaseI
H3K4me2
Wavelet coefficient measures the strength of
the change of the signal at the given position,
when considered at the given scale. A tool for
time-frequency analysis.
4
Significance of correlations
5
Statistical significance via sampling
0/1000 samples with that enrichment (plt.001)
Sampling strategy based on correlations between
large chunks randomly sampled separately from
each dataset.
6
Sample size matters
For 500kb region, p 0.0004 (2/5000 exceeding
samples)
7
High correlation correlates with gene density
Each point corresponds to one of the 31 ENCODE
regions of size 500kb
8
Case 2 H3K27me3 vs. H3K4me2
2kb
scale
32kb
9
Significance of pattern of positive and negative
correlations
plt0.001 for the observed fraction of correlation
values above 0.5 and below -0.5
10
Use 2-state HMMs to identify hi/hi regions
11
H3K27me3/H3K4me2 hi/hi regions

329 segments, covering 3425074 total bp, or
approximately 10 of ENCODE.
Over-represented GO categories
Six transcription-related terms
Regulation of cellular, physiological and
biological processes
Development

12
Summary

Tool for exploring correlations between any two
continuous genomic datasets
Multiple scales explored simultaneously
Correlation values at each position enable local,
regional and global analyses.

13
Acknowledgments

ENCODE project for funding and data
John Stam, Bill Noble, and all my homies in the
Noble and Stam labs
Don Percival (UW) for help with wavelets

14
Scale matters
Related recent work successful partition of
Barski modifications at domain scale.
41kb scale
Input resolution

Write a Comment

User Comments (0)

About PowerShow.com

Multiscale correlations in continuous genomic data PowerPoint PPT Presentation