Multiscale correlations in continuous genomic data - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Multiscale correlations in continuous genomic data

Description:

A technique using wavelets to uncover correlations between continuous genomic ... John Stam, Bill Noble, and all my homies in the Noble and Stam labs ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 15
Provided by: rthu
Category:

less

Transcript and Presenter's Notes

Title: Multiscale correlations in continuous genomic data


1
Multi-scale correlations in continuous genomic
data
  • Bob Thurman, Ph.D.
  • Research Scientist
  • Stamatoyannopoulos and Noble Labs
  • Department of Genome Sciences
  • University of Washington
  • 6 January, 2008
  • PSB 2008, The Big Island of Hawaii

2
Wavelet correlations
  • A technique using wavelets to uncover
    correlations between continuous genomic datasets
    at multiple scales.
  • Two case studies from ENCODE project. Multiple
    measures of functionality. To what extent do
    they agree or disagree?
  • DNaseI sensitivity (chromatin accessibility) vs.
    histone modifications.
  • H3K4me2 (activating) vs. H3K27me3 (repressive)

3
Wavelet representation of continuous data
DNaseI
H3K4me2
Wavelet coefficient measures the strength of
the change of the signal at the given position,
when considered at the given scale. A tool for
time-frequency analysis.
4
Significance of correlations
5
Statistical significance via sampling
0/1000 samples with that enrichment (plt.001)
Sampling strategy based on correlations between
large chunks randomly sampled separately from
each dataset.
6
Sample size matters
For 500kb region, p 0.0004 (2/5000 exceeding
samples)
7
High correlation correlates with gene density
Each point corresponds to one of the 31 ENCODE
regions of size 500kb
8
Case 2 H3K27me3 vs. H3K4me2
2kb
scale
32kb
9
Significance of pattern of positive and negative
correlations
plt0.001 for the observed fraction of correlation
values above 0.5 and below -0.5
10
Use 2-state HMMs to identify hi/hi regions
11
H3K27me3/H3K4me2 hi/hi regions
  • 329 segments, covering 3425074 total bp, or
    approximately 10 of ENCODE.
  • Over-represented GO categories
  • Six transcription-related terms
  • Regulation of cellular, physiological and
    biological processes
  • Development

12
Summary
  • Tool for exploring correlations between any two
    continuous genomic datasets
  • Multiple scales explored simultaneously
  • Correlation values at each position enable local,
    regional and global analyses.

13
Acknowledgments
  • ENCODE project for funding and data
  • John Stam, Bill Noble, and all my homies in the
    Noble and Stam labs
  • Don Percival (UW) for help with wavelets

14
Scale matters
Related recent work successful partition of
Barski modifications at domain scale.
41kb scale
Input resolution
Write a Comment
User Comments (0)
About PowerShow.com