Title: ENCODE Chromatin Replication Subgroup
1ENCODE Chromatin / Replication Subgroup Summary
of Preliminary Analyses July 18, 2005
2Major Questions
How can chromatin replication analysis deepen
and extend the current annotation of the human
genome? How can large-scale chromatin/replicatio
n analyses illuminate the biology of human gene
regulation and relate it to the physical
organization of the genome?
3Overall Goals
Chromatin Accessibility (DNaseI)
Histone Modifications
Genes Transcription 1oSequence Features Conservati
on
Replication
4The ENCODE Chromatin/Replication Workshop Team
Shamil Sunyaev (Harvard) Conservation stats
Chris Spencer (Oxford) Recomb hotspot stats
Rob Andrews (Sanger) Histone mods
Greg Crawford (NHGRI) HS stats
Jay Greenbaum (BU) ORCHID stats
Terry Furey (Duke) CpG effects
Scot Kuehn (U Wash) Annotation pipeline
Ian Dunham (Sanger) Histone mods
Chris Taylor (UVa) Replication stats
Ankit Malhotra (UVa) Replication stats
Anindya Dutta (UVa) Replication stats
John Stam (Regulome) DNaseI stats
Bob Thurman (U Wash) Wavelet correlation pipeline
Mike Hawrylycz (Regulome/AIBS) Mutual information
correlations
5The Data Sets
DNaseI sensitivity / hypersensitivity
(UW/Regulome, NHGRI) Histone modifications
(Sanger) DNA Replication (UVa) Transcription
(Affy and Yale) OH radical cleavage prediction
(BU) Recombination rate (Oxford) Gencode
(ENCODE Genes Transcripts group/Havana)
6Specific Aims and Approach
- Define the union/intersection of major
experimental data sets - and genomic features/annotations
- Approach Genomic feature annotation
pipeline (gt30 features) - Segment continuous data types using a
standardized approach - Approach 2-, 3-, and 4-state HMMs
segmentation pipeline - Examine the short- and long-range correlations
between - major experimental data sets and genomic features
- Approaches Wavelet heatmap and correlation
pipeline - Mutual information correlation pipeline
7Distribution of DNaseI HSs vs. TSS in Different
Gene Annotations
8DNaseI HSs, CpG islands, and Conservation
CpG
CNSs
HSs
9-22 (depending on definition of CNS)
25-64 (depending on definition of CpG island)
9Sorting Out CpG Effects
Histone Mods
Tissue HSs
Gene Annotations
Transfrags TARs
10Functionality of CpG islands
conservative criteria
11Frequency of different histone codes at TSSs
H4Ac H3Ac H3K4Me3 H3K4Me2 H3K4Me1
12Histone H3 Methylation Code at Transcription
Start Sites
13Histone H3 Methylation Combinatorial Code for
DNaseI HSs
14DNA Replication Chromatin
15Replication dynamics of chromosomes
Time
16TR50 Calculation
TR50 - Time at which 50 of the locus is
replicated In the example below, probe A has
a TR50 of 1.25hr (80 at 2hr, 0 at 0hr) probe
B has a TR50 of 6.33hr (100 at 8hr, 40 at 6hr)
Example
Probe Probe
17- TR50 improves the analysis of the data
- For segregating pan-S and various temporally
specific segments - For defining chromatin domains
- For predicting origins
18Specific Classification
- Within a specific region, we classify sub-regions
as early, mid, or late based on the average TR50 - For ENCODE regions
- 23 Pan-S
- 77 Specific
- 35 Early
- 38 Mid
- 27 Late
Taylor, Malhotra
19Examples
ENm005
ENm012
20TR50 for defining chromosomal domains
21ENm005 - Temporal Profile of Replication
ENm005 Replicates with Possible origins
A
B
E
F
C
D
G
H
K
L
I
J
22Time of replication confirmed by interphase FISH
Replicated
Unreplicated
23Confirmation of replication time by interphase
FISH
0hr
2hr
4hr
6hr
8hr
10hr
early
late
Karnani
24Chromosomal domain transcription
25TR50 for defining origins
26Sequence features of known metazoan replicators
IR
IR
IR
IR
IR
IR
27 Conf. Start Conf. End Difference Avg. Pred.
A 5,065,180 5,092,935 27,755
5,082,768 B 5,178,000 5,219,000
41,000 5,178,000 -5,219,000
C 5,263,570 5,292,750 29,180 5,271,110 D 5,36
6,290 5,460,645 94,355 5,399,952 E 5,543,905 5
,568,880 24,975 5,558,825 F 5,650,750 5,681,275
30,525 5,667,011
A
B
C
D
E
F
28LCR
29- Correlation of replication dynamics with
- DNA features
- Chromatin features
30High gene-density correlates with early
replication
TR50
Gene Density (50 kb window)
31High AT content correlates with late replication
TR50
AT Content
32Predicted origins distributed equally in all
temporal segments
33DNAse I hypersensitive sites correlate with early
replication (and with pan-S replication)
Taylor, Malhotra, Stam, Crawford, Collins, Kuehn,
Noble
34DNAse I hypersensitive sites correlate with early
replication (and with pan-S replication)
35Histone modification marks correlate with early
replication (and pan-S)
Taylor, Malhotra, Dunham, Stam, Kuehn, Noble
36Is HeLa cell replication dynamics saying
something general about DNA/chromatin structure
across cell lines?
Why is pan-S replication correlated with DNAse
hypersensitivity sites, histone modifications and
recombination hot spots?
Do predicted origins correlate with something
proximity to genes, motifs, MCS etc?
37Chromatin Conservation
38Conservation Patterns in CpG Islands
39Conservation Patterns in HSs and HS-CpG Islands
40Region-specific Variation in Conservation Patterns
CpG
-CpG
ENm006
ENm010
Without CpG islands
With CpG islands
41Correlating Chromatin Features
42Visualizing and quantifying higher-order
chromatin features using wavelet analysis
DNaseI Sensitivity
Scale of feature (kb)
1Mb
Wavelet analyses allow simultaneous visualization
of features and the scale over which they occur
43Visualizing and quantifying higher-order
chromatin features using wavelet analysis
DNaseI Sensitivity
Gencode annotation
Scale of feature (kb)
1Mb
Wavelet analyses allow simultaneous visualization
of features and the scale over which they occur
44Chromatin accessibility (DNaseI) vs. Histone
modifications
DNaseI
H3K4Me3
H3K4Me2
H3K4Me1
45Chromatin accessibility (DNaseI) vs. Histone
modifications
DNaseI
H3K4Me3
H3Ac
H4Ac
46Chromatin accessibility (DNaseI) vs. Histone
modifications
DNaseI
H3K4Me3
Correlation coefficient (-1 to 1)
H3K4Me2
H3K4Me1
Scale over which correlation occurs (kb)
47Cross-Correlating Experimental and Genomic
Feature Sets
48Wavelet correlations over 72 experimental/feature
pairs
Ubiquitous correlations (DNaseI vs. Histone Mods)
Region-specific correlations (high correlation is
specific to subset of ENCODE regions)
correlation coefficient gt 0.7 (at one or more
scales)
49Negative correlation between recombination rate
chromatin accessibility
P lt 0.005
50Mutual Information (MI) correlation analysis In
progress
DNaseI
51Next Steps
- Unraveling region-specific correlations What
features are driving - the correlation?
- Patterns of conservation in DNaseI HSs
Regional or focal? - Quantification of chromatin domains/features
correlation - with the gene/transcript annotation
- Needed
- Further methods development (segmentation,
correlation) - More experimental data from the same cell
type(s)!