Title: Comparative sequence analysis for ENCODE functional elements
1Comparative sequence analysis for ENCODE
functional elements
- Shamil Sunyaev
- Genetics Division, Brigham Womens Hospital,
- Harvard Medical School
- Harvard-M.I.T. Division of HST
2Features of a good score
- Reflects phylogeny
- Based on a good model of evolution/mutation
- Robust to missing information
3General strategy
- Reject the null model of neutrality
- Develop a score that partitions columns based on
deviation from neutrality - Compute a p-value for neutrality of site based
on heuristic score
4Probabilistic conservation score
- The score is position specific
- The score is based on a complex mutation model
- The score is expressed as a p-value that a given
nucleotide position evolves neutrally
5Determining mutation rates through phylogenetic
inference
human
GATCTATGGTGTGGGTTTGCATAACAGGAGAGGA
chimp
GATCCATGGTGTGGGTTTGCATAATAGGAGAGGA
baboon
GTTCCATGGTGTGGGTTTGCATAATAGGGGAGGA
Probability of T?C1/10 Probability of C?T1/3
6Human Chimp Baboon
AGC
ACC
AGC
Mutation rates are modeled as asymmetric and
context specific. The model incorporates
insertions and deletions
AGC
G?C
- Scores
- Empirical Score ln(lcons/lmut)
- ML estimate of fixation probability
- Log-likelihood ratio Score ln(lML/lNeutral)
P-value
7Extending substitution rates to other species
Instantaneous rate matrix of transitions Q
P(t) eQt
- Ignores mutation rate heterogeneity
- Assumes uniformity between species
8Computing a heuristic score
G
G
T
G
T
9Computing a p-value
G
C
T
C
T
P(X?S) n(xltS)/N
10Identifying blocks of conserved positions
We can transform a site-specific score into a
method for identifying conserved regions
Si ?j0..i -ln(pj) ln(t)
11(No Transcript)
12Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Element- and Region-specific p-values
- Overlaps of functional regions with highly
conserved regions (also for PhastCons, BinCons
and GERP) - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions
13Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Element- and Region-specific p-values
- Overlaps of functional regions with highly
conserved regions (also for PhastCons, BinCons
and GERP) - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions
14(No Transcript)
15Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Element- and Region-specific p-values
- Overlaps of functional regions with highly
conserved regions (also for PhastCons, BinCons
and GERP) - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions
16Additive score for the region
Expect normal distribution
P-value
17Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Element- and Region-specific p-values
- Overlaps of functional regions with highly
conserved regions (also for PhastCons, BinCons
and GERP) - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions
18Overlaps
19Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Element- and Region-specific p-values
- Overlaps of functional regions with highly
conserved regions (also for PhastCons, BinCons
and GERP) - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions
20Score
Length of region
21Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Element- and Region-specific p-values
- Overlaps of functional regions with highly
conserved regions (also for PhastCons, BinCons
and GERP) - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions
22Analysis of ENCODE tracks
- Distributions of the score for individual
positions - Region-specific p-values
- Overlaps of functional regions with conserved
regions (also for PhasCons, BinCons and GERP - Length/conservation distribution
- Conservation only in a subset of phylogeny
- Alignability of regions