Comparative sequence analysis for ENCODE functional elements - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Comparative sequence analysis for ENCODE functional elements

Description:

chimp. baboon. GTTCCATGGTGTGGGTTTGCATAATAGGGGAGGA. Probability of T C=1 ... Human Chimp Baboon. Mutation rates are modeled as asymmetric and context specific. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 23
Provided by: nobleGsWa
Category:

less

Transcript and Presenter's Notes

Title: Comparative sequence analysis for ENCODE functional elements


1
Comparative sequence analysis for ENCODE
functional elements
  • Shamil Sunyaev
  • Genetics Division, Brigham Womens Hospital,
  • Harvard Medical School
  • Harvard-M.I.T. Division of HST

2
Features of a good score
  • Reflects phylogeny
  • Based on a good model of evolution/mutation
  • Robust to missing information

3
General strategy
  • Reject the null model of neutrality
  • Develop a score that partitions columns based on
    deviation from neutrality
  • Compute a p-value for neutrality of site based
    on heuristic score

4
Probabilistic conservation score
  • The score is position specific
  • The score is based on a complex mutation model
  • The score is expressed as a p-value that a given
    nucleotide position evolves neutrally

5
Determining mutation rates through phylogenetic
inference
human
GATCTATGGTGTGGGTTTGCATAACAGGAGAGGA

chimp
GATCCATGGTGTGGGTTTGCATAATAGGAGAGGA
baboon
GTTCCATGGTGTGGGTTTGCATAATAGGGGAGGA
Probability of T?C1/10 Probability of C?T1/3
6
Human Chimp Baboon
AGC
ACC
AGC
Mutation rates are modeled as asymmetric and
context specific. The model incorporates
insertions and deletions
AGC
G?C
  • Scores
  • Empirical Score ln(lcons/lmut)
  • ML estimate of fixation probability
  • Log-likelihood ratio Score ln(lML/lNeutral)


P-value
7
Extending substitution rates to other species
Instantaneous rate matrix of transitions Q
P(t) eQt
  • Ignores mutation rate heterogeneity
  • Assumes uniformity between species

8
Computing a heuristic score
G
G
T
G
T
9
Computing a p-value
G
C
T
C
T
P(X?S) n(xltS)/N
10
Identifying blocks of conserved positions
We can transform a site-specific score into a
method for identifying conserved regions
Si ?j0..i -ln(pj) ln(t)
11
(No Transcript)
12
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Element- and Region-specific p-values
  • Overlaps of functional regions with highly
    conserved regions (also for PhastCons, BinCons
    and GERP)
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions

13
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Element- and Region-specific p-values
  • Overlaps of functional regions with highly
    conserved regions (also for PhastCons, BinCons
    and GERP)
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions

14
(No Transcript)
15
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Element- and Region-specific p-values
  • Overlaps of functional regions with highly
    conserved regions (also for PhastCons, BinCons
    and GERP)
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions

16
Additive score for the region
Expect normal distribution
P-value
17
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Element- and Region-specific p-values
  • Overlaps of functional regions with highly
    conserved regions (also for PhastCons, BinCons
    and GERP)
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions

18
Overlaps
19
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Element- and Region-specific p-values
  • Overlaps of functional regions with highly
    conserved regions (also for PhastCons, BinCons
    and GERP)
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions

20
Score
Length of region
21
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Element- and Region-specific p-values
  • Overlaps of functional regions with highly
    conserved regions (also for PhastCons, BinCons
    and GERP)
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions

22
Analysis of ENCODE tracks
  • Distributions of the score for individual
    positions
  • Region-specific p-values
  • Overlaps of functional regions with conserved
    regions (also for PhasCons, BinCons and GERP
  • Length/conservation distribution
  • Conservation only in a subset of phylogeny
  • Alignability of regions
Write a Comment
User Comments (0)
About PowerShow.com