PhyloGibbs: Siddharthan et al PLoS Comp Bio 2005 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

PhyloGibbs: Siddharthan et al PLoS Comp Bio 2005

Description:

Motif finding combined with model of evolution ... Anneal and track. Simulated annealing to find maximum likelihood configuration C ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 25
Provided by: Saurab5
Category:

less

Transcript and Presenter's Notes

Title: PhyloGibbs: Siddharthan et al PLoS Comp Bio 2005


1
PhyloGibbs Siddharthan et alPLoS Comp Bio 2005
  • Presented by
  • Saurabh Sinha

2
PhyloGibbs
  • Motif finding combined with model of evolution
  • Neutral evolution selection operating on
    binding sites
  • Gibbs sampling approach

3
Recall Gibbs sampling
  • Motif described by start positions of sites X
    (x1,x2,xN) where xi is the location of the site
    in ith sequence
  • Motif model ? is the PWM from which each site is
    sampled
  • Goal to sample from posterior P(?, X data)
  • In reality, sample from Pr(X ?, data)
  • Sample X using Gibbs sampling sample xi from
    conditional distribution Pr(xi X - xi)

4
PhyloGibbs
  • Input is a multiple alignment of each sequence
    (from several species)
  • Site is replaced by window of fixed length on
    multiple alignment

5
Basics placing motif windows
ttttCGTGAT-GCGTCGtttttttttt gagaCGTGATcGCGTCGagaat
atata cccc-------------CCAAGATCAGAccc aaata-------
-----CCAACATCAGAaaa
Multiple alignment from DIALIGN Vertically
aligned caps are evolutionarily related bases.
3 legal windows, 1 illegal window All possible
legal windows identified in preprocessing.
6
Motif windows
  • A legal window does not have gaps
  • A legal window can span one species or multiple
    species
  • If spanning multiple species, the sites in the
    window are assumed to be evolutionarily related
    binding sites

7
Sampling probability
  • For a single-species motif window, sampling
    probability is
  • For a window spanning multiple species, sampling
    probability is

8
Evolutionary model
  • is given by evolutionary model

9
Evolutionary model
  • Evolving binding site must bind the same protein
  • All bases mutate at a fixed rate ?
  • When a base at position i of a binding site is
    mutated to letter ?, probability that selection
    will fix this mutation is given by the PWM
    component w?i.
  • Recall that under strong selection s, probability
    of fixation to a new allele (starting copy
    number1), is s.
  • Then, probability that a base at position i will
    mutate from ? from ? in time t is given by
  • where q e-?t

10
Evolutionary model
  • As q-gt0 (i.e., large t), this becomes w?i
  • so the equilibrium distribution is wi
  • The model is multiplicative, i.e.,

11
Evolutionary model
  • For a star topology,

12
Configurations
  • Given a set of sequences, a configuration is a
    set of (legally placed) motif windows
  • AND a color assigned to each window
  • Could have multiple windows assigned with the
    same color
  • Different colors represent different
    transcription factors
  • Goal is to find multiple types of motifs
    together, by using different colors
  • Color 0 is background (random) sequence

13
Sampling
  • The algorithm samples configurations from their
    posterior probability distribution
  • Prior on configurations
  • n(C) colored windows in C

avoid too many sites/windws
14
Probabilities
  • P(SC) is the probability that
  • for each color in C,
  • all windows of color C were sampled from the same
    PWM,
  • though that PWM is unknown

P(S?CB) is for background windows (color
0) P(Sc) is for all windows of color c ? C
15
Probabilities
  • P(Sc) is the probability that all windows of
    color c were sampled from the same PWM, though
    that PWM is unknown
  • Integration of all PWMs (for single species case)

?
16
Probabilities
  • P(Sc) is the probability that all windows of
    color c were sampled from the same PWM, though
    that PWM is unknown
  • Integration of all PWMs (for single species case)
  • Dirichlet prior
  • is usually 1, so
  • ?() is a factorial

17
Probabilities
  • What about Pr(Sc) in the multiple species case?
  • Well come to this later

18
Sampling
  • We have to sample configuration C from the
    posterior distribution P(CS)
  • Monte Carlo Markov Chain (MCMC)
  • Take the current configuration C and move
    probabilistically to a new configuration C such
    that detailed balance holds
  • P(CS)P(C --gt C) P(CS)P(C --gt C)

19
Move set
  • What kinds of moves are allowed ?
  • Moves are
  • shift one or more colored windows, or
  • add or remove a colored window, or
  • recolor a randomly chosen window from current
    configuration

20
Move set
  • A move takes current config C, constructs the
    set X of all C ? X that differ from C by a
    single change
  • It then chooses one of the C as per
    P(CS)/P(CS) ?C?XP(CS)
  • Window shift move takes a single window and
    resamples its position
  • Gibbs sampling sample a joint prob. distr. by
    resampling one variable at a time, while keeping
    others fixed.

21
Summarizing the samples
  • After many moves, each configuration will be
    sampled as per P(CS)
  • Wish to report all relevant features that are
    shared by configurations with high posterior
    probability
  • Would like to identify groups of windows that
    with high probability share a color

22
Anneal and track
  • Simulated annealing to find maximum likelihood
    configuration C
  • sample from P(CS)?
  • slowly increase ? over time
  • This provides a reference set of windows that
    will be tracked

23
Tracking
  • Fix the reference configuration C
  • Continue sampling from P(CS), and for each
    window w in C, track the probability p(w,c) that
    window w belongs to the same motif (color) as the
    windows in color c of reference configuration C

24
One final point
  • What is Pr(Sc) in the multiple species case ?
Write a Comment
User Comments (0)
About PowerShow.com