Motif Finding in Biological Sequences - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Motif Finding in Biological Sequences

Description:

Select a position for motif start ai with probability 1/L, where L is the sequence length ... motif sequence in positions: [ai, ai w-1] using PWM ?, where ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 13
Provided by: xin4
Category:

less

Transcript and Presenter's Notes

Title: Motif Finding in Biological Sequences


1
Motif Finding in Biological Sequences
  • Xin He
  • Dep. of Computer Science, UIUC
  • 02/15/06

2
Problem
Given a collection of genes that share a common
TF-binding site motif, find the motif and where
it occurs in each sequence
3
Word Enumeration
  • Idea a true motif occurs in all input sequences,
    thus it should be overrepresented in the input
  • Algorithm compute the statistical significance
    of all words in the input and output the most
    significant ones
  • Statistical significance use the background
    frequency as control
  • Extension allow variations of words

4
Combinatorial Searching
  • Idea suppose we know where motifs occur, then
    the motif sites should be similar to each other
  • Algorithm define a similarity score for the
    alignment and then maximizes it

5
Finding the Best Alignment
  • Let A (a1, a2, , aN) be the starting
    positions of motifs at each input sequence, I(A)
    be the similarity of motif sites. Then
  • A argmax I(A)

6
CONSENSUS
  • f?k frequency of base ? at position k
  • q? frequency of base ? in the background
  • Information content of an alignment

7
Bayesian Inference Overview
  • Perspective data is generated from some
    (unknown) random process, infer the underlying
    process using the observed data
  • Procedure let ? be unknown and D be data
  • Set up a full probability model the joint
    distribution for all observed and unobserved
    quantities
  • Posterior distribution P(? D)
  • Model evaluation how good the model fits data

8
Bayesian Missing Data Problem
  • Problem let ? be the unknown parameter(s), S be
    observed data and A be missing data, find the
    posterior distribution P(?,AS) or P(AS)
  • Step1 probability model P(S,A ?)
  • Step2 posterior distribution

9
Gibbs Sampling
  • Idea to sample from joint distribution P(A,
    ?S), repeat the two steps
  • Sample At P(AS, ?t-1)
  • Sample ?t P(?S,At)
  • Can be generalized to joint distribution of any
    number (n) of random variables fix (n-1)
    variables, sample the rest cycle through n
    variables

10
Gibbs Motif Sampler
  • Motif representation position weight matrix
    (PWM). ? (?ij) where ?ij P( symbol j at
    position i)
  • Probability model for any sequence Si
  • Select a position for motif start ai with
    probability 1/L, where L is the sequence length
  • Plant motif sequence in positions ai, aiw-1
    using PWM ?, where w is the motif width
  • Plant background sequence in the other positions
    using background distribution ?0

11
Gibbs Motif Sampler
  • Posterior distribution sample from P(AS)
  • Gibbs sampling procedure
  • Repeat choose a sequence z, sample
  • az P( azS, A\az)
  • Approximation of conditional probability find
    the PWM ? using A\az and S then sample az using
    this value of ?
  • Predictive update ? argmax P(?S, A\az)
  • Sampling az P(azS, ?)

12
Extensions of Basic Model
  • Multiple or zero occurrences of motif per
    sequence
  • Multiple types of motifs Stochastic Dictionary
    model
  • Regulatory modules a cluster of motifs (multiple
    types, multiple occurrences of each type)
    CisModule
  • Regulatory modules with motif dependency
    EMCModule
Write a Comment
User Comments (0)
About PowerShow.com