Motif Discovery - PowerPoint PPT Presentation

About This Presentation
Title:

Motif Discovery

Description:

Motif may be short, or only partially similar in all input sequences ... May converge to local optima if motifs are subtle or residue distribution is skewed ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 14
Provided by: chase8
Category:

less

Transcript and Presenter's Notes

Title: Motif Discovery


1
Motif Discovery
  • Presented by
  • Chase A. Jahn
  • CS329E
  • April 29, 2008

2
Outline
  • Motif discovery problem
  • Search methods
  • Exhaustive
  • Gibbs Sampling

3
What is a motif?
Problem Description
  • Short sequences of DNA or RNA (or amino acids)
  • Often consist of 5-16 residues
  • May contain gaps
  • Examples include
  • Transcription factor binding sites
  • Aide in identifying gene networks
  • Splice sites
  • Start/stop codons
  • Phosphorylation sites
  • Coiled-coil domains

4
Motif Problem
Problem Description
  • Given sequences
  • Find motif
  • The number of motifs
  • The width of each motif
  • The locations of motif occurrences

5
Inherent Difficulties
Problem Description
  • Very long input sequences
  • Thousands to Millions of residues
  • Length variation
  • Motif may be short, or only partially similar in
    all input sequences

6
Exhaustive Searching
Exhaustive
  • BruteForceMotifSearch(DNA,t,n,l)
  • bestScore 0
  • For each (s1.st) from (1,.1) to (n-l1,,n-l1)
  • If Score(s,DNA) gt bestScore
  • bestScore Score(s,DNA)
  • bestMotif (s1, s2,,st)
  • Return bestMotif
  • Running time
  • Number of positions
  • Score(s,DNA) O(l)
  • Overall

7
Gibbs Sampling
  • Stochastically select starting positions for N
    l-mers (N5)

Gibbs Sampling
8
Gibbs Sampling
  • Stochastically select one sequence
  • Generate profile P from the remaining (N-1)
    l-mers
  • For each position in the chosen sequence,
    calculate the probability that the position is
    generated by P

Gibbs Sampling
9
Gibbs Sampling
  • Scoring Profile P is a position weight matrix
    (PWM) which contains log likelihood weights for
    computing a match score

Gibbs Sampling
MacIsaac, Practical Strategies for Discovering
Regulatory DNA Sequence Motifs, 2006.
10
Gibbs Sampling
  • Choose a new starting position in the chosen
    sequence randomly
  • Compute its score, accept the replacement
    depending on the score.

Gibbs Sampling
11
Review
Best Location
Gibbs Sampling
New Location
  • Start with random motif locations and calculate
    a motif model
  • Randomly select a sequence, remove its motif and
    recalculate tempory model
  • With temporary model, calculate probability of
    motif at each position on sequence
  • Select new position based on this distribution
  • Update model and Iterate

12
Summary
  • Gibbs sampling works well and is more efficient
    than the exhaustive technique
  • May converge to local optima if motifs are subtle
    or residue distribution is skewed

Summary
13
Questions?
Questions
Write a Comment
User Comments (0)
About PowerShow.com