Computational detection of genomic cisregulatory modules applied to body patterning in the early Dro - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Computational detection of genomic cisregulatory modules applied to body patterning in the early Dro

Description:

The Argos algorithm ... The Argos algorithm (cont.) Move a sliding ... For a certain set of modules, Argos recovers half of them - 50% false negative rate ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 28
Provided by: Tan
Learn more at: http://www.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Computational detection of genomic cisregulatory modules applied to body patterning in the early Dro


1
Computational detection of genomic cis-regulatory
modules applied to body patterning in the early
Drosophila embryo
  • N. Rajewsky, M. Vergassola, U. Gaul and E. Siggia
  • Presented by Bin Tan

2
Cis-regulatory modules (CRM)
  • In higher eukaryotes, many genes show complex
    spatial-temporal expression patterns.
  • Gene transcription regulation apparatus is
    largely organized in the form of separable
    cis-regulatory modules.
  • A module integrates inputs from several
    transcription factors and regulates another
    genes expression, forming a regulatory network.

3
Structural features of modules
  • Hundreds of nucleotides in length
  • Contains binding sites for as many as 4-5
    different transcription factors
  • Possibly multiple binding sites for the same
    transcription factor
  • Certain combinations of binding sites
    correlations between different transcription
    factors

4
Why computational methods?
  • Pure experimental methods such as promoter
    bashing is tedious.
  • It is easier to screen a modest list of
    candidates suggested by a computational method.

5
About this paper
  • Uses data on body patterning of the early
    Drosophila embryo
  • Makes statistically significant predictions of
    regulatory modules using three different levels
    of prior information
  • Binding sites (motifs)
  • Several related modules
  • Only genome

6
Three levels of prior information1. Binding
sites (motifs)2. Several related modules3. Only
genome
7
The Ahab algorithm
  • Uses known binding sites (motifs) information
  • Scans the genome in windows
  • Scores each window according to how well the
    sequence can be stochastically generated from the
    motifs
  • Outputs windows with high ranks

8
Ahab features
  • (As compared to Mobydick)
  • Uses positional weight matrices as the motif
    model
  • Introduces a local background to remove influence
    from local variations in sequence composition
  • Allows binding sites to overlap
  • Allows weak binding sites to contribute to the
    score
  • No parameter tuning (other than the window size)

9
Algorithm details
  • Background model k-th order Markov chain (each
    nucleotide is only dependent on the preceding k
    nucleotides)

10
Algorithm details (cont.)
  • Sequence Ss1s2..
  • Weight matrices w1 w2 .. for motifs
  • Background wb
  • Probabilistic generation of S
  • Choose a motif or background wk1,2,..b with
    probability pk
  • Sample a sequence according to w and append it to
    S
  • Repeat until S reaches a certain length

11
Algorithm details (cont.)
  • Unknown arameters? p1 p2 .. pb
  • Maximize
  • Conjugate descent or EM algorithm

12
Experiment setup
  • Input weight matrices for 8 transcription
    factors constructed from 11 modules
  • Window size 500 bp
  • 27 modules known to receive maternal/gap gene
    input

13
Results
  • 146 highly significant modules found
  • For 27 known modules
  • 116 recovered
  • 3 when filtering for at least 3 different factors
  • 3 because they contain only other factors
  • 4 ranked very low (700)
  • For 15 novel predictions
  • one of the adjacent genes is patterned in the
    blastoderm

14
Estimation of positive rate
  • Scramble the columns in the weight matrices half
    as many predictions - 50 false positive rate
  • (615)3/(146-11) - 50 positive rate

15
Experiment variations
  • Remove the least specific matrix (tailless) from
    input
  • 75 of the predictions without using tailless are
    also present in the list of 146
  • Vary window size to 700bp
  • 58 in the list of 146 are also among the top 200
    of the 700bp set
  • Interesting new predictions

16
Three levels of prior information1. Binding
sites (motifs)2. Several related modules3. Only
genome
17
Motivation
  • For most transcription factors, binding site
    information is rarely known
  • Modules obtained by experimental methods (e.g.
    promoter bashing) are more common

18
The method
  • Uses standard motif finders to recover weight
    matrices from input modules
  • Feed the motifs to Ahab to find similarly
    regulated genes

19
The method (cont.)
  • Gibbs sampler algorithm
  • Lawrence et al. Detecting subtle sequence
    signals a Gibbs sampling strategy for multiple
    alignment. (Presented by Xin He)
  • Customizations
  • Search for only one binding site at a time.
  • Mask only the central 1-2 bases of each motif
    before iterating.
  • - Results are more reproducible between runs.
  • - Motifs are allowed to overlap.

20
Experiment results
  • Testing on modules with known binding site
    information
  • Gibbs sampling predicts 30-50 of the sequence is
    covered by motifs
  • Gibbs motifs has higher specificity
  • Recovers half of the known motifs
  • Predicts several new interesting motifs

21
Experiment results (cont.)
  • Input 3 modules receiving inputs from 6
    transcription factors
  • 6 highly significant weight matrices found
  • Kr, Kni, (HbCad) 3 new
  • Ahab finds 63 highly significant modules
  • 4 overlaps with the input modules
  • 13 contiguouss to genes patterned in the
    blastoderm
  • Comparable positive rates

22
Three levels of prior information1. Binding
sites (motifs)2. Several related modules3. Only
genome
23
The Argos algorithm
  • Only uses the genome data (Unsupervised)
  • Motivation Is the redundancy of binding sites
    inside modules strong enough to predict modules
    alone?
  • The first successful attempt to do this for a
    metazoan genome

24
The Argos algorithm
  • To determine whether a motif is locally
    overrepresented Score its frequency in the
    sequence against its expected frequency
    (according to genome wide background).
  • Enumerate all possible motifs of length 8.
  • Compute their frequency in the genome (background
    counts), allowing 2 mutations

25
The Argos algorithm (cont.)
  • Move a sliding window S over the genome
  • Compute a motifs local count c in S
  • Compute the motifs expected count from
    background
  • Rank the motifs by their Poisson scores
  • The motifs are often related to each other
  • Greedily select the top motif and eliminate
    related ones (under shifts and up to 4 mutations)
  • Repeat until 5 motifs have been produced
  • Use the sum of the selected motifs scores as the
    score for S

26
Experiment results
  • For a certain set of modules, Argos recovers half
    of them - 50 false negative rate
  • For several genes with 15 known modules, Argos
    recovers 7 when looking over 10kbp upstream of
    translation start
  • Genome wide, roughly one module per gene

27
Experiment results
Write a Comment
User Comments (0)
About PowerShow.com