Computational detection of genomic cisregulatory modules applied to body patterning in the early Dro - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Computational detection of genomic cisregulatory modules applied to body patterning in the early Dro

Description:

The Argos algorithm ... The Argos algorithm (cont.) Move a sliding ... For a certain set of modules, Argos recovers half of them - 50% false negative rate ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 28

Provided by: Tan

Learn more at: http://www.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computational detection of genomic cisregulatory modules applied to body patterning in the early Dro

1
Computational detection of genomic cis-regulatory
modules applied to body patterning in the early
Drosophila embryo

N. Rajewsky, M. Vergassola, U. Gaul and E. Siggia
Presented by Bin Tan

2
Cis-regulatory modules (CRM)

In higher eukaryotes, many genes show complex
spatial-temporal expression patterns.
Gene transcription regulation apparatus is
largely organized in the form of separable
cis-regulatory modules.
A module integrates inputs from several
transcription factors and regulates another
genes expression, forming a regulatory network.

3
Structural features of modules

Hundreds of nucleotides in length
Contains binding sites for as many as 4-5
different transcription factors
Possibly multiple binding sites for the same
transcription factor
Certain combinations of binding sites
correlations between different transcription
factors

4
Why computational methods?

Pure experimental methods such as promoter
bashing is tedious.
It is easier to screen a modest list of
candidates suggested by a computational method.

5
About this paper

Uses data on body patterning of the early
Drosophila embryo
Makes statistically significant predictions of
regulatory modules using three different levels
of prior information
Binding sites (motifs)
Several related modules
Only genome

6
Three levels of prior information1. Binding
sites (motifs)2. Several related modules3. Only
genome
7
The Ahab algorithm

Uses known binding sites (motifs) information
Scans the genome in windows
Scores each window according to how well the
sequence can be stochastically generated from the
motifs
Outputs windows with high ranks

8
Ahab features

(As compared to Mobydick)
Uses positional weight matrices as the motif
model
Introduces a local background to remove influence
from local variations in sequence composition
Allows binding sites to overlap
Allows weak binding sites to contribute to the
score
No parameter tuning (other than the window size)

9
Algorithm details

Background model k-th order Markov chain (each
nucleotide is only dependent on the preceding k
nucleotides)

10
Algorithm details (cont.)

Sequence Ss1s2..
Weight matrices w1 w2 .. for motifs
Background wb
Probabilistic generation of S
Choose a motif or background wk1,2,..b with
probability pk
Sample a sequence according to w and append it to
S
Repeat until S reaches a certain length

11
Algorithm details (cont.)

Unknown arameters? p1 p2 .. pb
Maximize
Conjugate descent or EM algorithm

12
Experiment setup

Input weight matrices for 8 transcription
factors constructed from 11 modules
Window size 500 bp
27 modules known to receive maternal/gap gene
input

13
Results

146 highly significant modules found
For 27 known modules
116 recovered
3 when filtering for at least 3 different factors
3 because they contain only other factors
4 ranked very low (700)
For 15 novel predictions
one of the adjacent genes is patterned in the
blastoderm

14
Estimation of positive rate

Scramble the columns in the weight matrices half
as many predictions - 50 false positive rate
(615)3/(146-11) - 50 positive rate

15
Experiment variations

Remove the least specific matrix (tailless) from
input
75 of the predictions without using tailless are
also present in the list of 146
Vary window size to 700bp
58 in the list of 146 are also among the top 200
of the 700bp set
Interesting new predictions

16
Three levels of prior information1. Binding
sites (motifs)2. Several related modules3. Only
genome
17
Motivation

For most transcription factors, binding site
information is rarely known
Modules obtained by experimental methods (e.g.
promoter bashing) are more common

18
The method

Uses standard motif finders to recover weight
matrices from input modules
Feed the motifs to Ahab to find similarly
regulated genes

19
The method (cont.)

Gibbs sampler algorithm
Lawrence et al. Detecting subtle sequence
signals a Gibbs sampling strategy for multiple
alignment. (Presented by Xin He)
Customizations
Search for only one binding site at a time.
Mask only the central 1-2 bases of each motif
before iterating.
- Results are more reproducible between runs.
- Motifs are allowed to overlap.

20
Experiment results

Testing on modules with known binding site
information
Gibbs sampling predicts 30-50 of the sequence is
covered by motifs
Gibbs motifs has higher specificity
Recovers half of the known motifs
Predicts several new interesting motifs

21
Experiment results (cont.)

Input 3 modules receiving inputs from 6
transcription factors
6 highly significant weight matrices found
Kr, Kni, (HbCad) 3 new
Ahab finds 63 highly significant modules
4 overlaps with the input modules
13 contiguouss to genes patterned in the
blastoderm
Comparable positive rates

22
Three levels of prior information1. Binding
sites (motifs)2. Several related modules3. Only
genome
23
The Argos algorithm

Only uses the genome data (Unsupervised)
Motivation Is the redundancy of binding sites
inside modules strong enough to predict modules
alone?
The first successful attempt to do this for a
metazoan genome

24
The Argos algorithm

To determine whether a motif is locally
overrepresented Score its frequency in the
sequence against its expected frequency
(according to genome wide background).
Enumerate all possible motifs of length 8.
Compute their frequency in the genome (background
counts), allowing 2 mutations

25
The Argos algorithm (cont.)

Move a sliding window S over the genome
Compute a motifs local count c in S
Compute the motifs expected count from
background
Rank the motifs by their Poisson scores
The motifs are often related to each other
Greedily select the top motif and eliminate
related ones (under shifts and up to 4 mutations)
Repeat until 5 motifs have been produced
Use the sum of the selected motifs scores as the
score for S

26
Experiment results

For a certain set of modules, Argos recovers half
of them - 50 false negative rate
For several genes with 15 known modules, Argos
recovers 7 when looking over 10kbp upstream of
translation start
Genome wide, roughly one module per gene

27
Experiment results

Write a Comment

User Comments (0)