Cis-regultory module - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Cis-regultory module

Description:

Cis-regultory module 10/24/07 – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 36
Provided by: gcyuan
Category:

less

Transcript and Presenter's Notes

Title: Cis-regultory module


1
Cis-regultory module
  • 10/24/07

2
TFs often work synergistically
(Harbison 2004)
3
Combinatorial control
4
l-phase
E coli
lytic growth
lysogenic growth
(source Gary Kaiser)
5
l-operon
cro
cI
OR
6
l-operon
lysogenic growth
on
off
cro
cI
OR
7
l-operon
lytic growth
off
on
cro
cI
OR
OR1
OR2
OR3
8
l-operon
lysogenic
Pol II
cro
cI
9
Cis-regulatory module (CRM)
  • A CRM is a DNA segment, typically a few hundred
    base pairs in length containing multiple binding
    sites, that recruits several cooperating factors
    to a particular genomic location
  • Ji and Wong (2006)

10
Statistical Methods
  • Predict modules when the motifs are known.
    (simpler)
  • LRA, by Wasserman and Fickett (1998)
  • Predict modules when the motifs also need to be
    discovered. (more difficult)
  • CisModule, by Zhou and Wong (2004)
  • EMCModule, by Gupta and Liu (2005)

11
LRA
12
LRA
Basic idea True regulatory regions are likely to
have multiple motif sites.
Probability for being regulatory
13
LRA
Probability for being a regulatory region
regression coefficient
highest motif matching score within a given
sequence
  • Training data contain a subset of known
    regulatory and control regions.

14
Application skeletal-muscle gene regulation
  • 5 muscle-specific TFs are known
  • Mef-2, Myf, SRF, Tef, Sp-1
  • 29 regulatory regions are known.
  • Can we predict the regulatory regions just from
    sequence motif information?

15
Computational Procedure
  • Motif matrices are identified by Gibbs sampling
    using sequence information from the 29 regulatory
    regions.
  • For some TF, motifs cannot be found by the de
    novo approach. Use literature motifs instead.
  • Top two matching scores for each TF are included
    as covariates.
  • Apply LRA model. Use leave-one-out
    cross-validation to evaluate model performance.

16
Results
  • Single motifs are highly non-specific.
  • Simple multi-sites analysis improves specificity
    at the cost of reducing sensitivity.

17
Results
  • Single motifs are highly non-specific.
  • Simple multi-sites analysis improves specificity
    at the cost of reducing sensitivity.

18
Results
  • Single motifs are highly non-specific.
  • Simple multi-sites analysis improves specificity
    at the cost of reducing sensitivity.
  • Logistic regression further improves specificity
    at reduced cost for sensitivity.

19
Limitations of LRA
  • Motifs must be known in advance.
  • When known regulatory sequences are few, it is
    difficult to identify motifs by using traditional
    methods.
  • Objective
  • Integrating motif discovery and module finding in
    a single statistical model.

20
De novo module identification
  • Two tasks
  • Identify TF motifs
  • Identify CRMs.

21
Why module approach can help motif discovery
  • Due to poor specificity, a short sequence can be
    enriched simply by chance.
  • The probability for random matches is much
    smaller for motif co-occurrence.

22
cisModule
  • Basic idea
  • A two-level hierarchical mixture model (HMx).
  • Level 1 modules ? sequences

(Zhou and Wong 2004)
23
cisModule
  • Basic idea
  • A two-level hierarchical mixture model (HMx).
  • Level 1 modules ? sequences
  • Level 2 motifs ? modules

(Zhou and Wong 2004)
24
HMx Model as a Stochastic Process
  • Treat HMx model as a stochastic machinery to
    generate sequences.
  • From the first sequence position, make a series
    of random decisions of whether to initiate a
    module of length l or generate a letter from the
    background model.
  • Inside a module, If a site for the kth motif was
    initiated at position n, then generate wk letters
    from its PWM and place them at n, nwk-1,
    otherwise generate a letter from the background.
  • After reaching the end of the current module,
    decide whether sampling from the background or
    initiating a new module.

(Zhou and Wong 2004)
25
Model inference Gibbs sampling
given model parameters, update module/motif
locations
26
An numerical experiment
  • Merge the 29 regulatory regions with a set of
    sequences randomly selected from ENSEMBL
    promoters.
  • Test the ability of cisModule to identify motifs
    under noisy environment.

27
Results
28
Limitations of CisModule
  • The length of module, and number of motifs are
    externally provided.
  • Convergence time could be slow. Multiple cycles
    are needed each starting from a new seed.
  • Assuming that combinations of different motifs
    are independent.

29
EMCModule
  • Gupta and Liu (2005) developed a similar approach
    called EMCModule.
  • Main difference
  • They use the collection of literature motifs as
    initial seeds for motif discovery.
  • Their method improves the convergence speed.
  • Their definition of CRMs are a little different
    the number of motifs are fixed within one module,
    but the order of and distance between different
    motifs can be varied.

30
Further issues
  • Comparative genomic approach can also be
    incorporated into module discovery. (Zhou and
    Wong 2007).
  • The modules identified by these methods can be
    viewed as belonging to one type. New methods
    need to developed to discover multiple module
    types.
  • While module-based approach is helpful for
    finding cooperative motifs, it may hurt discovery
    of single motifs.

31
(Yuh et al. 1998)
32
(Yuh et al. 1998)
33
(Yuh et al. 1998)
34
(Yuh et al. 1998)
35
Reading List
  • Wasserman and Fickett (1988)
  • LRA. One of the first work on cis-regulatory
    modules.
  • Zhou and Wong (2004)
  • cisModule. A statistical method to identify cis-
    regulatory modules without knowledge of motif
    information.
  • Yuh et al. (1998)
  • An influential biological paper on how
    information can be integrated from different
    modules to regulate gene expression.
Write a Comment
User Comments (0)
About PowerShow.com