MotifBooster - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

MotifBooster

Description:

MotifBooster A Boosting Approach for Constructing TF-DNA ... Boosted models vs. Seed weight matrices. Leave-one-out test results. Results. RAP1. Weight Matrix ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 24
Provided by: csBra
Category:

less

Transcript and Presenter's Notes

Title: MotifBooster


1
MotifBooster A Boosting Approach for
Constructing TF-DNA Binding Classifiers
Pengyu Hong 10/06/2005
2
Motivation
  • Understand transcriptional regulation

Gene X
TF
  • Model transcriptional regulatory networks

3
Motivation
Previous works on motif finding
  • AlignACE (Hughes et al 2000)
  • ANN-Spec (Workman et al 2000)
  • BioProspector (Liu et al 2001)
  • Consensus (Hertz et al 1999)
  • Gibbs Motif Sampler (Lawrence et al 1993)
  • LogicMotif (Keles et al 2004)
  • MDScan (Liu et al 2002)
  • MEME (Bailey and Elkan 1995)
  • Motif Regressor (Colon et al 2003)

4
Motivation
A widely used model Motif Weight Matrix
(Stormo et al 1982)
1 2 3 4 5 6 7 8 A
0.19 1.11 -0.17 1.65 -2.65 -2.66 -1.98 0.92 C
-0.14 -0.49 1.89 -1.81 1.70 2.32 2.14 -2.07 G
-1.39 0.25 -1.22 -1.07 -2.07 -2.07 -2.07 1.13 T
0.86 -1.39 -2.65 -2.65 0.41 -2.65 -1.16 -1.80
Score of the site
10.84
vs. threshold

A sequence is a target if it contains a binding
site (score gt threshold).
Computational ltlt Molecular
5
Motivation
Non-linear binding effects, e.g., different
binding modes.
CACCCATACAT
Mode 1
Preferred binding
CATCCGTACAT
Mode 2
CA C/T CC A/G TACAT
CACCCGTACAT
Mode 3
Non-preferred binding
CATCCATACAT
Mode 4
6
Modeling
Model a TF-DNA binding classifier as an ensemble
model.
base classifier
weight
ensemble model
7
Modeling
The mth base classifier
Sequence scoring function
fm(sik) is a site scoring function (weight matrix
threshold).
The scoring function considers (a) the number of
matching sites (b) the degree of matching
8
Training Boosting
Modify the confidence-rated boosting (CRB)
algorithm (Schapire et al. 1999) to train
ensemble models
9
Why Boosting?
Booting is a Newton-like technique that
iteratively adds base classifiers to minimize the
upper bound on the training error.
(Schapire et al. 1998)
10
Challenges
Positive sequences targets of a TF
Negative sequences
  • Sequences are labeled, but not the sites in the
    sequences.
  • Cannot be well separated by the weight matrix
    model (linear).
  • Number of negative sequences gtgt number of
    positive sequences.

11
Boosting
Initialization
Positive
Negative
  • Total weight of the positive samples Total
    weight of the negative samples.
  • Since the motif must be an enriched pattern in
    the positive sequences, use Motif Regressor to
    find a seed motif matrix W0.

12
Boosting
Train a base classifier (BC)
Positive
Negative
  • Use the seed matrix W0 ? to initialize the mth
    base classifier qm(?) and let ?m1.
  • Refine ?m and the parameters of qm(?) to minimize

where yi is the label of Si and dim is the weight
of Si in the mth round.
BC 1
  • Negative information is explicitly used to train
    qm(?) and ?m.

13
Boosting
Adjust sample weights and gives higher weights to
previously misclassified samples.
Positive
Negative
  • yi is the label of Si
  • dim is the weight of Si in the mth round.
  • dim1 is the new weight of Si.

BC 1
14
Boosting
Add a new base classifier
Positive
Negative
BC 1
BC 2
15
Boosting
Add a new base classifier
Positive
Negative
Decision boundary
16
Boosting
Adjust sample weights again
Positive
Negative
Decision boundary
17
Boosting
Add one more base classifier
Positive
Negative
BC 3
18
Boosting
Add one more base classifier
Positive
Negative
Decision boundary
19
Boosting
  • Stop if the result is perfect or the performance
    on the internal validation sequences drops.

Positive
Negative
Decision boundary
20
Results
Data ChIP-chip data of Saccharomyces cerevisiae
(Lee et al. 2002 )
  • Positive sequences
  • p-value lt 0.001
  • Number of positive sequences ? 25.
  • Negative sequences
  • p-value ? 0.05 ratio ? 1

Got 40 TFs.
21
Results
Leave-one-out test results
Boosted models vs. Seed weight matrices
Vertical axis Improvements on specificity
Horizontal axis TFs
22
Results
Capture Position-Correlation

RAP1
0
?
Weight Matrix
Base classifier 1
Base classifier 2
Base classifier 3
Boosting
23
Results
Capture Position-Correlation
REB1
Weight Matrix
Base classifier 1
Base classifier 2
Boosting
Write a Comment
User Comments (0)
About PowerShow.com