MotifBooster - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

MotifBooster

Description:

MotifBooster A Boosting Approach for Constructing TF-DNA ... Boosted models vs. Seed weight matrices. Leave-one-out test results. Results. RAP1. Weight Matrix ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 24

Provided by: csBra

Category:

more less

Transcript and Presenter's Notes

Title: MotifBooster

1
MotifBooster A Boosting Approach for
Constructing TF-DNA Binding Classifiers
Pengyu Hong 10/06/2005
2
Motivation

Understand transcriptional regulation

Gene X
TF

Model transcriptional regulatory networks

3
Motivation
Previous works on motif finding

AlignACE (Hughes et al 2000)
ANN-Spec (Workman et al 2000)
BioProspector (Liu et al 2001)
Consensus (Hertz et al 1999)
Gibbs Motif Sampler (Lawrence et al 1993)
LogicMotif (Keles et al 2004)
MDScan (Liu et al 2002)
MEME (Bailey and Elkan 1995)
Motif Regressor (Colon et al 2003)

4
Motivation
A widely used model Motif Weight Matrix
(Stormo et al 1982)
1 2 3 4 5 6 7 8 A
0.19 1.11 -0.17 1.65 -2.65 -2.66 -1.98 0.92 C
-0.14 -0.49 1.89 -1.81 1.70 2.32 2.14 -2.07 G
-1.39 0.25 -1.22 -1.07 -2.07 -2.07 -2.07 1.13 T
0.86 -1.39 -2.65 -2.65 0.41 -2.65 -1.16 -1.80
Score of the site
10.84
vs. threshold

A sequence is a target if it contains a binding
site (score gt threshold).
Computational ltlt Molecular
5
Motivation
Non-linear binding effects, e.g., different
binding modes.
CACCCATACAT
Mode 1
Preferred binding
CATCCGTACAT
Mode 2
CA C/T CC A/G TACAT
CACCCGTACAT
Mode 3
Non-preferred binding
CATCCATACAT
Mode 4
6
Modeling
Model a TF-DNA binding classifier as an ensemble
model.
base classifier
weight
ensemble model
7
Modeling
The mth base classifier
Sequence scoring function
fm(sik) is a site scoring function (weight matrix
threshold).
The scoring function considers (a) the number of
matching sites (b) the degree of matching
8
Training Boosting
Modify the confidence-rated boosting (CRB)
algorithm (Schapire et al. 1999) to train
ensemble models
9
Why Boosting?
Booting is a Newton-like technique that
iteratively adds base classifiers to minimize the
upper bound on the training error.
(Schapire et al. 1998)
10
Challenges
Positive sequences targets of a TF
Negative sequences

Sequences are labeled, but not the sites in the
sequences.
Cannot be well separated by the weight matrix
model (linear).
Number of negative sequences gtgt number of
positive sequences.

11
Boosting
Initialization
Positive
Negative

Total weight of the positive samples Total
weight of the negative samples.
Since the motif must be an enriched pattern in
the positive sequences, use Motif Regressor to
find a seed motif matrix W0.

12
Boosting
Train a base classifier (BC)
Positive
Negative

Use the seed matrix W0 ? to initialize the mth
base classifier qm(?) and let ?m1.

Refine ?m and the parameters of qm(?) to minimize

where yi is the label of Si and dim is the weight
of Si in the mth round.
BC 1

Negative information is explicitly used to train
qm(?) and ?m.

13
Boosting
Adjust sample weights and gives higher weights to
previously misclassified samples.
Positive
Negative

yi is the label of Si
dim is the weight of Si in the mth round.
dim1 is the new weight of Si.

BC 1
14
Boosting
Add a new base classifier
Positive
Negative
BC 1
BC 2
15
Boosting
Add a new base classifier
Positive
Negative
Decision boundary
16
Boosting
Adjust sample weights again
Positive
Negative
Decision boundary
17
Boosting
Add one more base classifier
Positive
Negative
BC 3
18
Boosting
Add one more base classifier
Positive
Negative
Decision boundary
19
Boosting

Stop if the result is perfect or the performance
on the internal validation sequences drops.

Positive
Negative
Decision boundary
20
Results
Data ChIP-chip data of Saccharomyces cerevisiae
(Lee et al. 2002 )

Positive sequences
p-value lt 0.001
Number of positive sequences ? 25.
Negative sequences
p-value ? 0.05 ratio ? 1

Got 40 TFs.
21
Results
Leave-one-out test results
Boosted models vs. Seed weight matrices
Vertical axis Improvements on specificity
Horizontal axis TFs
22
Results
Capture Position-Correlation

RAP1
0
?
Weight Matrix
Base classifier 1
Base classifier 2
Base classifier 3
Boosting
23
Results
Capture Position-Correlation
REB1
Weight Matrix
Base classifier 1
Base classifier 2
Boosting

Write a Comment

User Comments (0)