Title: Folie 1
1Description Recognition of Regulatory DNA
sequences
Rainer PudimatBioinformaticsFSU Jena
2Titel dieser tollen Folie
Titel dieser tollen Folie
Titel dieser tollen Folie
Titel dieser tollen Folie
Titel dieser tollen Folie
Project Objectives
1
- Examining discriminating propertiesof regulatory
DNA sequences
Architectural Proteins
- Applying alternative model approachesfor
description and recognition
- Local Bayesian Networks as binding site models
- Global Decision Trees for explaining local
matches
Sp1
TATA box
introns exons
promoter
3Titel dieser tollen Folie
Titel dieser tollen Folie
Titel dieser tollen Folie
Titel dieser tollen Folie
Project Objectives
The 2-Step Annotation Tool
2
Input Mammalian Upstream Sequence
E12
Sp1
MyoD
Score
MEF-2
TBP
SRF
Step 1 Scanning with Bayesian Networks
4Titel dieser tollen Folie
Titel dieser tollen Folie
Titel dieser tollen Folie
Project Objectives
The 2-Step Annotation Tool
Bayesian Network Classifiers
3
- Directed Acyclic Graph (DAG)
- Nodes random variables
- Edges probabilistic dependencies
- Each node contains conditional probability
distribution - Network represents joint distribution of the
variables
E12
A
D
B
C
P(a,b,c,d)
P(a,b,c,d) P(a)
P(a,b,c,d) P(a) P(ba)
P(a,b,c,d) P(a) P(ba) P(ca)
P(a,b,c,d) P(a) P(ba) P(ca) P(da,c)
Learning the Network
5Titel dieser tollen Folie
Titel dieser tollen Folie
Project Objectives
The 2-Step Annotation Tool
Bayesian Network Classifiers
Common Scoring of Different Classifiers
4
E12
6Titel dieser tollen Folie
Project Objectives
The 2-Step Annotation Tool
Bayesian Network Classifiers
Common Scoring of Different Classifiers
Application of Decision Trees
5
Should we play tennis ?
E12
- Each inner node corresponds to a test of some
attribute - Each outgoing edge to one possible value of
that attribute - Classification by moving down the tree
according to the attribute values
Outlook
Sunny
Overcast
Rain
Y
Humidity
Wind
High
Normal
Strong
Weak
N
Y
N
Y
ltOutlSunny,TempHot,HumHigh,WindStronggt
Possible Questions for evaluating a putative
binding site
Composite Elements
Helical Parameters
Misc. Questions
pgtxpltx
gt 36lt 36
YESNO
MyoD present in neighbourhood ?
Helical Twist at -20 , 5 ?
Is in conservedregion ?
gt dlt d
YESNO
LiverMuscle
Distance ofnext MyoD ?
Periodic highbendable motifs ?
Tissue Inform.of Input ?
- "Worüber man nicht reden kann, darüber muss man
schweigen." - All tested knowledge has to be presentable in a
numeric manner!
Ludwig Wittgenstein
7 Project Objectives
The 2-Step Annotation Tool
Bayesian Network Classifiers
Common Scoring of Different Classifiers
Application of Decision Trees
Issues on Adaquate Learning Samples
6
- To learn these global promoter characeristics
- Full annotated and experimental proved promoter
sequences - A number of negative samples
- A catalogue of valid questions
- Learning from False Positives
- could be useful to detect the "real" sufficient
properties of regulatory sequences
E12
Building a Tree
Which test separatesbest TP from FP ?
Including the attributes of this test in the
question catalogues
Pleedings for a database for "proven false
positives"!
8 Project Objectives
The 2-Step Annotation Tool
Bayesian Network Classifiers
Common Scoring of Different Classifiers
Application of Decision Trees
Credits
7
E12
- Programming Data
- Java APIs jBNC, JavaBayes
- TRANSFAC
- TRRD
- Talk References
- Mitchel, Alberts, blablablabö
- ugwduhwhd
- euguw
- Cooperation
- Mitchel, Alberts, blablablabö
- ugwduhwhd
- euguw
- Contact
- rpudimat_at_informatik.uni-jena.de
- Tel,
- HP