Do Funnels Exist - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Do Funnels Exist

Description:

Exact Calculation ... Taking the log correlate with calculation of ?G. Methods of Producing the Affinity Signal ... Calculate the core similarity for each ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 72
Provided by: noa62
Category:

less

Transcript and Presenter's Notes

Title: Do Funnels Exist


1
Do Funnels Exist ?
First Rotation Group Meeting Noa
Rappaport 23/1/04
2
Lecture Outline
  • What is a funnel Challenges and ideas
  • Analysis Methods
  • Analysis Performed
  • Simulation
  • Signal Smoothing
  • Cebp Promoter Funnel
  • Plans for the future

3
DNA Binding Transcriptional Regulators
  • Bind DNA in order to influence mRNA
    transcription.
  • They control cell growth, cell development and
    differentiation.
  • Function by binding to specific DNA sequences
    located upstream to the gene and induce or
    repress gene expression.
  • These sequence are usually short (5-15bp) and
    frequently degenerate -gt which confers different
    levels of activity upon different promoters
    (Bulyk et. al)

4
DNA Binding Transcriptional Regulators
  • Grouped into families according to sequence and
    structural homologies, such as
  • Helix Turn Helix -Zinc-Finger
  • Helix-Loop-Helix -Betta Ribon

5
DNA Binding Transcriptional Regulators
  • Protein DNA interactions are governed by
  • Amino acid base pair interaction
  • Van der Waals interactions
  • Water mediated protein-DNA hydrogen bonds
  • Binding of small ligands
  • Homo and hetero protein dimerization
  • Binding of an associated transcription factor
  • Translational modifications.
  • (Marmorstein et al.)

6
DNA Binding Transcriptional Regulators
(Jacobson, 1997)
7
What is a Funnel ?
funnel
funnel
DNA strand
?G
AGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCT
ATATTCGAAAGGGGCGGTTGCCTCAGGAA
8
Challenges and ideas
  • Creating the energy layout of the sequence
  • Understanding the hopping\sliding mechanism
  • Smoothing of energy signal
  • Funnel Qualification
  • Multiplicity of Sites
  • Same motif different promoters
  • Conservation of funnel
  • Adi Shamirs problem
  • Simulations

9
Funnel - Expected Features
Finding time ?t(finding)
Escape time ?t(escape)
DNA strand
Orthologs/Paralogs
Touch down
?
?
Noise Problems
10
First Evidences of Funnel
  • In the literature, one negative evidence

(Gerland et. al, 2002)
Recent suspect for a funnel for the TF cebp
in the IL18BP promoter
Some qualitative observations in Yeast.
11
Some Problems
  • Very few TFs have experimentally verified binding
    sites.
  • Binding sites are usually predited by points who
    have the greatest score, while this doesnt
    necessarily has to be so.
  • Very few (100) structure of DNA protein were
    solved by crystallography and NMR.
  • Much information exists regarding PSSMs, but it
    has errors and their correction is a theory on
    its own (pseudo counts).

Analysis has to be performed on predicted sites
and predicted affinity signal.
12
Methods of Producing the Affinity Signal
  • Exact CalculationPossible, but limited due to
    small DB of solved protein DNA structures
    (proNIT).

13
Methods of Producing the Affinity Signal
PSSM Scoring
AGGTTGCAATTTCTTTTTCTATTAGTAGCTAAAAATGGGTCACGTGATCT
ATATTCGA
Score -log(0.50.20.250.70.10.250.1)
Score -log(0.050.20.250.70.20.250.5)
Score -log(0.050.30.250.10.60.250.5)
Scores Vector
4.36
4.98
4.56
And we can keep going
14
Methods of Producing the Affinity Signal
PSSM Scoring
  • Assumes positions are independent doesnt
    allow for logic.
  • Might be problematic for values which are zero.
  • Some PSSMs are defective.
  • multiplication of the probabilities correlates
    with the thermodynamic constant K.
  • Taking the log correlate with calculation of ?G.

15
Methods of Producing the Affinity Signal
Bayesian Network Models
(Barash et. al, 2003)
16
Possible Approaches
Given a relatively reliable affinity signal,
there could be a few approaches to attack the
problem
  • Simulations Giving the kinetic aspect
  • Working on real\smoothed signals.
  • Scan the space of possible theoretical funnels
    for our 4 demands.
  • Analytical Calculations
  • Working on real\smoothed signals.
  • Scan the space of possible theoretical funnels
    for our 4 demands.

17
Monte Carlo Modeling
  • Any method which solves a problem by generating
    suitable random numbers and observing that
    fraction of the numbers obeying some property or
    properties.
  • The method is useful for obtaining numerical
    solutions to problems which are too complicated
    to solve analytically.
  • The name Monte Carlo'' was given by Metropolis
    during the Manhattan Project of World War II,
    because the capital of Monaco was a center for
    gambling.

18
Monte Carlo Modeling
  • The only requirement is that the physical (or
    mathematical) system can be described by
    probability density functions (pdf's).
  • Once the pdf's are known, the Monte Carlo
    simulation can proceed by random sampling from
    the pdf's.
  • Many simulations are then performed, and the
    desired result is taken as an average over the
    number of observations.

19
Analysis Methods - Simulation
  • Represent the space by a discrete lattice.
  • Represent the DNA energy layout as a topographic
    terrain.
  • Represent the TF as a moving particle.
  • Take discrete points of time
  • The Lattice represent the cell, and the
    possibility of attaching other exposed sites on
    the DNA.

(Halford et. al 2002)
20
Analysis Methods - Simulation
T 0
T 1
T 2
Eureka !
T 3
T 4
TF
site
T 5
21
Probability distribution functions
Intermediate State
  • Treat the DNA-TF interactions as a set of second
    order elementary reactions.

?G
ka
A B ? A-B complex
kd
22
Probability distribution functions
  • We have three possible transitions
  • Sliding on the DNA Linear diffusion

-10 -5
-5 -10
  • Dessociation- Reassociation

23
Probability distribution functions
  • In the first stage we can assume that the energy
    barrier when moving from the higher energy state
    to the intermediate state is uniform.
  • Sliding on the DNA Linear diffusion

24
Parameters checked with the simulation
Touch down
Real Promoter vs. Shuffled
25
Problems with the simulation
  • Activation Energy for
  • Sliding
  • Attachment-Detachment from the DNA
  • Lattice Movement
  • Lattice Size
  • Lattice Energy
  • Assumption on second order elementary reactions.

(Ferreiro et. al 2003)
26
Simulation - Results
Coordinates on the DNA
The energy terrain
Number of time steps
PSSM score
Coordinate on the DNA
The energy terrain
Number of time steps
PSSM score
Coordinate on the DNA
27
Artificial Funnel - Results
Step function terrain
Step function in comparison to funnel
Funnel terrain
Number of time steps
200 initial points
28
Multi-Width-Depth Anal.
200 Initial Positions Stop at endpoint
Funnel width
Distracter depth
Compare
29
Two Parabolas Analysis - Results
Width Funnel- 0.1 Width Distracter 0.1 Depth F.
7 Depth D. - 5
Number of time steps
kinetic verification
30
Effect of Funnels Width on Finding Time
1000 initial points
Average of time steps
Funnel Width
31
Effect of Distracter Depth on Finding Time
Funnel Width 0.1 100 initial points.
Average of time steps
Distracters Depth
32
Step Function Terrain
100 initial points Finding time mean
2.9736e004 Finding time std 3.3282e004 Error
in mean finding time 3.3282e003
33
Verifying exponential search time in a flat
surface
Step function terrain
Average of time steps
Average of time steps
Distance from end point
Log(Distance) from end point
34
Verifying linear search time in an all-funnel
surface
Funnel terrain
Average of time steps
Distance from end point
35
Distribution of Finding Times starting from the
same IP
The simulation was run over the following
terrain Three initial points were tested, each
repeated 200 times.
Funnel Edge
right
Left
Finding Time Distribution
Finding Time Distribution
Finding Time Distribution
36
Does Funnel Improve Capturing of the TF in its
Vicinity ?
  • This question can be regarded by the simulation.
    The TF was put at time zero at the funnels
    minima and a few factors were checked1. The
    first time the TF escaped the funnels
    vicinity.2. The max distance from the funnels
    minima the TF got to.
  • Those factors were checked for a set of
    combinations of different widths and depths of
    the funnel. Each time the simulation was let to
    run 100000 time steps, and each such run was
    repeated 100 times.

37
Results First Escape Time
Escape time was found to increase with funnels
depth as expected. It can be seen as well that
for increasing funnel width the first escape time
increases for the same distracter depth.
Average of time steps
Distracters Depth
38
Results First Escape Time
  • The Number of time steps until the first escape
    was compared to the number on a stair-step
    terrain. The following graph was received.
  • The number drops on deeper funnels.
  • The vicinity was taken to be all the area
    contained between the right and left borders,
    including the lattice.

Mean F.E.T funnel/Mean F.E.T flat
Funnels Depth
39
Results Max Distance
  • The Maximal distance the TF got to on the DNA was
    measured for a few funnel depths and widths.

Max Distance
Funnels Depth
40
Yeast Analysis
  • The non coding region of the Yeast genome is
    relatively compact, complete genome available,
    well characterized phylogeny.
  • Working on a dataset containing sequences of 4483
    promoter sequences in yeast. For each TF it is
    possible to get a prediction of the promoters it
    is likely to be found on by ScanAce. Then, we can
    generate a score of each point of the promoter.

41
Gal4 Analysis
  • One of the binding sites of Gal4 was taken. It is
    found in the promoter of the Gelatos permease.
    The area around it was expanded

42
Gal4 Transcription Factor
  • Zink Finger transcription factor
  • Bind as a homodimer to the DNA
  • Recognizes inverted CGG half sites repeats with
    11 base pairs spacer

43
Gal4 Binding Site
44
Finding Time Analysis
kT4
kT4
kT6
kT5
45
Finding Time Analysis
kT7
kT8
kT15
kT20
46
Vicinity Analysis
Mean FT promoter/Mean FT shuffled
kT
47
Same IP analysiskT 4
Original
rand
Original
rand
Window on the Edge of the Funnel (680-700)
Window at the far end (100-120)
48
Same IP analysisDistribution
Original promoter - close
Shuffled promoter - close
49
Same IP analysisDistribution
Original promoter - far
Shuffled promoter - far
50
Using Fourier transform for signal smoothing
  • The fast discrete Fourier transform was used to
    transform between the spatial domain and the
    frequency domain.
  • In the frequency domain, high frequencies were
    zeroed.
  • Transforming back to the spatial domain resulted
    in a smoother signal.

51
Using Fourier transform for signal smoothing -
Example
52
IL18BP - Promoter
Scanned with cebpß PSSM
Smoothed Signal
Real Signal Smoothed Signal
53
Conservation map for cebp signal
54
An Unexpected Result
Alignment of Mouse and Human Promoters
Calculate Similarity Percent
Produce random sequences for which the similarity
is in the same percent
Generate scores vector for each random sequence
  • Took the mouse and human promoters. Check the
    similarity in sequence. Produce random sequences
    for which the similarity is in the same percent
    (by randomly changing the same X percent of their
    positions). Generate scores vector for each
    random sequence. Check the correlation for each
    of the random sequences with the human funnel.
    Generate distribution of the correlation
    coefficient as a function of the number of
    sequences that have it. Check where the
    mouse-human coefficient is situated on the
    distribution. If it is on this side it means
    that the conservation in the funnel is more than
    due to sequence similarity but more due to funnel
    similarity.
  • Results
  • Comparison between the conservation pattern
    between the human and mouse promoter and between
    the human promoter and my random promoter

55
An Unexpected Result
Generate scores vector for each random sequence
Smooth it by Fourier
Check the correlation for each of the random
sequences with the human funnel
Generate scores vector for each random sequence
56
An Unexpected Result
Check where the mouse-human correlation
coefficient is situated on the distribution
Expectation it will be localized in the right
hand side of the distribution, having high
p-value, meaning the reason for the funnels
similarity is sequence similarity.
  • Result

The p-value for the mouse-human signals is zero.
57
Scanning human promoters with cebp PSSM
58
Scanning IL18BP promoter with human PSSMs
Original Promoter
Shuffled Promoter
59
Scanning IL18BP promoter with Statx and IRF PSSMs
60
Methods of Producing the Affinity Signal
MatInspector Algorithm
On the Motif
Employ an alignment algorithm
Calculate the nucleotide distribution matrix
Calculate ci value for each position
Define a core region
61
Methods of Producing the Affinity Signal
MatInspector Algorithm
Scanning
Calculate the core similarity for each position
of the sequence
Calculate the matrix similarity if core
similarity reaches threshold
Binding Sites - the sequences that reach the
minimum core and matrix similarity thresholds.
62
For the Future
  • Add hopping
  • Large Scale Analysis for all yeast promoters
  • Analytical Calculation for hopefully reducing
    run-time
  • Optimization of the scoring method
  • Use of DB of known binding sites verified
    experimentally with high credibility.
  • Analysis over the parameters space

63
For the Future
  • Develop a method for MFA
  • Check for all the criterions with other methods
    of shuffling e.g. k-mer preserving.
  • Analytical calculation for specific protein and
    DNA interaction.
  • Use Gillespie algorithm
  • If funnels exists, check for common features of
    promoters containing them.

64
Acknowledgments
Thanks to Tzachi, Ran, , Reut, Arren and
everyone else !
Igor
65
PSSM
PSSM Position Specific Scoring Matrix.
A
T
C
G
66
ScanAce
  • ScanACE (Scans for Nucleic Acid Conserved
    Elements) is a program which scans DNA sequence
    for elements which match a DNA motif.

67
Simulations Methods
Gillsepie Algorithm
  • An algorithm for simulating system with multiple
    reaction channels and multiple chemical species.
  • Consider a system of r chemical reactions
  • For each time step, the system is exactly at one
    state.
  • A transition occurs when executing a reaction,
    and then the state changes.
  • Gillespies direct method calculates which
    reaction occurs next and when it occurs.

68
Simulations Methods
Gillsepie Algorithm
  • The probability for reaction channel µ to be the
    next reaction
  • Derive from the probability distribution
    for times

69
Monte Carlo Modeling
  • Probability distribution functions (pdf's) - the
    physical (or mathematical) system must be
    described by a set of pdf's.
  • Random number generator - a source of random
    numbers uniformly distributed on the unit
    interval must be available.
  • Sampling rule - a prescription for sampling from
    the specified pdf's, assuming the availability of
    random numbers on the unit interval, must be
    given.

70
Monte Carlo Modeling
  • Scoring (or tallying) - the outcomes must be
    accumulated into overall tallies or scores for
    the quantities of interest.
  • Error estimation - an estimate of the statistical
    error (variance) as a function of the number of
    trials and other quantities must be determined.
  • Variance reduction techniques - methods for
    reducing the variance in the estimated solution
    to reduce the computational time for Monte Carlo
    simulation
  • Parallelization and vectorization - algorithms to
    allow Monte Carlo methods to be implemented
    efficiently on advanced computer architectures.

71
Probability distribution functions
Intermediate State
?G
ka
A B ? A-B -gt C complex
kd
Write a Comment
User Comments (0)
About PowerShow.com