Prediction of Coordination Number and Relative Solvent Accessibility in Proteins - PowerPoint PPT Presentation

About This Presentation

Title:

Prediction of Coordination Number and Relative Solvent Accessibility in Proteins

Description:

There is no known algorithm for predicting solvent ... structure prediction, Baldi et al. Artificial Intelligence - a modern approach, Russel, Norvig. ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 35

Provided by: csta3

Category:

more less

Transcript and Presenter's Notes

Title: Prediction of Coordination Number and Relative Solvent Accessibility in Proteins

1
Prediction of Coordination Number and Relative
SolventAccessibility in Proteins

Computational Aspects

Yacov Lifshits 13.04.2005
2
Introduction
There is no known algorithm for predicting
solvent accessibility or coordination
number. Many different approaches were tried,
and most of them utilized the concept of neural
networks. We shall discuss what these networks
are, how do they work, and how we use them for
our cause.
3
Neurons
The human brain consists has about 100 billion
(1011) neurons and 100 trillion (1014)
connections (synapses) between them. Many
highly specialized types of neurons exist, and
these differ widely in appearance.
Characteristically, neurons are highly asymmetric
in shape.
4
Neurons cont.
Here is what a typical neuron looks like
5
Neural Networks
Like the brain, an neural network is a massively
parallel collection of small and simple
processing units where the interconnections form
a large part of the network's intelligence.
6
Neural Networks cont.
Artificial neural networks, however, are quite
different from the brain in terms of structure.
For example, a neural network is much smaller
than the brain. Also, the units used in a neural
network are typically far simpler than
neurons. Nevertheless, certain functions that
seem exclusive to the brain, such as learning,
have been replicated on a simpler scale with
neural networks.
7
Neural Networks Features

Large number of very simple units
Connections through weighted links
There is no program. The program is the
architecture
of the network.
There is no central control. If a portion of the
network is damaged, the network is still
functional. Hopefully.
A human observer cant understand what is going
on inside the network. It is a sort of a black
box.

8
Neural Networks - intuition
Neural networks are our way to model a
brain. The good thing about them is that you do
not program them, but instead teach them, like
children. For example, if you show a kid a
couple of trees, he will know to identify other
trees. Few humans can actually explain how to
identify a tree as such, and even fewer (if any)
can write a program to do that. So when you
dont know how the algorithm to do something,
neural networks are your second-best solution.
9
Neural Networks uses

Speech recognition
Speech synthesis
Image recognition
Pattern recognition
Stock market prediction
Robot control and navigation

10
Artificial Neurons
The neuron is a basically a simple calculator.
It calculates a weighted sum of the inputs
(plus a bias input), and applies an activation
function to the result.
11
Artificial Neurons example
The above example proves that neural networks are
universal you can implement any function with
them.
12
Training

How do we train a neural network?
The task is similar to teaching a student.
How do we do that?
First, we show him some examples.
After that, we ask him to solve some problems.
Finally, we correct him, and start the whole
process again.
Hopefully, hell get it right after a couple of
rounds

13
Training Simple case
Training the network means adjusting its weights
so that it gives correct output.
Its rather easy to train a network that has no
hidden layers (called a perceptron). This is
done by the gradient descent algorithm.
14
Training Gradient descent
The idea behind this algorithm is to adjust the
weights to minimize some measure of the error.
The classical measure is sum of squared
errors ( h(x) is the real output, and y is
the true (needed) value) This is an
optimization search in a weight space.
15
Training Gradient descent
First, we compute the gradient function And
then we update the weight as follows (a is
the learning rate)
16
Training Back Propagation
In most real-life problems we need to add a
hidden level. This turns out to be a problem
the training data does not specify what the
hidden layers output should be.
It turns out that we can back-propagate the error
from the output layer to the hidden ones. It
works like a generalized gradient descent
algorithm.
17
Training Back Propagation

Problems of back propagation
The algorithm is slow, if it does converge.
It may get stuck in a local minimum.
It is sensitive to initial conditions.
It may start oscillating.

18
Training OBD
A different kind of training algorithm is
Optimal Brain Damage, which tries to remove
unneeded neurons. One of its uses is to prune
the network after training. The algorithm begins
with a fully connected network and removes
connections from it. After the network is
trained, the algorithm identifies an optimal
selection of connections that can be
dropped. The network is retrained, and if the
performance has not decreased, the process is
repeated.
19
Training cont.

Two main training modes are
Batch update weights as a sum over all the
training set.
Incremental update weights after each pattern
presented.
Incremental mode requires less space complexity,
and is less likely to get stuck in a local
minimum.
Each cycle through the training set is called an
epoch.

20
Training General problems
Overfitting If we give a lot of unnecessary
parameters to the network, or train it for too
long, it may get lost. An extreme case of
overfitting is a network that has 100 success
rate on the training set, but behaves randomly on
other values. Poor data set To train a network
well, we must include all possible cases (but not
too much). The network learns the easiest
feature first!
21
Back to the problem
Until now, we discussed neural networks where all
connections go forward only (called
feed-forward networks). They are not suitable
for problem we are trying to solve, however. The
major weakness of FF networks is the use of a
local input window of fixed size, which
cannot provide any access to long-range
information.
22
Back to the problem cont.
Networks for contact prediction, for instance,
have windows of size 115 amino acids. Larger
windows usually do not work, in part because the
increase in the number of parameters leads to
overfitting. There are ways to prevent it, but it
is no the main problem. The main problem is that
long-range signals are very weak compared to the
additional noise introduced by a larger window.
Thus, larger windows tend to dilute sparse
information present in the input that is relevant
for the prediction.
23
Recurrent Neural Networks
One of the methods used to try to overcome these
limitations consists of using bidirectional
recurrent neural networks (BRNNs). An RNN is a
neural network that allows backward
connections. This means, that it can re-process
the output. Our brain, as you can surely guess,
is a recurrent network. Sadly, the issues of
RNN training and architecture are too complicated
for this lecture. An intuitive explanation will
be presented, however.
24
The Network used
25
The BRNN
The input, vector It, encodes the external input
at time t. In the most simple case, it encodes
one amino acid, using orthogonal encoding. The
output prediction has the functional form Ot
?(Ft , Bt , It) and depends on the forward
(upstream) context Ft, the backward (downstream
context) Bt, and the input It at time t.
26
The Past and the Future
Ft and Bt store information about the past
and the future of the sequence. They make the
whole difference, because now we can utilize
global information. The funtions satisfy the
recurrent bidirectional equations Ft f(Ft-1,
It) Bt ß(Bt1, It) where f() and ß() are
learnable nonlinear state transition functions,
implemented by two NNs (left and right
subnetworks in the picture).
27
The Past and the Future
Intuitively, we can think of Ft and Bt as
wheels that can be rolled along the protein.
To predict the class at position t, we roll
the wheels in opposite directions from the N- and
C-terminus up to position t and then combine what
is read on the wheels with It to calculate the
proper output using ?.
28
The difference it makes
Hence the probabilistic output Ot depends on the
input It and on the contextual information from
the entire protein, summarized by the pair (Ft ,
Bt)! In contrast, in a conventional NN
approach, the probability distribution depends
only on a relatively short subsequence of amino
acids.
29
Training the BRNNs
The BRNNs are trained by back-propagation on the
relative entropy error between the output and
target distributions. The authors use a hybrid
between online and batch training, with 300 batch
blocks (23 proteins each) per training set.
Thus, weights are updated 300 times per epoch
after each block. When the error does not
decrease for 50 consecutive epochs, the learning
rate is divided by 2, and training is restarted.
Training stops after 8 or more reductions, which
usually happens after 1,5002,500 epochs.
30
The experiment
The authors used a network of 16 Sun Microsystems
UltraSparc workstations for training and testing,
roughly equivalent to 2 years of a single CPU.
Seven different BRNNs were trained, and the
final result was the average of their outputs.
31
The results
The predictors achieve performances within
the 7174 range for contact numbers, depending
on radius, and greater than 77 for accessibility
in the most interesting range. These improve
the previous results by 3-7.
32
The results cont.
The improvements are believed to be due to both
an increase in the size of the training sets and
in the architectures developed. The most
important thing about the architectures is their
ability to capture long-range interactions that
are beyond the reach of conventional feed-forward
neural networks, with their relatively small and
fixed input window sizes.
33
The End
34
References