Title: P1259075736GNPIi
1Bioinformatics and Data Mining with an Ant Colony
Algorithm
The aim of the project is to discover new
knowledge about post synaptic protein activity
with a new data mining method Proteins are made
up from a sequence of amino acids. Predicting the
functionality of a protein from its sequence is
very difficult. A synapse is the point where two
nerve cells communicate with each other by
transmission of a chemical known as
neurotransmitter. Study of post synaptic protein
activity is important in understanding the
nervous system.
Data mining is performed on a large set of
bioinformatics data Classification in data
mining is the process of discovering, from
training data, a set of rules predicting the
class of a record. In this project a record
consists of data describing a protein, and the
class to be predicted is the presence or absence
of post-synaptic activity in a protein. These
rules can then be applied to new unclassified
data to classify it. Rules are of the form IF
ltTerm1, Term2, ... ,Termngt THEN ltclass valuegt An
example of a rule predicting post synaptic
activity IF (NEUROTR_ION_CHANNEL is present)
THEN (post-synaptic activity is present)
The new data mining algorithm is based on the
behaviour of ants To find the shortest path to
food, or around an obstacle, ants release
pheromones as they move towards their goal, and
other ants are attracted to this
pheromone. Although there may be more than one
path to a goal, the shortest path will accumulate
the largest amount of pheromone within a given
time, therefore becoming more attractive to
subsequent ants. An abstract model of this
principle has been used in data mining to
discover classification rules.
Results The algorithm discovered a set of
classification rules that were able to classify
the data with an accuracy of over 98.More
importantly, the rules that were discovered are
very easily interpretable biologists as they have
few terms and can be understood independently
from one another. Creating comprehensible rules
is a key aim of data mining.
Run Mean Accuracy () Mean Rule Count Term To Rule Ratio
1 98.26-0.16 101.5-0.19 1.21
2 98.26-0.12 101.3-0.17 1.21
This project was conducted by James Smaldon and
Dr. Alex A. Freitas, for more details please
contact James Smaldon (JS37_at_kent.ac.uk)