Title: Convolution and Its Applications to Sequence Analysis
1Convolution and Its Applications to Sequence
Analysis
- Student Bo-Hung Wu
- Advisor Professor Herng-Yow Chen R. C. T. Lee
- Department of Computer Science Information
Engineering - National Chi Nan University
2The Definition of Convolution in the Continuous
Case
Example
Reference Lecture notes, Introduction to
communication, R. C. T. Lee et al.
3(No Transcript)
4Exact String-Matching Problem Input. Text string
TT1T2Tn and pattern string PP1P2Pm
where Ti, Pi ?(alphabet) and mltn. Output.
All locations i in T where
TiTi1Ti2Tim-1P1P2Pm
It is obvious that string matching is related to
convolution.
5Convolution in the Discrete Case
Definition Let Xltx0, , xmgt, Ylty0, , yngt be
two given vectors, xi, yi D. Let and be
two given functions, where
Then the convolution of X and Y with respect to
and is
for k0 mn
6Consider the exact string-matching problem, how
can we use convolution to solve it?FP74 First
we reverse Y to be
Second we define the functions and to be
as follows
Note that the process of this convolution is
equal to the one of the sliding window approach.
FP74
7Applying Convolution to Sequence Analysis
- The common substring with k-mismatch allowed
problem - Common substrings with k-mismatches allowed among
multiple sequences problem - Determining the similarity of two DNA sequences
- Searching in a DNA sequences database
- Finding repeating groups in a DNA sequence
- An aid for detection in transposition
- An aid for detecting insertion/deletion
- An aid for detecting the overlapping of segments
resulting from the shot-gun operations - The corresponding pair-wise nucleotides in a DNA
sequence - An aid for looking for similar regions in a DNA
sequence with a distance constraint
8The Corresponding Pair-wise Nucleotides in a DNA
Sequence
Substitution rule A ? T T ? A C ? G G ? C
Example Sacttgacgtgaac
9Experiments
- We apply convolution on DNA sequences and English
compositions to find the similarity of them. - In the following experiments, we used the
following DNA sequences as the input data.
(Clustering was known in advance for
evaluating.) - C1(0-25) Hepatitis B virus C2(26-162)
Human mitochondrion C3(163-1041) Other
viruses
10(No Transcript)
11Experiment The Comparison of English
compositions.
- We applied convolution on two English
compositions to detect whether they are similar
or not.
12(No Transcript)
13Conclusion and Future Work
- We have shown that several applications related
to sequences analysis which we discovered can be
solved by means of convolution. - Convolution can be used as a negative answer
filter. - In practical parts, we did some experiments. The
experimental results confirm that this approach
is feasible. - By arranging appropriate operations to be the
functions in the convolution, we can solve more
problems related to sequences analysis. - For example, we hope that we may apply
convolution to help solve protein structure
comparison.
14Thank you.