Title: Suffix trees
1Suffix trees
- ALGGEN Algorithmics and genetics group
- Dep. Llenguatges i Sistemes Informà tics
- Universitat Politècnica de Catalunya
Dr. Xavier Messeguer
http//www.lsi.upc.es/alggen
2Suffix trees
Given string ababaas
Suffixes
3 abaas
1 ababaas
4 baas
2 babaas
What kind of queries?
3Queries on Suffix trees
- Does the sequence ababaas contain any ocurrence
of patterns abab, aab, and ab?
- Find repeats within the sequence ababaas.
What about MUMs?
4Search for MUMs
Given strings ababaabs and aabaat
1st Bottom-up traversal
(Through the tree)
List of UM aab,abaa,baa.
2nd Search for maximals
(through the list of UM)
MUMs aab,abaa.
5Suffix tree implementation
Given sequence ababaas
E.Ukkonen implementation MUMER, MGA
MALGEN implementation
On-line linear insertion algorithm!
6Meaning of suffix-links
?
a?
7Suffix links
Given Suffix tree of ababaas
8Search for MUMs
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
9Search for MUMs
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
Quadratic cost!
10Search for MUMs
Linear cost!
Quadratic cost!
11Search for MUMs
Linear cost
Quadratic cost!
Two improvements
12Search for MUMs
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
Linear cost
Quadratic cost!
Two improvements
?
13Tsuffix tree
Given Suffix tree of ababaas
and the sequence
b b a b b a a a b a a ...
Linear cost
Quadratic cost!
Two improvments
?
?
14Searching MUMs on-line
Number of the leaf
Length of the MUM
First character into the second sequence
2, 3(bab), 2
b b a b b a a a b a b b
15Searching MUMs on-line
4, 3(baa), 5
16Searching MUMs on-line
4, 3(baa), 5
5, 2(aa), 6
17Searching MUMs on-line
4, 3(baa), 5
5, 2(baa), 6
1, 4(abab), 8
18Methodology for a preview with two genomes
- Construct the TSuffix of the first genome
- Search the MUMs respect to the other genome
Construction of TSuffix tree
Reading the second sequence
Only one TSuffix tree
-50
What about more genomes?
19Computational and biological background (3)
Chlamydophila pneumoniae AR39 1.247420bps Chlamyd
ia pneumoniae 1.247.805 Chlamidia muridarum
1.084.689bps Chlamidia trachomatis1057413bps
?
?
?
?
?
?
?
?
20Alignment revisited
Pyrococcus abyssis 1.790.334 Pyrococcus
horikoshu 1.763.341 bps