Title: PAT Trees
1- PAT Trees
- Index for arbitrary character sequence in text
- Gonnet(1983) based on Patricia
Tree - Used for indexing OED
(Morrison 68) - SISTRINGS Semi-Infinite-Strings
- pos
- A 13219 .I rise on a point of order which
- B 41131 .I rise on a point of objection to
. - B lt A in sistring order
- What if we encountered all sistrings and sorted
then?
2 STEP 1 SISTRINGS .can
.a.can.can.cans? STRING . c a n . a
. c a n . c a n . c
a n s ? 0 1
2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 SISTRING
OFFSET
.can.a.can.can.cans? 0
can.a.can.can.cans?
1 an.a.can.can.cans?
2 n.a.can.can.cans?
3
.a.can.can.cans? 4
a.can.can.cans?
5 .can.can.cans?
6
can.can.cans? 7
an.can.cans?
8 n.can.cans?
9
.can.cans?
10 can.cans?
11 an.cans?
12
n.cans?
13 .cans?
14 cans?
15
ans?
16 ns?
17
s?
18 ?
19
3STEP 2 Sort and find minimal distinguishing
prefixes .can .a.can.can.cans? STRING
. c a n . a . c a n .
c a n . c a n s
? 0 1 2 3 4
5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 SISTRING
OFFSET MINIMAL
DISTINGUISHING PREFIX
.a.can.can.cans? 4
.a
.can.a.can.can.cans?
0
.can.a .can.can.cans?
6
.can.can.
.can.cans?
10
.can.cans .cans?
14
.cans ?
19
? a.can.can.cans?
5
a. an.a.can.can.cans?
2
an.a
an.can.cans? 12
an.cans ans?
16
ans
can.a.can.can.cans? 1
can.a
can.can.cans?
7
can.can. can.cans?
11
can.cans
cans?
15
cans n.a.can.can.cans?
3
n.a
n.can.cans?
9
n.can. n.cans?
13
n.cans ns?
17
ns s?
18
s
4STEP 4 Create a Digital Trie from Prefixes
?
? ?
?
?
?
? ? ?
?
? ? ?
?
?
?
? ?
? ?
?
?
? ? ?
?
?
? ? ?
( Label with
substring beginning )
?
18
19
5
12
16
3
4
15
1
14
0
11
9
13
7
10
16
5STEP 5 Simplify tree with use of skipped
bits
?
?
? ?
x x
x x ?
? ?
?
? x x
x x
?
? ? ? ?
?
x x
x
x x
x ? ? ?
?
? ? ?
x x
x x x
x x
x ?
?
of missing letters
. c a n s ?
?
19
18
4
5
15
16
1
3
0
2
7
12
9
13
16
10
8
12
6 Patricia Tree
Binary digital trie
Convert character to ASCII bits and make
trie
0 1
a c n
0 1
1 0
0 0
0 0 1
0
0 0 0
0
0 0
1 1
1 1
0 1
0 0 1
0 1
1 1
0
7- Applications
- Prefix Searching
-
If no branch for next character, -
then fail. -
-
-
-
c
a
n
Enumerate all leaves sharing prefix
- O(height) ? O(return set)
- O(log n)
8 Applications Longest Repetition
Search .can.can
farthest internal node from root
Simplify on-line
calculation by storing a bit to show which
direction longest
subtree goes Most Frequent
N-gram
9 Problem with Pat Tree
Enumerating subtree costly
10 Create Suffix
Arrays