PAT Trees - PowerPoint PPT Presentation

About This Presentation
Title:

PAT Trees

Description:

PAT Trees. Index for arbitrary character sequence in text. Gonnet(1983) ... Convert character to ASCII bits and make trie. 0 1. a c n 0 1 1. 0 0 0 0 0 1. 0 0 0 0 ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 11
Provided by: Yan151
Learn more at: https://www.cs.jhu.edu
Category:
Tags: pat | ascii | trees

less

Transcript and Presenter's Notes

Title: PAT Trees


1
  • PAT Trees
  • Index for arbitrary character sequence in text
  • Gonnet(1983) based on Patricia
    Tree
  • Used for indexing OED
    (Morrison 68)
  • SISTRINGS Semi-Infinite-Strings
  • pos
  • A 13219 .I rise on a point of order which
  • B 41131 .I rise on a point of objection to
    .
  • B lt A in sistring order
  • What if we encountered all sistrings and sorted
    then?

2
STEP 1 SISTRINGS .can
.a.can.can.cans? STRING . c a n . a
. c a n . c a n . c
a n s ? 0 1
2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 SISTRING
OFFSET
.can.a.can.can.cans? 0
can.a.can.can.cans?
1 an.a.can.can.cans?
2 n.a.can.can.cans?
3
.a.can.can.cans? 4
a.can.can.cans?
5 .can.can.cans?
6
can.can.cans? 7
an.can.cans?
8 n.can.cans?
9
.can.cans?
10 can.cans?
11 an.cans?
12
n.cans?
13 .cans?
14 cans?
15
ans?
16 ns?
17
s?
18 ?
19


3
STEP 2 Sort and find minimal distinguishing
prefixes .can .a.can.can.cans? STRING
. c a n . a . c a n .
c a n . c a n s
? 0 1 2 3 4
5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 SISTRING
OFFSET MINIMAL
DISTINGUISHING PREFIX
.a.can.can.cans? 4
.a
.can.a.can.can.cans?
0
.can.a .can.can.cans?
6
.can.can.
.can.cans?
10
.can.cans .cans?
14
.cans ?

19
? a.can.can.cans?
5
a. an.a.can.can.cans?
2
an.a
an.can.cans? 12

an.cans ans?
16
ans
can.a.can.can.cans? 1
can.a
can.can.cans?
7
can.can. can.cans?
11
can.cans
cans?
15
cans n.a.can.can.cans?
3
n.a
n.can.cans?
9
n.can. n.cans?
13
n.cans ns?

17
ns s?
18
s

4
STEP 4 Create a Digital Trie from Prefixes

?
? ?
?
?
?
? ? ?
?
? ? ?
?
?
?
? ?
? ?
?


?
? ? ?

?
?
? ? ?
( Label with
substring beginning )

?
18
19


5
12

16
3
4

15
1
14



0
11
9
13
7

10
16
5
STEP 5 Simplify tree with use of skipped
bits
?
?
? ?


x x






x x ?
? ?
?
? x x
x x
?
? ? ? ?
?



x x

x


x x

x ? ? ?
?
? ? ?
x x
x x x
x x
x ?
?

of missing letters
. c a n s ?


?

19
18



4
5
15


16
1
3

0
2
7
12
9
13


16
10
8
12
6
Patricia Tree
Binary digital trie
Convert character to ASCII bits and make
trie

0 1
a c n
0 1
1 0
0 0
0 0 1
0
0 0 0
0
0 0
1 1
1 1
0 1
0 0 1
0 1
1 1
0
7
  • Applications
  • Prefix Searching

  • If no branch for next character,

  • then fail.


c
a
n
Enumerate all leaves sharing prefix
  • O(height) ? O(return set)
  • O(log n)

8
Applications Longest Repetition
Search .can.can
farthest internal node from root
Simplify on-line
calculation by storing a bit to show which
direction longest
subtree goes Most Frequent
N-gram
9
Problem with Pat Tree
Enumerating subtree costly
10
Create Suffix
Arrays
Write a Comment
User Comments (0)
About PowerShow.com