Succinct Data Structures for Permutations, Functions and Suffix Arrays PowerPoint PPT Presentation

presentation player overlay
1 / 19
About This Presentation
Transcript and Presenter's Notes

Title: Succinct Data Structures for Permutations, Functions and Suffix Arrays


1
Succinct Data Structures for Permutations,
Functions and Suffix Arrays
  • Ian Munro
  • University of Waterloo
  • Joint work with F. Fich, M. He, J. Horton, A.
    López-Ortiz, S. Srinivasa Rao, Rajeev Raman,
    Venkatesh Raman
  • How do we encode a permutation or
  • generalization function or
  • specialization suffix array
  • in a small amount of space
  • and still perform queries in constant time ???

2
Permutations a Shortcut Notation
  • Let P be a simple array giving p Pi pi
  • Also have Bi be a pointer t positions back in
    (the cycle of) the permutation
  • Bi p-ti .. But only define B for every tth
    position in cycle. (t is a constant ignore cycle
    length round-off)
  • So array representation
  • P 8 4 12 5 13 x x 3 x 2 x 10 1
  • 1 2 3 4
    5 6 7 8 9 10 11
    12 13

2
4
5
13
1
8
3
12
10
3
Representing Shortcuts
  • In a cycle there is a B every t positions
  • But these positions can be in arbitrary order
  • Which is have a B, and how do we store it?
  • Keep a vector of all positions
  • 0 indicates no B 1 indicates a B
  • Rank gives the position of Bi in B array
  • So p(i) and p -1(i) in O(1) time (1e)n lg n
    bits
  • Theorem Under a pointer machine model with space
    (1 e) n references, we need time 1/e to answer p
    and p -1 queries i.e. this is as good as it gets.

4
Getting n lg n Bits an Aside
  • This is the best we can do for O(1) operations
  • But using Benes networks
  • 1-Benes network is a 2 input/2 output switch
  • r1-Benes network join tops to tops

1 2 3 4 5 6 7 8
3 5 7 8 1 6 4 2
R-Benes Network
R-Benes Network
5
A Benes Network
  • Realizing the permutation
  • (3 5 7 8 1 6 4 2)

1 2 3 4 5 6 7 8
3 5 7 8 1 6 4 2
6
What can we do with it?
  • Divide into blocks of lg lg n gates encode
    their actions in a word. Taking advantage of
    regularity of address mechanism
  • and also
  • Modify approach to avoid power of 2 issue
  • Can trace a path in time O(lg n/(lg lg n)
  • This is the best time we are able get for p and
    p-1 in minimum space.
  • Observe This method violates the pointer
    machine lower bound by using micropointers.

7
Back to the main track Powers of p
  • Consider the cycles of p
  • ( 2 6 8)( 3 5 9 10)( 4 1 7)
  • Keep a bit vector to indicate the start of each
    cycle
  • ( 2 6 8 3 5 9 10 4 1 7)
  • Ignoring parentheses, view as new permutation, ?.
  • Note ?-1(i) is position containing i
  • So we have ? and ?-1 as before
  • Use ?-1(i) to find i, then bit vector (rank,
    select) to find pk or p-k

8
Functions
  • Now consider arbitrary functions n?n
  • A function is just a hairy permutation
  • All tree edges lead to a cycle

9
Challenges here
  • Essentially write down the components in a
    convenient order and use the n lg n bits to
    describe the mapping (as per permutations)
  • To get fk(i)
  • Find the level ancestor (k levels up) in a tree
  • Or
  • Go up to root and apply f the remaining number of
    steps around a cycle

10
Level Ancestors
  • There are several level ancestor techniques using
  • O(1) time and O(n) WORDS.
  • Adapt Bender Farach-Colton to work in O(n) bits
  • But going the other way

11
f-k is a set
  • Moving Down the tree requires care
  • f-3( ) ( )
  • The trick
  • Report all nodes on a given level of a tree in
    time proportional to the number of nodes, and
  • Dont waste time on trees with no answers

12
Final Function Result
  • Given an arbitrary function f n?n
  • With an n lg n O(n) bit representation we can
    compute fk(i) in O(1) time and f-k(i) in time O(1
    size of answer).

13
Back to Text And Suffix Arrays
  • Text T1..n over (a,b) (altltb)
  • There are 2n-1 such texts, which of the n! suffix
    arrays are valid?
  • 1 2 3 4 5 6 7 8
  • SA 4 7 5 1 8 3 6 2
  • is
  • a b b a a b a
  • SA-1 4 8 6 1 3 7 2 5
  • M 4 7 1 5 8 2 3 6 isnt ..why?

14
Ascending to Max
  • M is a permutation so M-1 is its inverse
  • i.e. M-1i says where i is in M
  • Ascending-to-Max ? 1 ? i ? n-2
  • M-1i lt M-1n and M-1i1 lt M-1n ? M-1i
    lt M-1i1
  • M-1i gt M-1n and M-1i1 gt M-1n ? M-1i
    gt M-1i1
  • 4 7 5 1 8 3 6 2 OK
  • 4 7 1 5 8 2 3 6 NO

15
Non-Nesting
  • Non-Nesting? 1 ? i,j ? n-1 and M-1iltM-1j
  • M-1i lt M-1i1 and M-1j lt M-1j1 ?
    M-1i1 lt M-1j1
  • M-1i gt M-1i1 and M-1j gt M-1j1 ?
    M-1i1 lt M-1j1
  • 4 7 5 1 8 3 6 2 OK
  • 4 7 1 5 8 2 3 6 NO

16
Characterization Theorem for Suffix Arrays on
Binary Texts
  • Theorem Ascending to Max Non-nesting ? Suffix
    Array
  • Corollary Clean method of breaking SA into
    segments
  • Corollary Linear time algorithm to check whether
    SA is valid

17
Cardinality Queries
  • T a b a a a b b a a a b a a b b
  • Remember lengths longest run of as and of bs
  • SA (broken by runs, but not stored explicitly)
  • 8 3 9 4 12 1 10 5 1316 7 2 11
    156 14
  • Ba, ? bit vector .. If SA-1i-1 in an a
    section store 1 in Ba,SA-1i, else 0
  • Ba 0 0 1 1 0 0 1 1 1 0 0
    1 1 0 1 1
  • Create rank structure on Ba, and similarly Bb,
    (Note these are reversed except at )
  • Algorithm Count(T,P)
  • s ? 1 e ?n i ? m
  • while igt0 and s?e do
  • if Pia then
  • s? rank1(Ba,s-1)1 e?rank1(Ba,e)
  • else
  • s? na 2 rank1(Bb,s-1) e?na 1
    rank1(Bb,e)
  • i ? i-1
  • Return max(e-s1,0)
  • Time O(length of query)

18
Listing Queries
  • Complex methods
  • Key idea for queries of length at least d, index
    every dth position .. For T and forT(reversed)
  • So we have matches for Ti..n and T1,i-1
  • View these as points in 2 space (Ferragina
    Manzini and Grossi Vitter)
  • Do a range query (Alstrup et al)
  • Variety of results follow

19
General Conclusion
  • Interesting, and useful, combinatorial objects
    can be
  • Stored succinctly O(lower bound) o()
  • So that
  • Natural queries are performed in O(1) time (or at
    least very close)
  • This can make the difference between using them
    and not
Write a Comment
User Comments (0)
About PowerShow.com