Sets of Digital Data - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Sets of Digital Data

Description:

In earlier work with BSTs and various balanced trees, we compared keys for order ... to store with each node the index of the character on which it discriminates ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 25
Provided by: eileenk2
Category:

less

Transcript and Presenter's Notes

Title: Sets of Digital Data


1
Sets of Digital Data
  • CSCI 2720
  • Fall 2005
  • Kraemer

2
Digital Data
  • In earlier work with BSTs and various balanced
    trees, we compared keys for order or equality
  • Here, we take advantage of structure of key
  • Use it as an index, or
  • Decompose string key into characters, or
  • Treat key as numerical quantity on which we can
    perform operations

3
Assumptions
  • We will construct and manipulate sets that
  • Are drawn from a universe U of size N
  • U u0, uN-1
  • A relatively simple procedure exists by which we
    can compute, for an element u ? U, the index i
    such that u ui.
  • Easy if U is set of integers
  • Also easy if U is set of characters with
    character codes in a contiguous interval

4
Bit Vector
  • Used to represent a subset S ? U
  • A table of N bits, Bits0.. N-1
  • Bitsi 1 if ui ? S
  • Bitsi 0 if ui ? S
  • Example todays attendance

0 1 2 3 4 5 6 --
student number
1 1 0 1 0 1 1
1 present 0 absent
5
Bit Vectors
  • Assume
  • determining element index takes constant time
  • accessing position in table takes constant time
  • May actually take several ops, and depend
    somewhat on N(size of universe), but not on size
    of set represented
  • Then
  • Insert, Delete, Member are constant time ops

6
Bit Vectors
  • A subset of a set of size N always takes N bits
    to represent, independent of size of subset
  • Makes sense if
  • N is not too large
  • need to represent sets of size comparable to N

7
Storage Efficiency
  • Bit Vector vs. Binary Trees
  • Binary Tree, set of size n
  • Requires n(2p K) bits
  • K gt lg N, size of field to represent key value
  • p number of bits in a pointer
  • Bit Vector, takes N bits
  • If n ? N, then bit vector more efficient
  • If p K 32, then tree becomes more space
    efficient when n/N ? 1
  • Actually, when n(2p K) N, which is when n/N
    1/96

8
When to use Bit Vectors?
  • When universe is relatively small
  • When sets are large in relation to size of
    universe

9
Advantages of Bit Vectors
  • O(1) implementation of Insert, Delete, Member
  • Union and Intersection easy
  • Implement via Boolean and and or operations
  • May actually take less than one op/element, as
    operations are performed on full machine word
  • If machine word 32, then one machine operation
    handles 32 potential elements of set

10
Disadvantages of Bit Vectors
  • On some computers access to individual bits can
    require shifting and masking operations
    (expensive)
  • Result is that Member may be much more expensive
    than Union
  • Initialization takes ?(N) -- zero all the bits in
    the vector
  • But can use constant time initialization
    algorithm
  • But that makes storage requirement go to 2p 1
    bits per element
  • So, in practice, just use machine ops to set to
    zero, which are efficient

11
Tries and Digital Search Trees
  • If the key can be decomposed into characters,
    then the characters of the key can be used as
    indices
  • Tries are based on this idea
  • trie is the middle symbol of retrieval, a pun
    on tree, but pronounced try

12
Tries
  • Assume k possible character values
  • A trie is a (k1)-ary tree
  • each node a table of k1 pointers
  • One pointer for each possible character
  • One for the end of string character, ?

13
Trie Example
14
Tries
  • Path for key of m characters is length m, with
    pointer at ?
  • Dont need to store key itself .. It is the path
    followed.
  • Info field might be pointed to by ? element

15
Tries Analysis
  • Let
  • n be the number of keys stored in a trie
  • l be the length(in characters) of the longest key
  • s be the number of nodes in the trie
  • k be the size of the alphabet
  • Pro
  • Access time is O(l), independent of k, n and s
  • Con
  • Size -- requires (k1) s p bits
  • Most pointers are null, so lots of wasted space

16
Strategies for reducing storage requirements of
tries
  • Implement a k-ary trie with m nodes as a 2-D, m
    by k table

A B C D E M . P . T
. ?
0 1 2 3 4 5
17
Table approach
  • Number the nodes in the diagram of slide 13 from
    1 to m
  • The table entry corresponding to jth child of ith
    node is the index of the child node
  • How does that save space? Just as many nodes and
    elements as on slide 13
  • need only ceil(lg(m)) bits to represent,
    smaller than a pointer

18
Patricia TreeAnother strategy for reducing
space in a trie
  • Patricia tree
  • Practical Algorithm to Retrieve Information Coded
    in Alphanumeric
  • Eliminate nodes with only one nonempty child
  • Can now skip right from T to ? in TURING in our
    example
  • Skip from MA . To E or ? in the MENDEL ,
    MENDELEEV chain
  • But need to store with each node the index of the
    character on which it discriminates
  • And need to store the key itself at the leaf

19
Patricia tree
20
de la Briandais trees
  • Another strategy to save space vs. standard tries
  • Use a linked list instead of a table at the node
    level
  • Each pointer labeled with the character it
    indexes
  • longer search time than tries depends on size of
    character set
  • saves significant amounts of memory

21
de la Briandais
22
Another strategy
  • Use tries at the first few levels
  • Use ordinary BSTs or de la Briandais at the lower
    levels
  • reasoning
  • speed advantage at the top, but not too much
    extra memory required
  • save space at lower levels

23
Digital Search Trees
  • Treat keys as bit strings
  • (strings over the alphabet 0,1)
  • Binary tree search directed left on 0, right on
    1
  • Each node contains not only two pointers, but
    also contains a key that matches that string
    prefix
  • Compare for equality before searching left or
    right
  • If frequencies are known, store higher frequency
    keys nearer root
  • Can be grown dynamically
  • Expected Search time O(log n)

24
Digital Search Tree
Write a Comment
User Comments (0)
About PowerShow.com