An%20Approach%20to%20Generalized%20Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

An%20Approach%20to%20Generalized%20Hashing

Description:

Also many hash functions designed, including several universal families ... Multiple data items can be crammed into a word, so let's take advantage of that. ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 12
Provided by: michael598
Category:

less

Transcript and Presenter's Notes

Title: An%20Approach%20to%20Generalized%20Hashing


1
An Approach to Generalized Hashing
  • Michael Klipper
  • With
  • Dan Blandford
  • Guy Blelloch

2
Hashing techniques currently available
  • Many hashing algorithms out there
  • Separate chaining
  • Cuckoo hashing
  • FKS perfect hashing
  • Also many hash functions designed, including
    several universal families
  • Good O(1) expected amortized time for updates,
    and many have O(1) worst case time for searches
  • Bad Require fixed-length keys and fixed-length
    data

3
Whats so bad about fixed length?
  • Easy to waste a lot of space
  • Every hash bucket must be as large as the largest
    item to be stored in the table.
  • This is a large problem for sparsely-filled
    tables, or tables where large items occur
    infrequently.
  • Hash tables are often building blocks to more
    complicated structures, so optimizing them pays
    off in a lot of places.

4
Example A Graph Layout Where We Store Edges in a
Hash Table
  • Lets say u is a vertex of degree d and v1, vd
    are its neighbors. Lets say that v0 vd1 u
    by convention. Then the entry representing the
    edge (u, vi) has key (u, vi) and data (vi-1,
    vi1).

Hash Table
This extra entry starts the list.
u
u
u
u
v2
v1
u
v3
v1
u
v4
v2
v3
v2
v1
v4
Degree of Vertex
4
5
An Idea for Compression
  • Instead of ((u, vi), (vi-1, vi1)) in the table,
    we will store
    ((u, vi u), (vi-1 u, vi1 u)).
  • With this representation, we need O(kn) space
    where k S(u,v)ÎE log u v.
  • A good labeling of the vertices will make many of
    these differences small! But not all of them.
    The following paper has details

D. Blandford, G. E. Blelloch, and I. Kash.
Compact Representations of Separable Graphs. In
SODA, 2003, pages 342-351.
6
First, a simpler problem
  • Variable-length data stored in arrays
  • Its like a hash table except that the indices
    now are in the fixed range 0n-1 for n items in
    the array.

Well use the following data for our example in
these slides (0, 10110) (1, 0110) (2, 11111) (3,
0101) (4, 1100) (5, 010) (6, 11011) (7,
00001111) Well assume that the word size of the
machine is 2 bytes.
7
Key Idea BLOCKS
  • Multiple data items can be crammed into a word,
    so lets take advantage of that.
  • Two words in a block one with data, one marking
    off separations of strings
  • If the first index in a block is i, well label
    the block as bi

0 1 1 0
b0
1 0 1 1 0
1 1 1 1 1
1st word
2nd word
1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
This is the block containing strings s0 through
s2 from our example.
8
Organization of Blocks
  • Index structure (regular array) Ai 1 if and
    only if string i starts a block
  • Hash table (one of the regular kind) if string
    i starts a block, H(i) address of bi
  • Note that it is easy to split and merge blocks.

Key size invariant two adjacent blocks (like b0
and b3 in the example) must have their sizes sum
to greater than the word size of the machine
H(0)
b0
1
0
0
H(3)
b3
1
A
0
0
0
H(7)
b7
1
9
A Rough Look at Space and Time Bounds for this
Array Structure
  • Lets say we have n items, and w is the word size
    in bits of the machine. WLOG all data strings
    are nonempty.
  • Let m Si si.

At most w strings
1
0
lt w apart
0
On avg block is ½ full
1
0
Lookup tables cut the time down to constant time
for finding a block and the string inside it,
since they can operate on entire words at
once. Indexing structures hash table use O(w)
bits per block. Each block is on average half
full due to the invariant. O(m w) bits used
and operations are O(1) time!
0
0
O(m/w 1) blocks!
1
10
Briefly, how we proceed from there
  • We can finally implement our generalized hash
    table using an array of the type we just
    described as the hash table.
  • There are more details the following paper
    explains this.

D. Blandford and G. E. Blelloch. Storing
Variable-Length Keys in Arrays, Sets, and
Dictionaries, with Applications. In Symposium on
Discrete Algorithms (SODA), 2005 (hopefully)
11
Great, but if theres a paper written on the
subject already, then what do I do?
  • A lot of this code isnt yet written. We havent
    yet checked to see that the code we have fulfills
    the theoretical bounds, since we have to make
    sure that any cutting corners done for the
    programming is theoretically safe.
  • My job is to get a lot of this running and look
    for optimizations.
  • Also, once this is running, well want to run
    experiments to see how well it runs, especially
    in modeling graphs.
Write a Comment
User Comments (0)
About PowerShow.com