ArchitectureConscious Hashing - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

ArchitectureConscious Hashing

Description:

Cuckoo Hashing. Two hash functions, two offsets. Inside value in one of the offsets ... Cuckoo lookup (na ve) offset1 = HASH1(key) % size; offset2 = HASH2(key) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 36
Provided by: Mar1020
Category:

less

Transcript and Presenter's Notes

Title: ArchitectureConscious Hashing


1
Architecture-ConsciousHashing
  • Marcin Zukowski
  • TTT, CWI, 2006.06.01

2
Overview
  • Introduction
  • Hashing application
  • Basic implementation
  • Hashing in MonetDB/X100
  • CPU optimizations
  • Cache optimizations
  • Conclusions and discussion

3
Dictionary task
  • Two basic functions
  • insert(key,tuple) void
  • lookup(key) tuple
  • Example application duplicate removal
  • for each tuple
  • key make_key(tuple)
  • if not lookup(key)
  • insert(key,tuple)

4
Dictionary in DBMS
  • Used almost everywhere
  • Joins
  • Aggregation and duplicate removal
  • Set operations (intersection etc.)
  • Indices
  • Data compression

5
Dictionary implementations
  • Arrays
  • Ordered O(logN) lookup, O(N) insert
  • Unordered O(N) lookup, O(1) insert
  • Trees
  • O(logN) lookup and insert
  • Hash-tables
  • O(1) lookup and insert

6
Hash-tables
  • Hash function key -gt hash
  • Table offset hash mod table size
  • Sparse buckets and dense values
  • Collisions e.g. bucket chaining

7
Bucket-chained lookup
  • offset HASH(key) size
  • group_id bucketsoffset
  • while(group_id keysgroup_id key )
  • group_id nextgroup_id

8
Hashing CPU problems
  • Pipelined execution
  • Control dependencies if-then-else, loops
  • Data dependencies
  • Bucket-chained hashing
  • while(group_id keysgroup_id key )
  • group_id nextgroup_id

Nested loop with complex condition
9
Hashing memory problems
  • Memory hierarchy
  • Fast but small (16K-1M) cache memories
  • Large but slow (100s cycles) main memory
  • Hashing

Random memory accesses
10
Hashing - summary
  • What is hashing?
  • Basic dictionary functionality
  • Highly useful
  • Good complexity
  • But!
  • Performance problems on modern CPUs
  • How to use it in MonetDB/X100?

11
Hashing in MonetDB/X100
  • MonetDB/X100 architecture
  • Hashing
  • Vectorized Cuckoo Hashing
  • Best-Effort Partitioning

12
X100 Architecture
13
X100 execution layer
14
X100 vectors
  • Data stored in columns
  • Vector a vertical slice of a column
  • Internally a simple array
  • Vectors sizes tuned such that all fit in the CPU
    cache

15
X100 primitives
  • Process entire vectors at a time
  • Reduced interpretation overhead
  • Simple, type-specific, branch-free code
  • Highly efficient
  • void map_add_int_col_int_col ( int n,
  • int res, int col1, int col2)
  • for (int i0 iltn i)
  • resi col1i col2i

16
X100 hashing challenges
  • CPU-friendly hashing
  • Vectorized hashing functions
  • Vectorized hash-table lookup
  • Compound key handling
  • Cache-friendly hashing

17
Vectorized hashing function
  • Simple, speed optimized, for example
  • for(i0 iltn i)
  • h inputi
  • h (h) (hltlt15)
  • h h (hgtgt11)
  • h h (hltlt3)
  • outputi h

18
Vectorized hash lookup
  • Bucket-chained hashing
  • Conflict list traversal
  • Inherently impossible to vectorize
  • Idea
  • Avoid the conflict list
  • Cuckoo

19
Cuckoo Hashing
  • Two hash functions, two offsets
  • Inside value in one of the offsets
  • If it is taken, move the old value to its second
    offset. Repeat.
  • During lookup, check both offsets

20
Cuckoo lookup (naïve)
  • offset1 HASH1(key) size
  • offset2 HASH2(key) size
  • index1 bucketsoffset1
  • index2 bucketsoffset2
  • if (index1 keysindex1 key)
  • group_id index1
  • else if (index2 keysindex2 key)
  • group_id index2
  • else
  • group_id 0 // miss

21
Branch-free Cuckoo lookup
  • offset1 HASH1(key) size
  • offset2 HASH2(key) size
  • index1 bucketsoffset1
  • index2 bucketsoffset2
  • mask1 -(keysindex1 key)
  • mask2 -(keysindex2 key)
  • group_id
  • mask1 index1 mask2 index2

22
Compound keys
  • Primitives type-optimized for a single column
  • keysindex key
  • Multi-column keys how?
  • Pass an array of vectors and comparison functions
    SLOW!
  • Exploit the hash values ?

23
Compound keys (cont.)
  • Assume hash values unique
  • Vectorized lookup using hash values
  • Vectorized check for conflicting values
  • Lookup conflicts in a slow way (rare!)

24
CPU performance
25
CPU optimizations - summary
  • Fully vectorized hash-table processing
  • Hash functions
  • Lookup
  • Compound keys
  • What if the hash-table is big?
  • Cache-size exceeded
  • Main-memory access cost hundreds of CPU cycles

26
Cache-optimized hashing
  • Idea partition a hash-table into smaller
    hash-tables and process them locally
  • Outline
  • Traditional hashing
  • Best-Effort Partitioning
  • Benchmarks

27
Traditional partitioning
  • Both I/O and memory optimization
  • Basic idea
  • Spread the data into multiple partitions looking
    at the hash value
  • Process each partition locally
  • Problem
  • All partitions need to be fully saved before
    processing.
  • What if theres no space?

28
Best-Effort Partitioning (BEP)
  • Idea Partition as much as possible, then process
  • Rationale Even with fewer tuples the hash-table
    locality can be exploited
  • Benefit Reduced memory requirements

29
BEP algorithm
30
BEP example
  • Problem
  • Find 256K unique values out of 100M 8-byte wide
    tuples
  • Available memory - 70MB.
  • Cache is 64KB, 64b cache lines
  • BEP settings
  • Hash table 4MB (512K4 256K8)
  • 128 partitions 32KB sub-tables, 512 cache lines
  • 66MB for partitioned data 512KB per partition,
    64K tuples
  • When processing, 64K tuples are looked up in 512
    cache lines
  • Each cache line touched 128 times access cost
    amortized

31
BEP performance
32
BEP memory size impact
33
BEP conclusions
  • Provides cache-resident processing
  • Reduces memory requirements over full
    partitioning
  • Possible to use on multiple storage levels (disk,
    memory)

34
Conclusions
  • Two major hashing optimizations
  • CPU-friendly vectorized processing
  • Cache-friendly Best Effort Partitioning
  • Future work
  • Applying to various relational operators (hash
    join!)
  • Test in a bigger scenario (TPC-H)

35
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com