ArchitectureConscious Hashing - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

ArchitectureConscious Hashing

Description:

Cuckoo Hashing. Two hash functions, two offsets. Inside value in one of the offsets ... Cuckoo lookup (na ve) offset1 = HASH1(key) % size; offset2 = HASH2(key) ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 36

Provided by: Mar1020

Category:

more less

Transcript and Presenter's Notes

Title: ArchitectureConscious Hashing

1
Architecture-ConsciousHashing

Marcin Zukowski
TTT, CWI, 2006.06.01

2
Overview

Introduction
Hashing application
Basic implementation
Hashing in MonetDB/X100
CPU optimizations
Cache optimizations
Conclusions and discussion

3
Dictionary task

Two basic functions
insert(key,tuple) void
lookup(key) tuple
Example application duplicate removal
for each tuple
key make_key(tuple)
if not lookup(key)
insert(key,tuple)

4
Dictionary in DBMS

Used almost everywhere
Joins
Aggregation and duplicate removal
Set operations (intersection etc.)
Indices
Data compression

5
Dictionary implementations

Arrays
Ordered O(logN) lookup, O(N) insert
Unordered O(N) lookup, O(1) insert
Trees
O(logN) lookup and insert
Hash-tables
O(1) lookup and insert

6
Hash-tables

Hash function key -gt hash
Table offset hash mod table size
Sparse buckets and dense values
Collisions e.g. bucket chaining

7
Bucket-chained lookup

offset HASH(key) size
group_id bucketsoffset
while(group_id keysgroup_id key )
group_id nextgroup_id

8
Hashing CPU problems

Pipelined execution
Control dependencies if-then-else, loops
Data dependencies
Bucket-chained hashing
while(group_id keysgroup_id key )
group_id nextgroup_id

Nested loop with complex condition
9
Hashing memory problems

Memory hierarchy
Fast but small (16K-1M) cache memories
Large but slow (100s cycles) main memory
Hashing

Random memory accesses
10
Hashing - summary

What is hashing?
Basic dictionary functionality
Highly useful
Good complexity
But!
Performance problems on modern CPUs
How to use it in MonetDB/X100?

11
Hashing in MonetDB/X100

MonetDB/X100 architecture
Hashing
Vectorized Cuckoo Hashing
Best-Effort Partitioning

12
X100 Architecture
13
X100 execution layer
14
X100 vectors

Data stored in columns
Vector a vertical slice of a column
Internally a simple array
Vectors sizes tuned such that all fit in the CPU
cache

15
X100 primitives

Process entire vectors at a time
Reduced interpretation overhead
Simple, type-specific, branch-free code
Highly efficient
void map_add_int_col_int_col ( int n,
int res, int col1, int col2)
for (int i0 iltn i)
resi col1i col2i

16
X100 hashing challenges

CPU-friendly hashing
Vectorized hashing functions
Vectorized hash-table lookup
Compound key handling
Cache-friendly hashing

17
Vectorized hashing function

Simple, speed optimized, for example
for(i0 iltn i)
h inputi
h (h) (hltlt15)
h h (hgtgt11)
h h (hltlt3)
outputi h

18
Vectorized hash lookup

Bucket-chained hashing
Conflict list traversal
Inherently impossible to vectorize
Idea
Avoid the conflict list
Cuckoo

19
Cuckoo Hashing

Two hash functions, two offsets
Inside value in one of the offsets
If it is taken, move the old value to its second
offset. Repeat.
During lookup, check both offsets

20
Cuckoo lookup (naïve)

offset1 HASH1(key) size
offset2 HASH2(key) size
index1 bucketsoffset1
index2 bucketsoffset2
if (index1 keysindex1 key)
group_id index1
else if (index2 keysindex2 key)
group_id index2
else
group_id 0 // miss

21
Branch-free Cuckoo lookup

offset1 HASH1(key) size
offset2 HASH2(key) size
index1 bucketsoffset1
index2 bucketsoffset2
mask1 -(keysindex1 key)
mask2 -(keysindex2 key)
group_id
mask1 index1 mask2 index2

22
Compound keys

Primitives type-optimized for a single column
keysindex key
Multi-column keys how?
Pass an array of vectors and comparison functions
SLOW!
Exploit the hash values ?

23
Compound keys (cont.)

Assume hash values unique
Vectorized lookup using hash values
Vectorized check for conflicting values
Lookup conflicts in a slow way (rare!)

24
CPU performance
25
CPU optimizations - summary

Fully vectorized hash-table processing
Hash functions
Lookup
Compound keys
What if the hash-table is big?
Cache-size exceeded
Main-memory access cost hundreds of CPU cycles

26
Cache-optimized hashing

Idea partition a hash-table into smaller
hash-tables and process them locally
Outline
Traditional hashing
Best-Effort Partitioning
Benchmarks

27
Traditional partitioning

Both I/O and memory optimization
Basic idea
Spread the data into multiple partitions looking
at the hash value
Process each partition locally
Problem
All partitions need to be fully saved before
processing.
What if theres no space?

28
Best-Effort Partitioning (BEP)

Idea Partition as much as possible, then process
Rationale Even with fewer tuples the hash-table
locality can be exploited
Benefit Reduced memory requirements

29
BEP algorithm
30
BEP example

Problem
Find 256K unique values out of 100M 8-byte wide
tuples
Available memory - 70MB.
Cache is 64KB, 64b cache lines
BEP settings
Hash table 4MB (512K4 256K8)
128 partitions 32KB sub-tables, 512 cache lines
66MB for partitioned data 512KB per partition,
64K tuples
When processing, 64K tuples are looked up in 512
cache lines
Each cache line touched 128 times access cost
amortized

31
BEP performance
32
BEP memory size impact
33
BEP conclusions

Provides cache-resident processing
Reduces memory requirements over full
partitioning
Possible to use on multiple storage levels (disk,
memory)

34
Conclusions

Two major hashing optimizations
CPU-friendly vectorized processing
Cache-friendly Best Effort Partitioning
Future work
Applying to various relational operators (hash
join!)
Test in a bigger scenario (TPC-H)

35
Thank you!

Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Chapter 12: Indexing and Hashing PowerPoint PPT Presentation

Chapter 12: Indexing and Hashing - Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing ... | PowerPoint PPT presentation | free to view

CSE 326 Hashing PowerPoint PPT Presentation

CSE 326 Hashing - CSE 326 Hashing Richard Anderson (instead of Martin Tompa) Chaining review Open address hashing Store all elements in table If a cell is occupied, try another cell. | PowerPoint PPT presentation | free to view

Searching / Hashing PowerPoint PPT Presentation

Searching / Hashing - Searching / Hashing Big-O of Search Algorithms Sequential Search - O(n) unsorted list in an array (did not do this term) linked list, even if sorted (gradelnklist ... | PowerPoint PPT presentation | free to view

Disk Storage, Basic File Structures, and Hashing - Disk Storage, Basic File Structures, and Hashing | PowerPoint PPT presentation | free to view

Cuckoo Hashing : Hardware Implementations PowerPoint PPT Presentation

Cuckoo Hashing : Hardware Implementations - Cuckoo Hashing and Moves. Cuckoo hashing paradigm: give each element d ... Cuckoo hashing appears implementable, with per-insert move guarantees based on ... | PowerPoint PPT presentation | free to view

What is Hashing and Salt and How to Use it Effectively? PowerPoint PPT Presentation

What is Hashing and Salt and How to Use it Effectively? - A primary goal is to keep your credentials and data secure. Hashing and salting of passwords and cryptographic hash functions are used to assure the maximum level of protection. Learn how to use salts to increase the efficiency of hashing. | PowerPoint PPT presentation | free to view

External Memory Hashing PowerPoint PPT Presentation

External Memory Hashing - Function: division or multiplication. h(x) = (a*x b) mod M, ... Size of hash table M ... Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 ... | PowerPoint PPT presentation | free to view

More on Hashing and Security PowerPoint PPT Presentation

More on Hashing and Security - ... can calculate a Y that hashes to X (using SHA-1) Means that hash signature of X also matches Y So attacker can replace X with Y When is that bad? | PowerPoint PPT presentation | free to view

Chapter 8 Hashing PowerPoint PPT Presentation

Chapter 8 Hashing - Chapter 8 Hashing Part II Dynamic Hashing Also called extendible hashing Motivation Limitations of static hashing When the table is to be full, overflows increase. | PowerPoint PPT presentation | free to view

Lecture 6 : Dynamic Hashing PowerPoint PPT Presentation

Lecture 6 : Dynamic Hashing - Related bucket is split. Different hash function is used. C 1 records are rehashed ... Store a record in a bucket pointed by the index ... | PowerPoint PPT presentation | free to view

More Hashing and Template Functions PowerPoint PPT Presentation

More Hashing and Template Functions - More Hashing and Template Functions 9-23-2003 | PowerPoint PPT presentation | free to view

An investigation into FA minimization through Regex Hashing PowerPoint PPT Presentation

An investigation into FA minimization through Regex Hashing - The hash function. Preliminary Results. 1. Motivation. Context: Regex = FA. Gaol: ... An investigation into FA minimization through regex hashing Last modified by: | PowerPoint PPT presentation | free to view

Towards efficient matching with random hashing methods PowerPoint PPT Presentation

Towards efficient matching with random hashing methods - Towards efficient matching with random hashing methods Kristen Grauman Gregory Shakhnarovich Trevor Darrell | PowerPoint PPT presentation | free to view

Hashing Function - Hashing Function ... | PowerPoint PPT presentation | free to view

Two-pass algorithms based on hashing PowerPoint PPT Presentation

Two-pass algorithms based on hashing - Two-pass algorithms based on hashing Main idea: Instead of sorted sublists, create partitions, based on hashing. Second pass creates result from partitions using one ... | PowerPoint PPT presentation | free to view

Hashing - Data Structure Hash indices are typically a prefix of the ... infrequent hashing provides faster insertion, ... handled by using the next bucket in cyclic order ... | PowerPoint PPT presentation | free to view

Hashing, Hashing Tables PowerPoint PPT Presentation

Hashing, Hashing Tables - Keys and Hash Functions. Each key is mapped into some number in the range ... An equal number of keys should map into each array position. Ease of Computation ... | PowerPoint PPT presentation | free to view

Maps, Dictionaries, Hashing PowerPoint PPT Presentation

Maps, Dictionaries, Hashing - Maps, Dictionaries, Hashing Search with Linear Probing Consider a hash table A that uses linear probing find(k) We start at cell h(k) We probe consecutive locations ... | PowerPoint PPT presentation | free to view

Hashing - resolving collisions PowerPoint PPT Presentation

Hashing - resolving collisions - Insert : start with the location where the key hashed and do ... Assuming uniform hashing... Insert/Unsuccessful search : 1/(1- ) Successful search : (1 ln(1/(1 ... | PowerPoint PPT presentation | free to view

Hashing Table Professor Sin-Min Lee Department of Computer Science PowerPoint PPT Presentation

Hashing Table Professor Sin-Min Lee Department of Computer Science - TABLES: Hashing Hash ... For example, hash function will take numbers in the domain of SSN s, and map them into the range of 0 to 10,000. Where hashing is helpful? | PowerPoint PPT presentation | free to view

Hashing as a Dictionary Implementation PowerPoint PPT Presentation

Hashing as a Dictionary Implementation - A Dictionary Implementation That Uses Hashing ... Fig. 19-16 A hash table containing dictionary entries, removed entries, and null values. ... | PowerPoint PPT presentation | free to view

Applications of LSH (Locality-Sensitive Hashing) PowerPoint PPT Presentation

Applications of LSH (Locality-Sensitive Hashing) - Applications of LSH (Locality-Sensitive Hashing) Entity Resolution Fingerprints Similar News Articles Desiderata Whatever form we use for LSH, we want : The time ... | PowerPoint PPT presentation | free to view

HASHING - HASHING Using balanced trees (2-3, 2-3-4, red-black, and AVL trees) we can implement table operations (retrieval, insertion and deletion) efficiently. | PowerPoint PPT presentation | free to view

Hashing: The Ultimate Solution for your Data Security PowerPoint PPT Presentation

Hashing: The Ultimate Solution for your Data Security - Hashing plays an important role in data security. It is one of the interesting things in recent times. | PowerPoint PPT presentation | free to view

Additional notes on Hashing PowerPoint PPT Presentation

Additional notes on Hashing - Additional notes on Hashing And notes on HW4 Selected Answers to the Last Assignment The records will hash to the following buckets: K h(K) (bucket number) 2369 1 ... | PowerPoint PPT presentation | free to view

Hashing - A hash table data structure consists of: ... provided that the indices are uniformly distributed N = hash table size n = number of elements in the table If n = O(N), ... | PowerPoint PPT presentation | free to view

Perfect Spatial Hashing - Perfect Spatial Hashing | PowerPoint PPT presentation | free to view