Exploiting Multithreaded Architectures to Improve the Hash Join Operation - PowerPoint PPT Presentation

About This Presentation

Title:

Exploiting Multithreaded Architectures to Improve the Hash Join Operation

Description:

Exploiting Multithreaded Architectures to Improve the Hash Join Operation Layali Rashid, Wessam M. Hassanein, and Moustafa A. Hammad* – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 15

Provided by: Laya7

Category:

more less

Transcript and Presenter's Notes

Title: Exploiting Multithreaded Architectures to Improve the Hash Join Operation

1
Exploiting Multithreaded Architectures to Improve
the Hash Join Operation

Layali Rashid, Wessam M. Hassanein, and Moustafa
A. Hammad
The Advanced Computer Architecture Group _at_ U of C
(ACAG)
Department of Electrical and Computer Engineering
Department of Computer Science University of
Calgary

2
Outline

The SMT and the CMP Architectures
The Hash Join Database Operation
Motivation
Architecture-Aware Hash Join
Experimental Methodology
Timing and Memory Analysis
Conclusions

3
The SMT and the CMP Architectures

Simultaneous Multithreading (SMT) multiple
threads run simultaneously on a single processor.
Chip Multiprocessor (CMP) more than one
processor are integrated on a single chip.

4
The Hash Join Database Operation

The hash join process

The partition-based hash join algorithm

5
Motivation
Characterizing the Grace hash join on a
multithreaded machine

Multithreaded architectures create new
opportunities for improving essential DBMSs
operations.
Hash join is one of the most important operations
in current commercial DBMSs.
The L2 cache load miss rate is a critical factor
in main-memory hash join performance.
Therefore, we have two goals
Utilize the multiple threads.
Decrease the L2 miss rate.

6
Architecture-Aware Hash Join (AA_HJ)

The R-relation index partition phase
Tuples divided equally between threads, each
thread has its own set of L2-cache size clusters.
The build and S-relation index partition phase
One thread builds a hash table from each
key-range

Other threads index partition the probe relation.

7
Architecture-Aware Hash Join (contd)

The probe phase
The random accesses to any hash table whenever
there is a search for a potential match are a
challenge.
Threads probe hash tables with similar key range
simultaneously to increase temporal and spatial
locality.

8
Experimental Methodology

We ran our algorithms on two machines with the
following specifications

9
Experimental Methodology (contd)

All algorithms are implemented in C.
We employed the built-in OpenMP C/C library to
manage parallelism.
For Machine 1 we had a 50MByte build relation and
a 100MByte probe relation.
While for Machine 2 we had 250MByte build
relation and 500MByte.
We used the Intel VTune Performance Analyzer for
Linux 9.0 to collect the hardware events.

10
AA_HJ Timing Results

We achieved speedups ranging from 2 to 4.6
compared to Grace hash join on Quad Intel Xeon
Dual Core server (Machine 2).
Speedups for the Pentium 4 with HT ranges between
2.1 to 2.9 compared to Grace hash join.

PT Copy-partitioning hash join
NPT Non-partitioning hash join
Index PT Index-partitioning hash join
2, 4, 8, 12 or 16 is number of threads

11
Memory-Analysis for Multithreaded AA_HJ

A decrease in L2 load miss rate is due to the
cache-sized index partitioning, constructive
cache sharing and Group Prefetching.
A minor increase in L1 data cache load miss rate
from 1.5 to 4 on Machine 2.

12
Conclusions

Revisiting the join implementation to take
advantage of state-of-the-art hardware
improvements is an important direction to boost
the performance of DBMSs.
We emphasized pervious findings that the hash
join is bound by the L2 miss rates, which range
from 29 to 62.
We proposed an Architecture-Aware Hash Join
(AA_HJ) that relies on sharing critical
structures between working threads at the cache
level.
We find that AA_HJ decreases the L2 cache miss
rate from 62 to 11, and from 29 to 15 for
tuple size 20Bytes and 140Bytes, respectively.