Performance Tradeoffs in ReadOptimized Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Performance Tradeoffs in ReadOptimized Databases

Description:

WHERE a1 yields variable selectivity. massachusetts institute of technology. 11 ... Narrow tuple: 10% selectivity. Memory stalls disappear in narrow tuples ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 23
Provided by: nmsCsa
Learn more at: http://nms.csail.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Performance Tradeoffs in ReadOptimized Databases


1
Performance Tradeoffs in Read-Optimized Databases
  • Stavros Harizopoulos
  • MIT CSAIL
  • joint work with
  • Velen Liang, Daniel Abadi, and Sam Madden

massachusetts institute of technology
2
Read-optimized databases
1
2
Joe

SQL Server DB2 Oracle
Sybase IQ MonetDB CStore
Sue
45

37

column stores
row stores
Read optimizations
Materialized views, multiple indices, compression
How does column-orientation affect performance?
3
Rows vs. columns
column data
row data
seek
1 Joe 45
2 Sue 37
single file

Study performance tradeoffs solely in data storage
4
Performance study
  • Methodology
  • Built storage manager from scratch
  • Sequential scans
  • Analyze CPU, disk, memory
  • Findings
  • Columns are generally more I/O efficient
  • Competing traffic favors columns
  • Conditions where columns are CPU-constrained
  • Conditions where rows are MemBW-constrained

5
Talk outline
  • System architecture
  • Workload and Experiments
  • Analysis
  • Conclusions

6
System architecture
  • Block-iterator operators
  • Single-threaded, C, Linux AIO
  • No buffer pool
  • Use filesystem, bypass OS cache
  • Compression
  • Dense-pack

100 full
60 full
7
Compression methods
  • Dictionary
  • Bit-pack
  • Pack several attributes inside a 4-byte word
  • Use as many bits as max-value
  • Delta
  • Base value per page
  • Arithmetic differences

low high low normal
00 10 00 01
8
Storage engine
row scanner
column scanner
SELECT name, age WHERE age gt 40
Joe 45

Joe 45

S
apply predicate(s)
S
POS 45
POS
name
apply predicate 1
S
age
9
Platform
CPU
L2
RAM
DISKS
(striped)
3.2 GB/sec
3.2GHz
180 MB/sec
1GB
1MB
prefetching
10
Workload
  • LINEITEM (wide)
  • 60m rows ? 9.5 GB
  • ORDERS (narrow)
  • 60m rows ? 1.9 GB
  • Query

150 bytes
50 bytes
32 bytes
12 bytes
SELECT a1, a2, a3, WHERE a1 yields variable
selectivity
11
Wide tuple 10 selectivity
Column
Row
time (sec)
Column (CPU only)
Row (CPU only)
selected bytes per tuple
  • Large prefetch hides disk seeks in columns

12
Wide tuple 10 sel. (CPU)
attributes selected column store
  • Row-CPU suffers from memory stalls

13
Wide tuple 10 sel. (CPU)
0.1
attributes selected column store
  • Column-CPU efficiency with lower selectivity

14
Narrow tuple 10 selectivity
time (sec)
row store
column store
selected bytes per tuple
attributes selected
  • Memory stalls disappear in narrow tuples
  • Compression similar to narrow (not shown)

15
Varying prefetch size
no competingdisk traffic
Column 2
Column 8
time (sec)
Column 16
Column 48 (x 128KB)
Row (any prefetch size)
selected bytes per tuple
  • No prefetching hurts columns in single scans

16
Varying prefetch size
no competingdisk traffic
with competing disk traffic
time (sec)
selected bytes per tuple
  • No prefetching hurts columns in single scans
  • Under competing traffic, columns outperform rows
    for any prefetch size

17
Analysis
  • Central parameter in analysis
  • cycles per disk byte (cpdb)
  • What can it model
  • More / fewer disks
  • More / fewer CPUs
  • CPU / disk competing traffic
  • Trends in cpdb
  • 10 ? 30 from 1995 to 2006
  • Further increase with multicore chips

18
Analysis
speedup of cols over rows
2
10 selectivity
50 projection
1.6 2
1.2 1.6
(cpdb)
cycles per disk byte
0.8 1.2
0.4 0.8
tuple width
  • Rows favored by narrow tuples and low cpdb
  • Disk-bound workloads have higher cpdb

19
See our paper for the rest
  • CPU time breakdowns, L2 prefetcher
  • Disk prefetching implementation
  • Compression results
  • Non-pipelined column scanner
  • Analysis

20
Conclusions
  • Given enough space for prefetching, columns
    outperform rows in most workloads
  • Competing traffic favors columns
  • Memory-bandwidth bottleneck in rows
  • Future work
  • Column scanners, random I/O, write performance

21
Thank you
db.csail.mit.edu/projects/cstore
22
Analysis
parameter
what it can model
Write a Comment
User Comments (0)
About PowerShow.com