Title: CStore: A Columnoriented DBMS
1C-Store A Column-oriented DBMS
- By
- New England Database Group
2 Current DBMS Gold Standard
- Store fields in one record contiguously on disk
- Use B-tree indexing
- Use small (e.g. 4K) disk blocks
- Align fields on byte or word boundaries
- Conventional (row-oriented) query optimizer and
executor (technology from 1979) - Aries-style transactions
3Terminology -- Row Store
Record 1
Record 2
Record 3
Record 4
E.g. DB2, Oracle, Sybase, SQLServer,
4 Row Stores are Write Optimized
- Can insert and delete a record in one physical
write - Good for on-line transaction processing (OLTP)
- But not for read mostly applications
- Data warehouses
- CRM
5Elephants Have Extended Row Stores
- With Bitmap indices
- Better sequential read
- Integration of datacube products
- Materialized views
But there may be a better idea.
6Column Stores
7At 100K Feet.
- Ad-hoc queries read 2 columns out of 20
- In a very large warehouse, Fact table is rarely
clustered correctly - Column store reads 10 of what a row store reads
8C-Store (Column Store) Project
- Brandeis/Brown/MIT/UMass-Boston project
- Usual suspects participating
- Enough coded to get performance numbers for some
queries - Complete status later
9We Build on Previous Pioneering Work.
- Sybase IQ (early 90s)
- Monet (see CIDR 05 for the most recent
description)
10C-Store Technical Ideas
- Code the columns to save space
- No alignment
- Big disk blocks
- Only materialized views (perhaps many)
- Focus on Sorting not indexing
- Automatic physical DBMS design
11C-store (Column Store) Technical Ideas
- Optimize for grid computing
- Innovative redundancy
- Xacts but no need for Mohan
- Data ordered on anything, Not just time
- Column optimizer and executor
12How to Evaluate This Paper.
- None of the ideas in isolation merit publication
- Judge the complete system by its (hopefully
intelligent) choice of - Small collection of inter-related powerful ideas
- That together put performance in a new sandbox
13Code the Columns
- Work hard to shrink space
- Use extra space for multiple orders
- Fundamentally easier than in a row store
- E.g. RLE works well
14No Alignment
- Densepack columns
- E.g. a 5 bit field takes 5 bits
- Current CPU speed going up faster than disk
bandwidth - Faster to shift data in CPU than to waste disk
bandwidth
15Big Disk Blocks
- Tunable
- Big (minimum size is 64K)
16Only Materialized Views
- Projection (materialized view) is some number of
columns from a fact table - Plus columns in a dimension table with a 1-n
join between Fact and Dimension table - Stored in order of a storage key(s)
- Several may be stored!!!!!
- With a permutation, if necessary, to map between
them
17Only Materialized Views
- Table (as the user specified it and sees it) is
not stored! - No secondary indexes (they are a one column
sorted MV plus a permutation, if you really want
one)
18Example
User view EMP (name, age, salary, dept) Dept
(dname, floor) Possible set of MVs MV-1 (name,
dept, floor) in floor order MV-2 (salary, age) in
age order MV-3 (dname, salary, name) in salary
order
19Different Indexing
20Automatic Physical DBMS Design
- Not enough 4-star wizards to go around
- Accept a training set of queries and a space
budget - Choose the MVs auto-magically
- Re-optimize periodically based on a log of the
interactions
21Optimize for Grid Computing
- I.e. shared-nothing
- Dewitt (Gamma) was right
- Horizontal partitioning and intra-query
parallelism as in Gamma
22Innovative Redundancy
- Hardly any warehouse is recovered by a redo from
the log - Takes too long!
- Store enough MVs at enough places to ensure
K-safety - Rebuild dead objects from elsewhere in the
network - K-safety is a DBMS-design problem!
23XACTS No Mohan
- Undo from a log (that does not need to be
persistent) - Redo by rebuild from elsewhere in the network
24XACTS No Mohan
- Snapshot isolation (run queries as of a tunable
time in the recent past) - To solve read-write conflicts
- Distributed Xacts
- Without a prepare message (no 2 phase commit)
25Storage (sort) Key(s) is not Necessarily Time
- That would be too limiting
- So how to do fast updates to densepack column
storage that is not in entry sequence?
26Solution a Hybrid Store
Write-optimized Column store
(Much like Monet)
(Batch rebuilder)
Tuple mover
Read-optimized Column store
(What we have been talking about so far)
27Column Executor
- Column operations not row operations
- Columns remain coded if possible
- Late materialization of columns
28Column Optimizer
- Chooses MVs on which to run the query
- Most important task
- Build in snowflake schemas
- Which are simple to optimize without exhaustive
search - Looking at extensions
29Current Performance
- 100X popular row store in 40 of the space
- 10X popular column store in 70 of the space
- 7X popular row store in 1/6th of the space
- Code available with BSD license
30Structure Going Forward
- Vertica
- Very well financed start-up to commercialize
C-store - Doing the heavy lifting
- University Research
- Funded by Vertica
31Vertica
- Complete alpha system in December 05
- Everything, including DBMS designer
- With current performance!
- Looking for early customers to work with (see me
if you are interested)
32University Research
- Extension of algorithms to non-snowflake schemas
- Study of L2 cache performance
- Study of coding strategies
- Study of executor options
- Study of recovery tactics
- Non-cursor interface
- Study of optimizer primitives