Title: Data Warehousing
1Data Warehousing
- Donald Kossmann
- Carsten Binnig
- http//systems.ethz.ch
2All about me
- What is the Truth? ?
- 1997-2001 Bachelor Student in Karlsruhe
- 2001-2004 Software Developer at Research Center
in Karlsruhe - 2002-2004 Master Student in Karlsruhe
- 2004-2007 PhD Student in Heidelberg
- From 2008 Senior Researcher at ETH Zurich
- Working with SAP on main-memory column-stores for
DWH - gt 50 in Walldorf (Germany)
3Organization
- Please raise your hands if yes ? !!!
- Exercises Who of you will attend the exercise
group at - 11-12
- 12-13
- Who of you knows the following basics?
- Relational algebra
- Query processing and Query Optimization
(join-ordering, query plans, iterator-model ) - Index-structures (single-attribute, composite- or
multidimensional-indexes, )
4Query Processing and Indexing
5Some Literature (1)
- 1 Chee Yong Chan, Yannis E. Ioannidis Bitmap
Index Design and Evaluation. SIGMOD Conference
1998 355-366 - 2 Patrick E. O'Neil, Goetz Graefe Multi-Table
Joins Through Bitmapped Join Indices. SIGMOD
Record 24(3) 8-11 (1995) - 3 George P. Copeland, Setrag Khoshafian A
Decomposition Storage Model. SIGMOD Conference
1985 268-279 - 4 Stavros Harizopoulos, Velen Liang, Daniel J.
Abadi, Samuel Madden Performance Tradeoffs in
Read-Optimized Databases. VLDB 2006 487-498 - 5 Daniel J. Abadi, Samuel Madden, Nabil Hachem
Column-stores vs. row-stores how different are
they really? SIGMOD Conference 2008 967-980 - 6 Nicolas Bruno Teaching an Old Elephant New
Tricks. CIDR 2009 - 7 Philippe Cudré-Mauroux, Eugene Wu, Samuel
Madden The Case for RodentStore An Adaptive,
Declarative Storage System. CIDR 2009 - 8 Ravishankar Ramamurthy, David J. DeWitt, Qi
Su A Case for Fractured Mirrors. VLDB 2002
430-441
6Some Literature (2)
- 9 Anastassia Ailamaki, David J. DeWitt, Mark D.
Hill, Marios Skounakis Weaving Relations for
Cache Performance. VLDB 2001 169-180 - 10 Daniel J. Abadi, Samuel Madden, Miguel
Ferreira Integrating compression and execution
in column-oriented database systems. SIGMOD 2006
671-682 - 11 Jun Rao, Kenneth A. Ross Cache Conscious
Indexing for Decision-Support in Main Memory.
VLDB 1999 78-89 - 12 Jun Rao, Kenneth A. Ross Making B-Trees
Cache Conscious in Main Memory. SIGMOD Conference
2000 475-486 - 13 Wolfgang Lehner Datenbanktechnologie für
Data-Warehouse-Systeme. Konzepte und Methoden
(Book). Dpunkt - 14 George Eadon, Eugene Inseok Chong, Shrikanth
Shankar, Ananth Raghavan, Jagannathan Srinivasan,
Souripriya Das Supporting table partitioning by
reference in oracle. SIGMOD Conference 2008
1111-1122 - 15 Surajit Chaudhuri, Umeshwar Dayal An
Overview of Data Warehousing and OLAP Technology.
SIGMOD Record 26(1) 65-74(1997) - 16 A. Silberschatz, H.F. Korth, S. Sudarshan
"Database System Concepts" (Fifth Edition),
McGraw-Hill. (Chapters 18, 20-22) - 17 Namik Hrle Disruptive Technology Trends In
Information Management. ETH Zurich Computer
Science Colloquium, March 9, 2009
7Motivation
- OLAP Online Analytical Processing
- Use Case Analyze data by aggregating measures
over multiple dimensions (drill-down, roll-up) - Example Sales of Amazon
- What is the total price of all products
- in product group Computer
- sold in quarter 04/2008
- in country Switzerland
- grouped by region?
Fact table
Dimensions
8OLAP vs. OLTP
9Star Schema (Example)
Dimension D3 Region
Dimension D2 Time
1
1
Fact Table F Sales
n
n
n
n
1
1
Dimension D1 Payment
Dimension D4 Product
10Star Schema (Example)
1
1
Ref. to dimensions
Measure(s)
n
n
n
n
1
1
11Star Query
SELECT ltdimensionsgt, ltaggregation-function(measur
e)gt FROM F, D1, D2, ...., Dn WHERE
ltjoin-conditionsgt AND ltfilter-conditionsgt GROUP
BY ltdimensionsgt
SELECT D3.region, SUM(F.price) FROM F, D2, D3,
D4 WHERE F.d2 D2.idAND F.d3 D3.id AND F.d4
D4.id AND D2.quarter 4 AND D2.year 2008 AND
D3.country'Switzerland' AND D4.group'Computer' G
ROUP BY D3.region
12Query Processing in a Nutshell
SELECT SUM(F.price), D3.region FROM F, D3 WHERE
F.d3 D3.id AND D3.country'Switzerland' GROUP
BY D3.region
Declarative Query
Scanner /Parser
Canonical Relational Algebra Expression (Tree)
Optimizer
Optimal Execution Plan (Code)
Execution Engine
Query Result
13Query Optimization in a Nutshell
Canonical Relational Algebra Expression (Tree)
Query Rewrite
Optimizer
Cost-based Opt.
Optimal Execution Plan (Code)
- Query Rewrite (Logical Optimization)
- Use rules and heuristics to rewrite relational
algebra expression (e.g. pushdown of projection,
pushdown of group-by, ) - Cost-based Optimization (Physical Optimization)
- Use a cost model to find optimal plan (e.g.,
determine concrete join-algorithm) with lowest
costs ( of disk pages that must be read)
14Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
15Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
16(1) Naïve Join-Strategy
SELECT SUM(F.price), D3.region FROM F, D2, D3, D4
WHERE F.D2 D2.id AND F.d3 D3.id AND F.d4
D4.id AND D2.quarter4 AND D1.year2008 AND
D3.country'Switzerland' AND D4.group'Computer' G
ROUP BY D3.region
- Idea Iteratively join dimension tables to fact
table (OLTP-style) - Optimizer should reorder joins and start joining
those dimensions first that have highest
selectivity for fact table F - The first join could use a single attribute index
on the dimension key of fact table F
17Example Naïve Join-Strategy
1
1
2
2
2
3
1
3
18Problems with Naïve Join-Strategy
- Fact table is usually much bigger than dimension
tables - If the first dimension does not effectively
reduce the tuples that need to be read from disk
for table F then - Using an index scan on F (which might produce
random disk accesses) might be slower than a full
table scan on F - A full table scan of F might also be too
expensive for big a F table - Example Fact table F has size 100GB on disk and
300MB/s can be read from disk (e.g., SATA 3) - The total time to read the whole fact table from
disk is 333s which might not be acceptable for
online reporting
19(2) Cross Product Plan
- Idea First create cross product for all
dimensions and then join result with fact table
using all dimension keys - Works well, if all dimensions are small
(otherwise this is very exp.) - Final join effectively reduces data from F and
could use compound index on all dimensions keys
in table F
20Example Cross Product Plan
1
1
4
1
2
2
2
1
21(3) Semi-Join Plan
- Idea Restrict tuples that are read from F before
joining dimensions - A pre-selection of RIDs is created using
semi-joins of dimensions with F gt
single-attribute indexes on dimension keys in F
can be leveraged - Single RID lists are intersected (e.g., using
hashing) - Read tuples from F that are in the resulting
RID-list
22Example Semi-Join Plan
1
1
1
2
1
1
1,2,3
0,1,2
0,1,3
23What have we learned so far?
- Different strategies for executing a star-join
- Naïve (including join-reordering)
- Cross-product plan
- Semi-join plan
- Question How to implement these plans
efficiently?
24What are possible solutions?
- General optimizations
- Materialized Views (logical access path)
- Horizontal-partitioning (logical access path)
- Optimizations for the scan / selection
operations - Bitmap-Indexes (physical access path)
- Optimizations for the join / semi-join
operations - Join-Indexes (physical access path)
25Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
26Materialized Views (see 13)
- Idea Pre-compute aggregated query results that
can efficiently answer other queries over star
schema - Compared to normal views materialized views are
stored on disk - Query optimizer chooses to scan materialized
views instead of tables to speed-up query
processing (e.g., by avoiding joins) - Trade-off speed-up of query processing ? costs
of keeping and maintaining materialized views
27Example Materialized View
Query
Materialized View (M)
SELECT D3.region, SUM(F.price) as
totalsales2 FROM F, D2, D3, D4 WHERE F.d2
D2.idAND F.d3 D3.id AND F.d4 D4.id AND
D2.quarter 4 AND D2.year 2008 AND D4.group
'Computer' GROUP BY D3.region
SELECT D2.quarter D3.region, D4.group, SUM(F.price
) as totalsales FROM F, D2, D3, D4 WHERE F.d2
D2.idAND F.d3 D3.id AND F.d4 D4.id AND
D2.year 2008 GROUP BY D2.quarter, D3.region,
D4.group
SELECT D3.region, SUM(totalsales) as totalsales2
FROM M WHERE D2.quarter 4 AND D4.group
'Computer' GROUP BY D3.region
28Materialized Views Questions
- Which materialized view(s) can be used to answer
a query? - How to maintain a materialized view if the base
tables are updated? - Which materialized views should be created? gt
Workload - How to rewrite a query using a materialized view?
gt Optimizer
29Derivability
- Question Can a materialized view M be used to
answer a query Q? - Sufficient conditions (see 13)
- Selection predicates PQ of Q are subsumed in
selection predicates PM of M - Group-by attributes GQ of Q are subsumed in
group-by attributes GM of M - Aggregation functions FQ of Q are compatible to
aggregation functions FM of M - All tables RM referenced by M must also be
referenced by Q (and be joined using the same
join predicate as joins are not lossless) - gt View M can be used to answer Query Q
30Derivability Selection Predicates
- Problem
- In propositional logic, subsumption of predicates
is undecidable for two arbitrary predicates - Solution
- Restrict predicates to use only simple predicates
(i.e., a op c a is an attribute, op is a comp.
operator and c is a constant) - Translate predicate into DNF
- Example
- Selection predicate PQ year2008 AND quarter4
- View predicate PM year2008
- gt PQ is subsumed in PM
31Derivability Group-by Lattice
quarter
region
group
quarter, region
quarter, group
region, group
quarter, region, group
- Arrows indicate which queries / views can be
derived from a view - Number of views for n functional independent
dimension attributes 2n - For functional dependent attributes only a
polynomial number of views (e.g. for country,
region, city we only need 3 views grouping by
country, region, or city as the func. dep.
county-gtregion-gtcity hold - gt Not all views may be materialized Select a
good subset!
32Example Group-by Lattice
quarter
region
group
quarter, region
quarter, group
region, group
M2
M1
quarter, region, group
- Assume we create two materialized views (M1, M2)
using the group-by attributes quarter, region
and region, group - Queries that group-by quarter, region or
can be derived from M1 - Queries that group-by region, group or can
be derived from M2
33Derivability Aggregation Functions
- Classification aggregation functions
- Additive F(X1?X2) F(F(X1), F(X2)) and F-1
exists gt SUM(a1, a2, a3) SUM( SUM(a1, a2),
SUM(a3) ) and SUM(a3) SUM(a1, a2, a3) - SUM(a1,
a2) - Semi-additive Same but F-1() does not exists gt
MIN(a1, a2, a3) MIN( MIN(a1, a2), MIN(a3) ) - Additive-computable F(x) F(F1(X), F2(X), ,
Fi(X) ) where F1, F2,, Fi are (semi-)additive
functions gt AVG SUM / CNT - All these classes of aggregation functions can be
used by a view to answer other queries - Other aggregation functions (e.g., median) do not!
34Example Aggregation Functions
Query
Materialized View (M)
SELECT MIN(F.price) FROM F, D2, D3, D4 WHERE F.d2
D2.idAND F.d3 D3.id AND F.d4 D4.id AND
D2.quarter 4 AND D2.year 2008 AND D3.region
'Zurich'
SELECT D2.quarter D3.region, D4.group, MIN(F.price
) as minprice FROM F, D2, D3, D4 WHERE F.d2
D2.idAND F.d3 D3.id AND F.d4 D4.id AND
D2.year 2008 GROUP BY D2.quarter, D3.region,
D4.group
SELECT MIN(F.price) FROM M WHERE D2.quarter
4 AND D3.region 'Zurich'
35Derivability Set of Relations
- All tables RM referenced by M must also be
referenced by Q and be joined using the same join
predicate - Problem Joins are not lossless if they are not
equi-joins along a foreign-key relationship - If lossless joins are used then not all tables of
M need to be referred in Q
36Example Set of Relations
- Materialized view M holds all suppliers that are
also customers (indicated by same primary key
value) - Query Q asks for all suppliers in region Zurich
Supplier
Customer
Materialized View (M)
Query (Q)
SELECT COUNT() FROM Supplier s, Customer c WHERE
s.suppkey c.custkey
SELECT COUNT() FROM Supplier s WHERE
s.region'Zurich'
gt Question Is Q allowed to use M ?
37View Maintenance
- Question How to maintain a materialized view if
new data is loaded into the DWH? - Different alternatives
- Update strategy Rebuild vs. incremental-update
- Freshness Immediate vs. on commit vs. deferred
38View Maintenance Update
- Rebuild
- Rebuild view(s) completely from base tables or
from other views when data changes - Multi-Query-Optimization
- Different views might be based on the same base
tables - Rebuild each view individually or combine rebuild
(e.g., first create an intermediate view M3 in
order to update view M1 and M2)? - Incremental-Update
- Compute a view delta by looking at the updates
only (and not at the base tables) and merge delta
with view - Views must satisfy certain properties (e.g., only
additive aggregation functions to support insert
and delete operations) - gt Question (for you) What about semi-additive
agg. functions?
39Example Rebuild
- Two materialized views (M1 and M2) exist that
contain - Total sales of 2008 in Switzerland (SUM)
- M1 groups-by quarter, region
- M2 groups-by region, group
- Rebuild M1 and M2 gt First build a view M3 that
- Aggregates the sales of 2008 in Switzerland and
- Groups-by quarter, region, group
- Example (Views M3 and M1)
View M3
View M1
40Example Incremental-update
- Materialized view M1 contains
- Total sales of 2008 in Switzerland (SUM)
- M1 groups-by quarter, region
- New data for quarter 4 is loaded into fact table
F - Reversals of products
- New sales of the day
- Incremental update of M1
- Derive a delta on the granularity of the view
- Merge delta with view
- Example (View M1 and Delta)
View M1
Delta
41Example Incremental-update
- Materialized view M1 contains
- Total sales of 2008 in Switzerland (SUM)
- M1 groups-by quarter, region
- New data for quarter 4 is loaded into fact table
F - Reversals of products
- New sales of the day
- Incremental update of M1
- Derive a delta on the granularity of the view
- Merge delta with view
- Example (View M1 and Delta)
Merged View M1 Delta
42Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
43Horizontal-Partitioning (see 12)
- Idea Split a table into n disjoint parts with
the same schema gt Union of all parts equals
original table - Benefits
- Skip certain partitions based on query predicates
when executing a filter operation (e.g., year
2008) gt pruning - Distribute partitions (to different
machines/cores) gt Parallelize certain query
operations (e.g., parallel hash joins) - Add / remove huge data sets (logically
corresponding together) in one operation - Partitioning Schemes Range, Hash, Round-Robin
44Horizontal-Partitioning By-Range
- Idea Partition a table by the value-ranges of a
certain attribute (called partitioning attribute)
of that table - A tuple belongs to a partition if the value of
the partitioning attribute falls into a certain
range of values - Problems Skew in data gt partitions with
different sizes - Example Fact table F contains an attribute year
- Create a partition for each relevant year F2006,
F2007, F2008, F2009 - A query filtering by F.year may be restricted to
read the corresponding partitions only - Example A query filtering by F.year 2008 only
has to read partition F2008
45Horizontal-Partitioning By-Hash
- Idea Partition a table by the hash-buckets of a
certain attribute (called partitioning attribute) - The partition a tuple belongs to is determined by
computing partition hash(attribute value)
number of partitions - Good hash-function Avoids skew in partition
sizes - Problems Range-queries need to read more
partitions - Example Fact table F contains an attribute year
- Create partitions F0, F1, F2, F3 and use hash
function F.year 4 - A query filtering by F.year may be restricted to
read the corresponding partitions only - Example A query filtering by F.year 2008 only
has to read partition F0 (2008 4 0)
46Horizontal-Partitioning By-Reference
- Idea Equi-partition tables that are connected by
a parent-child relationship (i.e., a 1m
relationship) - Example Partition the fact table and dimension
table by the same attribute (e.g., D2.year) - Using range-partitioning, the attribute D2.year
has to be replicated into the fact table gt
updates of dimensions are critical - Instead use reference (i.e., foreign-key
relationship) from fact-table to dimension table
in order to partition fact table by D2.year - New in Oracle 11gR1 (see 14)
47Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
48Indexing Basics
- Idea Find tuples of a table that have a given
value for a certain attribute (without executing
a table scan) - Example Index on attribute year (logical view)
Table D2
Index D2.year -gt RID
gt Typical Index Implementation B-tree (single
attribute)
49Example B-tree Index
Logical view on index
Nodes hold separators and pointers to children
B-tree index
Leaves hold keys andpointers to RID lists
31,33,
28,29,30,32
gt How does a lookup for year 2008 work? And
for 2006 year 2008?
50OLAP Indexing
- OLAP allows different indexing techniques than
OLTP - Update performance of indexes does not matter!
- Data is loaded in bulks (e.g., once a day)
- Afterwards, read-mostly data access
(multidimensional value-based access not
key-based access) - gt Indexes should be optimized for reads (not for
updates) - Multidimensional indexes are important (e.g., to
support queries with filters like D2.year2008
and D2.quarter4) - Main memory resident indexes increase read
performance (i.e., compression and
cache-optimizations are important) - gt Number / Size of indexes is only limited by
- Available disk space and time to load new data /
build indexes
51Bitmap-Indexes Motivation
- Problem Traditional indexes have problems to
support multiple dimensions - B-tree with (composite key)
- Concatenate values of different attributes and
index this value - gt Problem The major attribute determines the
index organization - R-tree (multidimensional index)
- Nodes hold n-dimensional overlapping regions
- gt Problem Too many overlapping regions make
lookup expensive - UB-tree (multidimensional index)
- Map a n-dimensional point to a z-value gt use
Btree - gt Problem Combinatorial explosion of possible
z-values
52Example R-Tree
- Example R-Tree for two attributes D2.year and
D2.quarter
Logical view (quarter, year) -gt RID
D2.quarter
Q4
Q3
Q2
Q1
2005
2006
D2.year
2007
2008
53Example R-Tree
- Example R-Tree for two attributes D2.year and
D2.quarter
R-tree index (quarter, year) -gt RID
D2.quarter
Q4
Q3
Q2
D2.quarter
Q1
19
18
22
23
2005
2006
D2.year
2007
2008
gt How does a lookup for (Q2, 2006) work here?
54Bitmap-Indexes Idea (see 1)
- Idea A bitmap-index stores a bit-vector Bv for
each distinct value v of a single attribute (e.g,
2 values for gender) - Each bit-vector has the size of the table (of
tuples in table) - The bit at position n of one bit-vector is set to
1 if the tuple with RIDn has the value
represented by that bit-vector - Bitmap indexes can efficiently support
multidimensional queries - Use one bitmap-index per attribute
- Boolean operations to combine bitmap-indexes
(AND, OR, NOT) are efficient because 32/64 bits
can be combined in 1 CPU cycle
55Example Bitmap-Index
Index D4.brand -gt RID
Bitmap Index D4.brand
Bitmap Index D4.group
Index D4.group -gt RID
56Example Bitmap-Index
- Query Sales in group Computers for brands
Dell, Lenovo)
SELECT SUM(F.price) FROM F, D4 WHERE F.D4
D4.id AND D4.group 'Computer' AND (D4.brand
'Dell' OR D4.brand 'Lenovo')
Bitmap Index D4.brand
- In order to find all relevant RIDs of dimension
D4 for that query - Calculate B BCom ? ( BDell ? BLen )
- B indicates RIDs that need to be read
Bitmap Index D4.group
B 110000 ? (100100 ? 010000)
110000
gt RIDs 0,1 of D4 need to be read
57Bitmap-Indexes Problems
- Bitmap-indexes for attributes with large domains
consume a lot of space - Example Assume that the attribute D4.brand has
1000 distinct values and the D4 table holds 1m
products - gt Space needed by bitmap-index for attribute
D4.brand 1bn bits 125m bytes ? 120Mb or 125
bytes per tuple (which is a lot) - For range-queries over attributes with large
domains many bit-vectors need to be read from
disk - Example Assume that we want to select all tuples
of table D2 (time) where 1999 year 2008 - gt 10 bit-vectors need to be read for attribute
year
58Decomposed Bitmap-Indexes
- Idea express all attribute values v as a linear
combination v c1v1 cnvn (c1cn are
constants) - Create a single bitmap-index for each variable
v1vn - gt Reduce space overhead for large domains
- Example Assume that we want to index an
attribute a with values 0999 - A standard bitmap-index would need 1000
bit-vectors - Decomposition v v3100 v210 v11 (where
vi 09 ) - We create 3 bitmap-indexes each with 10 different
bit-vectors for the values 09 gt we need 30
bit-vectors instead of 1000 - However, a point-query (e.g. a578) needs to read
all three bit-vectors (v1, v2, v3) and AND-them
59Example Decomposed Bitmap-Index
60Example Decomposed Bitmap-Index
61Example Decomposed Bitmap-Index
62Example Decomposed Bitmap-Index
63Example Decomposed Bitmap-Index
a576510071061
Bv3,5 ?Bv2,7 ?Bv1,6 00100
gt RID 3, ...
64Range-encoded Bitmap-Indexes
- Idea The bit at position n of a bit-vector Bv is
set to 1 if the value of the tuple at RIDn is
less equal v - We do not need a bit-vector for the max. value of
a domain (this is a tautology all tuples hold
values less equal max. value) - gt Reduce number of bit-vectors to be read for
range queries - Example Assume that we want to index the
attribute D2.year with values 1999 2009 - We need bit-vectors for the value 19992008
B1999, , B2008 - The range-query 2001 year 2004 needs to read
the two bit-vectors (B2000, B2004) and calculate
B2004 AND (NOT B2000)
65Example Range-encoded Bitmap-Indexes
2001 D2.year 2004
B2004 ? NOT B2000 01010
gt How to process a point query like
?
D2.year2004
66Bitmap-Indexes Compression
- Many entries of a bit map may be zero (if the
attribute uses a lot of different values) - Therefore compression techniques may be applied
- gt At query time bitmaps need to be decompressed
- Compression Techniques
- Decomposed Bitmap-Indexes
- Run length encoding (RLE)
- Approximate bitmaps ( e.g., similar to
bloom-filters)
67Approximate Bitmap-Indexes
- Idea For table of size n (i.e., having n rows),
we can use a bitmap-index that uses bit-vectors
of size m ?? n - Apply hash function to determine the index i into
the bit-vector for a given rid i hash(rid)
(e.g., i rid m) - Standard Attribute a has value v at pos. rid ?
Bvrid 1 - Approx. Attribute a has value v at pos. rid ?
Bvhash(rid) 1 - This means
- Bvhash(rid) 1 Attribute a could have value
v at position rid - Bvhash(rid) 0 Attribute a does not have
value v at position rid - Approx. bitmaps may return a superset of
potential RIDs - False positives have to be post-filtered using
the predicate
68Example Approximate Bitmap-Indexes
- Example Table D2 has 10 rows that are mapped to
bit-vectors of size 4 by using rid 4 as a hash
function - For a query with D2.year2008, we know
- B200811 gt tuples at pos. 5, 9, 13, might have
value 2008 - gt Not all hash-functions have an
inverse-functions -gt use probing
For rid9 with D2.year2008, we set B200894
B20081 1
69Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
70Join-Indexes (see 2)
- Special case of a materialized view using RIDs
- For two tables R and S a join index may have
three different types - Type 1 Join value -gt (R.RID, S.RID) i.e.,
all rows of R and S that have the specified join
value - Type 2 R.RID -gt S.RID, i.e., all rows of S
that join with row R.RID - Type 3 R.attribute -gt S.RID, i.e., all rows of
S that join with those rows in R having the
specified attribute value (not the join key)
71Example Join-Index (Type 12)
D3 Region
F Sales
n
1
- Type 1 Join val.-gt(D3.RID, F.RID)
- 1 -gt (0, 0, 5)
- 2 -gt (1, 2, 3)
- 3 -gt (2, 1, 4)
- 4 -gt (3, 6)
- Type 2 D3.RID -gt F.RID
- 0 -gt 0, 5
- 1 -gt 2, 3
- 2 -gt 1, 4
- 3 -gt 6
- gt Can be derived from Type 1
72Example Join-Index (Type 3)
D3 Region
F Sales
n
1
- Type 2 D3.RID -gt F.RID
- 0 -gt 0, 5
- 1 -gt 2, 3
- 2 -gt 1, 4
- 3 -gt 6
- Type 3 D3.region -gt F.RID
- Zurich -gt 0, 5
- Basel -gt 2, 3
- Luzern -gt 1, 4
- Bavaria -gt 6
- gt Can be derived from Type 2
gt How to create a type 3 index from type 2 (if
the attribute values are not distinct)?
73Example Join-Index (Type 3)
D3 Region
F Sales
n
1
- Type 2 D3.RID -gt F.RID
- 0 -gt 0, 5
- 1 -gt 2, 3
- 2 -gt 1, 4
- 3 -gt 6
- Type 3 D3.country -gt F.RID
- Switzerland -gt 0, 1, 2, 3, 4, 5
- Germany -gt 6
- gt Can be derived from join-index (type 2) by
- Joining index with D3 on RID
- Merging F.RID lists with same values
74Join-Index Implementation (1)
- Question How to represent RID lists?
- As compact lists (Similar to inverted lists in
IR) - Uncompressed
- value -gt int,int,int, terminator or
- value -gt number of RIDS, int,int,int
- Compressed (using some compression scheme)
- value -gt int, diff to previous value, diff to
previous value, term. - As bitmaps
- Works well if total number of RIDs is not too
large and the cardinality of the indexed
attribute is low
75Join-Index Implementation (2)
- Question Which RID-list representation should be
used? - Depending on the cardinality of an attribute
RID-list representations may be switched - Attribute value frequent (i.e., one value is used
in many tuples) bitmaps may be a good choice - Attribute value rare (i.e., one value is used
only in some tuples) RID lists may be better - A clever index implementation may switch between
the two representations at any time - Therefore, an ordered RID list can be regarded as
a way to compress a bitmap (and vice-versa)
76Example Bitmap Join-Index (Type 2)
D3 Region
F Sales
n
1
- Type 2 D3.RID -gt F.RID
- 0 -gt 0, 5
- 1 -gt 2, 3
- 2 -gt 1, 4
- 3 -gt 6
- of bit-vectors D3
- Bit-vector length F
77Example Bitmap Join-Index (Type 3)
D3 Region
F Sales
n
1
- Type 3 D3.country -gt F.RID
- Switzerland -gt 0, 1, 2, 3, 4, 5
- Germany -gt 6
- gt Can be derived from Type 2 which maps D3.RID
-gt F.RID
Bitmap (Type 2)
B0 ? B1 ? B2 BSwitzerland
78Star-Join using Bitmap Indexes
- Semi-Join plan Read only those tuples from F
that contribute to the result by pre-selecting
the relevant RIDs using semi-joins - Execution Plan
- Selections Use bitmap-indexes
- Semi-join Use join-index (type 2)
- Alternatively Plan
- Selection and Semi-JoinUse join-index (type 3)
- Intersection of RIDs
- AND all bit-vectorsreturned from semi-joins
- Read tuples from F
- Use index-nested-loop join on RID
79Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
80Wait-Time in DBMS
from 17
- Problem Even in a well-tuned DBMS wait time can
dominate! Why?
81Memory Hierarchy
CapacityAccess Time
Control Granularity
Bytes1-5ns
Register
Compiler 1-8 Bytes
K-MBytes2-20ns
Cache Controller 8-128 Bytes (Cache Line)
Cache lt-gt Main- Memory Gap
M-GBytes50-100ns
Operating System 1-16 Kbytes(Disk Block, Page)
Main-Memory lt-gt HD Gap
GBytesmsec
Operator Files (Mbytes)
TBytessec
Operator Files (Mbytes)
TBytessec-min
82 and things are getting worse
- HD capacity vs. HD access time
- CPU clock rate vs.
- Main memory access time
Improvement
Improvement
10000
100
1000
100
10
10
1
1
1970
1980
1990
2000
1970
1980
1990
2000
gt We can store more and more data while
accessing the data relatively gets slower
83Wait-Time in DBMS due to I/O
from 17
- Wait time of CPU processes in current DBMS due to
I/O - Bring data from hard disk (rotating) into main
memory gt disk seek - Bring data from main memory to CPU caches gt
cache misses - gt Goal Increase temporal/spatial locality of
data in memory hierarchy
84N-ary Storage Model in DBMS (NSM)
NSM (Disk Page)
D3 Region
Header
1
2
Zurich
Switzerland
3
Basel
Switzerland
4
Luzern
Switzerland
5
Bavaria
Germany
Baden-W.
Germany
o
o
o
o
o
- In a DBMS data is stored clustered by pages on a
disk - Page Layout N-ary Storage Model / Slotted Page
- All tuples are stored sequentially
- All attributes of a tuple are stored together
- gt Question What happens in the memory hierarchy?
85NSM in Memory Hierarchy
SELECT id FROM D3 WHERE country'Switzerland'
CPU Cache
Disk
Main Memory
Cache Line 1
Cache Line 2
Cache Line 3
Cache Line 4
- () NSM optimized for full-tuple access (as in
OLTP) - (-) Partial-tuple access hurts at all memory
levels
86Vertical Partitioning
- Idea Divide a table into multiple tables that
contain fewer columns (each stored on disk using
the NSM) - Cluster attributes into one table that are
accessed together - Unique key column is replicated in all tables for
joining - When scanning a table only relevant data must be
read from disk into main memory and then into CPU
caches
D3 Region
D3 Region (vertically partitioned)
87Decomposed Storage Model (DSM)
- Idea Extreme Case of vertical partitioning (see
3) - Divide a table into a set of two-column tables
(RID, attribute) gt one table for each attribute - Alternatively divide a table into one-column
tables storing the RID implicitly (array-like
representation) - Store resulting tables on separate disk pages
D3 Region
D3 Region (decomposed)
88Decomposed Storage Model (DSM)
D3 Region (decomposed)
Disk Pages
Header
Header
Header
1
2
3
Zurich
Switzerland
5
4
Basel
Switzerland
Luzern
Switzerland
Bavaria
Germany
Baden-W.
Germany
- () Very efficient when only a few attributes of
a table are accessed - (-) Full-tuple access needs to read data from
different disk pages
89Column-Stores
- Specialized stores that use DSM to layout data on
disk - Used to speed up OLAP (i.e., read-only) workloads
- Questions What is different to traditional
row-oriented stores (as todays DBMS) that use
vertical partitioning? - Late materialization columns that are read from
disk are joined together into rows as late as
possible in a query plan gt avoid reading data
that is not relevant for the query result - Vectorized Query Processing Multiple values from
a column are passed as a block from one operator
to the next rather than using per-tuple iterators
gt inefficient for column-stores - Lightweight compression Reduce I/O to read data
from disk and enable query processing on
compressed data (e.g, RLE)
90Column-Stores vs. Row-Stores (1)
- Recent Study (see 4) Scanning a Table (NSM vs.
DSM) - Tuple width 150 bytes, 16 attributes, 9.5GB
table - Read 10 data of table using a filter operation
and vary the number of selected attributes (i.e.,
selected bytes per tuple)
91Column-Stores vs. Row-Stores (2)
- Two other studies 5, 6 Simulating a
column-store using a row-store - Both use Star Schema Benchmark for performance
evaluation - 5 compares a column-store to another row-store
using - Vertically-partitioned tables
- Tables which have one index per attribute using
index-only plans - Materialized views (as optimal case)
- 6 compares scan of relevant attributes (i.e.,
optimal column-store) to - A vertical partitioned row-store using a special
logical schema (explicit RLE) - (Generalized) Materialized views
- Fight between column-store and row-store industry
- 5 A row-store can not reach the performance of
a column-store for read-mostly workloads because
column-stores use additional optimizations - 6 We can teach a row-store the same tricks
92Other Storage Models
- PAX (Partition Attributes Across) (see 9)
- Store all attributes of a tuple together in one
disk page - Inside one disk page store tuples partitioned
column-wise - Fractured-mirrors (see 8)
- Keep both representations row-store and column
store - Depending on type of query pick appropriate store
- Rodent-store (see 7)
- Declaratively specify storage layout A table
uses different storage layout for different
attributes (row-wise, column-wise, ) - Generate code for data access optimized for that
layout
93Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
94Compression Query Processing
- Heavyweight compression schemes
- Data is stored highly compressed on disk
- When data is read from disk it is decompressed
immediately - Afterwards query processing operations (i.e.,
filters, joins, ) execute on decompressed data - gt Improve I/O performance (i.e., load data from
disk into main memory) but high CPU-costs - Lightweight compression schemes
- Data is sub-optimally compressed on disk
- When data is read from disk it is not
decompressed immediately - Many query processing operations (i.e., filters,
joins, ) can be executed on compressed data - Optimally, data is only decompressed when the
final query result is built - gt Improve I/O performance and query execution
performance
95Lightweight Compression Schemes (in
Column-Stores)
- Run-length-encoding
- Compresses runs of the same value in a column to
a compact singular representation (value, run
length) - Example
- Store lt3, 3, 3, 3, 10, 10, 4, 4gt as lt(3,4),
(10,2), (4,2)gt - Aggregation (SUM) 3410242 instead of
3333101044 - Dictionary-encoding
- Replace long values (e.g., strings) with smaller
codes - Example
- Replace values female and male of column
gender with 0 and 1 - A column ltmale, male, male, female,
female, male, female, malegt can be stored
as lt1,1,1,0,0,1,0,1gt - Filter gendermale (string compare) is
rewritten as gender1 (integer compare) - Many other compression schemes (see 10)
96Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
97Optimizing for CPU caches
- Memory references do not have uniform costs
anymore - Gap between cache access and main memory access
- gtImprove cache behavior (increase temporal and
spatial cache locality of data and instructions) - Cache conscious data structures and algorithms
- Exploit cache with detailed knowledge about cache
sizes - Specialized approach
- Cache oblivious data structures and algorithms
- Exploit cache without detailed knowledge about
cache sizes - General approach
98CSS-Tree (see 11)
- Cache-Sensitive-Search-Tree Building a
cache-conscious directory over a sorted array of
elements - Improve on B trees
- Store only separators in nodes and no pointers
- gt More separators per node (higher fan-out,
lower height) - All nodes are stored in one contiguous array
- gt Identify array offsets of child nodes by
arithmetic operations - Node size should not exceed cache line
- gt Minimize cache misses when loading a node
- Supports very efficient read operations und bulk
loads but updates are very expensive gt rebuild
tree
99Example CSS-Tree (1)
- Logical representation (Keys per node m3)
- Physical Representation(contiguous array)
Internal nodes
Leaf nodes (not stored)
Sorted Array
E.g., leaf 13 element 22 (sorted array)
Sorted Array
Node 2 (values)
Child Node
9
10
11
12
100Example CSS-Tree (2)
- Lookup value v on node n
- Binary search on node -gt key at pos. k
- Compute offset of child node into array c
n(m1) (k1) - If c gt array of internal nodes then offset into
sorted array of leaves must be calculated (see
11) - Example
- Lookup value 253 on node n2 -gt k1
- Next child node offset -gt c 24 2 10
Logical
Physical
101CSB-Tree (see 12)
- Similar to B-tree
- Store all children of one node in one contiguous
array - Store only one pointer for all children of a node
- Lookup more efficient than for B-tree gt more
keys per node - Compared to CSS-tree
- Lookup is little less efficient
- Updates are much cheaper
- Lookup value v on node n
- Binary search on node -gt key k
- Child node offset c k(m1)
- Lookup value 25 on root -gt Key k 1
- Child node offset-gt c 1 3 3
102Agenda
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans
103Collaborative Scans
- Idea Share scan cursors among queries that have
to scan the same table - Example
- Q1 SELECT SUM(F.price) FROM F, D3 WHERE
F.d3D3.id AND D3.country 'Germany' - Q2SELECT AVG(F.price), D3.region FROM F, D3
WHERE F.d3D3.id AND D3.country 'Germany'
GROUP BY D3.region - gt Q1 and Q2 can share the same scan cursors for
F and D3
104Summary
- Query Processing (over Star Schema)
- Implementation Techniques in RDBMS (ROLAP)
- Materialized Views
- Horizontal-Partitioning
- Bitmap-Indexes
- Join-Indexes
- Optimizations for Memory Hierarchy
- Vertical-Partitioning / DSM
- Compression
- CSS-/CSB-Tree
- Collaborative Scans