Title: Improved Query Performance With Variant Indexes
1Improved Query Performance With Variant Indexes
- Patrick ONeil, Dallan Quass
Presented by Bo Han
2Motivation
- 1. Speed up the queries on data warehoues
- data warehoues are large
- read-mostly
- always perform queries of aggregate, filter, and
group the data
3Motivation
- 2. The first rigorous examination of variant
- indexes in the literature
- Advantages over traditional Value-List
- indexes for certain classes of queries
- More than one type of index available on
- a column
4Motivation
- 3. Introducing a new indexing approach to support
OLAP-type queries - Datacube
- Multi-dimensional query
- Depends on summary tables
5Value-List Index (B tree)
Mianus
Leaf Node
Brighton Downtown
A212 Brighton 750
A101 Downtown 500
A110 Downtown 600
RID
Problem A key values will have large number of
associated RIDs!
6Bitmap Indexes
Bitmap for Brand Dell 1001000 HP
0100101 Sony 0010000 IBM 0000010
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
Bitmap for Size 12 0000011 13
0100000 14 1001100 15 0010000
- A Bitmap for a value an array of bits. The ith
bit is set to 1 - if the ith record has the value
- A Bitmap index consists of one bitmap for
each value that - attribute can take
- A Bitmap is an alternate method of representing
RID-lists in - a Value-List index (low-cardinality)
7Bitmap Indexes
- 1. More space efficient than RID lists in a
Value-List index - No compression
- RID32bits, rown, distinct valuem
- If mlt32 mnlt32n
- Compression Run-length encoding
- 2. More CPU efficient for may functions
- Boolean operations
- ex1 Select Brand From Product
- Where BrandHP and Size13
(AND) - ex2 Select Pid From Product
- Where Sizegt12 and Sizelt15
(OR)
8Bitmap Indexes
- 2. More CPU efficient for may functions
- Count
- Select count() From Product
- Where BrandDell and Sizegt14
- Each individual bitmap is small and frequently
used ones can be cached in memory - 4. Available in most major commercial DBMS
9Projection Index
A projection index for column duplicates all
column values for lookup by ordinal number .
Col1 Col2 v1 v2 . . . vk Col3 Col4
Col2 v1 v2 . . . vk
- Easy to locate
- N1000ps (p page, s slot)
- Few disk I/O
projection index for col2
10Bit-Sliced Index
A set of bitmap slices which are orthogonal to
the data held in a projection index. (i.e. a
bitwise vertical partition)
0 1 0 1 0 1 0 1
1 1 1 1 0 0 0 1
0 0 0 1 1 0 0 0
1 1 1 1 0 0 0 0
0 0 0 1 1 1 0 0
0 0 0 0 0 0 1 1
B5
bit-slice
B4
B3
B2
B1
B0
Col2 20 52 20 62 10 34 1 49
Bnn bitmap representing set of non null
values in the indexed column
11Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
Select SUM(doloar_sales) From Sales Where
condition Analyze the disk page I/O cost Plan 1
Direct access to the rows to calculate the Sum
100million rows, Len(row)200B,
page4K 20rows/page,
Foundset2million rows
Plan 2 Calculating Sum through a Projection
Index Len(doloar_sales)4B,
1000rows/page 100,000 pages
12Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
Plan 3 Calculating the Sum through a
Value-List(Bitmap) Index if (COUNT
(Bf AND Bnn) 0) Return
null SUM 0.0 for each
non-null value v in the index for C
Designate the set of rows with value v as
Bv SUM v COUNT(Bf AND Bv)
Return SUM
Bf 100,000,000bits12,500,000B?3125
pages Bv 100,000,000RIDs of 4 bytes
each?100,000 pages Total
103,125pages
13Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
- Plan 4 Calculating the SUM through a Bit-Sliced
Index - if (COUNT (Bf AND Bnn) 0)
Return null SUM 0.0
for i 0 to N SUM COUNT(Bi
AND Bf) Return SUM - Bf 100,000,000bits12,500,000B?3125
pages - 2 million rows
?21Bitmaps - Total22312568,750 pages
14Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
Method I/O CPU contribution
Add from Rows 1,341 K I/O 2M(25 ins)
Projection index 100K I/O 2M(10 ins)
Value-List index 103K I/O 100M(10 ins)
Bit-Sliced index 69K I/O 197M(1 ins)
15Evaluating Aggregate Function
Aggregate Value-List Index Projection Index Bit-Sliced Index
COUNT Not needed Not needed Not needed
SUM Not bad Good Best
AVG Not bad Good Best
MAX/MIN Best Slow Slow
MEDIAN,N-Tile Usually Best Not Useful Sometimes Best
Column-Product Very Slow Best Very Slow
16Range Evaluation Performance
Range Evaluation Value-List Index Projection Index Bit-Sliced Index
Narrow Range Best Good Good
Wide Range Not Bad Good Best
17Evaluating OLAP-style Queries
- OLAP approach creates precalculates results of
some Grouped Queries and stores them in summary
tables. - The expected set of queries is known in
advance? - Size of data in summary tables grows as the
product of the number of values in the
independent dimensions (space requirement?) -
- How to speed up Join and Group By ?
- Join Indexes and Bitmap-Join-Indexes
18Join Indexes
A join index an index on one table that involves
a column value from different table through a
commonly encountered join.
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
Cid Pid Dollar_sales Unit
0100 0000001 1200 1
0101 0000003 2600 2
0110 0000002 1600 1
0111 0000001 1200 1
Cid Size
0100 14
0111 14
0110 13
0101 15
19Bitmap Join Index
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
A Bitmap Join Index spans multiple tables and
improves query performance between the joined
tables.
Cid Pid Dollar_sales Unit
0100 0000001 1200 1
0101 0000003 2600 2
0110 0000002 1600 1
0111 0000001 1200 1
Dell HP Sony IBM
1 0 0 0
0 0 1 0
0 1 0 0
1 0 0 0
20Bitmap Join Index
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
Select Sum(Dollar_sales) From Sales S
Natural Join Product P
Natural Join Customer C Where
P.BrandDell AND
C.StatePA
Cid State
0100 CA
0101 NY
0110 CA
0111 PA
Cid Pid Dollar_sales Unit
0100 0000001 1200 1
0101 0000003 2600 2
0110 0000002 1600 1
0111 0000001 1200 1
1
0
0
1
0
0
0
1
0
0
0
1
and
21Calculating Groupset Aggregates
- Select Sum(F.A) From S,D1,D2,D3
- Where condition
- Group by D1.d1, D2.d2, D3.d3
- Using Value-List index to determine Groupset
(F.diDi.di, without join!) - Using Projection index on F.A to get SUM(F.A)
22Improved Grouping Efficiency
Problem Groupsets and rows are randomly placed
on disk. Segmentation Partition rows in F into
Segments. Query
evaluation one segment at a time. Clustering
Cluster the fact table F
D1 d1-1 111111111111111111111110000000000000
000000 d1-2 0000000000000000000000011
11111111111111000 D2 d2-1
111111000000000000000001111110000000000000
d2-2 000000111111100000000000000001111111100
000 D3 d3-1 11000011000000000000
0001111110000000000000 d3-2
001100001100000000000000000001111111100000
d3-n3 00001100000110000000000000000
1111111100000 (d1-1, d2-1, d3-1)
11000000000000000000000000000000000000 (d1-1,
d2-1, d3-2) 0011000000000000000000000000000000000
0
Groupset Indexes Keyvalues are a concatenation
of the
dimensional primary-key values
23Conclusion
- Analyze Value-List index, Bitmap index,
Projection index and Bit-Sliced index - Combine Bitmap indexing and physical row
clustering to evaluate OLAP queries involving
aggregation and grouping
24Reference
- Improved Query Performance With Variant Indexes
Patrick ONeil and Dallan Quass, Proc. ACM SIGMOD
Conf. 1997, Pages 38-49. - Bitmap Index Design and Evaluation C.Y. Chan
and Y.E. Ioannidis 1998. 6 - Database System Implementation Hector Garcia
M., Jeffrey D.U. and Jennifer W., Prentice Hall,
2000 - Encoded Bitmap Indexing for Data Warehouses
M.C. Wu and A.P. Buchmann 1998. 2 - An Efficient Bitmap Encoding Scheme for Selection
Queries C.Y. Chan and Y.E. Ioannidis 1998. 6 - Multidimensional Indexing and Query Coordination
for Tertiary Storage Management A. Shoshani and
L.M. Bernardo, etc. 1999. 10 - Multi-Table Joins Through Bitmapped Join Indices
P. ONeil and G. Graefe 1995. 9