Improved Query Performance With Variant Indexes - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Improved Query Performance With Variant Indexes

Description:

Improved Query Performance With Variant Indexes Patrick O Neil, Dallan Quass Presented by Bo Han Motivation 1. Speed up the queries on data warehoues data warehoues ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 25
Provided by: Gues5246
Learn more at: https://cis.temple.edu
Category:

less

Transcript and Presenter's Notes

Title: Improved Query Performance With Variant Indexes


1
Improved Query Performance With Variant Indexes
  • Patrick ONeil, Dallan Quass

Presented by Bo Han
2
Motivation
  • 1. Speed up the queries on data warehoues
  • data warehoues are large
  • read-mostly
  • always perform queries of aggregate, filter, and
    group the data

3
Motivation
  • 2. The first rigorous examination of variant
  • indexes in the literature
  • Advantages over traditional Value-List
  • indexes for certain classes of queries
  • More than one type of index available on
  • a column

4
Motivation
  • 3. Introducing a new indexing approach to support
    OLAP-type queries
  • Datacube
  • Multi-dimensional query
  • Depends on summary tables

5
Value-List Index (B tree)
Mianus
Leaf Node
Brighton Downtown
A212 Brighton 750
A101 Downtown 500
A110 Downtown 600

RID
Problem A key values will have large number of
associated RIDs!
6
Bitmap Indexes
Bitmap for Brand Dell 1001000 HP
0100101 Sony 0010000 IBM 0000010
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
Bitmap for Size 12 0000011 13
0100000 14 1001100 15 0010000
  • A Bitmap for a value an array of bits. The ith
    bit is set to 1
  • if the ith record has the value
  • A Bitmap index consists of one bitmap for
    each value that
  • attribute can take
  • A Bitmap is an alternate method of representing
    RID-lists in
  • a Value-List index (low-cardinality)

7
Bitmap Indexes
  • 1. More space efficient than RID lists in a
    Value-List index
  • No compression
  • RID32bits, rown, distinct valuem
  • If mlt32 mnlt32n
  • Compression Run-length encoding
  • 2. More CPU efficient for may functions
  • Boolean operations
  • ex1 Select Brand From Product
  • Where BrandHP and Size13
    (AND)
  • ex2 Select Pid From Product
  • Where Sizegt12 and Sizelt15
    (OR)

8
Bitmap Indexes
  • 2. More CPU efficient for may functions
  • Count
  • Select count() From Product
  • Where BrandDell and Sizegt14
  • Each individual bitmap is small and frequently
    used ones can be cached in memory
  • 4. Available in most major commercial DBMS

9
Projection Index
A projection index for column duplicates all
column values for lookup by ordinal number .

Col1 Col2 v1 v2 . . . vk Col3 Col4
Col2 v1 v2 . . . vk
  • Easy to locate
  • N1000ps (p page, s slot)
  • Few disk I/O

projection index for col2
10
Bit-Sliced Index
A set of bitmap slices which are orthogonal to
the data held in a projection index. (i.e. a
bitwise vertical partition)
0 1 0 1 0 1 0 1
1 1 1 1 0 0 0 1
0 0 0 1 1 0 0 0
1 1 1 1 0 0 0 0
0 0 0 1 1 1 0 0
0 0 0 0 0 0 1 1
B5
bit-slice
B4
B3
B2
B1
B0
Col2 20 52 20 62 10 34 1 49
Bnn bitmap representing set of non null
values in the indexed column
11
Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
Select SUM(doloar_sales) From Sales Where
condition Analyze the disk page I/O cost Plan 1
Direct access to the rows to calculate the Sum
100million rows, Len(row)200B,
page4K 20rows/page,
Foundset2million rows
Plan 2 Calculating Sum through a Projection
Index Len(doloar_sales)4B,
1000rows/page 100,000 pages
12
Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
Plan 3 Calculating the Sum through a
Value-List(Bitmap) Index if (COUNT
(Bf AND Bnn) 0) Return
null SUM 0.0 for each
non-null value v in the index for C
Designate the set of rows with value v as
Bv SUM v COUNT(Bf AND Bv)
Return SUM
Bf 100,000,000bits12,500,000B?3125
pages Bv 100,000,000RIDs of 4 bytes
each?100,000 pages Total
103,125pages
13
Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
  • Plan 4 Calculating the SUM through a Bit-Sliced
    Index
  • if (COUNT (Bf AND Bnn) 0)
    Return null SUM 0.0
    for i 0 to N SUM COUNT(Bi
    AND Bf) Return SUM
  • Bf 100,000,000bits12,500,000B?3125
    pages
  • 2 million rows
    ?21Bitmaps
  • Total22312568,750 pages

14
Comparison of Indexes (evaluating Single-Column
Sum Aggregates)
Method I/O CPU contribution
Add from Rows 1,341 K I/O 2M(25 ins)
Projection index 100K I/O 2M(10 ins)
Value-List index 103K I/O 100M(10 ins)
Bit-Sliced index 69K I/O 197M(1 ins)
15
Evaluating Aggregate Function
Aggregate Value-List Index Projection Index Bit-Sliced Index
COUNT Not needed Not needed Not needed
SUM Not bad Good Best
AVG Not bad Good Best
MAX/MIN Best Slow Slow
MEDIAN,N-Tile Usually Best Not Useful Sometimes Best
Column-Product Very Slow Best Very Slow
16
Range Evaluation Performance
Range Evaluation Value-List Index Projection Index Bit-Sliced Index
Narrow Range Best Good Good
Wide Range Not Bad Good Best
17
Evaluating OLAP-style Queries
  • OLAP approach creates precalculates results of
    some Grouped Queries and stores them in summary
    tables.
  • The expected set of queries is known in
    advance?
  • Size of data in summary tables grows as the
    product of the number of values in the
    independent dimensions (space requirement?)
  • How to speed up Join and Group By ?
  • Join Indexes and Bitmap-Join-Indexes

18
Join Indexes
A join index an index on one table that involves
a column value from different table through a
commonly encountered join.
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
Cid Pid Dollar_sales Unit
0100 0000001 1200 1
0101 0000003 2600 2
0110 0000002 1600 1
0111 0000001 1200 1
Cid Size
0100 14
0111 14
0110 13
0101 15
19
Bitmap Join Index
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
A Bitmap Join Index spans multiple tables and
improves query performance between the joined
tables.
Cid Pid Dollar_sales Unit
0100 0000001 1200 1
0101 0000003 2600 2
0110 0000002 1600 1
0111 0000001 1200 1
Dell HP Sony IBM
1 0 0 0
0 0 1 0
0 1 0 0
1 0 0 0
20
Bitmap Join Index
Pid Brand P_type Size
0000001 Dell P1 14
0000002 HP P2 13
0000003 Sony P1 15
0000004 Dell P1 14
0000005 HP P2 14
0000006 IBM P3 12
0000007 HP P2 12
Select Sum(Dollar_sales) From Sales S
Natural Join Product P
Natural Join Customer C Where
P.BrandDell AND
C.StatePA
Cid State
0100 CA
0101 NY
0110 CA
0111 PA
Cid Pid Dollar_sales Unit
0100 0000001 1200 1
0101 0000003 2600 2
0110 0000002 1600 1
0111 0000001 1200 1
1
0
0
1
0
0
0
1
0
0
0
1

and
21
Calculating Groupset Aggregates
  • Select Sum(F.A) From S,D1,D2,D3
  • Where condition
  • Group by D1.d1, D2.d2, D3.d3
  • Using Value-List index to determine Groupset
    (F.diDi.di, without join!)
  • Using Projection index on F.A to get SUM(F.A)

22
Improved Grouping Efficiency
Problem Groupsets and rows are randomly placed
on disk. Segmentation Partition rows in F into
Segments. Query
evaluation one segment at a time. Clustering
Cluster the fact table F
D1 d1-1 111111111111111111111110000000000000
000000 d1-2 0000000000000000000000011
11111111111111000 D2 d2-1
111111000000000000000001111110000000000000
d2-2 000000111111100000000000000001111111100
000 D3 d3-1 11000011000000000000
0001111110000000000000 d3-2
001100001100000000000000000001111111100000
d3-n3 00001100000110000000000000000
1111111100000 (d1-1, d2-1, d3-1)
11000000000000000000000000000000000000 (d1-1,
d2-1, d3-2) 0011000000000000000000000000000000000
0
Groupset Indexes Keyvalues are a concatenation
of the
dimensional primary-key values
23
Conclusion
  • Analyze Value-List index, Bitmap index,
    Projection index and Bit-Sliced index
  • Combine Bitmap indexing and physical row
    clustering to evaluate OLAP queries involving
    aggregation and grouping

24
Reference
  • Improved Query Performance With Variant Indexes
    Patrick ONeil and Dallan Quass, Proc. ACM SIGMOD
    Conf. 1997, Pages 38-49.
  • Bitmap Index Design and Evaluation C.Y. Chan
    and Y.E. Ioannidis 1998. 6
  • Database System Implementation Hector Garcia
    M., Jeffrey D.U. and Jennifer W., Prentice Hall,
    2000
  • Encoded Bitmap Indexing for Data Warehouses
    M.C. Wu and A.P. Buchmann 1998. 2
  • An Efficient Bitmap Encoding Scheme for Selection
    Queries C.Y. Chan and Y.E. Ioannidis 1998. 6
  • Multidimensional Indexing and Query Coordination
    for Tertiary Storage Management A. Shoshani and
    L.M. Bernardo, etc. 1999. 10
  • Multi-Table Joins Through Bitmapped Join Indices
    P. ONeil and G. Graefe 1995. 9
Write a Comment
User Comments (0)
About PowerShow.com