Title: Variant Indexes
1Variant Indexes
2Specialized Indexes?
- Data warehouses are large databases with data
integrated from many independent sources. - Queries are often complex.
- Updated in a batch fashion.
- Goal develop specialized indexes to speed up
query evaluation.
3Bitmap Indexes
- Value-list index same row may appear under many
key values in index - RID-list list a key value once at head, which
is followed by a long list of RIDs for rows with
this key value. - Bitmap alternate form for representing
RID-lists in a Value-list index - Bitmaps are more space efficient than RID-lists
when the number of key values is low (1)
4Creating Bitmap Index
- For n rows of T r1,,rn, do 1-1 mapping from
T to ZM, the 1st M positive integers. - For a given row r with row j, the table page
number accessed to retrieve row r is j/p and the
page slot is jp.
5- A Bitmap B is a sequence of M bits. Each row r
with row j that has property P, we set bit j in
B to 1, all other bits are zero. - A Bitmap index for a column C with values v1,,vk
is a B tree with entries having these key values
and associated data portions that contain Bitmaps
for the properties Cv1,,Cvk.
6- Dense Bitmaps if the proportion of 1-bits in
the Bitmap is large. - A Bitmap index for a column with 32 values will
have average density 1/32. Space needed for
Bitmap is comparable to disk space needed for
RID-list index, which requires 32 bits for each
RID present. - For a column index with a very small of values,
the Bitmaps will have high densities and disk
savings is enormous.
7- Boolean operations are extremely fast.
- Can use a bunch of for loops to execute. For long
Bitmaps, take fragments out from disk. (2) - Loops for Bitmap AND, OR, NOT, or COUNT are
extremely fast compared to loop operations on RID
lists. (3)
8Segmentation
- To optimize Bitmap index access, can be broken
into fragments of equal sizes - RIDs can be small bytes in length, they specify
position in segment. At beginning, high order
bits can be specified. (4) - If one of predicate indexes has no pointer to a
Bitmap Fragment for a segment, then the segment
fragments for the other indexes can be ignored as
well, can save a lot of I/O.
9- Bitmap representations and RID-list
representations are interchangeable, when Bitmap
representations are dense, Bitmaps are much more
efficient than RID-lists, for storage use and
Boolean operations. When too sparse, Bitmap index
can contain RID-list.
10Projection Indexes
- Say C is a column of a table T then the
Projection index on C consists of a stored
sequence of column values from C, in order by row
number in T from which the values are extracted.
11- Projection index very efficient when column
values must be retrieved from all rows of a found
set. - For example, say column values are 4 bytes in
length, 1000 values will fit on a 4KB page. Say
each row is 200 bytes, then using index requires
1/20 I/0s of getting whole rows.
12- Like vertical partitioning, good when small of
columns are retrieved by most SELECT statements,
bad when most queries retrieve most of the
columns. (5)
13Bit-Sliced Indexes
- A Bit-Sliced index stores a set of Bitmap
slices which are orthogonal to the data held in
a Projection index. Good for calculating
aggregates in a found set. - Each individual Bitmap Bi is called a Bit-slice
of the column. There is one Bitmap for Bnn for
not null and one Bitmap for Bn for null (6)
14Comparing Indexes for Aggregate Evaluation
- SELECT SUM (dollar_sales)
- FROM SALES
- WHERE condition
15- Say found set Bf (2 million rows) is already
determined. - SALES table has 100 million rows, which are each
200 bytes in length, stored 20 rows on a 4KB disk
page.
16Query Plan 1
- Direct access to rows to calculate SUM
- Each disk page contains 20 rows, have 5,000,000
disk pages occupied by SALES table (100 millon
rows / 20 rows). Since Bf represents only 1/50
rows of SALES table, disk pages occupies is
5,000,000(1 e-2,000,000/5,000,000)
1,648,4000.
17- This means that, if one disk arm retrieves 100
disk pages per second, you will need 4 hours of
disk arm use! (1,648,400/100 16,484 seconds 4
hours)
18Query Plan 2
- Get SUM with Projection Index
- The dollar_sales Projection index has 1000 values
per 4KB disk page. Requires 100,000 disk pages.
This equals 1000 seconds (17 minutes)
19Query Plan 3
- Get SUM with Value-list index on dollar_sales.
- Can calculate SUM(dollar_sales) for our Foundset
by ranging through all possible values in the
index and determining the rows with each value,
the determining how many rows with each value are
in the Foundset, and finally multiplying that
count by the value and adding to the SUM.
20- IF (COUNT (Bf AND Bnm) 0)
- Return null
- SUM 0.00
- For each non-null value v in index for C
- Designate set of rows with value v as Bv
- SUM v COUNT(Bf AND Bv)
-
- return SUM
21- Good if entire Bitmaps are in memory.
- Value-list index evaluation of SUM ( C ) requires
10,000 Bitmap ANDs and 10,000 COUNTS. - Requires 103,125 I/Os and a loop of length
100,000,000. - Requires 1,340,640 disk pages
22Query Plan 4
- Get SUM with Bit-Sliced index
- IF (COUNT (Bf AND Bnn) 0)
- Return null
- SUM 0.00
- For I 0 to N
- SUM 2 (i power) COUNT (Bi AND Bf)
- return SUM
23- Perform 21 ANDs and 21 COUNTs of 100,000,000 bit
Bitmaps. Each Bitmap is 12.5 Mbytes in length,
requiring 3125 I/Os, but we assume that Bf can
remain in memory. Thus need to read 22 Bitmaps
from disk 223125 68,750 I/Os. For CPU, need
to AND 21 pairs of Bitmaps, total of loop
passes 21(100,000,000/32) 65,625, 000. Need
to perform 21 COUNTs, with 131,250,000 passes.
24Comparing Algorithm Performance METHOD
I/0 CPU contributions Add from
rows 1,342 K I/O 2M(25 ins) Projection
index 100 K I/O 2M(10 ins) Value-List
index 103 K I/O 100M(10 ins) Bit-Sliced
index 69 K I/O 197M(1 ins)
25Aggregate Functions Index Types for
Evaluation Aggregate
Value-List Projection Bit-Sliced COUNT Not
needed Not needed Not needed SUM Not
Bad Good Best AVG (SUM/COUNT) Not
Bad Good Best MAX MIN Best Slow Slow MEDIAN,
N-TILE Best No Sometimes Column-Product Slow
Best Slow
26Range Predicates
- Consider
- SELECT target-list FROM T
- WHERE C-range AND ltconditiongt
- Want to further restrict Foundset Bf so that the
compound predicate C-range AND ltconditiongt
holds. This will be represented by BF.
27Using Projection Index
- Can create BF by accessing each C value in the
index corresponding to a row number in BF and
testing whether it lies within the specified
range.
28Value-List Index
- Br the empty set
- For each entry v in the Index for C that
satisfies the range specified - Designate the set of rows with the value v as
Bv - Br Br OR Bv
- BF Bf AND Br
- (Br needs to be in memory all times for
efficiency)
29Bit-Sliced Index
- The algorithm is efficient, it comprises a bunch
of Boolean operations for each bit-slice of a
column C.
30Range Evaluation Performance by Index Type
- Range Evaluation Value-List Projection Bit-Sliced
- Narrow Range Best Good Good
- Wide Range Not Bad Good Best
31Join Indexes
- A Join index is an index on one table for a
quantity that involves a column value of a
different table through a commonly encountered
join. - Join indexes can be used to avoid actual joins of
tables, or to greatly reduce the volume of data
that must be joined, by performing restrictions
in advance.
32Calculating Groupset Aggregates
- Once the Foundset has been computed from the
Where clause, the bits in Foundset must be
partitioned into Groupsets. Any aggregate
functions are then evaluated separately over
these different Groupsets. - How to compute Groupset aggregates using our
different index types.
33Projection Indexes
- Assume Projection indexes exist for each of the
group-by columns. - For each Foundset, read appropriate Projection
indexes. Read values of the columns to be
aggregated from index, and aggregate the result
into the proper cell of the memory-resident array.
34Value-List Indexes
- Grouping by columns D1.A, D2.B using Value-List
index -
- For each entry v1 in the Value-List index for
D1.A - For each entry v2 in the Value-List index for
D2.B - Bg Bv1 AND Bv2 AND Bf
- Evaluate AGG( F.C ) on Bg
35Conclusion
- For read-only systems, we could use more complex
index structures to speed up queries. The Bitmap,
Projection, and Bit-Slice indexes serve this
purpose.