Variant Indexes - PowerPoint PPT Presentation

About This Presentation
Title:

Variant Indexes

Description:

Bitmap alternate form for representing RID-lists in a Value-list index ... To optimize Bitmap index access, can be broken into fragments of equal sizes ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 36
Provided by: jst118
Category:

less

Transcript and Presenter's Notes

Title: Variant Indexes


1
Variant Indexes
2
Specialized Indexes?
  • Data warehouses are large databases with data
    integrated from many independent sources.
  • Queries are often complex.
  • Updated in a batch fashion.
  • Goal develop specialized indexes to speed up
    query evaluation.

3
Bitmap Indexes
  • Value-list index same row may appear under many
    key values in index
  • RID-list list a key value once at head, which
    is followed by a long list of RIDs for rows with
    this key value.
  • Bitmap alternate form for representing
    RID-lists in a Value-list index
  • Bitmaps are more space efficient than RID-lists
    when the number of key values is low (1)

4
Creating Bitmap Index
  • For n rows of T r1,,rn, do 1-1 mapping from
    T to ZM, the 1st M positive integers.
  • For a given row r with row j, the table page
    number accessed to retrieve row r is j/p and the
    page slot is jp.

5
  • A Bitmap B is a sequence of M bits. Each row r
    with row j that has property P, we set bit j in
    B to 1, all other bits are zero.
  • A Bitmap index for a column C with values v1,,vk
    is a B tree with entries having these key values
    and associated data portions that contain Bitmaps
    for the properties Cv1,,Cvk.

6
  • Dense Bitmaps if the proportion of 1-bits in
    the Bitmap is large.
  • A Bitmap index for a column with 32 values will
    have average density 1/32. Space needed for
    Bitmap is comparable to disk space needed for
    RID-list index, which requires 32 bits for each
    RID present.
  • For a column index with a very small of values,
    the Bitmaps will have high densities and disk
    savings is enormous.

7
  • Boolean operations are extremely fast.
  • Can use a bunch of for loops to execute. For long
    Bitmaps, take fragments out from disk. (2)
  • Loops for Bitmap AND, OR, NOT, or COUNT are
    extremely fast compared to loop operations on RID
    lists. (3)

8
Segmentation
  • To optimize Bitmap index access, can be broken
    into fragments of equal sizes
  • RIDs can be small bytes in length, they specify
    position in segment. At beginning, high order
    bits can be specified. (4)
  • If one of predicate indexes has no pointer to a
    Bitmap Fragment for a segment, then the segment
    fragments for the other indexes can be ignored as
    well, can save a lot of I/O.

9
  • Bitmap representations and RID-list
    representations are interchangeable, when Bitmap
    representations are dense, Bitmaps are much more
    efficient than RID-lists, for storage use and
    Boolean operations. When too sparse, Bitmap index
    can contain RID-list.

10
Projection Indexes
  • Say C is a column of a table T then the
    Projection index on C consists of a stored
    sequence of column values from C, in order by row
    number in T from which the values are extracted.

11
  • Projection index very efficient when column
    values must be retrieved from all rows of a found
    set.
  • For example, say column values are 4 bytes in
    length, 1000 values will fit on a 4KB page. Say
    each row is 200 bytes, then using index requires
    1/20 I/0s of getting whole rows.

12
  • Like vertical partitioning, good when small of
    columns are retrieved by most SELECT statements,
    bad when most queries retrieve most of the
    columns. (5)

13
Bit-Sliced Indexes
  • A Bit-Sliced index stores a set of Bitmap
    slices which are orthogonal to the data held in
    a Projection index. Good for calculating
    aggregates in a found set.
  • Each individual Bitmap Bi is called a Bit-slice
    of the column. There is one Bitmap for Bnn for
    not null and one Bitmap for Bn for null (6)

14
Comparing Indexes for Aggregate Evaluation
  • SELECT SUM (dollar_sales)
  • FROM SALES
  • WHERE condition

15
  • Say found set Bf (2 million rows) is already
    determined.
  • SALES table has 100 million rows, which are each
    200 bytes in length, stored 20 rows on a 4KB disk
    page.

16
Query Plan 1
  • Direct access to rows to calculate SUM
  • Each disk page contains 20 rows, have 5,000,000
    disk pages occupied by SALES table (100 millon
    rows / 20 rows). Since Bf represents only 1/50
    rows of SALES table, disk pages occupies is
    5,000,000(1 e-2,000,000/5,000,000)
    1,648,4000.

17
  • This means that, if one disk arm retrieves 100
    disk pages per second, you will need 4 hours of
    disk arm use! (1,648,400/100 16,484 seconds 4
    hours)

18
Query Plan 2
  • Get SUM with Projection Index
  • The dollar_sales Projection index has 1000 values
    per 4KB disk page. Requires 100,000 disk pages.
    This equals 1000 seconds (17 minutes)

19
Query Plan 3
  • Get SUM with Value-list index on dollar_sales.
  • Can calculate SUM(dollar_sales) for our Foundset
    by ranging through all possible values in the
    index and determining the rows with each value,
    the determining how many rows with each value are
    in the Foundset, and finally multiplying that
    count by the value and adding to the SUM.

20
  • IF (COUNT (Bf AND Bnm) 0)
  • Return null
  • SUM 0.00
  • For each non-null value v in index for C
  • Designate set of rows with value v as Bv
  • SUM v COUNT(Bf AND Bv)
  • return SUM

21
  • Good if entire Bitmaps are in memory.
  • Value-list index evaluation of SUM ( C ) requires
    10,000 Bitmap ANDs and 10,000 COUNTS.
  • Requires 103,125 I/Os and a loop of length
    100,000,000.
  • Requires 1,340,640 disk pages

22
Query Plan 4
  • Get SUM with Bit-Sliced index
  • IF (COUNT (Bf AND Bnn) 0)
  • Return null
  • SUM 0.00
  • For I 0 to N
  • SUM 2 (i power) COUNT (Bi AND Bf)
  • return SUM

23
  • Perform 21 ANDs and 21 COUNTs of 100,000,000 bit
    Bitmaps. Each Bitmap is 12.5 Mbytes in length,
    requiring 3125 I/Os, but we assume that Bf can
    remain in memory. Thus need to read 22 Bitmaps
    from disk 223125 68,750 I/Os. For CPU, need
    to AND 21 pairs of Bitmaps, total of loop
    passes 21(100,000,000/32) 65,625, 000. Need
    to perform 21 COUNTs, with 131,250,000 passes.

24
Comparing Algorithm Performance METHOD
I/0 CPU contributions Add from
rows 1,342 K I/O 2M(25 ins) Projection
index 100 K I/O 2M(10 ins) Value-List
index 103 K I/O 100M(10 ins) Bit-Sliced
index 69 K I/O 197M(1 ins)
25
Aggregate Functions Index Types for
Evaluation Aggregate
Value-List Projection Bit-Sliced COUNT Not
needed Not needed Not needed SUM Not
Bad Good Best AVG (SUM/COUNT) Not
Bad Good Best MAX MIN Best Slow Slow MEDIAN,
N-TILE Best No Sometimes Column-Product Slow
Best Slow
26
Range Predicates
  • Consider
  • SELECT target-list FROM T
  • WHERE C-range AND ltconditiongt
  • Want to further restrict Foundset Bf so that the
    compound predicate C-range AND ltconditiongt
    holds. This will be represented by BF.

27
Using Projection Index
  • Can create BF by accessing each C value in the
    index corresponding to a row number in BF and
    testing whether it lies within the specified
    range.

28
Value-List Index
  • Br the empty set
  • For each entry v in the Index for C that
    satisfies the range specified
  • Designate the set of rows with the value v as
    Bv
  • Br Br OR Bv
  • BF Bf AND Br
  • (Br needs to be in memory all times for
    efficiency)

29
Bit-Sliced Index
  • The algorithm is efficient, it comprises a bunch
    of Boolean operations for each bit-slice of a
    column C.

30
Range Evaluation Performance by Index Type
  • Range Evaluation Value-List Projection Bit-Sliced
  • Narrow Range Best Good Good
  • Wide Range Not Bad Good Best

31
Join Indexes
  • A Join index is an index on one table for a
    quantity that involves a column value of a
    different table through a commonly encountered
    join.
  • Join indexes can be used to avoid actual joins of
    tables, or to greatly reduce the volume of data
    that must be joined, by performing restrictions
    in advance.

32
Calculating Groupset Aggregates
  • Once the Foundset has been computed from the
    Where clause, the bits in Foundset must be
    partitioned into Groupsets. Any aggregate
    functions are then evaluated separately over
    these different Groupsets.
  • How to compute Groupset aggregates using our
    different index types.

33
Projection Indexes
  • Assume Projection indexes exist for each of the
    group-by columns.
  • For each Foundset, read appropriate Projection
    indexes. Read values of the columns to be
    aggregated from index, and aggregate the result
    into the proper cell of the memory-resident array.

34
Value-List Indexes
  • Grouping by columns D1.A, D2.B using Value-List
    index
  • For each entry v1 in the Value-List index for
    D1.A
  • For each entry v2 in the Value-List index for
    D2.B
  • Bg Bv1 AND Bv2 AND Bf
  • Evaluate AGG( F.C ) on Bg

35
Conclusion
  • For read-only systems, we could use more complex
    index structures to speed up queries. The Bitmap,
    Projection, and Bit-Slice indexes serve this
    purpose.
Write a Comment
User Comments (0)
About PowerShow.com