Review for Midterm 2

1 / 29
About This Presentation
Title:

Review for Midterm 2

Description:

Get the Value-List for 'Big Mac' using the B -tree, obtain RID-List1. ... W is an adjustable weighting factor between I/O and CPU. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 30
Provided by: vishalt

less

Transcript and Presenter's Notes

Title: Review for Midterm 2


1
Review for Midterm 2
  • Shahram Ghandeharizadeh

2
Midterm 2
  • Scheduled for April 30th
  • 4 papers
  • Variant indexes.
  • Access path selection.
  • Overview of query optimization.
  • Mining Association Rules.
  • Paper on cache management is not included because
    it was covered by your project.
  • Midterm 2 is worth 35 of your grade.

3
Variant Indexes
  • A read-mostly database that is updated
    infrequently.
  • Complex indexes to speedup queries.
  • Focuses on physical designs to enhance
    performance.

4
Example Data Warehouse
  • Key Observations
  • A handful of products, a PROD table with tens of
    rows.
  • Many millions of rows for SALES tables.

SALES
TIME
Cid Pid Day Amt dollar_cost Unit_sales
PROD
Day Week Month Year Holliday Weekday
Pid Name Size Weight Package_type
5
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
6
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
RID List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
7
Conjunctive Queries
  • Count number of Big Mac Sales on Presidents
    Day assuming a B-tree on product (pid) and day
    of SALES
  • With RID-Lists
  • Get the Value-List for Big Mac using the
    B-tree, obtain RID-List1.
  • Get the Value-List for Presidents Day using
    the B-tree, obtain RID-List2.
  • Compute set-intersect of RID-List1 and RID-List2
  • Count the number of RIDs in the intersection set.
  • Is there a better way?
  • Yes, use bit-maps and logical bit-wise operands.

8
Bitmap Indexes
  • Use a bitmap to represent the existence of a
    record with a certain attribute value.
  • Example If a record has the indexed attribute
    value Big Mac then its corresponding entry in
    the bitmap is set to one. Otherwise, it is a
    zero.

9
A Bitmap
  • A Bitmap B is defined on T as a sequence of M
    bits.
  • For each row r with row number j that has the
    property P, we set bit j in B to one all other
    bits are set to zero.
  • Assuming fix sized disk pages that hold p
    records, RID of record j is (j/p, jp). Page is
    j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 0
10
A Bitmap
  • A Bitmap B is defined on T as a sequence of M
    bits.
  • For each row r with row number j that has the
    property P, we set bit j in B to one all other
    bits are set to zero.
  • Assuming fix sized disk pages that hold p
    records, RID of record j is (j/p, jp). Page is
    j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 1
11
A Bitmap
  • A Bitmap B is defined on T as a sequence of M
    bits.
  • For each row r with row number j that has the
    property P, we set bit j in B to one all other
    bits are set to zero.
  • Assuming fix sized disk pages that hold p
    records, RID of record j is (j/p, jp). Page is
    j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 2
12
A B-Tree on Major Holidays
  • A B-tree index on different holidays of the
    SALES table.

B-tree Leaf page
(Pres day, 01111111.
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
13
Conjunctive Queries
  • Count number of Big Mac Sales on Presidents
    Day assuming a B-tree on product (pid) and day
    of SALES
  • With RID
  • Get the Value-List for Big Mac using the
    B-tree, obtain RID-List1.
  • Get the Value-List for Presidents Day using
    the B-tree, obtain RID-List2.
  • Compute set-intersect of RID-List1 and RID-List2
  • Count the number of RIDs in the intersection set.
  • With bit maps
  • Get the Value-List for Big Mac using the
    B-tree, obtain bit-map1.
  • Get the Value-List for Presidents Day using
    the B-tree, obtain bit-map2.
  • Recall Existence Bitmap (EBM) identify rows that
    exist.
  • Let RES logical AND of bit-map1, bit-map2, and
    EBM.
  • Count the number of bits set to one to identify
    how many Big Macs were sold on Presidents Day.

14
Variant Indexes
  • Midterm 2 ignores
  • MEDIAN, N-TILE, Column-Product as aggregates.
  • Section 5.

15
Access Path Selection
  • Formulates a cost prediction for each access
    plan, using the following cost formula
  • COST Page fetches W (RSI Calls)
  • W is an adjustable weighting factor between I/O
    and CPU.
  • RSI calls is an approximation for CPU
    utilization.
  • Assumptions
  • WHERE tree is considered to be in conjunctive
    normal form,
  • Every disjunct is called a boolean factor.

16
How?
  • Enumerating the different execution plans,
  • Estimate the cost of performing each plan,
  • Pick the cheapest plan.
  • Definition of cost is as follows
  • COST Page fetches W (RSI Calls)

17
Clustered B-Tree
  • A B-tree on the gpa attribute

3.6
(3.7, (3, 1))
(3.9, (4,1))
(2.3, (1, 1))
(3, (2,1))
(3.8, (3,2))
(3.9, (4,2))
(2.5, (1,2))
(3.1, (2,2))
(3.8, (3,3))
(4, (4,3))
(2.8, (1,3))
(3.2, (2,3)
(3.8, (3,4))
(2.8, (1,4))
(4, (4,4))
(3.5, (2,4))
Bob, 21, 3.7, CS
Mary, 24, 3, ECE
Chris, 22, 3.9, CS
Chad, 28, 2.3, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
James, 24, 3.1, ME
Chang, 18, 2.5, CS
Kane, 19, 3.8, ME
Lam, 22, 2.8, ME
Louis, 32, 4, LS
Tom, 20, 3.2, EE
Martha, 29, 3.8, CS
Pat, 19, 2.8, EE
Leila, 20, 3.5, LS
Shideh, 16, 4, CS
18
Non-Clustered B-Tree
  • A random I/O for every qualifying record

3.6
(3.7, (1, 1))
(3.9, (4,1))
(2.3, (4, 2))
(3, (1,2))
(3.8, (3,2))
(3.9, (2,4))
(2.5, (2,3))
(3.1, (3,3))
(3.8, (2,1))
(4, (3,1))
(2.8, (2,2))
(3.2, (1,3)
(3.8, (1,4))
(2.8, (3,4))
(4, (4,4))
(3.5, (4,3))
Bob, 21, 3.7, CS
Kane, 19, 3.8, ME
Louis, 32, 4, LS
Chris, 22, 3.9, CS
Mary, 24, 3, ECE
Lam, 22, 2.8, ME
Martha, 29, 3.8, CS
Chad, 28, 2.3, LS
Tom, 20, 3.2, EE
Chang, 18, 2.5, CS
James, 24, 3.1, ME
Leila, 20, 3.5, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
Pat, 19, 2.8, EE
Shideh, 16, 4, CS
19
Questions
  • How are relations and segments related? If slide
    4, you state segments may contain more than one
    relation, and then the next bullet says at most
    one relation per segment.. What is going on?

20
Questions
  • How are relations and segments related? If slide
    4, you state segments may contain more than one
    relation, and then the next bullet says at most
    one relation per segment.. What is going on?
  • Best clarified with an example
  • Segment 1 may contain the Emp, Dept, and Revenues
    tables/relations.
  • The Emp relation can be assigned to Segment 1
    only. It may NOT be assigned to both Segments 1
    and 2.

21
Questions
  • NINDX(I), the number of pages in index I.
  • TCARD(T), the number of pages in the segment that
    hold tuples of relation T.
  • The cost of retrieving a range of records from a
    clustered B-tree Should not this be
    (depth_of_B-tree F(pred) TCARD) or something
    related to the depth of the tree rather than
    NINDX, since you only have to navigate through to
    the leaf nodes once and do a record scan once
    youve reached the correct leaf node?

22
Question
  • NINDX(I), the number of pages in index I.
  • TCARD(T), the number of pages in the segment that
    hold tuples of relation T.
  • Should not the cost for a non-clustered B-tree
    also involve the depth of the tree rather than
    NINDX?

23
Question
  • NINDX(I), the number of pages in index I.
  • TCARD(T), the number of pages in the segment that
    hold tuples of relation T.
  • Should not the cost for a non-clustered B-tree
    also involve the depth of the tree rather than
    NINDX?
  • It should include the depth of the tree.
  • NINDX must be included because the leaf pages of
    the B-tree must be visited for the qualifying
    records.

24
Questions
  • NINDX(I), the number of pages in index I.
  • TCARD(T), the number of pages in the segment that
    hold tuples of relation T.
  • Why is the cost of a merge-scan NINDX(R)
    NINDX(S) rather than the sum of segment scans of
    R and S, since you still have to visit every page
    that contains tuples of R and S?

25
Questions
  • NINDX(I), the number of pages in index I.
  • TCARD(T), the number of pages in the segment that
    hold tuples of relation T.
  • Why is the cost of a merge-scan NINDX(R)
    NINDX(S) rather than the sum of segment scans of
    R and S, since you still have to visit every page
    that contains tuples of R and S?
  • The merge-scan employs the sorted order of the
    entries in the leaf pages of the B-tree index
    structures.

26
Overview of Query Optimization
  • Extends discussion to
  • Correlation queries, use of outer-join to
    flatten nested queries.
  • Raises more questions than providing answers.
  • A good starting point for
  • A practitioner who wants to build an optimizer
    for a relational DBMS.
  • A Ph.D. student interested in writing a
    dissertation in the area of query optimization
    techniques.

27
Mining Association Rules
  • Objective Discover association Rule over basket
    data.
  • Motivation valuable for cross-marketing and
    attached mailing applications.
  • Example 98 of customers who purchase tires and
    auto accessories also get automotive services
    done.
  • Key contributions
  • Fast algorithms
  • Apriori, AprioriTid, and AprioriHybrid
  • Pay attention to terminology, definitions, and
    the general framework.

28
Database Systems
  • This course has introduced you to
  • A storage manager and its use, BDB.
  • Spatial indexing, R-Trees.
  • Parallel DBMS.
  • Alternative technologies for applications that do
    not require ACID transactions.
  • Google FS, MapReduce, etc.
  • Optimization techniques for relational DBMSs.
  • Knowledge Discovery.
  • Efficient query processing techniques.
  • RAID and Use of flash memory in enterprises
  • Steve Kleimans lecture.

29
Database Systems
  • Many important topics remain
  • Data mining, Data cubes, Data visualization
    techniques.
  • Papers are from 1990s Neumerous follow-on
    papers!
Write a Comment
User Comments (0)