Review for Midterm 2 presentation

About This Presentation

Transcript and Presenter's Notes

Title: Review for Midterm 2

1
Review for Midterm 2

Shahram Ghandeharizadeh

2
Midterm 2

Scheduled for April 30th
4 papers
Variant indexes.
Access path selection.
Overview of query optimization.
Mining Association Rules.
Paper on cache management is not included because
it was covered by your project.
Midterm 2 is worth 35 of your grade.

3
Variant Indexes

A read-mostly database that is updated
infrequently.
Complex indexes to speedup queries.
Focuses on physical designs to enhance
performance.

4
Example Data Warehouse

Key Observations
A handful of products, a PROD table with tens of
rows.
Many millions of rows for SALES tables.

SALES
TIME
Cid Pid Day Amt dollar_cost Unit_sales
PROD
Day Week Month Year Holliday Weekday
Pid Name Size Weight Package_type
5
A B-Tree on Major Holidays

A B-tree index on different holidays of the
SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
6
A B-Tree on Major Holidays

A B-tree index on different holidays of the
SALES table.

B-tree Leaf page
(Pres day, (1,2), (1, 3), (1, 4), (2,1), .
Value List
RID List
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
7
Conjunctive Queries

Count number of Big Mac Sales on Presidents
Day assuming a B-tree on product (pid) and day
of SALES
With RID-Lists
Get the Value-List for Big Mac using the
B-tree, obtain RID-List1.
Get the Value-List for Presidents Day using
the B-tree, obtain RID-List2.
Compute set-intersect of RID-List1 and RID-List2
Count the number of RIDs in the intersection set.
Is there a better way?
Yes, use bit-maps and logical bit-wise operands.

8
Bitmap Indexes

Use a bitmap to represent the existence of a
record with a certain attribute value.
Example If a record has the indexed attribute
value Big Mac then its corresponding entry in
the bitmap is set to one. Otherwise, it is a
zero.

9
A Bitmap

A Bitmap B is defined on T as a sequence of M
bits.
For each row r with row number j that has the
property P, we set bit j in B to one all other
bits are set to zero.
Assuming fix sized disk pages that hold p
records, RID of record j is (j/p, jp). Page is
j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 0
10
A Bitmap

A Bitmap B is defined on T as a sequence of M
bits.
For each row r with row number j that has the
property P, we set bit j in B to one all other
bits are set to zero.
Assuming fix sized disk pages that hold p
records, RID of record j is (j/p, jp). Page is
j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 1
11
A Bitmap

A Bitmap B is defined on T as a sequence of M
bits.
For each row r with row number j that has the
property P, we set bit j in B to one all other
bits are set to zero.
Assuming fix sized disk pages that hold p
records, RID of record j is (j/p, jp). Page is
j/p, slot number is jp.

Pres Day, 0100001100111111110000011001..
Record 2
12
A B-Tree on Major Holidays

A B-tree index on different holidays of the
SALES table.

B-tree Leaf page
(Pres day, 01111111.
Joe, Big Mac, Lab day,
Jane, Happy Meal, Pres day,
Shideh, Happy Meal, Pres day,
Mary, Fries, Pres day,
Kam, Happy Meal, Pres day,
Harry, Big Mac, Pres day,
Henry, Big Mac, Pres day,
Bob, Big Mac, Pres day,
13
Conjunctive Queries

Count number of Big Mac Sales on Presidents
Day assuming a B-tree on product (pid) and day
of SALES
With RID
Get the Value-List for Big Mac using the
B-tree, obtain RID-List1.
Get the Value-List for Presidents Day using
the B-tree, obtain RID-List2.
Compute set-intersect of RID-List1 and RID-List2
Count the number of RIDs in the intersection set.
With bit maps
Get the Value-List for Big Mac using the
B-tree, obtain bit-map1.
Get the Value-List for Presidents Day using
the B-tree, obtain bit-map2.
Recall Existence Bitmap (EBM) identify rows that
exist.
Let RES logical AND of bit-map1, bit-map2, and
EBM.
Count the number of bits set to one to identify
how many Big Macs were sold on Presidents Day.

14
Variant Indexes

Midterm 2 ignores
MEDIAN, N-TILE, Column-Product as aggregates.
Section 5.

15
Access Path Selection

Formulates a cost prediction for each access
plan, using the following cost formula
COST Page fetches W (RSI Calls)
W is an adjustable weighting factor between I/O
and CPU.
RSI calls is an approximation for CPU
utilization.
Assumptions
WHERE tree is considered to be in conjunctive
normal form,
Every disjunct is called a boolean factor.

16
How?

Enumerating the different execution plans,
Estimate the cost of performing each plan,
Pick the cheapest plan.
Definition of cost is as follows
COST Page fetches W (RSI Calls)

17
Clustered B-Tree

A B-tree on the gpa attribute

3.6
(3.7, (3, 1))
(3.9, (4,1))
(2.3, (1, 1))
(3, (2,1))
(3.8, (3,2))
(3.9, (4,2))
(2.5, (1,2))
(3.1, (2,2))
(3.8, (3,3))
(4, (4,3))
(2.8, (1,3))
(3.2, (2,3)
(3.8, (3,4))
(2.8, (1,4))
(4, (4,4))
(3.5, (2,4))
Bob, 21, 3.7, CS
Mary, 24, 3, ECE
Chris, 22, 3.9, CS
Chad, 28, 2.3, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
James, 24, 3.1, ME
Chang, 18, 2.5, CS
Kane, 19, 3.8, ME
Lam, 22, 2.8, ME
Louis, 32, 4, LS
Tom, 20, 3.2, EE
Martha, 29, 3.8, CS
Pat, 19, 2.8, EE
Leila, 20, 3.5, LS
Shideh, 16, 4, CS
18
Non-Clustered B-Tree

A random I/O for every qualifying record

3.6
(3.7, (1, 1))
(3.9, (4,1))
(2.3, (4, 2))
(3, (1,2))
(3.8, (3,2))
(3.9, (2,4))
(2.5, (2,3))
(3.1, (3,3))
(3.8, (2,1))
(4, (3,1))
(2.8, (2,2))
(3.2, (1,3)
(3.8, (1,4))
(2.8, (3,4))
(4, (4,4))
(3.5, (4,3))
Bob, 21, 3.7, CS
Kane, 19, 3.8, ME
Louis, 32, 4, LS
Chris, 22, 3.9, CS
Mary, 24, 3, ECE
Lam, 22, 2.8, ME
Martha, 29, 3.8, CS
Chad, 28, 2.3, LS
Tom, 20, 3.2, EE
Chang, 18, 2.5, CS
James, 24, 3.1, ME
Leila, 20, 3.5, LS
Kathy, 18, 3.8, LS
Vera, 17, 3.9, EE
Pat, 19, 2.8, EE
Shideh, 16, 4, CS
19
Questions

How are relations and segments related? If slide
4, you state segments may contain more than one
relation, and then the next bullet says at most
one relation per segment.. What is going on?

20
Questions

How are relations and segments related? If slide
4, you state segments may contain more than one
relation, and then the next bullet says at most
one relation per segment.. What is going on?
Best clarified with an example
Segment 1 may contain the Emp, Dept, and Revenues
tables/relations.
The Emp relation can be assigned to Segment 1
only. It may NOT be assigned to both Segments 1
and 2.

21
Questions

NINDX(I), the number of pages in index I.
TCARD(T), the number of pages in the segment that
hold tuples of relation T.

The cost of retrieving a range of records from a
clustered B-tree Should not this be
(depth_of_B-tree F(pred) TCARD) or something
related to the depth of the tree rather than
NINDX, since you only have to navigate through to
the leaf nodes once and do a record scan once
youve reached the correct leaf node?

22
Question

NINDX(I), the number of pages in index I.
TCARD(T), the number of pages in the segment that
hold tuples of relation T.

Should not the cost for a non-clustered B-tree
also involve the depth of the tree rather than
NINDX?

23
Question

NINDX(I), the number of pages in index I.
TCARD(T), the number of pages in the segment that
hold tuples of relation T.

Should not the cost for a non-clustered B-tree
also involve the depth of the tree rather than
NINDX?
It should include the depth of the tree.
NINDX must be included because the leaf pages of
the B-tree must be visited for the qualifying
records.

24
Questions

NINDX(I), the number of pages in index I.
TCARD(T), the number of pages in the segment that
hold tuples of relation T.

Why is the cost of a merge-scan NINDX(R)
NINDX(S) rather than the sum of segment scans of
R and S, since you still have to visit every page
that contains tuples of R and S?

25
Questions

NINDX(I), the number of pages in index I.
TCARD(T), the number of pages in the segment that
hold tuples of relation T.

Why is the cost of a merge-scan NINDX(R)
NINDX(S) rather than the sum of segment scans of
R and S, since you still have to visit every page
that contains tuples of R and S?
The merge-scan employs the sorted order of the
entries in the leaf pages of the B-tree index
structures.

26
Overview of Query Optimization

Extends discussion to
Correlation queries, use of outer-join to
flatten nested queries.
Raises more questions than providing answers.
A good starting point for
A practitioner who wants to build an optimizer
for a relational DBMS.
A Ph.D. student interested in writing a
dissertation in the area of query optimization
techniques.

27
Mining Association Rules

Objective Discover association Rule over basket
data.
Motivation valuable for cross-marketing and
attached mailing applications.
Example 98 of customers who purchase tires and
auto accessories also get automotive services
done.
Key contributions
Fast algorithms
Apriori, AprioriTid, and AprioriHybrid
Pay attention to terminology, definitions, and
the general framework.

28
Database Systems

This course has introduced you to
A storage manager and its use, BDB.
Spatial indexing, R-Trees.
Parallel DBMS.
Alternative technologies for applications that do
not require ACID transactions.
Google FS, MapReduce, etc.
Optimization techniques for relational DBMSs.
Knowledge Discovery.
Efficient query processing techniques.
RAID and Use of flash memory in enterprises
Steve Kleimans lecture.

29
Database Systems

Many important topics remain
Data mining, Data cubes, Data visualization
techniques.
Papers are from 1990s Neumerous follow-on
papers!

Write a Comment

User Comments (0)

Review for Midterm 2 PowerPoint PPT Presentation