Title: CPSC 404: Advanced Relational Databases, Assignment
1CPSC 404 Advanced Relational Databases,
Assignment 2. Due Tuesday, March 17, 2009.
IMPORTANT Show all your work. Be sure to
explain your answer.
- Question 1. Consider the following linear hashing
index. -
Bkt 0
Bkt 4
2- The Next pointer currently points to bucket
number 1. By current bucket, we mean the bucket
pointed to by Next. The current round number
(i.e., Level) is 0. Assume that every time there
is an overflow, the current bucket is split. Now,
answer the following questions. - (i) Identify those buckets that have yet to be
split in the current round. - (ii) Give an example of an insertion that would
cause the current bucket to be split. What
happens to the Next pointer? - (iii) After the insertion in question (ii),
suppose the data entry 42 is inserted. Show what
happens to the linear hash structure, including
the Next pointer.
3- Question 2. Consider the relation Employees(Name,
Address, Rank, Salary, Perks, yearOfHire) that
contains 10 million tuples of 250 bytes each.
Suppose we create a Btree index on Name (primary
key), which is 40 bytes long. Suppose we allocate
10 bytes for a pointer. Assume a page size of 4K
and a buffer with 400 pages. Make a uniform
distribution assumption for the following
questions. For questions (a)-(c), consider the
range query Name lt H. - (a) Briefly explain how you will answer the range
query above. - (b) Estimate the cost in page I/Os for your
strategy, assuming the index is clustered. - (c) If the index was unclustered, would it make a
difference to your calculation above? Why (not)? - Consider the range query yearOfHire gt 1992 for
questions (d) and (e).
4(d) Suppose the field yearOfHire is 4 bytes and
we have an unclustered Btree index on it. The
yearOfHire ranges from 1981--2000. Estimate the
cost of answering the above range query using
your strategy in (a) applied to this query. (e)
Can you think of any way of improving the
performance of the above method? Explain your
idea and estimate the cost of the improved
method.
5Question 3. Answer the following questions about
extendible hashing. (a) why do we need both
local depth and global depth? (b) after an
insertion that doubles the directory size, how
many buckets have just one directory entry
pointing to them? (c) for retrieving a record
with a given key value, do we ever need to make
more than one disk access? Explain. (d) suppose
an insertion triggers doubling of directory size.
At this point, how many buckets with local depth
global depth should we examine? Why?
6- Question 4. Consider the following SQL query
- SELECT S.sname, S.year, R.uid
- FROM Ratings R, Songs S
- WHERE R.sid S.sid AND R.rating gt 5 AND S.genre
love - The two relations have the following schema
Ratings(uid (8 bytes), sid (7 bytes), time (9
bytes), rating (1 byte)). Songs(sid (7 bytes),
sname (30 bytes), year (2 bytes), genre (11
bytes)). Primary keys are underlined. In
addition, sname, year ? sid holds for Songs.
Each page is 2K bytes. Available buffer pool
amounts to 100 pages. Ratings contains a million
tuples and Songs contains 100,000 tuples. There
are 20 genres and songs are rated on a scale of
1-10. - Assuming no indexes are available, draw the plan
corresponding to page-based nested loops (for
join) with all other operations done on the fly,
after join. Estimate the cost. Which relation
would you put on the outer loop and why? - Push the projection and selection on Ratings
only. Draw the plan where the result of
selection/projection on Ratings is stored in a
temp table and Songs is streamed (i.e.,
pipelined) into the join and the join is
chunk-based nested loops (BNL). Estimate the
cost, using the maximum possible chunk size. - A Btree index on sid is available on Ratings.
Push the select and project down on Songs only.
Do those operations by a table scan on Songs and
stream the result into an indexed nested loops
join that probes Ratings for every distinct sid
that is coming in. Suppose the index is
unclustered. Estimate the cost of the plan.
7- Question 5. Consider R R.aS.b S. R
contains 20,000 tuples with 20 - tuples/page. S contains 4000 tuples with 20
tuples/page. S.b is the - primary key of S. Both relations are unsorted and
have no index. - Buffer size is 52 pages. Estimate the cost of the
following algorithms, in - no. of page I/Os, ignoring the output cost.
- In each of the following cases, what is the least
size of the buffer required - for your estimate to be valid?
- page oriented nested loops join.
- (b) block oriented nested loops join.
- (c) sort-merge join.
- (d) hash join.
- What is the min. and max. no. of tuples of the
output?