CPSC 404: Advanced Relational Databases, Assignment - PowerPoint PPT Presentation

1 / 7
About This Presentation
Title:

CPSC 404: Advanced Relational Databases, Assignment

Description:

Songs(sid (7 bytes), sname (30 bytes), year (2 bytes), genre (11 bytes) ... There are 20 genres and songs are rated on a scale of 1-10. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 8
Provided by: lakslak
Category:

less

Transcript and Presenter's Notes

Title: CPSC 404: Advanced Relational Databases, Assignment


1
CPSC 404 Advanced Relational Databases,
Assignment 2. Due Tuesday, March 17, 2009.
IMPORTANT Show all your work. Be sure to
explain your answer.
  • Question 1. Consider the following linear hashing
    index.

Bkt 0
Bkt 4
2
  • The Next pointer currently points to bucket
    number 1. By current bucket, we mean the bucket
    pointed to by Next. The current round number
    (i.e., Level) is 0. Assume that every time there
    is an overflow, the current bucket is split. Now,
    answer the following questions.
  • (i) Identify those buckets that have yet to be
    split in the current round.
  • (ii) Give an example of an insertion that would
    cause the current bucket to be split. What
    happens to the Next pointer?
  • (iii) After the insertion in question (ii),
    suppose the data entry 42 is inserted. Show what
    happens to the linear hash structure, including
    the Next pointer.

3
  • Question 2. Consider the relation Employees(Name,
    Address, Rank, Salary, Perks, yearOfHire) that
    contains 10 million tuples of 250 bytes each.
    Suppose we create a Btree index on Name (primary
    key), which is 40 bytes long. Suppose we allocate
    10 bytes for a pointer. Assume a page size of 4K
    and a buffer with 400 pages. Make a uniform
    distribution assumption for the following
    questions. For questions (a)-(c), consider the
    range query Name lt H.
  • (a) Briefly explain how you will answer the range
    query above.
  • (b) Estimate the cost in page I/Os for your
    strategy, assuming the index is clustered.
  • (c) If the index was unclustered, would it make a
    difference to your calculation above? Why (not)?
  • Consider the range query yearOfHire gt 1992 for
    questions (d) and (e).

4
(d) Suppose the field yearOfHire is 4 bytes and
we have an unclustered Btree index on it. The
yearOfHire ranges from 1981--2000. Estimate the
cost of answering the above range query using
your strategy in (a) applied to this query. (e)
Can you think of any way of improving the
performance of the above method? Explain your
idea and estimate the cost of the improved
method.
5
Question 3. Answer the following questions about
extendible hashing. (a) why do we need both
local depth and global depth? (b) after an
insertion that doubles the directory size, how
many buckets have just one directory entry
pointing to them? (c) for retrieving a record
with a given key value, do we ever need to make
more than one disk access? Explain. (d) suppose
an insertion triggers doubling of directory size.
At this point, how many buckets with local depth
global depth should we examine? Why?
6
  • Question 4. Consider the following SQL query
  • SELECT S.sname, S.year, R.uid
  • FROM Ratings R, Songs S
  • WHERE R.sid S.sid AND R.rating gt 5 AND S.genre
    love
  • The two relations have the following schema
    Ratings(uid (8 bytes), sid (7 bytes), time (9
    bytes), rating (1 byte)). Songs(sid (7 bytes),
    sname (30 bytes), year (2 bytes), genre (11
    bytes)). Primary keys are underlined. In
    addition, sname, year ? sid holds for Songs.
    Each page is 2K bytes. Available buffer pool
    amounts to 100 pages. Ratings contains a million
    tuples and Songs contains 100,000 tuples. There
    are 20 genres and songs are rated on a scale of
    1-10.
  • Assuming no indexes are available, draw the plan
    corresponding to page-based nested loops (for
    join) with all other operations done on the fly,
    after join. Estimate the cost. Which relation
    would you put on the outer loop and why?
  • Push the projection and selection on Ratings
    only. Draw the plan where the result of
    selection/projection on Ratings is stored in a
    temp table and Songs is streamed (i.e.,
    pipelined) into the join and the join is
    chunk-based nested loops (BNL). Estimate the
    cost, using the maximum possible chunk size.
  • A Btree index on sid is available on Ratings.
    Push the select and project down on Songs only.
    Do those operations by a table scan on Songs and
    stream the result into an indexed nested loops
    join that probes Ratings for every distinct sid
    that is coming in. Suppose the index is
    unclustered. Estimate the cost of the plan.

7
  • Question 5. Consider R R.aS.b S. R
    contains 20,000 tuples with 20
  • tuples/page. S contains 4000 tuples with 20
    tuples/page. S.b is the
  • primary key of S. Both relations are unsorted and
    have no index.
  • Buffer size is 52 pages. Estimate the cost of the
    following algorithms, in
  • no. of page I/Os, ignoring the output cost.
  • In each of the following cases, what is the least
    size of the buffer required
  • for your estimate to be valid?
  • page oriented nested loops join.
  • (b) block oriented nested loops join.
  • (c) sort-merge join.
  • (d) hash join.
  • What is the min. and max. no. of tuples of the
    output?
Write a Comment
User Comments (0)
About PowerShow.com