Database Design and The Entity-Relationship Model - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Database Design and The Entity-Relationship Model

Description:

passes = 1 Ceil(log(B-1)(N/B)) = 1 Ceil(log(95)(10,000/96)) = 1 2 = 3 ... M Ceil(M/(B-2)) * N ... Join cost: 10,000 Ceil(10/(100-2)) * (550) = 10,550 ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 32

Provided by: RaghuRama129

Category:

more less

Transcript and Presenter's Notes

Title: Database Design and The Entity-Relationship Model

1
Database Design andThe Entity-Relationship Model

Lecture 21
RG - Chapter 2

A relationship, I think, is like a shark, you
know? It has to constantly move forward or it
dies. And I think what we got on our hands is a
dead shark. Woody Allen (from Annie Hall, 1979)
2
Administriva

Homework 4 Due One Week From Today
Midterm is graded

3
Exam Summary

Average 74.5
Max 93.5
Min 36.5
Std Dev 11.6

4
Exam Details

Section Low High / Total (Average)
1 External Sorting 10 24 / 24 (18)
2 Query Execution 17 43 / 44 (36)
3 Query Optim. 6.5 29 / 32 (20)

5
Section 1 External Sorting

For all questions in this section, assume that
you have 96 pages in the buffer pool, and you
wish to sort EMP by Ename.
Using the general external sort, how many sorting
passes are required? (6pts)
passes 1 Ceil(log(B-1)(N/B))
1 Ceil(log(95)(10,000/96))
1 2
3
What is the I/O cost of doing this sort? (6pts)
I/O Cost 2N ( passes)
20,000 3

6
Section 1 (cont)

Database systems sometimes use blocked I/O
because reading a block of continuous pages is
more efficient that doing a separate I/O for each
page. Assume that in our computer system, it is
much more efficient to read and write blocks of
32 pages at a time, so all reads and writes from
files must be in blocks of 32 pages (and if a
file has less than 32 pages, it is padded with
blank pages). Consider doing external sort with
blocked I/O, which gets faster I/O at the expense
of more sorting passes. If the database must read
and write 32 pages at a time, how many sorting
passes are required? Show either the formula you
use, or the temp file sizes after each pass.
(4pts)
Pass 0 105 runs of 96 pages
Pass 1 2-way merge 96 96 -gt result is 53 runs
of 192 pages
Pass 2 2-way merge of 192 192 -gt result is 27
runs of 384 pages
Pass 3 -gt result is 14 runs of 768 pages,
Pass 4 -gt result is 7 runs 1536 pages,
Pass 5 -gt result is 4 runs 3072 pages,
Pass 6 -gt result is 2 runs of 6144,
Pass 7 -gt result is 10,000
8 passes total

7
Section 1 (cont)

Blocked I/O is used because reading a 32-page
block is faster than 32 separate 1-page I/Os.
For each sorting pass, you still must read and
write every page in the file, but instead of
doing 10,000 1-page I/Os, instead you do
(10,000/32) block I/Os. Assume that in our
system, we can read a 32-page block in the time
it would normally take to do 8 single-page I/Os.
Is the blocked I/O sort faster or slower than a
regular sort from question 2? By approximately
what ratio? (4pts)
Assume that instead of a heap file, the records
from EMP are stored a clustered B-Tree index,
whose key is Ename, using alternative 1 (i.e.,
the full data records are stored in the leaves of
the tree). The B-tree has depth 3. Assuming the
B-Tree already exists, what is the approximate
I/O cost to use the B-tree to get the records in
sorted order? (4pts)

8
Section 2 Relational Operators

Consider the operation ? (EID lt 5000)EMP
What is the I/O cost of this operation if there
is no index? (4pts)
Lecture 15 slide 9 no index -gt sequential scan,
N I/Os, meaning 10,000 I/Os
What is the reduction factor? (4pts)
5000/100,000 0.05
Assume that instead of a heap file, the records
from EMP are stored in a clustered B-Tree index,
whose key is Ename, using alternative 1 (i.e.,
the full data records are stored in the leaves of
the tree). The B-tree has depth 3. Assuming
the B-Tree already exists, what is the I/O cost
of this selection operation? (4pts)
Due to a copy paste error on my part, the B-Tree
Index is not useful for the query. I gave
credit either way, whether people tried to use
the index as an index, or for sequential scan.
In considering the cost of using the index,
either as an index or for a scan, full credit
required knowing that B-Trees have an overhead of
approximately 50, i.e., there are 50 more
leaves than the number of pages in a heap file.

9
Relational Operators (cont)

Consider the query select distinct Ename from
EMP order by Ename
If you had 96 pages of memory, what algorithm
would you use to execute this query and why?
(4pts)
Order by suggests sort-merge to remove dups.
If, instead, the query were select distinct
Ename from EMP and you had 101 pages of memory,
what algorithm would you use and why? (4pts)
Could use hashing to remove dups, which would
have a similar I/O cost but be more CPU efficient.

10
Relational Operators (cont)

Consider the join In_Dept ? Dept
Assuming 100 pages of memory, what is the I/O
cost of this join using Blocked Nested Loops?
(4pts)
Lec 15, slide 27 Blocked nested loops M
Ceil(M/(B-2)) N
550 Ceil(550/100)) 20 550 6 20 670
What is the I/O cost of this using Index Nested
Loops, with an unclustered Hash index on
Dept.DID? Assume an average cost of 1.2 I/Os to
get to the right hash bucket. (4pts)
Lec 15, slide 27 index nested loops M tuples
in M index cost
550 110,000 (1.2 1) 242,000
Consider the join Dept ? In_Dept. Assuming 100
pages of memory, what is the I/O cost of this
using Sort-Merge-Join, optimized so that the last
merge pass is combined with the join if possible?
(4pts)
Best 1 pass over In_Dept, to make 6 runs, sort
Dept in memory, merge. Cost 5503 20 1670
By the book 3(M N) 3570 1710
Assuming 100 pages of memory, what is the I/O
cost of this using regular (not hybrid) hash
join? (4pts)
Best partition In_Dept (2 550), hash Dept in
memory, match. Cost 550 3 20 1670
By the book 3(M N) 1710

11
Relational Operators (cont)

Consider the join (? (EID lt 100)EMP) ?
In_Building Assume there is no index on EMP, and
an unclustered hash index on In_Building.EID, and
plan operators are pipelined where possible. You
have 100 pages of memory. Assume that it takes
on average 1.2 I/Os get reach the appropriate
hash bucket.
What is the I/O cost of this using Blocked Nested
Loops? (4pts)
M Ceil(M/(B-2)) N
If you do not push select down10,000
Ceil(10,000/(100-2)) 550 10,000 103550
66,650
If you do push select, pipeline result Size of
selection in pages is S (I/100,000)10,000
I/10 10
Join cost 10,000 Ceil(10/(100-2)) (550)
10,550
What is the I/O cost of this using Index Nested
Loops? (4pts)
If you do push select, pipeline result Size of
selection in pages is
S (I/100,000)10,000 I/10 10
each EID will have 1.1 matching tuples.
Join cost M ( tuples index cost)
10,000 100 (1.2 1.1) 10,230

12
Section 3 Query Optimization

There are several parts of an SQL query that can
cause a column (or columns) to be considered an
"interesting order", meaning that an ordered
input can lower the cost of evaluating that part
of the query. Briefly describe three of these,
and for each one briefly explain why. (6 pts)
Order By, Group By, Joins, Distinct, some
aggregations (e.g. Max, Min)
Write the reduction factor that would be used by
a System R style query optimizer for each of the
following predicates (6pts)
Building.BID gt 150
BID ranges from 1 to 200, so this would be ¼.
EMP.Ename Joe
We dont know about distribution of names, so
1/10.
IN_DEPT.EID 003
Since there are 100,000 employees, this is
1/100,000
Given the following query, where X is the join
operator
p(EMP.Ename) ?(Dept.Budget gt 500000) (Emp X
In_Dept X Dept)
Mark whether each of the following queries are
equivalent (True/False).
__T___p(EMP.Ename) ?(Dept.Budget gt 500000) (Emp X
Dept X In_Dept)
__T___p(EMP.Ename) ?(Dept.Budget gt 500000) (Dept
X In_Dept X Emp)
__T___p(EMP.Ename) (Emp X In_Dept X ?(Dept.Budget
gt 500000)(Dept))
__F___?(Dept.Budget gt 500000) (p(EMP.Ename) (Emp)
X In_Dept X Dept)

13
Query Optimization (cont)

Given the schema on the first page of the exam,
assume the following
There are unclustered hash indexes on both
Emp.EID and Dept.DID.
There are clustered B-Tree Indexes (alt 1 data
records store in the B-Tree leaves) on both
Emp.EID and Dept.DID.
There is an unclustered Btree index on
(Emp.Salary, Emp.EID)
You can assume that Btree-Indexes have heights of
3. And the cost for getting to the data record
using a hash index is 2.
What is the best access plan for the following
query (4pts).
SELECT E.eid, E.Salary
FROM Emp E
WHERE E.Salary gt 100,000
Very best is unclustered B-tree, index only plan
Draw all possible join orders considered by a
System-R style query optimizer for the following
query. (4pts)
SELECT E.eid, D.dname
FROM Emp E, In_Dept I, Dept D
WHERE E.sal 64,000 AND D.budget gt 500,000 AND
E.eid I.eid AND I.did D.did
(E X I) X D)
(I X E) X D)

14
Where are we?

This week Database Design (me)
The ER Model as a design tool
Next Week Concurrency Control Recovery (Prof.
Roth)
Week after More Database Design (me)

15
Today and Thursday The ER Model

A different data model from Relational
Most commonly used for database design
Today Details of the ER Model
Thursday Translating ER Schemas to Relational

16
Databases Model the Real World

Data Model translates real world things into
structures computers can store
Many models
Relational, E-R, O-O, Network, Hierarchical, etc.
Relational
Rows Columns
Keys Foreign Keys to link Relations

Enrolled
Students
17
Database Design

The process of modelling things in the real world
into elements of a data model.
I.E., describing things in the real world using a
data model.
E.G., describing students and enrollments using
various tables with key/foreign key relationships
The Relational model is not the only model in use

18
A Problem with the Relational Model
CREATE TABLE Students (sid CHAR(20), name
CHAR(20), login CHAR(10), age INTEGER, gpa
FLOAT)
CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2))

With complicated schemas, it may be hard for a
person to understand the structure from the data
definition.

Enrolled
Students
19
One Solution The E-R Model

Instead of relations, it has
Entities and Relationships
These are described with diagrams,
both structure, notation more obvious to humans
A visual language for describing schemas

20
Steps in Database Design

Requirements Analysis
user needs what must database do?
Conceptual Design
high level descr (often done w/ER model)
Logical Design
translate ER into DBMS data model
Schema Refinement (in 2 weeks)
consistency, normalization
Physical Design (discussed already)
indexes, disk layout
Security Design
who accesses what, and how

21
Conceptual Design

Define enterprise entities and relationships
What information about entities and relationships
should be in database?
What are the integrity constraints or business
rules that hold?
A database schema in the ER Model is
represented pictorially (ER diagrams).
Can map an ER diagram into a relational schema.

22
ER Model Basics

Entity
Real-world thing, distinguishable from other
objects.
Entity described by set of attributes.
Entity Set A collection of similar entities.
E.g., all employees.
All entities in an entity set have the same set
of attributes. (Until we consider hierarchies,
anyway!)
Each entity set has a key (underlined).
Each attribute has a domain.

23
ER Model Basics (Contd.)

Relationship Association among two or more
entities. E.g., Attishoo works in Pharmacy
department.
relationships can have their own attributes.
Relationship Set Collection of similar
relationships.
An n-ary relationship set R relates n entity sets
E1 ... En each relationship in R involves
entities e1 ? E1, ..., en ? En

24
ER Model Basics (Cont.)
subor-dinate
super-visor

Same entity set can participate in different
relationship sets, or in different roles in the
same set.

25
Key Constraints

An employee can work in many departments a dept
can have many employees.

In contrast, each dept has at most one manager,
according to the key constraint on Manages.
26
Participation Constraints

Does every employee work in a department?
If so, this is a participation constraint
the participation of Employees in Works_In is
said to be total (vs. partial)
What if every department has an employee working
in it?
Basically means at least one

since
since
name
dname
name
dname
ssn
lot
budget
did
budget
did
Departments
Employees
Manages
Works_In
since
27
Weak Entities

A weak entity can be identified uniquely only by
considering the primary key of another (owner)
entity.
Owner entity set and weak entity set must
participate in a one-to-many relationship set
(one owner, many weak entities).
Weak entity set must have total participation in
this identifying relationship set.

Weak entities have only a partial key (dashed
underline)
28
Binary vs. Ternary Relationships

If each policy is owned by just 1 employee

Bad design
Key constraint on Policies would mean policy
can only cover 1 dependent!

Think through all the constraints in the 2nd
diagram!

29
Binary vs. Ternary Relationships (Contd.)

Previous example illustrated case when two binary
relationships were better than one ternary
relationship.
Opposite example a ternary relation Contracts
relates entity sets Parts, Departments and
Suppliers, and has descriptive attribute qty. No
combination of binary relationships is an
adequate substitute.