Database Design and The Entity-Relationship Model - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Database Design and The Entity-Relationship Model

Description:

passes = 1 Ceil(log(B-1)(N/B)) = 1 Ceil(log(95)(10,000/96)) = 1 2 = 3 ... M Ceil(M/(B-2)) * N ... Join cost: 10,000 Ceil(10/(100-2)) * (550) = 10,550 ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 32
Provided by: RaghuRama129
Category:

less

Transcript and Presenter's Notes

Title: Database Design and The Entity-Relationship Model


1
Database Design andThe Entity-Relationship Model
  • Lecture 21
  • RG - Chapter 2

A relationship, I think, is like a shark, you
know? It has to constantly move forward or it
dies. And I think what we got on our hands is a
dead shark. Woody Allen (from Annie Hall, 1979)
2
Administriva
  • Homework 4 Due One Week From Today
  • Midterm is graded

3
Exam Summary
  • Average 74.5
  • Max 93.5
  • Min 36.5
  • Std Dev 11.6

4
Exam Details
  • Section Low High / Total (Average)
  • 1 External Sorting 10 24 / 24 (18)
  • 2 Query Execution 17 43 / 44 (36)
  • 3 Query Optim. 6.5 29 / 32 (20)

5
Section 1 External Sorting
  • For all questions in this section, assume that
    you have 96 pages in the buffer pool, and you
    wish to sort EMP by Ename.
  • Using the general external sort, how many sorting
    passes are required? (6pts)
  • passes 1 Ceil(log(B-1)(N/B))
  • 1 Ceil(log(95)(10,000/96))
  • 1 2
  • 3
  • What is the I/O cost of doing this sort? (6pts)
  • I/O Cost 2N ( passes)
  • 20,000 3

6
Section 1 (cont)
  • Database systems sometimes use blocked I/O
    because reading a block of continuous pages is
    more efficient that doing a separate I/O for each
    page. Assume that in our computer system, it is
    much more efficient to read and write blocks of
    32 pages at a time, so all reads and writes from
    files must be in blocks of 32 pages (and if a
    file has less than 32 pages, it is padded with
    blank pages). Consider doing external sort with
    blocked I/O, which gets faster I/O at the expense
    of more sorting passes. If the database must read
    and write 32 pages at a time, how many sorting
    passes are required? Show either the formula you
    use, or the temp file sizes after each pass.
    (4pts)
  • Pass 0 105 runs of 96 pages
  • Pass 1 2-way merge 96 96 -gt result is 53 runs
    of 192 pages
  • Pass 2 2-way merge of 192 192 -gt result is 27
    runs of 384 pages
  • Pass 3 -gt result is 14 runs of 768 pages,
  • Pass 4 -gt result is 7 runs 1536 pages,
  • Pass 5 -gt result is 4 runs 3072 pages,
  • Pass 6 -gt result is 2 runs of 6144,
  • Pass 7 -gt result is 10,000
  • 8 passes total

7
Section 1 (cont)
  • Blocked I/O is used because reading a 32-page
    block is faster than 32 separate 1-page I/Os.
    For each sorting pass, you still must read and
    write every page in the file, but instead of
    doing 10,000 1-page I/Os, instead you do
    (10,000/32) block I/Os. Assume that in our
    system, we can read a 32-page block in the time
    it would normally take to do 8 single-page I/Os.
    Is the blocked I/O sort faster or slower than a
    regular sort from question 2? By approximately
    what ratio? (4pts)
  • Assume that instead of a heap file, the records
    from EMP are stored a clustered B-Tree index,
    whose key is Ename, using alternative 1 (i.e.,
    the full data records are stored in the leaves of
    the tree). The B-tree has depth 3. Assuming the
    B-Tree already exists, what is the approximate
    I/O cost to use the B-tree to get the records in
    sorted order? (4pts)

8
Section 2 Relational Operators
  • Consider the operation ? (EID lt 5000)EMP
  • What is the I/O cost of this operation if there
    is no index? (4pts)
  • Lecture 15 slide 9 no index -gt sequential scan,
    N I/Os, meaning 10,000 I/Os
  • What is the reduction factor? (4pts)
  • 5000/100,000 0.05
  • Assume that instead of a heap file, the records
    from EMP are stored in a clustered B-Tree index,
    whose key is Ename, using alternative 1 (i.e.,
    the full data records are stored in the leaves of
    the tree). The B-tree has depth 3. Assuming
    the B-Tree already exists, what is the I/O cost
    of this selection operation? (4pts)
  • Due to a copy paste error on my part, the B-Tree
    Index is not useful for the query. I gave
    credit either way, whether people tried to use
    the index as an index, or for sequential scan.
  • In considering the cost of using the index,
    either as an index or for a scan, full credit
    required knowing that B-Trees have an overhead of
    approximately 50, i.e., there are 50 more
    leaves than the number of pages in a heap file.

9
Relational Operators (cont)
  • Consider the query select distinct Ename from
    EMP order by Ename
  • If you had 96 pages of memory, what algorithm
    would you use to execute this query and why?
    (4pts)
  • Order by suggests sort-merge to remove dups.
  • If, instead, the query were select distinct
    Ename from EMP and you had 101 pages of memory,
    what algorithm would you use and why? (4pts)
  • Could use hashing to remove dups, which would
    have a similar I/O cost but be more CPU efficient.

10
Relational Operators (cont)
  • Consider the join In_Dept ? Dept
  • Assuming 100 pages of memory, what is the I/O
    cost of this join using Blocked Nested Loops?
    (4pts)
  • Lec 15, slide 27 Blocked nested loops M
    Ceil(M/(B-2)) N
  • 550 Ceil(550/100)) 20 550 6 20 670
  • What is the I/O cost of this using Index Nested
    Loops, with an unclustered Hash index on
    Dept.DID? Assume an average cost of 1.2 I/Os to
    get to the right hash bucket. (4pts)
  • Lec 15, slide 27 index nested loops M tuples
    in M index cost
  • 550 110,000 (1.2 1) 242,000
  • Consider the join Dept ? In_Dept. Assuming 100
    pages of memory, what is the I/O cost of this
    using Sort-Merge-Join, optimized so that the last
    merge pass is combined with the join if possible?
    (4pts)
  • Best 1 pass over In_Dept, to make 6 runs, sort
    Dept in memory, merge. Cost 5503 20 1670
  • By the book 3(M N) 3570 1710
  • Assuming 100 pages of memory, what is the I/O
    cost of this using regular (not hybrid) hash
    join? (4pts)
  • Best partition In_Dept (2 550), hash Dept in
    memory, match. Cost 550 3 20 1670
  • By the book 3(M N) 1710

11
Relational Operators (cont)
  • Consider the join (? (EID lt 100)EMP) ?
    In_Building Assume there is no index on EMP, and
    an unclustered hash index on In_Building.EID, and
    plan operators are pipelined where possible. You
    have 100 pages of memory. Assume that it takes
    on average 1.2 I/Os get reach the appropriate
    hash bucket.
  • What is the I/O cost of this using Blocked Nested
    Loops? (4pts)
  • M Ceil(M/(B-2)) N
  • If you do not push select down10,000
    Ceil(10,000/(100-2)) 550 10,000 103550
    66,650
  • If you do push select, pipeline result Size of
    selection in pages is S (I/100,000)10,000
    I/10 10
  • Join cost 10,000 Ceil(10/(100-2)) (550)
    10,550
  • What is the I/O cost of this using Index Nested
    Loops? (4pts)
  • If you do push select, pipeline result Size of
    selection in pages is
  • S (I/100,000)10,000 I/10 10
  • each EID will have 1.1 matching tuples.
  • Join cost M ( tuples index cost)
  • 10,000 100 (1.2 1.1) 10,230

12
Section 3 Query Optimization
  • There are several parts of an SQL query that can
    cause a column (or columns) to be considered an
    "interesting order", meaning that an ordered
    input can lower the cost of evaluating that part
    of the query. Briefly describe three of these,
    and for each one briefly explain why. (6 pts)
  • Order By, Group By, Joins, Distinct, some
    aggregations (e.g. Max, Min)
  • Write the reduction factor that would be used by
    a System R style query optimizer for each of the
    following predicates (6pts)
  • Building.BID gt 150
  • BID ranges from 1 to 200, so this would be ¼.
  • EMP.Ename Joe
  • We dont know about distribution of names, so
    1/10.
  • IN_DEPT.EID 003
  • Since there are 100,000 employees, this is
    1/100,000
  • Given the following query, where X is the join
    operator
  • p(EMP.Ename) ?(Dept.Budget gt 500000) (Emp X
    In_Dept X Dept)
  • Mark whether each of the following queries are
    equivalent (True/False).
  • __T___p(EMP.Ename) ?(Dept.Budget gt 500000) (Emp X
    Dept X In_Dept)
  • __T___p(EMP.Ename) ?(Dept.Budget gt 500000) (Dept
    X In_Dept X Emp)
  • __T___p(EMP.Ename) (Emp X In_Dept X ?(Dept.Budget
    gt 500000)(Dept))
  • __F___?(Dept.Budget gt 500000) (p(EMP.Ename) (Emp)
    X In_Dept X Dept)

13
Query Optimization (cont)
  • Given the schema on the first page of the exam,
    assume the following
  • There are unclustered hash indexes on both
    Emp.EID and Dept.DID.
  • There are clustered B-Tree Indexes (alt 1 data
    records store in the B-Tree leaves) on both
    Emp.EID and Dept.DID.
  • There is an unclustered Btree index on
    (Emp.Salary, Emp.EID)
  • You can assume that Btree-Indexes have heights of
    3. And the cost for getting to the data record
    using a hash index is 2.
  • What is the best access plan for the following
    query (4pts).
  • SELECT E.eid, E.Salary
  • FROM Emp E
  • WHERE E.Salary gt 100,000
  • Very best is unclustered B-tree, index only plan
  • Draw all possible join orders considered by a
    System-R style query optimizer for the following
    query. (4pts)
  • SELECT E.eid, D.dname
  • FROM Emp E, In_Dept I, Dept D
  • WHERE E.sal 64,000 AND D.budget gt 500,000 AND
  • E.eid I.eid AND I.did D.did
  • (E X I) X D)
  • (I X E) X D)

14
Where are we?
  • This week Database Design (me)
  • The ER Model as a design tool
  • Next Week Concurrency Control Recovery (Prof.
    Roth)
  • Week after More Database Design (me)

15
Today and Thursday The ER Model
  • A different data model from Relational
  • Most commonly used for database design
  • Today Details of the ER Model
  • Thursday Translating ER Schemas to Relational

16
Databases Model the Real World
  • Data Model translates real world things into
    structures computers can store
  • Many models
  • Relational, E-R, O-O, Network, Hierarchical, etc.
  • Relational
  • Rows Columns
  • Keys Foreign Keys to link Relations

Enrolled
Students
17
Database Design
  • The process of modelling things in the real world
    into elements of a data model.
  • I.E., describing things in the real world using a
    data model.
  • E.G., describing students and enrollments using
    various tables with key/foreign key relationships
  • The Relational model is not the only model in use

18
A Problem with the Relational Model
CREATE TABLE Students (sid CHAR(20), name
CHAR(20), login CHAR(10), age INTEGER, gpa
FLOAT)
CREATE TABLE Enrolled (sid CHAR(20), cid
CHAR(20), grade CHAR(2))
  • With complicated schemas, it may be hard for a
    person to understand the structure from the data
    definition.

Enrolled
Students
19
One Solution The E-R Model
  • Instead of relations, it has
  • Entities and Relationships
  • These are described with diagrams,
  • both structure, notation more obvious to humans
  • A visual language for describing schemas

20
Steps in Database Design
  • Requirements Analysis
  • user needs what must database do?
  • Conceptual Design
  • high level descr (often done w/ER model)
  • Logical Design
  • translate ER into DBMS data model
  • Schema Refinement (in 2 weeks)
  • consistency, normalization
  • Physical Design (discussed already)
  • indexes, disk layout
  • Security Design
  • who accesses what, and how

21
Conceptual Design
  • Define enterprise entities and relationships
  • What information about entities and relationships
    should be in database?
  • What are the integrity constraints or business
    rules that hold?
  • A database schema in the ER Model is
    represented pictorially (ER diagrams).
  • Can map an ER diagram into a relational schema.

22
ER Model Basics
  • Entity
  • Real-world thing, distinguishable from other
    objects.
  • Entity described by set of attributes.
  • Entity Set A collection of similar entities.
    E.g., all employees.
  • All entities in an entity set have the same set
    of attributes. (Until we consider hierarchies,
    anyway!)
  • Each entity set has a key (underlined).
  • Each attribute has a domain.

23
ER Model Basics (Contd.)
  • Relationship Association among two or more
    entities. E.g., Attishoo works in Pharmacy
    department.
  • relationships can have their own attributes.
  • Relationship Set Collection of similar
    relationships.
  • An n-ary relationship set R relates n entity sets
    E1 ... En each relationship in R involves
    entities e1 ? E1, ..., en ? En

24
ER Model Basics (Cont.)
subor-dinate
super-visor
  • Same entity set can participate in different
    relationship sets, or in different roles in the
    same set.

25
Key Constraints
  • An employee can work in many departments a dept
    can have many employees.

In contrast, each dept has at most one manager,
according to the key constraint on Manages.
26
Participation Constraints
  • Does every employee work in a department?
  • If so, this is a participation constraint
  • the participation of Employees in Works_In is
    said to be total (vs. partial)
  • What if every department has an employee working
    in it?
  • Basically means at least one

since
since
name
dname
name
dname
ssn
lot
budget
did
budget
did
Departments
Employees
Manages
Works_In
since
27
Weak Entities
  • A weak entity can be identified uniquely only by
    considering the primary key of another (owner)
    entity.
  • Owner entity set and weak entity set must
    participate in a one-to-many relationship set
    (one owner, many weak entities).
  • Weak entity set must have total participation in
    this identifying relationship set.

Weak entities have only a partial key (dashed
underline)
28
Binary vs. Ternary Relationships
  • If each policy is owned by just 1 employee

Bad design
Key constraint on Policies would mean policy
can only cover 1 dependent!
  • Think through all the constraints in the 2nd
    diagram!

29
Binary vs. Ternary Relationships (Contd.)
  • Previous example illustrated case when two binary
    relationships were better than one ternary
    relationship.
  • Opposite example a ternary relation Contracts
    relates entity sets Parts, Departments and
    Suppliers, and has descriptive attribute qty. No
    combination of binary relationships is an
    adequate substitute.

30
Binary vs. Ternary Relationships (Contd.)
qty
Departments
Parts
Contract
VS.
Suppliers
Parts
Departments
needs
can-supply
deals-with
Suppliers
  • S can-supply P, D needs P, and D
    deals-with S does not imply that D has agreed
    to buy P from S.
  • How do we record qty?

31
Summary so far
  • Entities and Entity Set (boxes)
  • Relationships and Relationship sets (diamonds)
  • binary
  • n-ary
  • Key constraints (1-1,1-M, M-M, arrows on 1 side)
  • Participation constraints (bold for Total)
  • Weak entities - require strong entity for key
  • Next, a couple more advanced concepts
Write a Comment
User Comments (0)
About PowerShow.com