COMP 30053905 Database Systems Data Modelling Week 7 - PowerPoint PPT Presentation

1 / 91
About This Presentation
Title:

COMP 30053905 Database Systems Data Modelling Week 7

Description:

each attribute of the entity becomes an attribute (or column) of the relation ... A - B and B - C. decomposes with lossless join to. R1(A, B) and R2(B, C) ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 92
Provided by: greg99
Category:

less

Transcript and Presenter's Notes

Title: COMP 30053905 Database Systems Data Modelling Week 7


1
COMP 3005/3905Database Systems - Data Modelling
Week 7
2
From Conceptual Model to Relational Design
3
Conversion of E-R Design
  • Simplest solution involves
  • creating a relation for each entity type
  • each attribute of the entity becomes an attribute
    (or column) of the relation
  • creating a relation for each relationship type
  • where identifiers of the related entities become
    attributes (columns) of the relation
  • and the attributes of the relationship type
    become attributes (columns) in the relation

4
Alternative Conversions
  • A many-to-one binary relationship can be
    represented by having the identifying attributes
    of the one side added as columns in the relation
    representing the entity type on the many side
  • If participation is partial, use null values

5
Alternative Conversions
  • e.g. Employee works-in Department (N1)
  • each employee works-in at most one department
  • Employee relation becomesEmp(StaffID, Name,
    ..., DeptID)

6
Converting Relationships
  • 11 - Both Mandatory Entities

7
Converting Relationships
  • 11 - One optional entity

8
Converting Relationships
  • 11 - Both optional

9
Converting Relationships
  • 1M - Both mandatory

10
Converting Relationships
  • 1M Both optional

11
Converting Relationships
  • MM Both optional

12
Converting Relationships
  • 11 Both Mandatory - Recursive

13
Converting Relationships
  • 11 Both optional - Recursive

14
Converting Relationships
  • 1M - One optional - Recursive

15
Converting Relationships
  • 1M - Both mandatory - Recursive

16
Converting Relationships
  • MM -Both Mandatory - Recursive

17
Converting Relationships
  • 111 - All mandatory - Ternary

18
Converting Relationships
  • KMN - All mandatory - Ternary

19
Converting Relationships
  • 1MN - All mandatory - Ternary

20
Converting Relationships
  • 11N - All mandatory - Ternary

21
Converting Relationships
  • Generalisation Hierarchy

22
Converting Relationships
  • Subset Hierarchy

23
Converting Extended E-R Designs
  • Composite (record) attributes flatten out so that
    each element of the composite record becomes a
    column in the relation for the entity type
  • Multivalued (set) attributes are modelled as a
    separate relation with columns for the identifier
    of the entity and an element of the attribute set

24
Converting Extended E-R Designs
  • Specialisation and generalisation alternatives
  • have a single table for the superclass
  • columns for all attributes from all subclasses
  • many entries are null
  • have a separate table for each subclass
  • columns for attributes of superclass as well as
    subclass attributes
  • have a table for the superclass and also tables
    for each subclass
  • superclass id is one column in each subclass

25
(No Transcript)
26
Normalisation - Week 8
27
Logical Definitions
  • To precisely define the normal forms which
    represent good design we need careful definition
    of characteristics of the data, in particular
  • data dependencies
  • keys, superkeys and candidate keys

28
Functional Dependency
  • Given R(A, B, C, D, E) then A -gt B if there are
    no tuples in R
  • with the same A value
  • and different B values
  • We say A determines B or B depends on A
  • The functional dependency expresses properties of
    the real world. It has to be decided based on
    interpretation of the real world constraints

29
Functional Dependency
  • In the definition above, A and B can be sets of
    attributes as well as individual attributes
  • In this case we describe A -gt B as a full
    functional dependency if there is no subset of A,
    which also determines B
  • A relation may have several functional
    dependencies constraining it
  • A functional dependency can involve all the
    attributes of a relation e.g. A, C -gt B, E, D

30
Example Functional Dependency
  • Supplies(PartNo, SuppName, SuppAddress)
  • SuppName -gt SuppAddress
  • as each supplier has a single address (in this
    example)
  • So the anomalous case of having two different
    addresses entered for a supplier in this relation
    would contradict that functional dependency

31
Another Example
  • Duty(ID, Project, Location)
  • ID, Project -gt Location
  • each combination of ID and Project has a single
    location
  • Location -gt Project
  • no more than one project can occur at each
    location

32
Keys
  • Once the functional dependencies are known for a
    given relation we can determine the keys
  • Superkey
  • a set of attributes on which all the attributes
    of the relation are functionally dependent
  • i.e. no two tuples can have the same values for
    the superkey attributes
  • Candidate Key
  • a set of attributes that is a superkey where no
    subset forms a superkey

33
Keys
  • Database designer should choose one candidate key
    and use it as the identifier for each relation -
    the primary key
  • Most relational database systems provide system
    support for nomination of keys

34
Examples
  • Supplies(PartNo, SuppName, SuppAddress)
  • SuppName -gt SuppAddress
  • candidate key is (PartNo, SuppName)
  • Duty(ID, Project, Location)
  • ID, Project -gt Location Location -gt Project
  • two candidate keys
  • (ID, Project)
  • (ID, Location)

35
Derived Functional Dependencies
  • Given a relational schema and a list of
    functional dependencies, we want to see if other
    dependencies can be derived
  • If PartNo -gt SuppName andSuppName -gt
    SuppAddressthen PartNo -gt SuppAddress

36
Derived Functional Dependencies
  • Shipment
  • (PartNo, Colour, SuppName, SuppAddress)
  • if PartNo -gt Colour and SuppName -gt Addressthen
    PartNo, SuppName -gt Colour, Address
  • Such derivations have been formalised

37
Armstrongs Axioms
38
Normal Forms
39
First Normal Form
  • A table is in first normal form (1NF) if it has
    no multi-valued (repeating) fields
  • i.e. the table obeys the definition of a
    relation
  • All standard relational database products enforce
    this as a matter of course

40
Second Normal Form
  • Given a relation R, and a set of functional
    dependencies F on Rif every non-prime attribute
    is fully functionally dependent on each candidate
    key then the relation is in second normal form
    (2NF).
  • i.e. no partial key dependency

41
Third Normal Form
  • A relation R is said to be in third normal form
    (3NF) if for any functional dependency X -gt A,
    where A is a single attribute not in X,then
    either
  • X is a superkey for R
  • or A is a prime attribute in R
  • i.e. No transitive dependency

42
Boyce-Codd Normal Form
  • Suppose we are given a relationR(X1, X2, X3,
    ..., Xn) and a list of the functional
    dependencies which hold for Rthen R is in
    Boyce-Codd normal form (BCNF) if for every
    functional dependency X -gt Y
  • either Y is a subset of X
  • or X is a superkey for R

43
BCNF Example
  • Supplies(PartNo, SuppName, SuppAddress)
  • Functional dependencies
  • PartNo -gt SuppNameSuppName -gt SuppAddress
  • This is not in BCNF as SuppName is on the left
    hand side of a functional dependency and is not a
    superkey

44
BCNF Example
  • Duty(EmpNo, Project, Location)
  • Functional dependencies EmpNo, Project -gt
    Location
  • This is in BCNF as EmpNo, Project is a
    superkey for the relation Duty

45
Improving a Design
  • If a database conceptual schema includes one or
    more relations that are not in BCNF, then seek
    another design for the same collection of facts
  • One common approach for dealing with the problem
    of a relation not being in BCNF is decomposition

46
Decomposition
  • If relation R is not in the desired normal form,
    decompose it into a set of equivalent relations
    which contain between them the same set of
    attributes as R
  • The contents of each of the new decomposed
    relations is a projection of some subset of
    attributes of R

47
Decomposition
  • Not all ways of decomposing a relation are
    appropriate
  • Some queries may be answered in the original
    design, but not in the decomposed relations
  • e.g. consider
  • Supplies(PartNo, SuppName, SuppAddress)
  • PartNo -gt SuppName
  • SuppName -gt SuppAddress

48
Decomposition
  • Decompose this relation into R1(PartNo,
    SuppAddress)R2(SuppName, SuppAddress)
  • Valid decompositions in terms of projections, but
    we can no longer answer the query Which
    supplier supplies part 514?

49
Lossless Join Condition
  • We say a decomposition has the lossless join
    property provided that for every legal state of
    the initial relation R, the natural join of the
    corresponding decomposed relations is equal to
    the original state
  • Note that the natural join always includes
    theoriginal state. The lossless join condition
    says that no spurious tuples are produced

50
Theorem
  • This important theorem is presented here without
    proofFor every relation R there exists a
    decomposition that has lossless join and where
    every new relation is in BCNF

51
Example
  • Note that you have to consider the whole set of
    dependencies presente.g. R(A, B, C) with
    functional dependencies
  • A -gt B and B -gt C
  • decomposes with lossless join to R1(A,
    B) and R2(B, C)

52
Example
  • But the same relation with only the functional
    dependencyA -gt Bdecomposes without lossless
    join to R1(A, B) and R2(B, C)

53
Preservation of Dependencies
  • A decomposition preserves dependencies if the set
    of dependencies generated by those holding in the
    new relations is equal to the set of dependencies
    held in the original relation
  • We have to consider all dependencies in the
    original, not just a generating, or covering, set

54
When to stop normalisation?
  • The standard 3NF decomposition eliminates most
    anomalies and makes it possible to verify
    efficiently that desired functional dependencies
    remain valid when the database is updated
  • Over normalisation, decomposing into more tables
    than necessary should be avoided because of the
    overheads arising from joins on queries

55
Further Normal Forms
  • Further normal forms exist and deal with
    particular unusual anomalies or generalise the
    theory
  • 4NF is based on a new type of dependency called a
    multi-valued dependency
  • Other normal forms are 5NF and Projection/Join
    Normal Form

56
Why Further Normal Forms?
  • Consider a relational database which models a
    system about teachers, the textbooks they
    recommend, and the courses they teach, with the
    following constraints
  • a teacher can teach many courses
  • a course can be taught by many teachers
  • a teacher recommends the same set of textbooks
    for all courses they teach
  • not all teachers recommend the same set of
    textbooks

57
Why Further Normal Forms?
58
Dependencies and Keys?
  • Are there any non-trivial functional
    dependencies in this relation?
  • No, none of the constraints map to functional
    dependencies.
  • What are the superkeys for the relation?
  • You need each of the three attributes to forma
    key for the relation
  • What normal form is the relation in?
  • BCNF, as the only functional dependencies are
    those like
  • Teacher, Textbook, Course -gt Teacher

59
So what is the problem?
  • Consider the following update scenario
  • fekete wants to promote his new book as a
    recommended textbook in all of the courses he
    teaches
  • What table updates are required?
  • there are as many row inserts required as there
    are courses which fekete teaches
  • Wherever there is such duplication we have the
    potential for anomalous information to be stored
    in error
  • i.e. data which contravenes the constraints

60
Multivalued Dependency
  • The problem with this particular relation is
    caused by the presence of multivalued
    dependencies (MVDs).
  • Informally
  • the set of textbooks a teacher recommends is
    independent of which courses the teacher is
    teaching
  • the set of courses a teacher teaches is
    independent of the textbooks that teacher
    recommends

61
Multivalued Dependency
  • These facts are expressed as the multivalued
    dependencies
  • Teacher -gt-gt Textbook
  • Teacher -gt-gt Course
  • Notice the complementary nature of the
    multivalued dependencies
  • The functional dependencies we defined earlier
    are a special case of a multivalued dependency

62
Fourth Normal Form
  • Fourth Normal Form (4NF) parallels BCNF, but uses
    the extended idea of multivalued dependencies
  • Informally, a relation is in Fourth Normal Form
    if all the non-trivial multivalued dependencies
    are the results of keys for the relation
  • Recall that a functional dependency is a special
    case of a multivalued dependency, so 4NF includes
    BCNF

63
How far to go with normalisation
  • There are further normal forms defined which will
    not be considered in this course
  • How far should one go with normalisation?
  • 3NF? BCNF? 4NF?
  • Entity relationship methodology and translation
    to the relational model will usually provide a
    good enough relational implementation
  • usually in 3NF, BCNF, often in 4NF
  • Normalisation can then be used for dealing with
    difficult dependencies or to justify the schema
    chosen

64
E-R Design and Normalisation
  • A relational implementation can be approached
    either from a conceptual model using the
    Entity-Relationship approach, or by the process
    of normalisation
  • A good E-R model produces a good relation
    implementation, where goodness is now measured in
    terms of the normal form to which the model
    complies

65
Integrity and Constraints
  • Integrityensuring that the database state
    satisfies all constraints after each update

66
Approaches
  • One can place checks of integrity into each
    database application
  • Usually better to have the DBMS make these checks
    internally, as it is generally
  • safer
  • easier to manage
  • easier to modify
  • Again, this is usually seen as a job of the DBA

67
Constraints
  • SQL as a data definition language includes
    commands that declare constraints
  • Current SQL standard dictates that integrity
    constraints are specified in the create table
    command
  • Constraints may be assigned a name, which is used
    in error reporting when an update would
    contravene a constraint
  • Constraints are specified in a non-procedural way
  • Key constraints maintain referential integrity of
    a database

68
Key Constraint
  • Declare that a column of a relation is not to
    have duplicate or null values
  • create table Part (
  • PartNo integer unique not null,Description char
    (20),Weight integer,Colour char(10),
  • Instock integer
  • )

69
Alternative Syntax
  • create table Part (
  • PartNo integer primary key,Description char(20)
    ,Weight integer,Colour char(10),
  • Instock integer
  • )
  • Note that some systems do not enforce primary key
    unless an index is present on that attribute

70
Alternative Syntax
  • create table Part (
  • PartNo integer
  • constraint pk_Part primary key,Description char(
    20),Weight integer,Colour char(10),
  • Instock integer
  • )
  • In this case the constraint is assigned a name
    which becomes available in error reporting

71
Simple Constraints
  • SQL92 allows flexible expression of a check
    constraint, a condition which must not be false
    for any row of a table
  • create table Part (...,
  • Instock integer,check (weight gt 0 and Instock
    gt0)
  • )
  • Not all DBMS enforce check conditions

72
Advanced Constraints
  • Use a subquery in check to express more complex
    constraints
  • check (
  • Instock lt (
  • select sum(Maxqty)
  • from Availability
  • where
  • Availability.PartNo PartNo)
  • )

73
Advanced Constraints
  • Define a multi-attribute primary key
  • create table Availability (PartNo integer,Supp
    lierId integer,Price integer,Maxqty integer
    ,
  • constraint avail_pk
  • primary key (PartNo, SupplierId)
  • )

74
Advanced Constraints
  • Instead of defining constraints in a table
    declaration, an assertion can be declared on its
    own
  • create assertion no_overload_256
  • check (
  • (select sum (P.Weight A.Maxqty)
  • from Part P, Availability A
  • where A.PartNo P.PartNo
  • and A.SupplierId 256) lt 1000
  • )

75
Foreign Key Constraint
  • An attribute is a foreign key for a relation if
    every value it assumes occurs in an attribute of
    another relation
  • it usually has the same name in both relations
  • it is usually a primary key in the other relation
  • create table Availability (...constraint
    supp_fkforeign key (SupplierId)references
    Supplier(SupplierId)
  • )

76
Triggers
  • Describe an action to take when an event takes
    place in a database
  • events like insert, delete, update
  • more powerful than constraints
  • procedural (triggered procedures)
  • not in SQL92, but present in some systems
  • will be standardised in SQL3

77
(No Transcript)
78
Database Design is Difficult
  • Michael Stonebraker () presents 6 main reasons
    why people have trouble with database design
  • difficulty in tuning the schema
  • unwillingness to iterate the schema
  • unrealistic application design
  • failure to load and test a realistic database
    early
  • difficulty in capturing the rest of the schema
  • excessive focus on formal modelling
  • ( Object-Relational DBMSs - The Next Great Wave
  • - Stonebraker Moore, Morgan Kaufmann 1996)

79
Performance Tuning the Schema
  • Find all the employees on the first floor
  • Consider the normalised model where there is a
    separate employee and department relation, with
    the floor being an attribute of the department
    relation
  • This form would require a join
  • Alternative is to denormalise and include the
    department dependent floor attribute in the
    employee relation
  • Deciding between the options requires expertise
    in the underlying DBMS and the applications

80
Iterate the Schema
  • Database schema design is an iterative process
  • Changes in the schema usually requires
    maintenance of applications
  • It is usually a mistake to freeze the schema
    early in an attempt to save flow-on changes to
    applications
  • Mistakes in the schema usually cost more in the
    long run
  • Expect schema design to consume significant
    resources (10-20 of a projects budget)

81
Realistic Application Design
  • Choose the most appropriate site for your data
    processing
  • For example, input checking may best be done
    within the data entry application, rather than by
    the use of complex constraints or triggers
  • DBMS features usually incur a cost which has to
    be weighed up against any potential advantage

82
Set Realistic Performance Goals
  • A common mistake is to fail to test prototypes of
    the system with realistic data sets
  • Load a large database early
  • Test mock-ups of the transactions you expect to
    be most common and/or most critical
  • Identifying performance problems early gives you
    some chance of rectifying the them

83
Capture the Business Rules
  • An application must enforce a wide variety of
    business rules in most application domains
  • Extracting the rules from the users can be
    difficult
  • Knowing how to react to breaches of the rules can
    be difficult

84
Focus on Formal Modelling
  • A large range of methodologies and accompanying
    tools exist for database design and database
    application analysis
  • Finding the right tool for the job can be a
    difficult task in itself
  • Dont let the search for a modelling tool or
    methodology distract you from design for too long

85
Evaluation of Relational Designs
  • A good design of a database schema
  • avoids anomalies, which are possibilities for
    inconsistent update
  • avoids unnecessary redundancy, where information
    is repeated
  • These concepts are formalised by requiring that a
    relational schema be in normal form
  • Sensible Entity-Relationship design usually leads
    to a good relational design

86
A Bad Example
  • Supplies(PartNo, SuppName, Address)
  • redundancy address is repeated for each part
    supplied
  • update anomaly an address can be changed for
    one part, but not for all from that supplier
  • deletion anomaly if a supplier supplies no
    parts at the moment, the name and address are lost

87
A Bad Example
Supplies
88
A Better Example
  • Associate the attributes which describe
    individual suppliers together into one relation
  • Model the act of supply in a separate relation
  • Supplier(SuppName, Address)
  • Supplies(PartNo, SuppName)

89
A Better Example
Aside What Relational Algebraic
operation gets us from the bad example to the
better example?
Supplies
Supplier
90
Evaluation of Relational Designs
  • A good design
  • avoids anomalies, which are possibilities for
    inconsistent update
  • avoids unnecessary redundancy, where information
    is repeated
  • These concepts are formalised by requiring that a
    relational schema be in normal form
  • Sensible E-R design usually leads to a good
    relational design

91
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com