Title: COMP 30053905 Database Systems Data Modelling Week 7
1COMP 3005/3905Database Systems - Data Modelling
Week 7
2From Conceptual Model to Relational Design
3Conversion of E-R Design
- Simplest solution involves
- creating a relation for each entity type
- each attribute of the entity becomes an attribute
(or column) of the relation - creating a relation for each relationship type
- where identifiers of the related entities become
attributes (columns) of the relation - and the attributes of the relationship type
become attributes (columns) in the relation
4Alternative Conversions
- A many-to-one binary relationship can be
represented by having the identifying attributes
of the one side added as columns in the relation
representing the entity type on the many side - If participation is partial, use null values
5Alternative Conversions
- e.g. Employee works-in Department (N1)
- each employee works-in at most one department
- Employee relation becomesEmp(StaffID, Name,
..., DeptID)
6Converting Relationships
- 11 - Both Mandatory Entities
7Converting Relationships
8Converting Relationships
9Converting Relationships
10Converting Relationships
11Converting Relationships
12Converting Relationships
- 11 Both Mandatory - Recursive
13Converting Relationships
- 11 Both optional - Recursive
14Converting Relationships
- 1M - One optional - Recursive
15Converting Relationships
- 1M - Both mandatory - Recursive
16Converting Relationships
- MM -Both Mandatory - Recursive
17Converting Relationships
- 111 - All mandatory - Ternary
18Converting Relationships
- KMN - All mandatory - Ternary
19Converting Relationships
- 1MN - All mandatory - Ternary
20Converting Relationships
- 11N - All mandatory - Ternary
21Converting Relationships
22Converting Relationships
23Converting Extended E-R Designs
- Composite (record) attributes flatten out so that
each element of the composite record becomes a
column in the relation for the entity type - Multivalued (set) attributes are modelled as a
separate relation with columns for the identifier
of the entity and an element of the attribute set
24Converting Extended E-R Designs
- Specialisation and generalisation alternatives
- have a single table for the superclass
- columns for all attributes from all subclasses
- many entries are null
- have a separate table for each subclass
- columns for attributes of superclass as well as
subclass attributes - have a table for the superclass and also tables
for each subclass - superclass id is one column in each subclass
25(No Transcript)
26Normalisation - Week 8
27Logical Definitions
- To precisely define the normal forms which
represent good design we need careful definition
of characteristics of the data, in particular - data dependencies
- keys, superkeys and candidate keys
28Functional Dependency
- Given R(A, B, C, D, E) then A -gt B if there are
no tuples in R - with the same A value
- and different B values
- We say A determines B or B depends on A
- The functional dependency expresses properties of
the real world. It has to be decided based on
interpretation of the real world constraints
29Functional Dependency
- In the definition above, A and B can be sets of
attributes as well as individual attributes - In this case we describe A -gt B as a full
functional dependency if there is no subset of A,
which also determines B - A relation may have several functional
dependencies constraining it - A functional dependency can involve all the
attributes of a relation e.g. A, C -gt B, E, D
30Example Functional Dependency
- Supplies(PartNo, SuppName, SuppAddress)
- SuppName -gt SuppAddress
- as each supplier has a single address (in this
example) - So the anomalous case of having two different
addresses entered for a supplier in this relation
would contradict that functional dependency
31Another Example
- Duty(ID, Project, Location)
- ID, Project -gt Location
- each combination of ID and Project has a single
location - Location -gt Project
- no more than one project can occur at each
location
32Keys
- Once the functional dependencies are known for a
given relation we can determine the keys - Superkey
- a set of attributes on which all the attributes
of the relation are functionally dependent - i.e. no two tuples can have the same values for
the superkey attributes - Candidate Key
- a set of attributes that is a superkey where no
subset forms a superkey
33Keys
- Database designer should choose one candidate key
and use it as the identifier for each relation -
the primary key - Most relational database systems provide system
support for nomination of keys
34Examples
- Supplies(PartNo, SuppName, SuppAddress)
- SuppName -gt SuppAddress
- candidate key is (PartNo, SuppName)
- Duty(ID, Project, Location)
- ID, Project -gt Location Location -gt Project
- two candidate keys
- (ID, Project)
- (ID, Location)
35Derived Functional Dependencies
- Given a relational schema and a list of
functional dependencies, we want to see if other
dependencies can be derived - If PartNo -gt SuppName andSuppName -gt
SuppAddressthen PartNo -gt SuppAddress
36Derived Functional Dependencies
- Shipment
- (PartNo, Colour, SuppName, SuppAddress)
- if PartNo -gt Colour and SuppName -gt Addressthen
PartNo, SuppName -gt Colour, Address - Such derivations have been formalised
37Armstrongs Axioms
38 Normal Forms
39First Normal Form
- A table is in first normal form (1NF) if it has
no multi-valued (repeating) fields - i.e. the table obeys the definition of a
relation - All standard relational database products enforce
this as a matter of course
40Second Normal Form
- Given a relation R, and a set of functional
dependencies F on Rif every non-prime attribute
is fully functionally dependent on each candidate
key then the relation is in second normal form
(2NF). - i.e. no partial key dependency
41Third Normal Form
- A relation R is said to be in third normal form
(3NF) if for any functional dependency X -gt A,
where A is a single attribute not in X,then
either - X is a superkey for R
- or A is a prime attribute in R
- i.e. No transitive dependency
42Boyce-Codd Normal Form
- Suppose we are given a relationR(X1, X2, X3,
..., Xn) and a list of the functional
dependencies which hold for Rthen R is in
Boyce-Codd normal form (BCNF) if for every
functional dependency X -gt Y - either Y is a subset of X
- or X is a superkey for R
43BCNF Example
- Supplies(PartNo, SuppName, SuppAddress)
- Functional dependencies
- PartNo -gt SuppNameSuppName -gt SuppAddress
- This is not in BCNF as SuppName is on the left
hand side of a functional dependency and is not a
superkey
44BCNF Example
- Duty(EmpNo, Project, Location)
- Functional dependencies EmpNo, Project -gt
Location - This is in BCNF as EmpNo, Project is a
superkey for the relation Duty
45Improving a Design
- If a database conceptual schema includes one or
more relations that are not in BCNF, then seek
another design for the same collection of facts - One common approach for dealing with the problem
of a relation not being in BCNF is decomposition
46Decomposition
- If relation R is not in the desired normal form,
decompose it into a set of equivalent relations
which contain between them the same set of
attributes as R - The contents of each of the new decomposed
relations is a projection of some subset of
attributes of R
47Decomposition
- Not all ways of decomposing a relation are
appropriate - Some queries may be answered in the original
design, but not in the decomposed relations - e.g. consider
- Supplies(PartNo, SuppName, SuppAddress)
- PartNo -gt SuppName
- SuppName -gt SuppAddress
48Decomposition
- Decompose this relation into R1(PartNo,
SuppAddress)R2(SuppName, SuppAddress) - Valid decompositions in terms of projections, but
we can no longer answer the query Which
supplier supplies part 514?
49Lossless Join Condition
- We say a decomposition has the lossless join
property provided that for every legal state of
the initial relation R, the natural join of the
corresponding decomposed relations is equal to
the original state - Note that the natural join always includes
theoriginal state. The lossless join condition
says that no spurious tuples are produced
50Theorem
- This important theorem is presented here without
proofFor every relation R there exists a
decomposition that has lossless join and where
every new relation is in BCNF
51Example
- Note that you have to consider the whole set of
dependencies presente.g. R(A, B, C) with
functional dependencies - A -gt B and B -gt C
- decomposes with lossless join to R1(A,
B) and R2(B, C)
52Example
- But the same relation with only the functional
dependencyA -gt Bdecomposes without lossless
join to R1(A, B) and R2(B, C)
53Preservation of Dependencies
- A decomposition preserves dependencies if the set
of dependencies generated by those holding in the
new relations is equal to the set of dependencies
held in the original relation - We have to consider all dependencies in the
original, not just a generating, or covering, set
54When to stop normalisation?
- The standard 3NF decomposition eliminates most
anomalies and makes it possible to verify
efficiently that desired functional dependencies
remain valid when the database is updated - Over normalisation, decomposing into more tables
than necessary should be avoided because of the
overheads arising from joins on queries
55Further Normal Forms
- Further normal forms exist and deal with
particular unusual anomalies or generalise the
theory - 4NF is based on a new type of dependency called a
multi-valued dependency - Other normal forms are 5NF and Projection/Join
Normal Form
56Why Further Normal Forms?
- Consider a relational database which models a
system about teachers, the textbooks they
recommend, and the courses they teach, with the
following constraints - a teacher can teach many courses
- a course can be taught by many teachers
- a teacher recommends the same set of textbooks
for all courses they teach - not all teachers recommend the same set of
textbooks
57Why Further Normal Forms?
58Dependencies and Keys?
- Are there any non-trivial functional
dependencies in this relation? - No, none of the constraints map to functional
dependencies. - What are the superkeys for the relation?
- You need each of the three attributes to forma
key for the relation - What normal form is the relation in?
- BCNF, as the only functional dependencies are
those like - Teacher, Textbook, Course -gt Teacher
59So what is the problem?
- Consider the following update scenario
- fekete wants to promote his new book as a
recommended textbook in all of the courses he
teaches - What table updates are required?
- there are as many row inserts required as there
are courses which fekete teaches - Wherever there is such duplication we have the
potential for anomalous information to be stored
in error - i.e. data which contravenes the constraints
60Multivalued Dependency
- The problem with this particular relation is
caused by the presence of multivalued
dependencies (MVDs). - Informally
- the set of textbooks a teacher recommends is
independent of which courses the teacher is
teaching - the set of courses a teacher teaches is
independent of the textbooks that teacher
recommends
61Multivalued Dependency
- These facts are expressed as the multivalued
dependencies - Teacher -gt-gt Textbook
- Teacher -gt-gt Course
- Notice the complementary nature of the
multivalued dependencies - The functional dependencies we defined earlier
are a special case of a multivalued dependency
62Fourth Normal Form
- Fourth Normal Form (4NF) parallels BCNF, but uses
the extended idea of multivalued dependencies - Informally, a relation is in Fourth Normal Form
if all the non-trivial multivalued dependencies
are the results of keys for the relation - Recall that a functional dependency is a special
case of a multivalued dependency, so 4NF includes
BCNF
63How far to go with normalisation
- There are further normal forms defined which will
not be considered in this course - How far should one go with normalisation?
- 3NF? BCNF? 4NF?
- Entity relationship methodology and translation
to the relational model will usually provide a
good enough relational implementation - usually in 3NF, BCNF, often in 4NF
- Normalisation can then be used for dealing with
difficult dependencies or to justify the schema
chosen
64E-R Design and Normalisation
- A relational implementation can be approached
either from a conceptual model using the
Entity-Relationship approach, or by the process
of normalisation - A good E-R model produces a good relation
implementation, where goodness is now measured in
terms of the normal form to which the model
complies
65Integrity and Constraints
- Integrityensuring that the database state
satisfies all constraints after each update
66Approaches
- One can place checks of integrity into each
database application - Usually better to have the DBMS make these checks
internally, as it is generally - safer
- easier to manage
- easier to modify
- Again, this is usually seen as a job of the DBA
67Constraints
- SQL as a data definition language includes
commands that declare constraints - Current SQL standard dictates that integrity
constraints are specified in the create table
command - Constraints may be assigned a name, which is used
in error reporting when an update would
contravene a constraint - Constraints are specified in a non-procedural way
- Key constraints maintain referential integrity of
a database
68Key Constraint
- Declare that a column of a relation is not to
have duplicate or null values - create table Part (
- PartNo integer unique not null,Description char
(20),Weight integer,Colour char(10), - Instock integer
- )
69Alternative Syntax
- create table Part (
- PartNo integer primary key,Description char(20)
,Weight integer,Colour char(10), - Instock integer
- )
- Note that some systems do not enforce primary key
unless an index is present on that attribute
70Alternative Syntax
- create table Part (
- PartNo integer
- constraint pk_Part primary key,Description char(
20),Weight integer,Colour char(10), - Instock integer
- )
- In this case the constraint is assigned a name
which becomes available in error reporting
71Simple Constraints
- SQL92 allows flexible expression of a check
constraint, a condition which must not be false
for any row of a table - create table Part (...,
- Instock integer,check (weight gt 0 and Instock
gt0) - )
- Not all DBMS enforce check conditions
72Advanced Constraints
- Use a subquery in check to express more complex
constraints - check (
- Instock lt (
- select sum(Maxqty)
- from Availability
- where
- Availability.PartNo PartNo)
- )
73Advanced Constraints
- Define a multi-attribute primary key
- create table Availability (PartNo integer,Supp
lierId integer,Price integer,Maxqty integer
, - constraint avail_pk
- primary key (PartNo, SupplierId)
- )
74Advanced Constraints
- Instead of defining constraints in a table
declaration, an assertion can be declared on its
own - create assertion no_overload_256
- check (
- (select sum (P.Weight A.Maxqty)
- from Part P, Availability A
- where A.PartNo P.PartNo
- and A.SupplierId 256) lt 1000
- )
75Foreign Key Constraint
- An attribute is a foreign key for a relation if
every value it assumes occurs in an attribute of
another relation - it usually has the same name in both relations
- it is usually a primary key in the other relation
- create table Availability (...constraint
supp_fkforeign key (SupplierId)references
Supplier(SupplierId) - )
76Triggers
- Describe an action to take when an event takes
place in a database - events like insert, delete, update
- more powerful than constraints
- procedural (triggered procedures)
- not in SQL92, but present in some systems
- will be standardised in SQL3
77(No Transcript)
78Database Design is Difficult
- Michael Stonebraker () presents 6 main reasons
why people have trouble with database design - difficulty in tuning the schema
- unwillingness to iterate the schema
- unrealistic application design
- failure to load and test a realistic database
early - difficulty in capturing the rest of the schema
- excessive focus on formal modelling
- ( Object-Relational DBMSs - The Next Great Wave
- - Stonebraker Moore, Morgan Kaufmann 1996)
79Performance Tuning the Schema
- Find all the employees on the first floor
- Consider the normalised model where there is a
separate employee and department relation, with
the floor being an attribute of the department
relation - This form would require a join
- Alternative is to denormalise and include the
department dependent floor attribute in the
employee relation - Deciding between the options requires expertise
in the underlying DBMS and the applications
80Iterate the Schema
- Database schema design is an iterative process
- Changes in the schema usually requires
maintenance of applications - It is usually a mistake to freeze the schema
early in an attempt to save flow-on changes to
applications - Mistakes in the schema usually cost more in the
long run - Expect schema design to consume significant
resources (10-20 of a projects budget)
81Realistic Application Design
- Choose the most appropriate site for your data
processing - For example, input checking may best be done
within the data entry application, rather than by
the use of complex constraints or triggers - DBMS features usually incur a cost which has to
be weighed up against any potential advantage
82Set Realistic Performance Goals
- A common mistake is to fail to test prototypes of
the system with realistic data sets - Load a large database early
- Test mock-ups of the transactions you expect to
be most common and/or most critical - Identifying performance problems early gives you
some chance of rectifying the them
83Capture the Business Rules
- An application must enforce a wide variety of
business rules in most application domains - Extracting the rules from the users can be
difficult - Knowing how to react to breaches of the rules can
be difficult
84Focus on Formal Modelling
- A large range of methodologies and accompanying
tools exist for database design and database
application analysis - Finding the right tool for the job can be a
difficult task in itself - Dont let the search for a modelling tool or
methodology distract you from design for too long
85Evaluation of Relational Designs
- A good design of a database schema
- avoids anomalies, which are possibilities for
inconsistent update - avoids unnecessary redundancy, where information
is repeated - These concepts are formalised by requiring that a
relational schema be in normal form - Sensible Entity-Relationship design usually leads
to a good relational design
86A Bad Example
- Supplies(PartNo, SuppName, Address)
- redundancy address is repeated for each part
supplied - update anomaly an address can be changed for
one part, but not for all from that supplier - deletion anomaly if a supplier supplies no
parts at the moment, the name and address are lost
87A Bad Example
Supplies
88A Better Example
- Associate the attributes which describe
individual suppliers together into one relation - Model the act of supply in a separate relation
- Supplier(SuppName, Address)
- Supplies(PartNo, SuppName)
89A Better Example
Aside What Relational Algebraic
operation gets us from the bad example to the
better example?
Supplies
Supplier
90Evaluation of Relational Designs
- A good design
- avoids anomalies, which are possibilities for
inconsistent update - avoids unnecessary redundancy, where information
is repeated - These concepts are formalised by requiring that a
relational schema be in normal form - Sensible E-R design usually leads to a good
relational design
91(No Transcript)