Title: Relational Data Model
1 Relational Data Model
- Overview
- introduced by E. F. Codd (1970)
- strong theoretical foundations
- simplicity
- numerous commercial systems
- has a single data-modeling concept relation
- a table of values (informally )
- Each column in the table has a column header
called an attribute. - Each row is called a tuple.
- loosely speaking, represents databases as a
collection of relations and constraints
2Formal Relational Concepts (1)
- Domain
- A set of atomic (indivisible) values.
- Attribute
- A name to suggest the meaning that a domain plays
in a particular relation. - Each attribute Ai has a domain dom(Ai)
- Relation schema
- A relation name R and a set of attributes Ai that
define the relation - Denoted by R(A1, A2, ... , An)
- (ex) Movie(title, year, length, filmType)
- Degree of a relation
- the number of attributes
3Formal Relational Concepts (2)
- Tuple t (of R(A1, A2, ,An) )
- A (ordered) set of values t ltv1, v2, ... , vngt
where each value vi is an element of dom(Ai).
Also called an n-tuple. - Relation instances, r(R)
- A set of tuples
- r(R) t1, t2, ... , tm, or alternatively
- r(R) ? dom(A1) ? dom(A2) ? ... ? dom(An)
- Relational database schema
- A set S of relation schemas that belong to the
same database. - S R1, R2, ... , Rn
4Suppliers relation S
CITY
STATUS
NAME
S
Domain
Primary key
S SNAME STATUS CITY
S1 Smith 20
London S2 Jones 10
Paris S3 Blake 30
Paris S4 Clark
20 London S5 Adams
30 Athens
S
Relation
Cardinality
Tuples
Attributes
Degree
5Characteristics of Relations
- Ordering of tuples in a relation r(R)
- The tuples are not considered to be ordered, even
though they appear to be in the tabular form - Ordering of attributes in a relation schema R
(and of values within each tuple) is immaterial - All values are considered atomic (indivisible).
- Relational databases do not allow repeating
groups - repeating group a column or combination of
columns that contains several data values in row - A special null value is used to represent missing
value
6Relational terminology
7From ODL to relational schema
- Basically one relation for each class, and one
attribute for each property - interface Movie attribute string
title attribute integer year attribute
integer length attribute enum Film color,
blackAndWhite filmtype - Movie(title, year, length, filmType)
8Non-atomic attribute in class
- Remember, relational model allows atomic values
to appear in relation - As for record structures, expand it making one
attribute of relation for each field of the
structure - interface Star attribute string
name attribute Struct Addr string street,
string city address - Star(name, street, city)
9Set attribute
- One approach is to make one tuple for each value
of a set-valued attribute - interface Star attribute string
name attribute SetltStruct Addr string street,
string citygt address - Star(name, street, city)
10Other type constructor (1)
- For a bag (multiset)
- add to relation schema another attribute count
representing the number of times that each
element is a member of the bag - interface Star attribute string
name attribute BagltStruct Addr string street,
string citygt address
11Other type constructor (2)
- For a list, add a new attribute position,
indicating the position in the list - A fixed-length array can be represented by
attributes for each position in the array
12Single-valued relationship
- Relational model does not support the notion of a
pointer, so simulate the effect of pointers by
values that represent the related objects - Include key attribute of referenced class
- interface Movie ... attribute enum Film
color, blackAndWhite filmtype relationship
Studio ownedBy inverse Studioowns //
assume studioName is key of the Studio class
13Multivalued relationship
- May represent a set of related objects by
creating one tuple for each value - Note there is redundancy
14What if there is no key
- ODL permits two objects in a class to have
exactly the same values for all properties - Invent a new attribute that can be used as key
15Relationship and its inverse
- In the ODL model, the relationship and its
inverse are both needed ! - We cannot follow a pointer backwards
- However, representing both relationship and its
inverse in relational schema is redundant ! - WHY ???
16Steps in Logical Database Design
- using ER Model Approach
- Major Steps
- identify entity types
- identify relationships between entity types
- determine appropriate attributes for entity and
relationship types - convert ER diagram into the system dependent
model - Hierarchical model
- Network model
- Relational model
- Object-oriented model
- Normalization
17Regular entity type
- create a relation.
- for composite attribute, may include only the
simple component. - choose a primary key.
- Example
- Movies(title, year, length, filmType)
- Stars(name, address) or Stars(name, street, city)
18Binary MN relationship type R
- create a new relation S.
- include all attributes of R in S.
- include primary keys of the participating
relations as foreign key in S - Example
- key of the Stars entity type starName
- key of the Movies entity type title year
- key of the Studios entity type StudioName
- Owns(title, year, StudioName)
- Stars-In(title, year, starName)
19Regular binary 1N or 11 relationship R
- For 11 or 1N relationship types, we may not
create a new relation optionally - Let S, T participating entity types (S N-side)
- include the primary key of T as foreign key in S.
- include all attributes of R in S.
- Example
- Movies(title, year, length, filmType, StudioName)
- StudioName is a foreign key
- Studios(name, address)
- no schema for the Owns relationship
20For multivalued attribute A
- create a new relation R that includes the
multivalued attribute - include the primary key attribute K of the
relation that has A as an attribute - primary key combination of A and K
- Department(number, name)
- DeptLocation(number, location)
21N-ary relationship type (1)
- create a new relation S.
- include primary keys of participating entity
types as foreign keys in S. - include all attributes of n-ary relationship.
- primary key of S usually a combination of all
foreign keys - Contracts(starName, title, year, studioOfStar,
producingStudio)
22N-ary relationship type (2)
Quantity
SName
ProName
SUPPLY
SUPPLIER
PROJECT
PART
PartNo
PROJECT
SUPPLIER
SName
ProName
PART
SUPPLY
PartNo
SName ProName PartNO
Quantity
FK
FK
FK
23Weak entity type
- create a relation R
- include the primary key of the owner as foreign
key in R - primary key in R
- primary key in the owner partial key of the
weak entity - Studios(name, addr)
- Crews(number, StudioName)
- Note, there is no separate relation for
identifying relationship type
24Converting subclasses to relations
- Recall the distinction
- In ODL, an object belongs to exactly one class.
An object inherits properties from all its
superclasses but technically is not a member of
the superclasses - In ER model, an entity belongs to several entity
sets that are related by isa relationships.
Thus, the linked entities together represent the
object and give that object its properties -
attributes and relationships
25Relational representation of ODL subclasses
- Every subclass has its own relation that
represents all the properties of that subclass
including all its inherited properties - interface Cartoon Movie relationship
SetltStargt voices interface MurderMystery
Movie attribute string weapon interface
Cartoon-MurderMystery Cartoon, MurderMystery
- Movie(title, , starName)Cartoon(title, ,
starName, voices)MurderMystery(title, ,
starName, weapon)Cartoon-MurderMystery(title, ,
starName, voices, weapon)
26Representing ISA in the relational model
- For each entity set, create a relation that has
attributes of that entity set alone as well as
key attributes of related entity sets - There is no relation created for an isa
relationship - Comparison
- ODL translation keeps all properties of an object
together in one relation - ER translation repeats the key for an object once
for each of the entity sets or relationships to
which that entity belongs
27- Movies(title, year, length, filmType)
- MurderMysteries(title, year, weapon)
- Cartoons(title, year)
- Voices(title, year, starName)
- Why we need the Cartoon relation whose attributes
are a subset of the Movies relation - How to represent a silent cartoon movie without
Cartoon relation - What about Cartoon-MurderMystery ?
28Using null values
- interface Cartoon Movie relationship
SetltStargt voices interface MurderMystery
Movie attribute string weapon interface
Cartoon-MurderMystery Cartoon, MurderMystery
- Movie(title, year, length, filmType, studioName,
starName, voice, weapon) - Use null value when not applicable !
29Functional Dependency (FD)
- Definition
- For any two tuples t1 and t2 in R such that t1X
t2X, we must have t1Y t2Y where X and
Y are sets of attributes in relation R - The values of the X component of a tuple uniquely
(functionally) determine the value of the Y
component. - Notation X ? Y
- FD is a property of the relation schema
(intension) of R that should hold all relation
instances (extensions) all the times - Example
- every key K always functionally determines any
subset of attributes Y of R i.e. K ? Y
30Example on FD
- title year ? length
- title year ? filmType
- title year ? studioName
- title year ? length filmType studioName
- title year ? starName (false)
31Key, superkey, primary key, candidate key
- Superkey of a relation schema RA1, A2, ... An
is a set of attributes S with the property that
no two tuples t1 and t2 in any relation instance
of R will have t1S t2S - Candidate key a minimal superkey
- Candidate key, Primary key
- If a relation schema has more than one minimal
super key, each is called a candidate key - One of candidate keys are designated to be a
primary key - Example
- (title, year, starName) key
- (title, year, starName, length) superkey
32Discovering keys for relations
- If a relation comes from an entity set then the
key for the relation is the key attributes of
this entity set or class - If a relation R comes from a relationship
- many-many keys of both connected entity sets are
the key attributes for R - many-one from entity set E1 to entity set E2 key
attributes of E1 are key attributes of R, but
those of E2 are not - one-one key attributes for either of the
connected entity sets are key attributes of R. - If a multiway relationship R has an arrow to
entity set E, then there is at least one key for
the corresponding relation that excludes the key
of E - Movies(title, year, length, filmType) Stars(name,
address)Owns(title, year, studioName) //
many-oneStars-in(title, year, starName) //
many-many
33Keys for relations derived from ODL
- If there is no key at all of an ODL class,
introduce an attribute that is a surrogate for
the object identifier of objects - There are certain cases in which the key
attributes for the class are not a key for the
relation - WHY ???
- In general, if the relation for C represents
several multivalued relationships from C, then
the keys for all the classes that these
relationships connected to C must be added to the
key for C - The result is the key for C relation
34Rules for FDs
- splitting rule
- We can replace a FD A1A2An ? B1B2Bm by a set of
FDs A1A2An ? Bi for i 1, m - combining rule
- We can replace a set of FDs A1A2An ? Bi for i
1, m by a single FD A1A2An ? B1B2Bm - A1A2An ? B1B2Bm is
- trivial if the Bs are a subset of the As
- nontrivial if at least one the Bs is not among
the As - completely nontrivial if none of the Bs is also
one of the As - We can always remove from the right side of a DF
those attributes that appear on the left - title year ? year length // the second year
may be dropped
35Armstrong's axioms (inference rules)
- To infer new dependencies from a (given) set F of
dependencies i.e. F X ? Y - Let X,Y,Z be a set of attributes
- Reflexive rule if Y ? X then X ? Y
- Augmentation rule if X ? Y then XZ ? YZ
- Transitive rule if X ? Y and Y ? Z then X ? Z
- Def F (closure of F) is the set of all
dependencies which are logically implied by F - A set of inference rules is complete if given the
set F the rule allows us to determine all
dependencies in F - A set of rule is sound if using them we cannot
deduce any dependency not in F - Armstrong's axioms are sound and complete
36Additional inference rules
- Decomposition rule if X ? YZ then X ?
Y (proof) 1. X ? YZ (given) 2. YZ ? Y
(reflexive) 3. X ? Y (transitive) - Union rule if X ? Y and X ? Z then X ?
YZ (proof) 1. X ? Y (given) 2. X ? Z
(given) 3. X ? XY (augmentation on 1 with
X) 4. XY ? YZ (augmentation on 2 with Y) 5.
X ? YZ (transitive with 3 and 4) - Pseudotransitive rule if X ? Y and WY ? Z then
WX ? Z (proof) 1. X ? Y (given) 2. WY ? Z
(given) 3. WX ? WY (augmentation on 1 with
W) 4. WX ? Z (transitive with 2 and 3)
37Closure of attributes
- X (closure of X with respect to F) is the set of
attributes that are functionally determined by X - Algorithm (compute X) X Xrepeat oldX
X for each functional dependency Y ? Z
in F do if Y ? X then X X ?
Zuntil (oldX X )
38Example of attribute closure (1)
- R(A,B,C,D,E,G)
- FDs 1. AB ? C 2. D ? EG 3. C ? A 4. BE ?
C 5. BC ? D 6. CG ? BD 7. ACD ? B 8. CE ?
AG - Let X BD, then X0 BD X1 BDEG (by 2) X2
BCDEG (by 4) X3 ABCDEG (by 3,5,6,8) X4
X3 Hence, X X4 ABCDEGX is a key
39Example of attribute closure (2)
- R(A,B,C,D,E,F)
- FDs 1. AB ? C 2. BC ? AD 3. D ? E 4. CF ?
B - Let X AB, then X0 AB X1 ABC (by 1) X2
ABCD (by 2) X3 ABCDE (by 3) X4 X3 Hence,
X X4 ABCDEX is not a key
40Minimal cover
- Two sets of functional dependencies E and F are
equivalent if E F - A functional dependency fd in F is redundant if
(F - fd) F - F' is a nonredundant cover (or minimal cover,
minimal base) of F if F' F and F' contains
no redundant functional dependencies - Usually the closure of F (i.e. F) is too big to
handle, so use the previous algorithm to detect
redundant functional dependency - X ? Y follows from set of dependencies S if and
only if Y are in X - There exist several minimal covers for a set of
FDs !
41Minimal cover example (1)
- R(A,B,C,D,E,F)
- S 1. AB ? C 2. BC ? AD 3. D ? E 4. CF ?
B - Does AB ? D follow from S ?
- A,B ABCDE and D ? A,B
- AB ? D follows from S
- AB ? D is redundant
- Does D ? A follow from S ?
- D DE and A ? D
- D ? A does not follow from S
- D ? A is not redundant
42Minimal cover example (2)
- R(A,B,C)
- S 1. A ? B 2. A ? C 3. B ? A 4. B ? C 5. C
? A 6. C ? B 7. AB ? C 8. BC ? A 9. AC ?
B - Minimal cover1 A ? B, B ? A, B ? C, C ? B
- Minimal cover2 A ? B, B ? C, C ? A
- Other minimal cover, too.
- Issue How to get minimal cover ?
43Normalization
- one phase in database design
- first proposed by E.F. Codd (1972)
- a process during which unsatisfactory relation
schemas are decomposed by breaking into smaller
relation schemas that possess desirable
properties - utilizes functional dependency, multivalued
dependency and join dependency
44Bad relational schema
- Anomalies
- Insertion anomalies
- Cannot record filmType without starName
- Deletion anomalies
- If we delete the last star, we also lose the
movie info. - Modification (update) anomalies
45Decomposing relations
- Given a relation R(A1, A2, , An), we may
decompose R into two relations S(B1, B2, , Bm)
and T(C1, C2, , Ck) - A1, A2, , An B1, B2, , Bm ? C1, C2,
, Ck - Tuples of relation S is projections onto B1, B2,
, Bm of relation R - Similarly for relation T
46Decomposition example
47Normal forms
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce/Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
48First normal form (1NF)
- The only attribute values permitted are single
atomic (indivisible) - Not allow a set, list, bag, etc.
- considered to be part of the formal definition of
a relations - nested relation concept (?)
49Various functional dependencies
- Prime and nonprime attribute
- an attribute of a relation schema R is called a
prime attribute of R if it is a member of any
candidate key of R - an attribute is called nonprime if it is not a
prime attribute, i.e. not a member of any
candidate key of R - Full functional dependency
- Y is said to be fully dependent on X if X ? Y and
Z \ ? Y for any X ? Z - Y is fully dependent on X if and only if
- Y is functionally dependent on X, and -
- not functionally dependent on any proper subset
of X - Partial functional dependency
- Y is said to be partially dependent on X if some
attribute can be removed from X and the
dependency still holds - Transitive functional dependency
50Second Normal Form (2NF)
- A relation schema R is in 2NF if it is in 1NF and
every nonprime attribute A in R is fully
functionally dependent on every key of R - R(SSN, pNumber, hours, eName, pName,
pLocation)SSN pNumber ? hoursSSN ?
eNamepNumber ? pName pLocationdecomposed
intoR1(SSN, pNumber, hours)R2(SSN,
eName)R3(pNumber, pName, pLocation)
51Example for 1NF and 2NF
- FIRST(S, P, Status, City, Qty) S P ? Status
City Qty S ? City StatusCity ? Status - Anomalies
- insertion cannot record City for a supplier
until he supplies something - modification City for a supplier appears many
times - deletion deletion of last tuple for S lose its
City - FIRST is not in 2NF, so decompose into
SECOND(S, Status, City) SP(S, P, Qty) S
? City StatusCity ? StatusS P ? Qty
52Third Normal Form (3NF) (1)
- A relation schema is in 3NF if it is in 2NF and
no nonprime attribute of R is transitively
dependent on any key - EMP_DEPT(SSN, Ename, Bdate, Addr, D, Dname,
Dmgrssn) FDs SSN ? Ename Bdate Addr D D ?
Dname Dmgrssn Since Dname and Dmgrssn are
transitively dependent on D (not in
3NF)ED1(SSN, Ename, Bdate, Addr, D)ED2(D,
Dname, Dmgrssn)
53Third Normal Form (2)
- Alternatively
- A relation schema R is in 3NF if whenever a
nontrivial functional dependency X --gtA holds in
R, then either (a) X is a super key of R or (b) A
is a prime attribute of R. - Violation of (a) implies X is not a superset of
any key i.e. - X could be nonprime, or
- in result, typical transitive dependency
- X could be a proper subset of a key
- in result, have a partial dependency, that is not
in 2NF ! - Violation of (b) implies A is a nonprime
attribute
54Example for 2NF and 3NF
- SECOND(S, Status, City) SP(S, P, Qty) S ?
City StatusCity ? Status - Anomalies
- insertion cannot record new Status for a city
without S - modification Status for a City appears in many
tuples - deletion delete only the second tuple for a
particular City - SECOND is not in 3NF, so decompose into
SC(S,City) CS(City, Status)
55Boyce/Codd Normal Form (BCNF)
- A relation schema R is in BCNF if whenever a
nontrivial functional dependency X ? A holds in
R, then X is a super key of R - Difference with 3NF
- Drop the second condition in 3NF that allows A to
be prime if X is not a superkey
563NF and BCNF example (1)
- MovieStudio(title, year, length, filmType,
studioName, studioAddr)title year ? length
filmType studioNamestudioName ? studioAddr - Key title, yearHence, MovieStudio is not 3NF
- Decompose intoMovieStudio1(title, year, length,
filmType, studioName)MovieStudio2(studioName,
studioAddr)Then we get a schema in BCNF.
57BCNF example (2)
58All binary relations are in BCNF
- Let A and B are all attributes
- Consider all possible cases, here there are
totally 4 cases - no nontrivial FD at all
- A, B is a key, so in BCNF
- A ? B holds, B ? A does not hold
- A is a key, so in BCNF
- B ? A holds, A ? B does not hold similarly
- A ? B holds, B ? A holds
- A and B are keys, so in BCNF
- Note that such dependencies are plausible
59Why 3NF, not BCNF ???
- Bookings(title, theater, city) theater ?
city title city ? theater - Two candidate keys title city, title theater
- Bookings is not in BCNF, so decompose
intotheater, city and theater, title - Consider following two instances
- When two relations are joined, title city ?
theater does not hold !
60Recovering info. from decomposition
- We need to make sure that projections of the
original tuples can be joined again to produce
all and only the original tuples - Now recover the original relation with join
operation - There are two spurious tuplesAttribute B is not
a key in either relation
61Lossless (non-additive) join
- To prevent spurious tuples from being generated
when a natural join is applied - Decomposition D R1, , Rm of R has the
lossless join property wrt a set of FDs if for
every legal state r of R ?ltR1gt(r) ? ? ?ltRmgt(r)
r
62Property of lossless join
- D R1, R2 of R has the lossless join property
wrt a set of FDs if (R1 ? R2) ? (R1 R2)
or (R1 ? R2) ? (R2 R1)That is, (R1 ? R2) is
a key in R1 or R2 - ExamplePCZ(phone, company, zip) phone ?
company zip ? company Decomposed
into PC(phone, company) ZC(zip, company)Since
company ( PC ? ZC) is not a key, lossy join
63Projecting FDs
- How we can find (new) FDs relevant to decomposed
relation schema ? - Suppose relation R is decomposed into relation S
and other relation, and F is a set of FDs known
to hold for R - Let X be a set of attributes that is contained in
the set of attributes of S - Compute X
- For each attribute B such that
- B is an attribute of S
- B is in X
- B is not in X
- Then, FD X ? B holds in S
64Example 3.39
- R(A, B, C, D), A ? B, B ? C
- Let S(A,C) be a decomposed relation of R
- Need to compute the closure of each subset of
A,C - Compute A
- A ABC
- C is in S
- so A ? C holds for S
- Compute C
- C C, no new FD
- Compute AC
- AC ABC, no new FD
- Hence, A ? C is the only non-trivial FD for S
65Example 3.40
- R(A, B, C, D, E), A ? D, B ? E, DE ? C
- Let S(A, B, C) be a decomposed relation of R
- Need to compute the closure of each subset of
A, B, C - Compute A AD, no new FD
- Compute B BE, no new FD
- Compute C C, no new FD
- Compute AB ABCDE, so AB ? C holds for S
- Compute BC BCE, no new FD
- Compute AC ACD, no new FD
- Compute ABC ABCDE, no new FD
- Hence, AB ? C is the only nontrivial FD for S
66Dependency preservation
- A decomposition D R1, , Rm of R is
dependency-preserving wrt a set F of FDs if
(?F(R1) ? ? ?F(Rm)) Fwhere ?F(Ri)
denotes a set of FDs X ? Y in F such that all
attributes in X ? Y are contained in Ri - We do not want FDs to be lost in the
decomposition - Always possible to have a dependency-preserving
decomposition D such that each Ri in D is in 3NF - Not always possible to find a decomposition that
preserves dependencies into BCNF
67Multivalued dependency example (1)
- EMP(eName, pName, depName) Smith X,Y John,
Anna - Must repeat every combination due to 1NF
- Two independent one-many relationships are mixed
in the same relation - eName --gtgt pNameeName --gtgt depName
68Multivalued dependency example (2)
- interface Star attribute string
name attribute SetltStruct Addr string street,
string citygt address relationship SetltMoviegt
starredIn inverse Moviestars - Note that there are no nontrivial FDs, hence it
is in BCNF
69Multivalued dependency example (3)
- A1A2 An --gtgt B1B2 Bm holds ifFor each pair
of tuples t and u of relation R that agree on all
the As, we can find in R some tuple v that
agrees - With both t and u on the As
- With t on the Bs and
- With u on all attributes of R that are not among
the As or Bs - In previous example,
- first tuple t
- second tuple u
- then, the third tuple of the previous instance
becomes v
70Multivalued dependency definition
- Let X, Y be sets of attributes in RLet Z be
compliment of X ? Y - Multivalued dependency (MVD) X --gtgt Y X
multidetermines Y holds in R if and only if each
X-value in R is associated with a set of Y-values
in a way that does not depend on Z-values - Formally, a MVD from X to Y, X --gtgt Y exists in R
iff Yxz Yxz for each X, Z, Z such that Yxz
and Yxz are nonemptywhere Yxz y? ltx,y,zgt ?
R
71MVD and FD-MVD rules
- Complementation
- If X --gtgt Y, then X --gtgt T X Y
- Augmentation
- If X --gtgt Y and V ? W, XW --gtgt YV
- Transitivity
- If X --gtgt Y and Y --gtgtZ, then X --gtgt Z Y
- If X ? Y, then X --gtgt Y (i.e. an FD is a special
case of an MVD) - Coalescence rule
- If X--gt Y, Z ? Y and for some W disjoint from Y
we have W ? Z, then X ? Z
72Fourth normal form (4NF)
- An MVD A1A2 An --gtgt B1B2 Bm for a relation R
is trivial if - As ? Bs or
- As ? Bs are all attributes of R
- A relation R is in 4NF if whenever A1A2 An
--gtgt B1B2 Bm is a nontrivial MVD, A1, A2, ,
An is a superkey
734NF example
- Star(name, street, city, title, year)name --gtgt
street city - Star relation is not in 4NF,hence decompose
into Star1(name, street, city) Star2(name,
title, year) - Note that both new relations are in 4NF
74Relationship among normal forms (1)
- In practice, it is best to have relation schemas
in BCNF or in 3NF
75Relationship among normal forms (2)