Title: Enhanced EntityRelationship Modeling
1Chapter 12
- Enhanced Entity-Relationship Modeling
2Chapter 12 - Objectives
- Most useful additional data modeling concepts of
Enhanced ER (EER) model called - specialization/generalization
- aggregation
- composition.
- A diagrammatic technique for displaying
specialization/generalization, aggregation, and
composition in an EER diagram using UML.
3Why An Enhanced Entity-Relationship Model?
- More demanding requirements
- CAD
- GIS
- OIS
- More complex applications.
- Need for additional semantic modeling concepts.
4The Enhanced Entity-Relationship (ERR) Model
- Examples of additional concepts of EER model are
- specialization / generalization
- aggregation
- composition.
5Specialization / Generalization
- Superclass
- An entity type that includes one or more distinct
subgroupings of its occurrences. - Subclass
- A distinct subgrouping of occurrences of an
entity type.
6Specialization / Generalization
- Superclass/subclass relationship is one-to-one
(11). - Superclass may contain overlapping or distinct
subclasses. - Not all members of a superclass need be a member
of a subclass.
7Specialization / Generalization
- Subclass Attribute Inheritance
- An entity inherits all the attributes of its
superclass - An entity also has subclass specific attributes
8Specialization / Generalization
- Specialization
- Top-Down approach
- Begin with a Superclass
- Identify differences between members of an entity
by examining their distinguishing attributes - Group distinguishing attributes in subclasses
- Generalization
- Bottom-Up Approach
- Begin with entities (subclass candidates)
- Identify attributes that are common between
subclass - Group like attributes in a superclass
9AllStaff Relation Holding Details of all Staff
Specialization
10Specialization/Generalization of Staff Entity
into Subclasses Representing Job Roles
11Specialization/Generalization of Staff Entity
into Job Roles and Contracts of Employment
12EER Diagram with Shared Subclass and Subclass
with its own Subclass
13Constraints on Specialization / Generalization
- Participation constraint
- Determines whether every member in superclass
must participate as a member of a subclass. - May be mandatory or optional.
- Disjoint constraint
- Describes relationship between members of the
subclasses - indicates whether member of a superclass can be a
member of one, or more than one, subclass. - May be disjoint (or) or nondisjoint (and)
14DreamHome Worked Example - Staff Superclass with
Supervisor and Manager Subclasses
15DreamHome Worked Example - Owner Superclass with
PrivateOwner and BusinessOwner Subclasses
16DreamHome Worked Example - Person Superclass with
Staff, PrivateOwner, and Client Subclasses
17EER Diagram of Branch View of DreamHome with
Specialization/Generalization
18Aggregation / Composition
- Semantic notation only
- Aggregation
- Represents a has-a or is-part-of relationship
between entity types, where one represents the
whole and the other the part. - Composition
- Specific form of aggregation that represents an
association between entities, where there is a
strong ownership and coincidental lifetime
between the whole and the part
19Examples of Aggregation/Composition
Aggregation
Composition
20Chapter 13
21Normalization
- Determine good design of database model
- Accurate representation of data, its
relationships, and constraints. - To achieve this objective, must identify a
suitable set of relations.
22Normalization
- Four most commonly used normal forms
- first (1NF)
- second (2NF)
- third (3NF) normal forms
- BoyceCodd normal form (BCNF).
- Based on functional dependencies among the
attributes of a relation. - A relation can be normalized to a specific form
to prevent possible occurrence of update
anomalies.
23Data Redundancy
- Major aim of relational database design
- minimize data redundancy
- reduce file storage space
- Problems associated with data redundancy are
- Update anomalies
- Insert anomalies
- Deletion anomalies
24Data Redundancy
Redundant Data
25Lossless-join and Dependency Preservation
Properties
- Decomposing original relation into multiple
relations creates a model that avoids update
anomalies - Two important properties of decomposition
- - Lossless-join property enables us to
- Recreate the original relation joining instances
of the smaller relations. - - Dependency preservation property enables us to
- enforcing a constraint on original relation by
enforcing some constraint on each of the smaller
relations.
26Functional Dependency
- Main concept associated with normalization.
- Functional Dependency
- Describes relationship between attributes in a
relation - If A and B are attributes of relation R, B is
functionally dependent on A (denoted A ? B) - if each value of A in R is associated with
exactly one value of B in R.
27Functional Dependency
- Property of the meaning (or semantics) of the
attributes in a relation. - Diagrammatic representation
- Determinant of a functional dependency refers to
attribute or group of attributes on left-hand
side of the arrow.
28Example - Functional Dependency
29Functional Dependency
- A property of a relational schema (intension)
not - A property of a particular instance of the schema
(extension)
30Functional Dependency
- Main characteristics of functional dependencies
used in normalization - have a 11 relationship between attribute(s) on
left and right-hand side of a dependency - hold for all time
- are nontrivial.
31Functional Dependency
- Complete set of functional dependencies for a
given relation can be very large. - Important to find an approach that can reduce set
to a manageable size. - Need to identify a set of functional dependencies
on (X) that is - smaller than complete set of functional
dependencies (Y) - has property that every functional dependency in
Y is implied by functional dependencies in X.
32Functional Dependency
- Set of all functional dependencies implied by a
given set of functional dependencies X called
closure of X (written X). - Set of inference rules, called Armstrongs
axioms, specifies how new functional dependencies
can be inferred from given ones.
33Functional Dependency
- Let A, B, and C be subsets of the attributes of
relation R. Armstrongs axioms are as
follows 1. Reflexivity - If B is a subset of A, then A B
- 2. Augmentation
- If A B, then A,C B,C
- 3. Transitivity
- If A B and B C, then A C
- Complete - through application of axioms on X all
functional dependencies of Y can be derived - Sound no additional dependencies can be derived
that are not implied by X
34Normalization
- Formal technique for analyzing a relation
- Based on primary key and functional dependencies
- Series of Steps
- With each successive step the relation gets more
restricted
35First Normal Form (1NF)
- A relation in which intersection of each row and
column contains one and only one value. - 1NF a relation that contains no repeating groups
36UNF to 1NF
- Nominate an attribute or group of attributes to
act as the key for the unnormalized table. - Identify repeating group(s) in unnormalized table
which repeats for the key attribute(s). - Remove repeating group by
- entering appropriate data into the empty columns
of rows containing repeating data (flattening
the table). - Or by
- placing repeating data along with copy of the
original key attribute(s) into a separate
relation.
37Second Normal Form (2NF)
- Based on concept of full functional dependency
- A and B are attributes of a relation,
- B is fully dependent on A if
- B is functionally dependent on A but not on any
proper subset of A. - 2NF - A relation that is in 1NF and every
non-primary-key attribute is fully functionally
dependent on the primary key. - Relations with a simple key are automatically in
2NF - Need to concentrate on relations with composite
keys
381NF to 2NF
- Identify primary key for the 1NF relation.
- Identify functional dependencies in the relation.
- If partial dependencies exist on the primary key
remove them by placing them in a new relation
along with copy of their determinant.
39Third Normal Form (3NF)
- Based on concept of transitive dependency
- A, B and C are attributes of a relation such that
if A ? B and B ? C, - then C is transitively dependent on A through B.
(Provided that A is not functionally dependent on
B or C). - 3NF - A relation that is in 1NF and 2NF and in
which no non-primary-key attribute is
transitively dependent on the primary key. - No non-key attribute is a determinant for for
another non-key attribute
402NF to 3NF
- Identify the primary key in the 2NF relation.
- Identify functional dependencies in the relation.
- If transitive dependencies exist on the primary
key remove them by placing them in a new relation
along with copy of their determinant.
41General Definitions of 2NF and 3NF
- Second normal form (2NF)
- A relation that is in 1NF and every
non-primary-key attribute is fully functionally
dependent on any candidate key. - Third normal form (3NF)
- A relation that is in 1NF and 2NF and in which no
non-primary-key attribute is transitively
dependent on any candidate key.
42BoyceCodd Normal Form (BCNF)
- Based on functional dependencies that take into
account all candidate keys in a relation, however
BCNF also has additional constraints compared
with general definition of 3NF. - BCNF - A relation is in BCNF if and only if every
determinant is a candidate key.
43BoyceCodd normal form (BCNF)
- Difference between 3NF and BCNF is that for a
functional dependency A ? B, 3NF allows this
dependency in a relation if B is a primary-key
attribute and A is not a candidate key. - Whereas, BCNF insists that for this dependency to
remain in a relation, A must be a candidate key. - Every relation in BCNF is also in 3NF. However,
relation in 3NF may not be in BCNF.
44BoyceCodd normal form (BCNF)
- Violation of BCNF is quite rare.
- Potential to violate BCNF may occur in a relation
that - contains two (or more) composite candidate keys
- the candidate keys overlap (i.e. have at least
one attribute in common).
45Review of Normalization (UNF to BCNF)
46Review of Normalization (UNF to BCNF)
47Review of Normalization (UNF to BCNF)
48Review of Normalization (UNF to BCNF)
49Fourth Normal Form (4NF)
- Although BCNF removes anomalies due to functional
dependencies, another type of dependency called a
multi-valued dependency (MVD) can also cause data
redundancy. - Possible existence of MVDs in a relation is due
to 1NF and can result in data redundancy.
50Fourth Normal Form (4NF) - MVD
- Dependency between attributes (for example, A, B,
and C) in a relation, such that for each value of
A there is a set of values for B and a set of
values for C. However, set of values for B and C
are independent of each other. - MVD between attributes A, B, and C in a relation
using the following notation - A ¾¾ØØ B
- A ¾¾ØØ C
-
51Fourth Normal Form (4NF)
- MVD can be further defined as being trivial or
nontrivial. - MVD A ¾¾ØØ B in relation R is defined as
being trivial if - (a) B is a subset of A or
- (b) A ? B R.
- MVD is defined as being nontrivial if neither
(a) nor (b) are satisfied. - Trivial MVD does not specify a constraint on a
relation, while a nontrivial MVD does specify a
constraint.
52Fourth Normal Form (4NF)
- Defined as a relation that is in BCNF and
contains no nontrivial MVDs.
534NF - Example
54Fifth Normal Form (5NF)
- A relation decomposed into two relations must
have lossless-join property, which ensures that
no spurious tuples are generated when relations
are reunited through a natural join. - However, there are requirements to decompose a
relation into more than two relations. - Although rare, these cases are managed by join
dependency and fifth normal form (5NF).
55Fifth Normal Form (5NF)
- A relation that has no join dependency.
565NF - Example
57Chapter 14
- Methodology -
- Conceptual Database Design
58Design Methodology
- Structured approach that uses procedures,
techniques, tools, and documentation aids to
support and facilitate the process of design. - Database design methodology has 3 main phases
- Conceptual database design
- Logical database design
- Physical database design.
59Database Design
- Conceptual database design
- Process of constructing a model of information
used in an enterprise, independent of all
physical considerations. - Logical database design
- Process of constructing a model of information
used in an enterprise based on a specific data
model (e.g. relational), but independent of a
particular DBMS and other physical
considerations.
- Physical database design
- Process of producing a description of the
implementation of the database on secondary
storage it describes the base relations, file
organizations, and indexes design used to achieve
efficient access to the data, and any associated
integrity constraints and security measures.
60Critical Success Factors in Database Design
- Work interactively with users as much as
possible. - Follow a structured methodology throughout the
data modeling process. - Employ a data-driven approach.
- Incorporate structural and integrity
considerations into the data models. - Combine conceptualization, normalization, and
transaction validation techniques into the data
modeling methodology.
61Critical Success Factors in Database Design
- Use diagrams to represent as much of the data
models as possible. - Use a Database Design Language (DBDL) to
represent additional data semantics. - Build a data dictionary to supplement the data
model diagrams. - Be willing to repeat steps.
62Methodology Overview - Conceptual Database Design
- Step 1 Build local conceptual data model for
each user view - Step 1.1 Identify entity types
- To identify the main entity types that are
required by the view - Step 1.2 Identify relationship types
- To identify the important relationships that
exist between the entity types that have been
identified - Step 1.3 Identify and associate attributes with
entity or relationship types - To identify and associate attributes with the
appropriate entity or relationship types and
document the details of each attribute.
63Methodology Overview - Conceptual Database Design
(cont)
- Step 1 Build local conceptual data model for
each user view - Step 1.4 Determine attribute domains
- To determine domains for the attributes in the
local conceptual model and document the details
of each domain - Step 1.5 Determine candidate and primary key
attributes - To identify the candidate key(s) for each entity
and if there is more than one candidate key, to
choose one to be the primary key - Step 1.6 Consider use of enhanced modeling
concepts (optional step) - To consider the use of enhanced modeling
concepts, such as specialization /
generalization, aggregation, and composition
64Methodology Overview - Conceptual Database Design
(cont)
- Step 1 Build local conceptual data model for
each user view - Step 1.7 Check model for redundancy
- To check for the presence of any redundancy in
the model. - Step 1.8 Validate local conceptual model
against user transactions - To ensure that the local conceptual model
supports the transactions required by the view - Step 1.9 Review local conceptual data model
with user - To review the local conceptual data model with
the user to ensure that the model is a true
representation of the users view of the
enterprise
65Extract from Data Dictionary for Staff View of
DreamHome Showing Description of Entities
66First-cut ER diagram for Staff View of DreamHome
67Extract from Data Dictionary for Staff View of
DreamHome Showing Description of Relationships
68Extract from Data Dictionary for Staff View of
DreamHome Showing Description of Attributes
69ER Diagram for Staff View of DreamHome with
Primary Keys Added
70Revised ER Diagram for Staff View of DreamHome
with Specialization / Generalization
71Example of a Non-Redundant Relationship FatherOf
72Using Pathways to Check that the Conceptual Model
Supports the User Transactions
73(No Transcript)