Title: Views
1Database Design - 1
2Levels of Abstraction
- User Reports are physical, tangible artefacts,
e.g forms - User Views describe the data that the system must
capture and are derived from current and future
system inputs / outputs - Conceptual Schema defines logical structure
- Internal Schema describes how data is physically
stored (tables, indexes, etc.) - Levels of abstraction give data independence
3Example Purchase Order
4Example Purchase Order
- User View
- Purchase Order Order Number
- Date
- Supplier Name
- Supplier Address
- Delivery Address
- Currency
- Line Item
-
Quantity -
Description of Goods - Unit
Cost - Total
Cost - VAT-exclusive
Total - VAT
- VAT-inclusive
Total - Authorised by
- Account Codes
5Example Purchase Order
6Stages of Database Design
- Conceptual Design
- The process of constructing a model of the
information used in an enterprise, independent of
all physical considerations - Logical Design
- The mapping of the conceptual design to a
specific model of data representation (for
example, the relational model) but independent of
a particular DBMS - With the relational model, this involves
normalisation, decomposition of MM
relationships, selection of primary and foreign
keys, and specification of attributes, domains
and integrity constraints - Physical Design
- The process of implementing the logical design
using a specific DBMS technology for example,
ORACLE
7E-R Modeling
- Entity-Relationship (E-R) modelling is a popular
technique for conceptual and logical database
design - E-R Diagrams are the diagrammatic representation
of E-R Models - There are a variety of E-R diagramming notations.
The most common are the Chen notation, and the
crows foot (Barker) notation, and variants
thereof
8Entities and Attributes
- An entity is something about which the
organisation needs to record data for example,
Student, Course, Lecturer, Degree, etc. - An instance is a single occurrence of an entity
for example Michael Lang is an instance of
Lecturer, Commerce is an instance of Degree - Entities are described by a set of attributes
for example, a Student may be defined by (Student
ID, Name, Address, Date of Birth) - Attributes have domains. A domain is a set of
allowable values for example, Student Mark must
be an integer between 1 and 100, Date of Birth
must be a date no later than 17 years ago,
Address must be a sequence of no more than 80
alphanumeric characters
9Identifiers
- Every instance of an entity must be uniquely
identified - An identifier (primary key) can be a single
attribute, or a combination of two or more
attributes (composite identifier) - An identifier can be invented if there is no
suitable or naturally occurring candidate
attribute(s) - The chosen identifier should be
- minimal - choose the fewest columns necessary
- stable - choose a key that seldom changes
- simple / familiar - something that is easy to
remember - not null - there must always be a value (in the
case of composite identifiers, all component
attributes must have a value)
10Identifiers Examples
- Employee Staff_Number Name
Telephone_Extension Email RSI_Number - What should be the key ?
- Name is not guaranteed to be unique. Besides,
there are other issues, such as misspellings
(Michael/Micheal, Lang/Laing/Lange/Long), case
sensitivity (UPPER/lower), and variations
(Thomas/Tom/Tommy) - Telephone_Extension is probably unique in 95 of
cases. However, note the exceptions of shared
offices, and employees without telephones - Email is unique. However, not everyone has email.
Also, email addresses tend to frequently change - RSI_Number is unique. However, its not easy to
remember - Therefore Staff_Number is the best choice. Its
unique, easy to remember, stable, and everyone
has one
11Relationships
- One-to-One Relationship (11)
- Two entities are participants in a 11
relationship if, for any instance of either
entity there is at most one instance of the
other. Very rare in actuality - One-to-Many Relationship (1M)
- Two entities are participants in a 1M
relationship if, for every instance of the first
entity there are zero or more instances of the
second. Vice versa for M1. 1M relationships
subsume 11 relationships - Many-to-Many Relationship (MM)
- Two entities are participants in a MM
relationship if, for every instance of either
entity there are zero or more instances of the
other
12Relationships
One-to-One
One-to-Many
Many-to-One
Many-to-Many
13Synonymous Terms
- Terminology may be confusing for end-users /
student analysts - Roughly speaking, there are a number of
synonymous terms which equate to each other,
depending on the stage of design - Files Entities / Objects Tables /
Relations - Records Instances Rows / Tuples
- Fields Attributes Columns
- Identifiers Primary Keys
14E-R Modeling
- Chen Notation
- Crows Foot Notation
ENTITY A
ENTITY B
15One-to-One Relationships
DEPARTMENT
EMPLOYEE
workplace of
dept_name
emp_no
work in
dept_floor
emp_fname
dept_phone
emp_salary
managed by
boss of
- Where do you put the foreign keys ?
16One-to-One Relationships
- This second representation is better
17One-to-Many Relationships
- Implemented in relational database model by the
inclusion of a foreign key at the Many side of
the relationship
DEGREE
STUDENT
18Relationship Hierarchies
- Occur very frequently in practice typical
example organisational hierarchy - Multiple 1M relationships
- If misused, can lead to inflexible models
DEPART- MENT
DIVISION
SECTION
FIRM
19Many-to-Many Relationships
- Relational databases cannot implement MM
relationships directly - Must create a third entity, a so-called
intersection or associative entity - This new entity can have separate attributes of
its own - Example
- A student attends many courses a course may be
attended by many students - Student Student_ID Name Address DOB
- Course Course_Code Course_Title ECTS_Units
- Enrollment Student_ID Course_Code
Days_Absent
20Many-to-Many Relationships
BORROWER
BOOK
LOAN
callno
loandate
borrowerid
ISBN
borrowerid
booktitle
callno
duedate
- Assumptions
- Need to keep records of loans, even after book is
returned - A borrower will not borrow a book, return it, and
borrow it again all on the same day. Therefore
every loan can be uniquely identified by
callnoloandateborrowerid
21Recursive Relationships
- A recursive relationship relates an entity to
itself - 1M
- The entity gets an additional column for the
foreign key - Need a name different from the primary key
EMP
empno
empfname
empsalary
boss of
employee
22Recursive Relationships
23E-R Models Examples
- In-class worked example
- Ollscoil na Coiribe is constituted of seven
Faculties. Each Faculty consists of a number of
Departments. There are many Lecturers working
within each of these Departments. A Lecturer may
teach one or more Courses. A Course may be
simultaneously attended by Students pursuing
different Programmes of education. - A Student is registered for a Programme within a
Faculty. Programmes consist of a number of
Courses. A Course will not normally be available
unless at least ten students are registered to
present for examination in the current session.
Students are required to attend a number of
obligatory Courses, and may opt for additional
Courses. All Courses have a unique Course Code
and are allocated a weighting in Units. - It is also necessary to record Venue Reservations
for the purposes of assembling a timetable. This
consists of Course Code, Class Group, Day, Time
and Venue. Two sets of Venue Reservations exist,
one for each semester. - At the end of each semester, there are
Examinations. Each student is allocated an
Examination Number and must present for
examination in all Courses for which he/she has
registered for examination in that session. There
are three examination sessions every year
Semester 1, Semester 2, and Autumn. Examination
marks are recorded for all students, and are
maintained on a historical basis.
24Database Design - 2
25Entity Types
- Strong (Independent)
- Weak (Dependent)
- Subordinate (Sub-type)
- Intersection / Associative
26Strong and Weak Entities
- An instance of a Strong Entity can exist
independently - An instance of a Weak Entity can only exist if
there is a related instance of some other entity - Example A Degree may be taken by zero or more
Students A Student must register for a Degree
27Intersection Entities
- A decomposition of a single Many-to-Many
relationship into multiple One-to-Many /
Many-to-One relationships - At the very least, its attributes will consist of
the combination of the identifiers of the
entities which participated in the Many-to-Many
relationship - It can have additional attributes of its own
- Naming intersection entities
- Concatenate entity names if there is no obvious
name - Example Railway Company - A train stops at many
stations a station is serviced by many trains
28Intersection Entities
- For example, a train leaves Galway at 930AM
every Monday with final destination Dublin - We need to know what stations it stops at
- but we also need to know what time it departs
each station, and from what platform (attributes
of the relationship)
29Intersection Entities
- The solution is to create an intersection entity
- (In this example, dep_stn, arr_stn, and stop_stn
are all foreign keys linked to stn_code)
30Subordinate Entities
- Consider the following example
- There are 800 employees in an organisation, 600
of which receive hourly wages, and 200 of which
receive salaries - Of the 200 salaried employees, 50 are field
service engineers - The RSI Number, Taxation Bracket, Name, Address,
Telephone Extension, Email Address, Department,
and Job Title of all employees is recorded in a
database - Additionally, for waged employees, the Hourly
Wage and Hours Worked is recorded - For salaried employees, the Weekly Salary is
recorded - For the field service engineers, Company Car
Registration Number and Mobile Phone are recorded - How should this be modeled ? ...
31Subordinate Entities
EMPLOYEE RSI_Number Taxation_Bracket Name Addres
s Telephone_Ext Email Dept Job_Title Hourly_Rate H
rs_Worked Weekly_Salary Car_Reg_Num Mobile_Phone_N
um
- Possibility 1
- Model it as a single entity
- Problem this is inefficient, wasteful, and
confusing
32Subordinate Entities
EMPLOYEE RSI_Number Taxation_Bracket Name Address
Telephone_Ext Email Dept Job_Title WAGED_EMP Hou
rly_Rate Hrs_Worked SALARIED_EMP Weekly_Salary F
IELD_SVC_ENG Car_Reg_Num Mobile_Phone_Num
- Possibility 2
- Model it using subordinate entities
33Subordinate Entities
WAGED_EMP RSI_Number Hourly_Rate Hrs_Worked
EMPLOYEE RSI_Number Taxation_Bracket Name Addres
s Telephone_Ext Email Dept Job_Title
SALARIED_EMP RSI_Number Weekly_Salary
- Possibility 3
- use One-to-One relationships
- may be exclusive either-or relationships
- may be mandatory-to-mandatory or
optional-to-mandatory
FIELD_SVC_ENG RSI_Number Car_Reg_Num Mobile_Phon
e_Num
34What is a Good Data Model ?
- Basic construction rules should be obeyed
- There should be no ambiguity
- All entities, attributes, relationships, and
identifiers are defined and labeled - Names are meaningful to the client and to the
developer - Beware of homonyms(same word/different meanings)
and synonyms (different words/same meaning) - The model should be faithful to reality
- Accurately describes the world it is supposed to
represent - Relationships are of the correct degree
(cardinality) - Data model is complete, understandable, precise,
and at the right level of detail ?
35What is a Good Data Model ?
- The model should be as simple as possible
- There should be no redundant or duplicated data
- The model should be flexible and adaptable to
future needs - Have all exceptions been considered ?
36Normalisation
- Normalisation is a technique for simplifying the
design of a relational database so that it
achieves the optimum structure with minimal
redundancy - The normal forms are a linear progression of
rules applied to database design, with each
higher normal form achieving a better, more
efficient design - The normal forms are
- Un-normalised
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
- In this course, we shall concern ourselves only
with progression to 3NF
37Relationships of Attributes
- one-to-one
- a value of an attribute determines the value of
another attribute and vice versa that is, A ? B
and B ? A - for example
- IRL ? Republic of Ireland, Republic of Ireland
? IRL - Only occurs when both attributes are unique
- one-to-many
- a value of one attribute determines the value of
another attribute but not vice versa that is, A
? B but not B ? A - for example
- United States of America ? Dollar
- Dollar not ? United States of America (could be
Canada ? Hong Kong ? Australia ? Bahamas ? New
Zealand ? Singapore ? etc..)
38Relationships of Attributes
- many-to-many
- neither attribute determines the other
- A not ? B
- B not ? A
- for example
- country name not ? language (in Ireland, official
languages are Irish and English are spoken) - language not ? country name (English is an
official language of Ireland, United Kingdom,
United States of America, etc.)
39Functional Dependencies
- A functional dependency is a relationship between
attributes in a relation, whereby an attribute
(or combination of attributes) determine the
value of another attribute - Attribute B is said to be functionally dependent
on attribute A if, for every unique value of A in
the database, there exists no more than one
correlated value of B - Otherwise put, attribute A is said to be the
determinant of attribute B, A ? B - If the determinant is a single-attribute, those
attributes which depend on it are said to be
fully functionally dependent - If the determinant is a combination of attributes
(for example, a composite identifier), an
attribute is not fully functionally dependent
upon it only a part of the combination is
required to determine it. This is also called a
partial dependency
40First Normal Form (1NF)
- For a relation (table) to be in 1NF, it must
satisfy the following conditions - All rows must have the same number of columns
- All values are atomic otherwise put, there are
no multi-valued attributes / repeating groups
(at most one value for each row-column
intersection) - There can be no duplicate rows that is, every
row must have a unique primary key (identifier) - Every non-key attribute must be functionally
dependent upon the primary key - The rows and columns are not necessarily in any
particular order
41Multi-valued Attributes
- What really is a multi-valued attribute ?
- For example, an Items attribute of a Sales
Invoice which contains values such as 50 Exhaust
Pipes _at_ 120/each, 80 Hubcaps _at_ 30/each, 60
Towbars _at_ 55/each, 90 Wing Mirrors _at_ 45/each,
... is multi-valued (a repeating group) - but consider a typical address, such as
Milltown Road, Tuam, County Galway, Ireland - Should this be stored as a single Address
attribute, or as Street Address, Town, County,
Region, Country, Postcode ? - If there is a requirement to access parts of the
address separately, represent it as separate
single-valued attributes - Examples a mail merge, or demographic analysis
such as How many customers do we have in each
country ? - If there is no such current or foreseeable
requirement, a single Address attribute is
acceptable
42Multi-valued Attributes
- How are multi-valued attributes / repeating
groups resolved ? - Consider a Students Exam Result Sheet with
following structure - Student ID
- Student_Name
- Student_Address
- Module_Code Module_Description Mark
- Average
- Grade
- Transform to 1NF by placing repeating groups into
new relation - STUDENT Student ID Student_Name
Student_Address - Average Grade
- EXAM_RESULT Student ID Module_Code
-
Module_Description Mark
431NF Anomalies
- 1NF is inefficient as it can give rise to a
variety of possible anomalies - Insertion Anomaly
- bear in mind the rule of entity integrity it
is not allowable for any part of the primary key
of a record to be null - in the case of composite identifiers, it is
insufficient to have a value in one of the
component columns but no value in the other(s) - insertion anomalies arise whereby data cannot be
inserted because a complete primary key is not
available - Deletion Anomaly
- this is the inverse of the insertion anomaly
data that should be kept must be deleted because
part of the primary key has become null
441NF Anomalies
- Update Anomaly
- this arises because of redundancy (duplicated
data) the same data may be stored in two or more
records - if that data is modified, all occurrences of it
must be modified, otherwise integrity is breached
and data is unsynchronised / contradictory
451NF Anomalies
- Primary Key Student_ID Module_Code
- Insertion Anomaly cannot create new Module
unless a Student has registered to take an exam
in that Module - Deletion Anomaly If there were a Module for
which only 1 student was registered, and if we
were to delete that record, we would
inadvertently delete details of the Module - Update Anomaly what if the Module_Name
corresponding to an Module_Code were to change ?
Many records might be affected all would have to
be altered (duplication / redundancy)
46Second Normal Form (2NF)
- The next step of data analysis is to transform
relations into 2NF. - Relations must already satisfy rules of 1NF
- 2NF looks for an anomaly called a partial
dependency, meaning an attribute(s) whose value
is determined by only part of the primary key - all non-key attributes must be fully functionally
dependent on the primary key - Relations that have a single-column primary key
are already in 2NF - Only those relations that have a composite
(concatenated) primary key need to be checked
47Second Normal Form (2NF)
- To transform a relation from 1NF to 2NF,
- identify all functional dependencies in 1NF
- Make each determinant the primary key of a new
relation - Place all attributes that are functionally
dependent on a given determinant alongside that
determinant as non-key attributes in the
newly-created relation of which that determinant
is the primary key - ????Confused???? see worked examples to follow
...
482NF Example
- The above relation is in 1NF. What are the
functional dependencies ? - Student_ID Module_Code ? Mark
- Module_Code ? Module_Name
- In 2NF, we therefore have two relations
- EXAM_RESULT Student_ID Module_Code Mark
- MODULE Module_Code Module_Name
49Third Normal Form (3NF)
- Entities are assumed to be in 2NF before
beginning 3NF analysis. - 3NF analysis looks for two types of problems,
derived data and transitive dependencies. - In both cases, the fundamental error is that
non-key attributes are dependent on other non-key
attributes - Derived attributes are those whose values can
either be calculated from other attributes, or
derived through logic from the values of other
attributes - A transitive dependency exists when a non-key
attribute is dependent on another non-key
attribute (other than by derivation) - This error usually indicates that an undiscovered
relation is still embedded within the problem
relation
503NF Example
- Consider the following scenario
- The Health Safety Officer of an organisation of
300 employees is scheduling an obligatory 1-day
Seminar in Ergonomics. The only available venue
is capable of holding 50 employees, hence 6
different sessions are necessary. An employee
will only have to attend one of these sessions. - The H S Officer has created a simple database
to help her make arrangements. It contains a
SEMINAR table with attributes Employee_ID
Session_Num Date Time Venue - What is wrong with this structure ?
- it gives rise to insertion, update, and delete
anomalies in much the same way as 1NF
513NF Example
- continued
- SEMINAR Employee_ID Session_Num Date Time
Venue - Insertion Anomaly details of a session cannot be
recorded until at least one employee is assigned
to that session - Deletion Anomaly if there were only a single
employee assigned so far to a session, and if
that record were to be deleted (say, if the
employee left the company), then data pertaining
to the session would be inadvertently lost - Update Anomaly if 20 employees were assigned to
a session, and if that session needed to be
rescheduled, all 20 records would have to be
updated, else integrity would be breached - Problem arises because of transitive dependency
- Employee_ID ? Session_Num ? Date, Time, Venue
523NF Example
- continued
- SEMINAR Employee_ID Session_Num Date Time
Venue - Resolution split SEMINAR into two relations
- SESSION Session_Num Date Time Venue
- EMP_ASSIGNMENT Employee_ID Session_Num
- (Note In SESSION table, combination of Date
Time Venue is unique because otherwise you
could have a double-booking ! Session_Num is
chosen as primary key in preference to this
combination because of the principle of
minimality i.e. least number of columns)
53Third Normal Form (3NF)
- An entity is in third normal form if every
non-key attribute is dependent on the primary
key, the whole primary key, and nothing but the
primary key
54Example Invoice
55Example Invoice
- Assumptions
- The price of an item is separately agreed for
each invoice at the time of the sale - VAT rates fluctuate over time, and different
rates may apply to different products - Take a short-cut by ignoring calculated fields
(Line Total, Invoice Total) - First Normal Form
- INVOICE Invoice_Num Date Cust_Num
Cust_Name Cust_Address - INVOICE_LINE Invoice_Num Item_Code
Item_Description Qty Unit_Price VAT_Rate
56Example Invoice
- Second Normal Form
- No change to INVOICE relation as it has a
single-column primary key - In 1NF, INVOICE_LINE Invoice_Num Item_Code
Item_Description Qty Unit_Price VAT_Rate - Item Code ? Item Description
- Invoice Num Item Code ? Qty, Unit_Price,
VAT_Rate - We therefore end up with three relations in 2NF
- INVOICE Invoice_Num Date Cust_Num
Cust_Name Cust_Address - INVOICE_LINE Invoice Num Item Code Qty
Unit_Price VAT_Rate - INVENTORY Item Code Item Description
57Example Invoice
- Third Normal Form
- There is a transitive dependency in INVOICE
- Invoice Num ? Cust Num ? Cust Name, Address
- This is resolved by splitting INVOICE relation
- INVOICE Invoice_Num Date Cust_Num
- CUSTOMER Cust_Num Cust_Name Cust_Address
58Physical Database Design
- After logical design comes physical design, which
is concerned with implementation aspects such as - selection of data types
- distribution / partition of data
- capacity planning
- database workloads / optimisation of queries
- access methods and indexing
- security checks
- fine tuning of performance