Relational Database M 358

About This Presentation

Title:

Relational Database M 358

Description:

... on B and B is functionally dependent on A, then C is ... [offending FD: in a transitive dependency A B and B C implying A C, B C is the offending FD] ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 35

Provided by: Ange325

Category:

Tags: bb | database | relational

more less

Transcript and Presenter's Notes

Title: Relational Database M 358

1
Relational Database (M 358)

Handout 7
Tang Kwok Wa, Billy

2
Block 4 Database Development
This block concerns
Establish Requirement
data requirements
Data analysis (ER Modeling, Normalization)
Conceptual data model
Database design
Logical schema
Implementation
Schema and database
3
Establishing Data Requirements

Data requirements is normally a document written
in natural language.
Documenting data requirements should
enable the identification of subjects
enable the recognition of relevant connections
between subjects together with the properties of
the relations
Problems of ambiguity in data requirements

4
Data Analysis Process - Entity-Relation Modeling

Entity-Relation Modeling
Model is for communication between people, but
not for computers.
Not influenced by design issues. The following
issues are not relevant
data format (e.g. length of strings)
data volume (e.g. how many occurrences of
students)
data usage (what processing required)
data access (how data may be retrieved
efficiently)
Good ER model should
represent all users data requirements -
completeness
have no duplication - integrity
include all constraints - integrity
be general (not restricted by the particular
initial requirements) - flexibility
understandable (simple, with no unnecessary
complexity) - usability

5
Data Analysis Process - Entity-Relation Modeling

Entity Types
Represents things having a meaning in a given
context and with a need to record data.
Defines the properties common to a collection of
entities - e.g. Student
How to look for the entity type? In the
University data requirement, there are words like
information about each student is initially
recorded at registration. So one would think
each student would be represented by an entity
with some information to be recorded.
Student is the subject being modeled. To identify
subjects
exclude terms that only refer to data about those
subjects, such as course code or identification
number.
exclude nouns that refer to processes, such as
registration which refers to the process of
initially recording a students details.

6
Data Analysis Process - Entity-Relation Modeling

Relationships
Relationship name
Assign the relationship name in one to many
direction
E.g
staff - counsels - student, not student -
IsCounselled By - staff.
Remember
In E-R model, the relationship counsel is simply
represented by the line connected between Staff
and Student.
The attribute counsellor_no, being a foreign key
in Student referencing to Staff for the Counsel
relationship in Relational model, is not an
attribute in the Student entity type of the E-R
model

Student(student_id, name, registration, region)
Staff (staff_no, name, region)
7
Data Analysis Process - Entity-Relation Modeling

Relationships
mn relationship
mn relationship is decomposed into two 1n
relationships with an entity type introduced in
between them, unless there is absolutely no
additional data to be recorded by the
relationship.
the participation conditions of the introduced
entity type in the decomposition are mandatory.
It is meaningless for any occurrence exists in
this entity type without relationship to both.
identifiers from two related entity types are
combined to give a composite identifier for the
intersection entity type
Inclusive and exclusive relationships
two relationships are inclusive if it is required
that every occurrence of the common entity type
participates either in both relationships or in
neither
two relationships are exclusive if it is required
that no occurrence of common entity type
participates in both relationships.

8
Data Analysis Process - Entity-Relation Modeling

Attributes
According to data requirements, one may find
specification of data directly associated with
the subject. E.g. each student would be recorded
with his/her name, then student name will be one
of the attributes for the student entity type.
Attributes should be assigned to avoid duplicated
data
E.g. Each student should be given his
counsellors telephone number for communication
if counsellor telephone numbers are recorded in
Student entity type, forming
student(student_id, counsellor_telno)
then, counsellor_telno will have repeated values
in this student entity type since a counsellor
would counsel many students. Thus
counsellor_telno should be recorded in staff
entity type.

9
Data Analysis Process - Entity-Relation Modeling

Attributes
Complex values
A staff has more than one qualifications If we
just introduce a Qualification attribute in Staff
entity to cater for this requirement, then a
staff will be multivalued when the staff has more
than one qualification such as BSc, MSc and PhD
Resolved by 1n relationship.
Derived data
having values based on some processing deriving
from other data
should not be included in an E-R model because it
is a kind of redundancy
occasionally exists in a data model, either
because its derivation is difficult to infer, or
because it is important for the data to be made
explicit. (should include a constraint to
prevent the inconsistency that may result from
duplication.)

Obtained
Staff
Qualifications
Qualifications(StudentId, Degree) (5324,
Bsc) (5324, MSc) (5324, PhD)
10
Normalization

A process to assess and improve the quality of
relational data model by identifying duplication
in a table and reorganizing the data into two or
more tables, without duplication or loss of
information.
Used in data analysis stage during the
development of database.

Problem of a table not in normalized form
Amending a value may cause changes in many tuples
( in order to maintain consistency of the data)
If GPId g1 changes the name from Leila to John,
then rows 1, 2, 3, 9, 10 have to be changed

11
Normalization

Single-Valued Fact (SVF)
is a statement of expressing the meaning of
attribute in a relation, it associates one
attribute with exactly one another attribute.
E.g. a general practitioner has a name
It is interpreted as a general practitioner
identified by general
practitioner identifier has (exactly) a name
Having known such SVF, we know that in a
relation GeneralPractitioner,
GPID as the primary key determines exactly one
non-primary key attribute GPname
Every non-primary-key attribute in a relation
(e.g. GPName) becomes a single-valued fact about
the primary key (e.g. GPID).
Properties of single-valued fact
Occurrences of a SVF will be recorded just once.
4 occurrences in the above example
The first occurrence of this SVF is the
general practitioner identified by g1 has a name
Leila
The second occurrence of this SVF is the
general practitioner identified by g2 has a name
Ftizdare..
It is not reversible. A general practitioner has
one name - it is not true that a name is
associated with just one general practitioner.

relation GeneralPractitioner
GPId GPName g1 Leila g2 Fitzdare g3 von B g4 von B
12
Normalization

Functional Dependencies
Functional dependency is - Within a relation R,
attribute B is functionally dependent on
attribute A if and only if each value of A
determines precisely one value of B
same as what single-value fact expresses.
SVF a general practitioner identified by general
practitioner identifier (GPID) has a name can be
expressed in a short form as functional
dependency using the notation below
GPId ? GPName
(it means that GPId determines precisely one
value of GPName)
GPId is said to be a determinant
Given a relation R, B is fully functionally
dependent on A if it is functionally dependent on
A and not functionally dependent on any subset of
A
E.g Appointment(PatientId, ApptDate, ApptTime,
ConsId)
PatientId, ApptDate, ApptTime ? ConsId (not
full functional dependency)
PatientId, ApptDate ? ConsId (full
functional dependency)

13
Normalization Process

Normalization process
first normal form - gt second normal form - gt
third normal form - gt Boyce-Codd Normal Form
Starting point of normalization process
Given a relation in first normal form
Given a collection of Single Value Facts (or
Functional Dependencies)

ExtendedAppointment (PatientId, PatientName,
GPId, GPName, ApptDate, ApptTime, ConsId,
ConsName, HospNo, HospName)
14
Normalization Process

First Normal Form (1NF)
A relation is in 1NF if and only if every
non-primary key attribute is functionally
dependent (not necessarily fully functionally
dependent) on the primary key.
Any relation is in 1NF, since a primary value
will identify a tuple and thus determine exactly
one value of each of the non-primary attributes.
ExtendedAppointment is in 1NF because it is a
relation.

15
Normalization Process

Second Normal Form (2NF)
A relation is in 2NF if and only if it is in 1NF
and every non-primary key attribute is fully
functionally dependent on the primary key.
Need to know all the FDs holds in the relation.
Identify additional determinant from the given
relation. (FD10 -FD17)
Add Transitivity FDs
if PatientId ? GPId and GPId ? GPName then
PatientId ? GPName also holds
Add Augmentation FDs
if A ? B holds, then A,Z ? B, Z also holds.
E.g. PatientId, ApptDate ? ConsId (FD 5)
? PatientId, ApptDate ? ConsId, ApptDate
ConsId, ApptDate ? HospNo (FD 9)
therefore PatientId, ApptDate ? HospNo (FD
15)

16
Normalization Process

Whether a relation is not in 2NF
Examine whether non-primary key attributes are
fully functionally dependent on primary key
In ExtendedAppointment, FD2, FD3 and FD10,
PatientId ? PatientName, PatientId ? GPId,
PatientId ? GPName. PatientId is only subset of
primary key (which is PatientId, ApptDate).
Therefore ExtendedAppointment is not in 2NF.
Putting a relation into 2NF
ExtendedAppointment (PatientId, PatientName,
GPId, GPName, ApptDate, ApptTime, ConsId,
ConsName, HospNo, HospName)
Separate the original relation into two
Put the subset of primary key together with its
determined attributes into a separate relation.
Patient2(PatientId , PatientName, GPId, GPName)
Put the primary key together with those
attributes not determined by above into another
relation
PatientAppointment2 (PatientId, ApptDate,
ApptTime, ConsId, ConsName, HospNo, HospName)

17
Normalization Process
1NF
FD 2, 3, 10 PatientId ? PatientName PatientId
? GPId PatientId ? GPName
2NF
PatientAppointment2 (PatientId, ApptDate,
ApptTime, ConsId, ConsName, HospNo, HospName)
PatientA2 (PatientId, PatientName, GPId, GPName)
Producing 1n relationship through decomposition
of original relation, with the common names
(PatientId) as the foreign key
18
Normalization Process

Third Normal Form (3NF)
A relation is in 3NF if and only if it is in 2NF
and no non-primary key attribute is transitively
dependent on the primary key.
Transitive dependency is
Within a relation, where A, B and C are each an
attribute or combination of attributes with no
attributes in common, if C is functionally
dependent on B and B is functionally dependent on
A, then C is transitively dependent on A via B,
provided that A is not functionally dependent on
B or C.
Putting relations into 3NF
to move the attributes of the offending FDs, into
separate relation (or relations if determinant
differ)
offending FD in a transitive dependency A ? B
and B ? C implying A ?C, B ?C is the offending
FD
In PatientA2 (PatientId, PatientName, GPId,
GPName), GPName is transitive dependent of
PatientId because of PatientId ? GPId and GPId
? GPName
thus GPId, GPName should be moved to a new
relation (see in the following page)
remove the right-hand side attributes from
original relation
create new relation for each different
determinant that occurs in the offending FDs.
primary key of the new relation(s) is the
determinant of the offending FDs used to
construct it
primary key of the relation from which the
attributes have been removed is unchanged

19
Normalization Process
2NF
PatientA2
PatientAppointment2
3NF
PatientA3

FD7,8 ConsId ? ConsName HospNo ?
HospName
ConsultantA3
PatientAppointmentA3

HospitalA3
GeneralPractitionerA3
20
Normalization Process

Putting Third Normal Form to Boyce-Codd Normal
Form (BCNF)
A relation is in BCNF if and only if every
determinant is a candidate key.
So, in BCNF, all the non-primary attributes can
only be determined from the candidate keys
See if any 3NF in which a determinant is not
candidate key. If there is such determinant,
open new relation for it.
Whether a 3NF is BCNF, checking by
only one candidate key (that is, there are no
alternate keys)
or if more than one candidate key, then
the candidate keys are not combinations of
attributes
or if the candidate keys are combination of
attributes, then
the candidate keys do not overlap

21
Normalization Process
ConsId, ApptDate ? HospNo in
PatientAppointmentA3 (FD 9), but ConsId,
ApptDate cannot determine ApptTime and ConsId,
ApptDate cannot determine PatientID, thus
ConsId, ApptDate is not a candidate key in
PatientAppointmentA3
PatientAppointmentA3
ConsultantA3
PatientA3
3NF
HospitalA3
GeneralPractitionerA3

PatientAppointmentBCNF
General PractitionerA3
ConsultantVisitBCNF
HospitalA3
ConsultantA3
ConsultantVisitBCNF
PatientA3
PatientAppointmentBCNF
22
Normalization Process

Limitation of Normalization
it only produce relations of 1n relationships,
results in no redundancy
relations of 11 relationship cannot be produced
by normalization.
no participation conditions are considered
no constraint conditions are considered.

23
Database Design
Use a relational conceptual data model to give a
set of tables for initial database design
Do the tables represent the data in an acceptable
way according to the given criteria for
usability, efficiency, and so on?
Yes
No

Define constraints of each table
columns
primary key
foreign keys
constraints

Revise table
Revise if necessary
Implementation
24
Defining Column in SQL

Numeric data type
Range
Integer 2147483647
Smallint 32767
Decimal numbers
precision number of decimal digits
scale number of digits after the decimal point
E.g cast( input_no as decimal(5,2)) as output_no
if input_no 999.123456 output_no 999.12
Metadata (data about data) - any data that is
required to interpret other data as meaningful
information.
E.g. A numeric column Amount showing x.xx
would mean HK x.xx K. The actual meaning of
Amount would be interpreted as in scaling
factor of HK000 (metadata)

25
Defining Column in SQL

Character data type
VARCHAR (variable length)
CHAR (fixed length, adding spaces so that string
of the specified length is stored)
(Infomaker only supports VARCHAR. CHAR is
implemented as VARCHAR)
Date-time data types
DATE
TIME
TIMESTAMP (combination of date and time)
Default values
Real values as default e.g. default is current
year., or
Non real value as default, e.g. 9999 as default
for integer column. May cause error in queries,
e.g. avg()

26
Defining Column in SQL

Using Codes
e.g. HK representing Hong Kong
Advantage less space, less typing mistakes
Disadvantage Not immediately understandable
(may need to link with a full name table)
SQL domains not in same way of relational
domains
comparison of two SQL columns requires only that
they are comparable data type, but not same
domain.
SQL foreign keys and operations are not
constrained by domains as they are for relations.

27
Defining Table in SQL

Defining primary key
Choose primary key according to uniqueness,
minimality and not being null.
Using surrogate - a field with distinguish values
for primary key. It has no meaning at all, just
generate by a DBMS to distinguish different row.
E.g. a numeric column as DEFAULT AUTOINCREMENT
will automatically generate a new and unique
value for the column when a new row is added.
Defining constraints in SQL
primary key - expressed as PRIMARY KEY
alternate key - expressed as UNIQUE
referential constraints FOREIGN KEY
mandatory participation condition - NOT NULL for
foreign keys, or using CHECK clause
Constraints part of the model - expressed as a
CHECK clause
Considerations in using CHECK
Inefficient processing if CHECK statement is
complicate.
Every update involves processing a CHECK, causing
unacceptable delay.

28
Revising Tables

Avoiding Null
E.g. Student( StudentId, Name, Telephone_no)
Because some students might not have telephone,
telephone_no column will be null in some rows.
To avoid null, can use two tables
Student( StudentId, Name) Telephone(StudentId,
Telephone_no)
However, query might become complex as it will
then involve join.
Denormalization
Deliberately combining normalized tables into
one, in order to achieve efficient data retrieval
and simple query.
Beware duplicated data after denormalization.

29
Additional Data

Derived data - values that are obtained by
processing other data.
To maintain exact consistency of derived data, it
should be updated every time the data used in the
derivation is changed (can be done by trigger)
Can also be done by periodic updating of
student_count by a separate application process
(sacrifice some data consistency but relief
processing burden)
Defining Extra Tables because of
snapshot data (e.g. to make an annual report)
summary data for users
historical record (for auditing purpose)

30
Database Implementation

Indexes - fast data access method to support
efficient retrieval of data.
It involves a mechanism that associates a value
in a column of a table with the stored locations
of all the rows which contain that value.
When a query is executed in which an indexed
column is referenced in the WHERE clause, it can
use the index to provide the stored locations of
the particular rows that satisfy the WHERE
condition, thus enable direct access to the
required data.
Need extra space for index file
Index must be updated every time when there is
change in the values in the indexed column
Need to balance against the possible benefits of
an index in deciding whether one is appropriate
for a given column(s)

31
Database Implementation

Establishing a database
Ways of populating the tables of a new database
by import of old tables
prepares an application program that reads in the
old data, transforms it to appropriate format and
then inserts it into the new table.
Backup (for data recovery when necessary)
continuous copying of all changes to a database
in a log file
periodic unloading of database to make a backup
copy

32
Logical Processing Sequential Processing
(quicker)
Find the best plan for executing SQL statements
FROM get the required table store as the 1st
intermediate table
FROM get a row of the required table
next row
WHERE get the 1st intermediate table retain the
rows that satisfy the where clause store as 2nd
int. table
WHERE retain the rows that satisfy the where
clause
SELECT get the 2nd intermediate table retain the
columns required produce the final table
SELECT retain the columns required
1 disk access
5 disk accesses
33
Find the best plan for executing SQL statements
Using Index

When will index become ineffective?
For small table, single disk access would
retrieve the whole table. Index will not help
much
With many rows spreading in each blocks, most of
the blocks are likely to be accessed, so using an
index to identify the required blocks may be
slower than simply retrieving every block for
sequential plan of processing of the whole table.
Any conditions in the WHERE clause not involving
the indexed column would need to be processed
separately