Title: Relational Data Analysis
1Relational Data Analysis
- Stems from work of Ted Codd in early 1970s
- Led to relational database theory
- Detailed mathematical theory
- Relational data analysis for database design
- Bottom-up technique based on analysing the
inter-relationships between attributes
Logical Data
Relational Data
Structuring
Analysis
Enhanced
Logical Data
Model
2Relations (also known as Tables )
3Properties of Relations
- Rows
- show occurrences of Patient
- each row must be uniquely identifiable
- order of rows must NOT be significant
- Columns
- column name attribute type
- each value attribute occurrence
- order of columns must not be significant
- only one value should be associated with each
column/row intersection in the table - Domain
- pool of possible values from which the actual
values appearing in the columns of the table are
drawn - e.g. domain of Patient Numbers includes all of
the possible Patient Numbers, not just the ones
currently in hospital. - important for comparing values from different
tables
4Normalised Relations Objective
- The organisation of a system's attributes into a
set of - well normalised relations.
- Well normalised relations avoid
- unnecessary duplication of data
- i.e. no redundant data
- problems with modifying, inserting and deleting
data - N.B. sometimes referred to as the update
anomalies
5Stages of Normalisation
- Normalised takes place in stages
- Each stage is known as a normal form
- Each stage is a development from the previous
stage
Un-Normalised Form
First Normal Form
Second Normal Form
Third Normal Form
6Sample Source of Data
DRUG CARD
DRUG CARD
Patient No.
Surname
Forename
109
Foot
Ivor
Patient No.
Surname
Forename
923
Moneybags
Maurice
Ward No.
Ward Name
Ward No.
Ward Name
11
Fleming
10
Barnard
Drugs Prescribed
Drugs Prescribed
Length of
Length of
Date
Drug Code
Drug Name
Dosage
Date
Drug Code
Drug Name
Dosage
Treatment
Treatment
2 pills 3 x day after meals
2 pills 3 x day
15/5/88
AS473A
7 days
Aspirin
20/5/88
CO2355P
Cortisone
14 days
after meals
Injection every 4 hours
20/5/88
VA231M
Valium
2 per day
5
20/5/88
MO3416T
Morphine
5
MO3416T
Morphine
Injection
25/5/88
3
every 8 hours
26/5/88
PE8694N
Penicillin
1 pill 3 x day
7
for additional drugs continue on another card
for additional drugs continue on another card
7Steps in Normalisation - 1Un-normalised Form
- Represent the data in un-normalised form and pick
a key. - Column headings (attribute names) should be
meaningful - Choice of key attribute/attributes
- must be unique for the particular data source
- may require two or more attributes
- e.g. Invoice No. and Supplier, because
different suppliers may send invoices having the
same number - use smallest combination of attributes possible
- avoid textual keys wherever possible
8Data Re-organised from Drug Cards
Prescn
Length
Fore
Wd
Ward
Pat
Surname
Drug Code
Drug Name
Dosage
Date
Trtmnt
-name
No.
Name
No.
2 pills 3 x
20/5/88
CO2355P
Cortisone
14
Barnard
10
Maurice
Moneybags
923
day after
meals
Injection
20/5/88
MO3416T
Morphine
5
every 4 hours
MO3416T
Morphine
Injection
25/5/88
3
every 8 hours
26/5/88
PE8694N
Penicillin
1 pill 3 x day
7
2 pills 3 x
15/5/88
AS473A
7
Aspirin
Fleming
11
Ivor
Foot
109
day after
meals
20/5/88
Valium
2 per day
5
VA231M
- Patient Number is chosen as primary key and
underlined
9Steps in Normalisation - 2First Normal Form
- Remove any repeating groups of ATTRIBUTES to
separate relations - Pick keys for any relations identified
Length
Prescn
Pat No.
Drug Code
Drug Name
Dosage
Date
Trtmnt
20/5/88
CO2355P
Cortisone
14
923
2 pills 3 x day after meals
20/5/88
MO3416T
Morphine
Injection
5
923
every 4 hours
25/5/88
MO3416T
Morphine
Injection
3
923
every 8 hours
26/5/88
PE8694N
Penicillin
1 pill 3 x day
7
923
15/5/88
AS473A
7
Aspirin
109
2 pills 3 x day after meals
20/5/88
Valium
2 per day
5
109
VA231M
10Repeating Groups
- Any attribute or group of attributes that may
occur with multiple values for a single value of
the primary key attribute. - Several values of Drug Code, Drug Name,
Prescription Date, Dosage and Treatment Length
for one value of Patient Number - Remove to new relation
- Key of new relation will be a compound key
involving several attributes - key identified in Step 1 (Patient No.)
- further items required to make each row in the
new relation unique (Drug Code and Prescription
Date )
11Non-repeating Attributes
Pat
Surname
Fore
Wd
Ward
No.
-name
No.
Name
Barnard
10
Maurice
Moneybags
923
Fleming
11
Ivor
Foot
109
- Attributes that do not repeat remain as a relation
12Functional Dependency
- For any two attributes A and B, A is dependent
on B if and only if - for a given value of B there is precisely one
associated value of A at any one time. - e.g. Patient Surname is totally dependent on
Patient Number because each patient is given a
unique Patient Number - Another way of describing this is to say that
- Attribute B determines attribute A
- i.e. Patient Number determines Patient Surname
- NB The opposite is not trueFor a given value
of Patient Surname, say Moneybags, there may be
several Patient Numbers, as there may be
several patients called Moneybags in the hospital
together. - Dependency can also occur with groups of
attributes.
13Steps in Normalisation - 3Second Normal Form
- Remove ATTRIBUTES that only depend upon part of
the key to separate relations - Only applies to compound key relations
- NB. Others are already in Second Normal Form
- Thus
- Patient Number, Prescription Date, and Drug Code
in combination determine attributes Dosage and
Treatment Length - However
- Drug Name is only dependent on part of the key
(Drug Code) and can be removed from the relation - Drug Code and Drug Name form a new relation with
Drug Code as the key.
14Development to Second Normal Form
Length
Prescn
Drug Code
Drug Name
Dosage
Pat No.
Drug Code
Trtmnt
Date
2 pills 3 x
20/5/88
CO2355P
Cortisone
14
923
CO2355P
day after
meals
20/5/88
MO3416T
Morphine
Injection
5
923
MO3416T
every 4 hours
Penicillin
PE8694N
25/5/88
MO3416T
Injection
3
923
every 8 hours
Aspirin
AS473A
26/5/88
PE8694N
1 pill 3 x day
7
923
Valium
VA231M
2 pills 3 x
15/5/88
AS473A
7
109
day after
meals
20/5/88
2 per day
5
109
VA231M
15Steps in Normalisation - 4Third Normal Form
- Remove any ATTRIBUTES not directly and fully
dependent on the key to separate relations. - Create a separate relation for attributes which
can be uniquely determined by any other attribute
in the relation - The key attribute in the separate relation must
uniquely determine the remaining attributes in a
one-to-one relationship
Pat
Surname
Forename
Ward
Ward
No.
No.
Name
Barnard
10
Maurice
Moneybags
923
Fleming
11
Ivor
Foot
109
16Example in Third Normal Form
- Patient No. does not uniquely determine Ward No.
or Ward Name, because the patient might be in
different wards at different times. - Ward No uniquely determines Ward Name, and these
attributes can therefore be removed to a separate
relation. - Ward No becomes a foreign key in the Patient
relation.
17Full Set of Well Normalised Relations
Lgth
Presc
Pat
Wd
Ward
Drug Name
Drug Code
Dosage
Drug Code
Treat
Date
No.
No.
Name
2 pills 3 x
Cortisone
CO2355P
20/5/88
CO2355P
923
14
day after
Barnard
10
meals
Morphine
MO3416T
20/5/88
MO3416T
Injection
923
5
Penicillin
PE8694N
Fleming
11
every 4 hours
Aspirin
AS473A
25/5/88
MO3416T
Injection
923
3
every 8 hours
VA231M
Valium
26/5/88
PE8694N
1 pill 3 x day
923
7
Pat
Surname
Forename
Wd
No
No.
2 pills 3 x
15/5/88
AS473A
109
7
day after
Maurice
Moneybags
923
10
meals
20/5/88
2 per day
109
VA231M
5
Ivor
Foot
109
11
18SSADM Notation
19Relations as a Logical Data Structure
Ward
Wd No
Ward Name
Patient
Barnard
10
Pat No
Surname
Forename
Wd No
Fleming
11
Maurice
Moneybags
923
10
Ivor
Foot
109
11
Prescription
Drug Code
Trt Lgth
Prescr Date
Dosage
Pat No
2 pills 3 x day after meals
20/5/88
CO2355P
923
14
Injection
20/5/88
MO3416T
923
5
every 4 hours
Injection
25/5/88
MO3416T
923
3
every 8 hours
26/5/88
PE8694N
1 pill 3 x day
923
7
2 pills 3 x day
15/5/88
AS473A
109
7
after meals
20/5/88
2 per day
109
VA231M
5