Title: Data Design
1IMS2805 - Systems Design and Implementation
2References
- HOFFER, J.A., GEORGE, J.F. and VALACICH (2002)
3rd ed., Modern Systems Analysis and Design,
Prentice-Hall, New Jersey, Chap 12 - HOFFER, J.A., GEORGE, J.F. and VALACICH (2005)
4th ed., Modern Systems Analysis and Design,
Prentice-Hall, New Jersey, Chap 10 - WHITTEN, J.L., BENTLEY, L.D. and DITTMAN, K.C.
(2001) 5th ed., Systems Analysis and Design
Methods, Irwin/McGraw-Hill, New York, NY.
Chapter 12
3Data Design
- Logical data design
- Logical data modelling
- Physical file and database design
4Overview of logical data modelling
- Logical data modelling involves
- collecting detailed attributes for each entity
and relationship identified - converting ER models to relations
- normalising the relations
- merging relations from each user viewpoint
- converting the normalised and merged relations to
create a data structure diagram
5Data Design
- The data structures developed in the logical
system design phase need to be converted to
physical data structures for the physical design
stage. - A knowledge of the relevant data base management
system (DBMS) is important for this phase. - The aim is to develop a compromise between the
ideal (most flexible) structure developed
during normalisation and the most efficient
(fastest, cheapest) structure preferred for
implementation.
6Physical Data Design
- Information required
- Normalised relations, including volume estimates
- Definitions of each attribute
- Descriptions of where and when data are used
(CRUD) - Expectations/requirements for response time and
data integrity - Descriptions of technologies to be used for
implementing the files and databases
7Physical Data Design
- Key decisions
- The storage format (data type) for each
attribute, e.g. length,coding scheme, decimal
places, range of values - Grouping of attributes into physical records
(data structure) - Arranging records in secondary memory for
storage, retrieval, update (file organisation) - Selection of media and structures for efficient
access - DBMS technologies require selection of options
for retrieval and for implementation of
relationships between entities
8The relational database model
- The relational database model represents data in
the form of tables or relations - Important concepts are
- relation
- primary key
- foreign key
- functional dependency
- normalisation
9Normalisation
- Normalisation is a process for converting complex
data structures into simple, stable data
structures in the form of relations. - Data models consisting of normalised relations
- are robust and stable
- have minimum redundancy
- are flexible
- are technology-independent (logical)
10Normalisation
- Normalisation ensures that each attribute is
attached to the appropriate relation - i.e. each attribute is contained in the relation
which represents the real world system object or
concept that the attribute describes or is a
property of - e.g. the attribute Student-name should be in the
relation STUDENT which represents the real world
object student of interest to a student records
system
11Normalisation
- Normalisation was originally developed as part of
relational database theory by E.F. Codd (1970) - Normalisation is accomplished in stages, each of
which corresponds to a normal form - Originally, Codd defined first, second and third
normal forms. Third normal form is adequate for
most business applications. - Later extensions include Boyce-Codd, 4th, 5th and
domain-key normal forms.
12Relation
- A relation is a named, two-dimensional table of
data - Each relation consists of a set of named columns
and an arbitrary number of rows - Each column corresponds to an attribute of the
relation - Each row corresponds to an instance (or record)
that contains values for that instance
13Example relation
- A relation generally corresponds to some real
world object or concept of interest to the system
(similar to an entity) e.g.
Employee
Emp
Name
Salary
Dept
1247
Adams
24000
Finance
1982
Smith
27000
MIS
9314
Jones
33000
Finance
Employee (Emp, Name, Salary, Dept)
14Properties of relations
- Relational tables are tables in which
- data values are atomic (single-valued)
- data values in columns are from the same domain
- each row in the relation is unique
- the sequence of columns is insignificant
- the sequence of rows is insignificant
15Primary key
- An attribute or group of attributes which
uniquely identifies a row of a relation - Entity integrity (relational data base theory)
requires that each relation has a non-null
primary key - Where several possible keys are identified, they
are known as candidate keys - choose one to be
the primary key
E.g.
Employee (Emp, Name, Salary, Dept) Order-item
(Order, Item, Qty-ordered)
16Foreign key
- A foreign key is an attribute in one relation
that is also a primary key in another relation - The referential integrity constraint (relational
database theory) specifies that if an attribute
value exists in one relation then it must also
exist in a linked relation - A foreign key must satisfy referential integrity
17Foreign key example
- In the example below, if a given Dept exists in
an Employee relation then that Dept must exist
in the Department relation
Employee (Emp, Name, Salary, Dept)
foreign key
Department (Dept, Dname, Budget)
18Functional dependency
- A functional dependency is a particular
relationship between attributes in a relation - For any relation R, attribute B is functionally
dependent on attribute A if each value of A has
only ONE value of B associated with it, - i.e. if for every valid instance of A, that
value of A uniquely determines the value of B
A identifies B A B Emp
Emp-name Emp Salary
19Steps in normalisation
- Normalisation to third normal form is
accomplished in 3 steps each corresponding to a
basic normal form - A normal form is a state of a relation that can
be determined by applying simple rules concerning
dependencies within that relation - Each step of the normalisation process is applied
to a single relation in sequence so that the
relation is converted to third normal form
20Steps in normalisation
Unnormalised table
Remove repeating groups
First Normal Form
Remove partial dependencies
Second Normal Form
Remove transitive dependencies
Third Normal Form
21Well Structured Relations
- A well structured relation contains a minimum
amount of redundancy and allows users to insert,
modify, and delete rows in a table without errors
or inconsistencies (known as anomalies) - Three types of anomaly are possible
- insertion
- deletion
- Modification
- Third normal form relations are considered to be
well structured relations
22First normal form (1 NF)
- A relation is in first normal form if it contains
no repeating data - the value of the data at the intersection of
each row and column must be single-valued - Remove any repeating groups of attributes to
convert a relation to 1 NF (key of the removed
group is a composite key)
Order (Order, Customer, (Item, Desc,
Qty)) Order-Item (Order, Item, Desc, Qty)
Order (Order, Customer)
23Second normal form (2 NF)
- A relation is in 2 NF if it is in 1 NF and every
non-ket attribute is fully functionally dependent
on the primary key - A partial dependency exists if one or more
non-key attributes are dependent on only part of
a composite primary key - Remove any partial dependencies to convert a 1 NF
- relation to 2 NF
- If the primary key consists of only one attribute
or there are no non-key attributes, then a 1 NF
relation is automatically in 2 NF
24Second normal form (2 NF)
- Remove Partial Dependencies
- a non-key attribute cannot be identified by part
of a composite key
- Order (Order, Item, Desc, Qty-ordered)
- Order-Item (Order, Item, Qty-ordered)
- Item (Item, Desc)
25Partial dependency anomalies
Order-Item
Order
Item
Qty
Item-desc
2
27
873
nut
1
28
402
bolt
10
28
873
nut
50
30
495
washer
- UPDATE change item -desc in many places
- DELETE data for last item lost when last order
for that item is deleted - CREATE cannot add new item until it is ordered
26The solution for these anomalies 2 NF
Order
delete last order for item, but item remains
Order
Item
Qty
27
873
2
28
402
1
28
873
10
30
495
50
Item
add new item at any time
Item
Desc
873
nut
change item description in one place only
402
bolt
495
washer
27Dependencies within the primary key
- If a relation has a composite key, it must be
checked for dependencies within the key to
determine whether they should be retained e.g. - DEPT (Dept, Dept-name, (Emp, Emp-name))
- DEPT (Dept, Dept-name), DEPT-EMP (Dept, Emp,
Emp-name) - But DEPT-EMP (Dept, Emp, Emp-name)
- Remove Dept from the key
- EMP (Emp ,Emp-name, Dept)
28Third normal form (3 NF)
- A relation is in 3 NF if it is in 2 NF and no
transitive dependencies exist - A transitive dependency is a functional
dependency between two or more non-key attributes - Remove any transitive dependencies to convert a 2
NF relation to 3 NF
29Third normal form
- Remove Transitive Dependencies
- a non-key attribute cannot be identified
- by another non-key attribute
Employee (Emp, Ename, Dept, Dname) Employee
(Emp, Ename, Dept) Department( Dept, Dname)
(look for foreign keys and their attributes)
30Transitive Dependency Anomalies
Employee
Emp
Emp-name
Dept
Dname
10
Smith
D5
MIS
20
Jones
D7
Finance
25
Smith
D7
Finance
30
Black
D8
Sales
- UPDATE change dept name in many places
- DELETE data for dept lost when last employee
for that dept is deleted - CREATE cannot add new dept until an employee is
allocated to it
31The solution for these anomalies 3 NF
delete last emp in dept, but dept remains
Employee
Emp
Ename
Dept
10
Smith
D5
20
Jones
D7
25
Smith
D7
30
Black
D8
Item
add new dept at any time
Dept
Dname
D5
MIS
D7
Finance
change dept name in one place
D8
Sales
32Normalisation to 3 NF
- A relation is normalised if all attributes are
fully - functionally dependent on the primary key
Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
33Transforming ER Diagrams into Relations
- Transforming an ER diagram into normalised
relations, and then merging all the relations
into one final, consolidated set of relations can
be accomplished in four steps -
- 1. Represent entities as relations
- 2. Represent relationships as relations
- 3. Normalise each relation
- 4. Merge the relations
34Representing Entities as Relations
- Each entity in the ER model is transformed into a
relation e.g. - becomes
CUSTOMER
Customer (Customer-no, Name, Address, City,
State, Postcode, Discount)
35Representing Relationships as Relations
- 1. Binary Relationships (1N, 11)
places
CUSTOMER
ORDER
Customer (Customer-no, Name, Address, City,
State, Postcode, Discount) Order (Order-no,
Order-date, Promised-date, Customer-no)
36Representing Relationships as Relations
- 1. Binary Relationships (1N, 11)
- For 1M, add the primary key of the entity on
the one side of the relationship as a foreign
key in the relation that is on the many side - For 11 relationship involving entities A and
B, choose from - add the primary key of A as a foreign key of B
- add the primary key of B as a foreign key of A
- both of the above
37Representing Relationships as Relations
- 2. Binary and Higher Degree Relationships
(MN)
PRODUCT
ORDER
requests
Where we wish to know the quantity of each
product on each order, i.e. this attribute is an
attribute of the relationship requests/requested
by
Order (Order-no, Order-date, Promised-date) Order
Line (Order-no, Product-no, Quantity-ordered) Prod
uct (Product-no, description, (other attributes))
38Representing Relationships as Relations
- 2. Binary and Higher Degree Relationships
(MN) - For MN, first create a relation for each for
each of the entity types, and then create a
relation for the relationship, with a composite
primary key formed from the primary keys of the
participating entity types.
39Representing Relationships as Relations
manages
Has component
EMPLOYEE
ITEM
Reports to
Is a component of
(1N)
(MN)
Item (Item-no, Name, Cost) Item-Bill (Item-no,
Component-no, Quantity)
Employee (Emp-id, Name, Birthdate,
Manager-id)
40Representing Relationships as Relations
- 4. IS-A Relationship (Class-Subclass or
generalisation)
PROPERTY
BEACH PROPERTY
MOUNTAIN PROPERTY
Property (Street-address, City-state-postcode,
No-rooms, Typical-rent) Beach (Street-address,
City-state-postcode , Distance-to-beach) Mountain
(Street-address, City-state-postcode , Skiing)
41Normalisation of relations merging relations
- During the normalisation process two or more
relations with the same primary key may appear - The set of 3NF relations must not contain any
duplicate data - Relations with the same primary key should be
merged
42Normalisation of relations
- Synonyms
- Two or more attributes may have different names
but the same meaning - Either adopt one of the names as a standard or
choose a third name - STUDENT1 (Student-id, Name, Phone-no)
- STUDENT2 (VCE-no, Name, Address)
- STUDENT (Student-id, Name, Address, Phone-no)
43Normalisation of relations
- Homonyms
- Two or more attributes may have the same name but
different meanings - To resolve the conflict, new attribute names need
to be created - STUDENT1 (Student-id, Name, Address)
- STUDENT2 (Student-id , Name, Phone-no, Address)
- STUDENT (Student-id, Name, Phone-no,
Campus-address, Permanent-address )
44Normalisation of relations
- When two 3NF relations are merged, transitive
dependencies may result e.g. - STUDENT1 (Student-id, Major)
- STUDENT2 (Student-id , Advisor)
- STUDENT (Student-id, Major, Advisor)
- BUT MAJOR ADVISOR
- STUDENT1 (Student-id, Major)
- MAJOR (Major , Advisor)
45Data Structure Diagrams
- A set of 3 NF relations may be converted to a
simple diagrammatic form to begin physical
database design - The conversion is simple
- 1. Draw a named rectangle for each relation
- 2. Draw a relationship line between rectangles
linked by foreign keys with a many
cardinality at the foreign key end of the
relationship
46Data Structure Diagram example
- CUSTOMER (Cust, Cname,Phone number)
- SALES ORDER (Sord, Sord-date, Cust)
- SALES ORDER-ITEM (Sord, Item, Qty)
- ITEM (Item, Item-desc)
CUSTOMER
ITEM
SALES ORDER
SALES ORDER LINE
47Data Structure Diagram example
- Eliminate Redundant Relationships
TOUR
(Tourcode, ...)
DEPARTURE
(Tourcode, depdate, ...)
redundant
(Booking, ... , Tourcode, depdate)
BOOKING
48Relational Database Design
Logical Process Model Designer Added
Procedures Frequency Response
Req. Constraints
Logical Data Model Designer Added Data Volume
Sizing Info Constraints
Logical Database Design Logical Transaction
Model
Target Platform Specific Design
49Database Design Objectives
Structure (Stability / Adaptability) Preserve
correspondence to normalised data model Preserve
data integrity
Performance (Speed and Resource
Utilisation) Transaction response
requirements Minimise secondary storage
requirements Minimise transfers between primary
and secondary storage
50Database Design - Logical Design
- Logical Design
- Designer added entity types and attribute types
- Review primary and foreign keys
- Review derived and aggregate data
- Define table and column structure
- Establish row and column ratios
- Define transaction access maps
- Review transaction access maps considering
performance - Develop composite load map
- Evaluate design in terms of corporate standards,
response requirements and resource utilisation.
51Database Design - Physical Design
- Selection of table access paths including indexes
and file organisation. - Selection of data types and sizes
- Selection of other DBMS dependent variables
- page size,
- free space management option,
- data and index clustering,
- compression,
- concurrency control issues,
- database sizing - fill for tables,
- buffer sizes,
- access control constraints.
52Designer Added Data
- It is often necessary to extend the logical data
model with extra entity types, attribute types
and / or relationship types to reflect design and
implementation decisions. - Additional data may be required to support
- archiving
- back-up
- access control
- restart and recovery
- audit and integrity requirements
- system housekeeping functions and the storing of
historical information
53Example Designer Added Data
- Adding entities for the purpose of archiving
Has Assigned
Project
Employee
Archived Project
Archived Employee
Last Used Empno
54Convert Logical Data Model
- Relations become tables.
- Attributes become columns
- a number of decisions must be made when mapping
attributes to columns. - Domains must be defined and linked to the
appropriate table columns. (if domains are
supported by the DBMS) -
55Primary Keys
- Some guidelines regarding primary keys
- Primary keys identified in the logical data model
may not always be suitable for implementation. - The length of each attribute forming the primary
key should be as short as possible. Avoid long
character fields. - Keep the use of composite keys to a minimum.
- Surrogate keys may be used to reduce these
problems. Sometimes these keys may be hidden
from the users. - Compact unique identifiers are particularly
suitable as primary and therefore foreign keys. - No component of a primary key can accept null
values, and the uniqueness property must be
enforced.
56Transaction Analysis
- A technique to analyse the data access loads
generated by each transaction type. - This allows alternative approaches and their
impact on data access loads to be considered. - The loads generated by each transaction type may
be aggregated to calculate the total access
loads on the database, and this information is
useful in identifying heavily loaded structures
of the database.
57Statistics Must be Representative
- To properly evaluate a design it may be necessary
to calculate several sets of performance
statistics. - Transaction loads may vary widely from peak to
average. - Database size may vary widely during application
execution particularly if cumulative data is
held. - It is much safer to perform design calculations
using peak loads, rather than 'average' loads. - Planned equipment utilisation should never exceed
70.
58Transaction Analysis Technique
- Define the cardinality of each table, and the
ratios between table rows. - Develop transaction access maps for each
transaction type showing the sequence in which
each table is accessed and the basis of each
access. - Use table cardinality and ratio information to
determine the number of rows of each table that
each transaction type will access.
59Transaction Analysis Technique
- Aggregate these details to produce total access
loads for each table in the database. - The individual loads for each transaction type
and the composite load map should be carefully
examined for opportunities to improve the
efficiency of the system. - Great care needs to be taken to balance
flexibility against performance, when considering
amendments to transaction access paths.
60Transaction Analysis Technique
- The following data model fragment will be used
for examples. The numbers relate to - table cardinality (eg Order table 900)
- table ratios (eg Order to Orderline ratio 110)
61Table Cardinality Row Ratios
- Item (Item, Description, Price,
Quantity-on-hand) - 1000 records with 5 growth pa
- Order-line (Order, Item, Qty-ordered,
Qty-shipped, Unit-price) - 9000 records with 7 growth pa,
- Ratios ( 1 Item 9 Order-lines), (1 Order 10
Order-lines) - Order ( Order, Ord-date, etc)
- 900 records with 7 growth pa,
- Next Order (Order)
- 1 record with no growth
- Each Item has on average 9 Order-lines
- Each Order has on average 10 Order-lines
62Transaction Access Path Maps
- Developed from action diagrams for elementary
processes. For each transaction, or at least the
most critical, it is necessary to identify - Each table access,
- The sequence of the access within the
transaction, - The type of access C. R. U. D.(create, read,
update, delete) - The columns and predicate of the search condition
for the CRUD access - Whether the predicate values come from other
tables accessed by the same transaction, - The columns required as a result of this access,
- The number of accesses per transaction,
- The number of accesses per period.
63Example Transaction Access Path Map
3
1 2
4 5
6
64Example Transaction Access Path Map
(1000)
Item
1 2
Assumption 70 of enquiries are by Item, 30
by description (Desc)
65Example Transaction Access Path Map
1.
3.
2.
66Ad Hoc Queries
- Complicate the situation, because they are
difficult to quantify and predict in advance. - Some controls on when ad hoc queries are
permitted may be necessary. - Some DBMS allow the cost of a query to be
calculated and the user may be forewarned or
prevented from executing an expensive query. - Data may be extracted from the production
database during quiet times and loaded onto
another database on a different machine to
minimise the impact of ad hoc query.
67Analysis of the Transaction Access Path Map
- The access path for each transaction should be
reviewed for efficiency. - A number of techniques may be used to reduce the
cost of transactions. - Alteration of aggregation time.
- Alteration of column derivation time.
- Data Replication
- Combining and splitting tables
- Introducing codes simplify identification and
compress data
68Developing the Composite Load Map
- Summarise the information from each transaction
access map to show the total number of accesses
by table, access type, table columns used for
access and the access predicate. - This will highlight possible keys and indexes,
row sequencing requirements and the columns used
to join tables. - Ideally the production of the composite load map
should be automated to allow for automatic
re-calculation of totals following changes to
transaction access maps.
69Example Composite Load Map
70Interpreting the Composite Load Map
- The composite load map is designed to highlight
table access types that have a very high
frequency. - One logical input-output operation does not
necessarily result in one physical input-output
operation. Buffering techniques, indexes, and
logging complicate the actual number of physical
I/O operations. - The transactions contributing to these high
levels of access should be closely scrutinised to
determine whether the number of I/O accesses
could be reduced by altering their data access
strategy. - The composite load map also provides a base for
determining where entry point access methods such
as primary and secondary indexes may provide
substantial benefit.
71Interpreting the Composite Load Map
- Where the composite load map shows high join
frequencies between tables, control of table row
placement, or the use of table access methods
such as hashing techniques or indexes may be used
to optimise join operations. - Where a table has a high frequency of sequential
access, physical ordering of table rows, or the
creation of an index to support the access could
be useful. - Tables with a small number of rows could be held
in main storage, if this option is supported by
the DBMS. - The actual access path taken will depend on the
DBMS query optimiser.
72Compromising the Logical Model
- Given the limitations of some DBMS it may be
necessary to compromise the logical data model
for performance reasons. - In general this should be hidden from application
logic by the use of special access routines that
are responsible for mapping the normalised
application view to the de-normalised storage
view. - Some DBMS specific design techniques may allow a
normalised application view to be maintained.
73Transaction Design
- Some transactions may implement designer added
procedures that are part of the designed business
system rather than part of the process model
developed during analysis. - The starting point for transaction design for
process implementing procedures is the logic
defined via an action diagram for the elementary
process(es) which the transaction (procedure) is
to implement. - If action diagrams have not been developed during
analysis then these should be developed in close
association with the data model as part of
design.
74Physical Issues in Transaction Design
- Operational, audit and access control issues.
- Impact of designer added data on procedure logic.
- Concurrency and locking issues.
- Failure, fallback, restart and recovery issues.
- Standardisation of transaction resource
requirements. - Frequency of use and transaction response
requirements. - The use of derived and redundant data to minimise
access to secondary storage. - The most appropriate time to derive aggregate
data.