Data Design

About This Presentation

Title:

Data Design

Description:

DELETE data for last item lost when last order for that item is deleted ... a relation has a composite key, it must be checked ... Develop composite load map ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 75

Provided by: andrewb77

more less

Transcript and Presenter's Notes

Title: Data Design

1
IMS2805 - Systems Design and Implementation

Lecture 6
Data Design

2
References

HOFFER, J.A., GEORGE, J.F. and VALACICH (2002)
3rd ed., Modern Systems Analysis and Design,
Prentice-Hall, New Jersey, Chap 12
HOFFER, J.A., GEORGE, J.F. and VALACICH (2005)
4th ed., Modern Systems Analysis and Design,
Prentice-Hall, New Jersey, Chap 10
WHITTEN, J.L., BENTLEY, L.D. and DITTMAN, K.C.
(2001) 5th ed., Systems Analysis and Design
Methods, Irwin/McGraw-Hill, New York, NY.
Chapter 12

3
Data Design

Logical data design
Logical data modelling
Physical file and database design

4
Overview of logical data modelling

Logical data modelling involves
collecting detailed attributes for each entity
and relationship identified
converting ER models to relations
normalising the relations
merging relations from each user viewpoint
converting the normalised and merged relations to
create a data structure diagram

5
Data Design

The data structures developed in the logical
system design phase need to be converted to
physical data structures for the physical design
stage.
A knowledge of the relevant data base management
system (DBMS) is important for this phase.
The aim is to develop a compromise between the
ideal (most flexible) structure developed
during normalisation and the most efficient
(fastest, cheapest) structure preferred for
implementation.

6
Physical Data Design

Information required
Normalised relations, including volume estimates
Definitions of each attribute
Descriptions of where and when data are used
(CRUD)
Expectations/requirements for response time and
data integrity
Descriptions of technologies to be used for
implementing the files and databases

7
Physical Data Design

Key decisions
The storage format (data type) for each
attribute, e.g. length,coding scheme, decimal
places, range of values
Grouping of attributes into physical records
(data structure)
Arranging records in secondary memory for
storage, retrieval, update (file organisation)
Selection of media and structures for efficient
access
DBMS technologies require selection of options
for retrieval and for implementation of
relationships between entities

8
The relational database model

The relational database model represents data in
the form of tables or relations
Important concepts are
relation
primary key
foreign key
functional dependency
normalisation

9
Normalisation

Normalisation is a process for converting complex
data structures into simple, stable data
structures in the form of relations.
Data models consisting of normalised relations
are robust and stable
have minimum redundancy
are flexible
are technology-independent (logical)

10
Normalisation

Normalisation ensures that each attribute is
attached to the appropriate relation
i.e. each attribute is contained in the relation
which represents the real world system object or
concept that the attribute describes or is a
property of
e.g. the attribute Student-name should be in the
relation STUDENT which represents the real world
object student of interest to a student records
system

11
Normalisation

Normalisation was originally developed as part of
relational database theory by E.F. Codd (1970)
Normalisation is accomplished in stages, each of
which corresponds to a normal form
Originally, Codd defined first, second and third
normal forms. Third normal form is adequate for
most business applications.
Later extensions include Boyce-Codd, 4th, 5th and
domain-key normal forms.

12
Relation

A relation is a named, two-dimensional table of
data
Each relation consists of a set of named columns
and an arbitrary number of rows
Each column corresponds to an attribute of the
relation
Each row corresponds to an instance (or record)
that contains values for that instance

13
Example relation

A relation generally corresponds to some real
world object or concept of interest to the system
(similar to an entity) e.g.

Employee
Emp
Name
Salary
Dept
1247
Adams
24000
Finance
1982
Smith
27000
MIS
9314
Jones
33000
Finance
Employee (Emp, Name, Salary, Dept)
14
Properties of relations

Relational tables are tables in which
data values are atomic (single-valued)
data values in columns are from the same domain
each row in the relation is unique
the sequence of columns is insignificant
the sequence of rows is insignificant

15
Primary key

An attribute or group of attributes which
uniquely identifies a row of a relation
Entity integrity (relational data base theory)
requires that each relation has a non-null
primary key
Where several possible keys are identified, they
are known as candidate keys - choose one to be
the primary key

E.g.
Employee (Emp, Name, Salary, Dept) Order-item
(Order, Item, Qty-ordered)
16
Foreign key

A foreign key is an attribute in one relation
that is also a primary key in another relation
The referential integrity constraint (relational
database theory) specifies that if an attribute
value exists in one relation then it must also
exist in a linked relation
A foreign key must satisfy referential integrity

17
Foreign key example

In the example below, if a given Dept exists in
an Employee relation then that Dept must exist
in the Department relation

Employee (Emp, Name, Salary, Dept)
foreign key
Department (Dept, Dname, Budget)
18
Functional dependency

A functional dependency is a particular
relationship between attributes in a relation
For any relation R, attribute B is functionally
dependent on attribute A if each value of A has
only ONE value of B associated with it,
i.e. if for every valid instance of A, that
value of A uniquely determines the value of B

A identifies B A B Emp
Emp-name Emp Salary
19
Steps in normalisation

Normalisation to third normal form is
accomplished in 3 steps each corresponding to a
basic normal form
A normal form is a state of a relation that can
be determined by applying simple rules concerning
dependencies within that relation
Each step of the normalisation process is applied
to a single relation in sequence so that the
relation is converted to third normal form

20
Steps in normalisation
Unnormalised table
Remove repeating groups
First Normal Form
Remove partial dependencies
Second Normal Form
Remove transitive dependencies
Third Normal Form
21
Well Structured Relations

A well structured relation contains a minimum
amount of redundancy and allows users to insert,
modify, and delete rows in a table without errors
or inconsistencies (known as anomalies)
Three types of anomaly are possible
insertion
deletion
Modification
Third normal form relations are considered to be
well structured relations

22
First normal form (1 NF)

A relation is in first normal form if it contains
no repeating data
the value of the data at the intersection of
each row and column must be single-valued
Remove any repeating groups of attributes to
convert a relation to 1 NF (key of the removed
group is a composite key)

Order (Order, Customer, (Item, Desc,
Qty)) Order-Item (Order, Item, Desc, Qty)
Order (Order, Customer)
23
Second normal form (2 NF)

A relation is in 2 NF if it is in 1 NF and every
non-ket attribute is fully functionally dependent
on the primary key
A partial dependency exists if one or more
non-key attributes are dependent on only part of
a composite primary key
Remove any partial dependencies to convert a 1 NF
relation to 2 NF
If the primary key consists of only one attribute
or there are no non-key attributes, then a 1 NF
relation is automatically in 2 NF

24
Second normal form (2 NF)

Remove Partial Dependencies
a non-key attribute cannot be identified by part
of a composite key

Order (Order, Item, Desc, Qty-ordered)
Order-Item (Order, Item, Qty-ordered)
Item (Item, Desc)

25
Partial dependency anomalies
Order-Item
Order
Item
Qty
Item-desc
2
27
873
nut
1
28
402
bolt
10
28
873
nut
50
30
495
washer

UPDATE change item -desc in many places
DELETE data for last item lost when last order
for that item is deleted
CREATE cannot add new item until it is ordered

26
The solution for these anomalies 2 NF
Order
delete last order for item, but item remains
Order
Item
Qty
27
873
2
28
402
1
28
873
10
30
495
50
Item
add new item at any time
Item
Desc
873
nut
change item description in one place only
402
bolt
495
washer
27
Dependencies within the primary key

If a relation has a composite key, it must be
checked for dependencies within the key to
determine whether they should be retained e.g.
DEPT (Dept, Dept-name, (Emp, Emp-name))
DEPT (Dept, Dept-name), DEPT-EMP (Dept, Emp,
Emp-name)
But DEPT-EMP (Dept, Emp, Emp-name)
Remove Dept from the key
EMP (Emp ,Emp-name, Dept)

28
Third normal form (3 NF)

A relation is in 3 NF if it is in 2 NF and no
transitive dependencies exist
A transitive dependency is a functional
dependency between two or more non-key attributes
Remove any transitive dependencies to convert a 2
NF relation to 3 NF

29
Third normal form

Remove Transitive Dependencies
a non-key attribute cannot be identified
by another non-key attribute

Employee (Emp, Ename, Dept, Dname) Employee
(Emp, Ename, Dept) Department( Dept, Dname)
(look for foreign keys and their attributes)
30
Transitive Dependency Anomalies
Employee
Emp
Emp-name
Dept
Dname
10
Smith
D5
MIS
20
Jones
D7
Finance
25
Smith
D7
Finance
30
Black
D8
Sales

UPDATE change dept name in many places
DELETE data for dept lost when last employee
for that dept is deleted
CREATE cannot add new dept until an employee is
allocated to it

31
The solution for these anomalies 3 NF
delete last emp in dept, but dept remains
Employee
Emp
Ename
Dept
10
Smith
D5
20
Jones
D7
25
Smith
D7
30
Black
D8
Item
add new dept at any time
Dept
Dname
D5
MIS
D7
Finance
change dept name in one place
D8
Sales
32
Normalisation to 3 NF

A relation is normalised if all attributes are
fully
functionally dependent on the primary key

Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
33
Transforming ER Diagrams into Relations

Transforming an ER diagram into normalised
relations, and then merging all the relations
into one final, consolidated set of relations can
be accomplished in four steps
1. Represent entities as relations
2. Represent relationships as relations
3. Normalise each relation
4. Merge the relations

34
Representing Entities as Relations

Each entity in the ER model is transformed into a
relation e.g.
becomes

CUSTOMER
Customer (Customer-no, Name, Address, City,
State, Postcode, Discount)
35
Representing Relationships as Relations

1. Binary Relationships (1N, 11)

places
CUSTOMER
ORDER
Customer (Customer-no, Name, Address, City,
State, Postcode, Discount) Order (Order-no,
Order-date, Promised-date, Customer-no)
36
Representing Relationships as Relations

1. Binary Relationships (1N, 11)
For 1M, add the primary key of the entity on
the one side of the relationship as a foreign
key in the relation that is on the many side
For 11 relationship involving entities A and
B, choose from
add the primary key of A as a foreign key of B
add the primary key of B as a foreign key of A
both of the above

37
Representing Relationships as Relations

2. Binary and Higher Degree Relationships
(MN)

PRODUCT
ORDER
requests
Where we wish to know the quantity of each
product on each order, i.e. this attribute is an
attribute of the relationship requests/requested
by
Order (Order-no, Order-date, Promised-date) Order
Line (Order-no, Product-no, Quantity-ordered) Prod
uct (Product-no, description, (other attributes))
38
Representing Relationships as Relations

2. Binary and Higher Degree Relationships
(MN)
For MN, first create a relation for each for
each of the entity types, and then create a
relation for the relationship, with a composite
primary key formed from the primary keys of the
participating entity types.

39
Representing Relationships as Relations

3. Unary Relationships

manages
Has component
EMPLOYEE
ITEM
Reports to
Is a component of
(1N)
(MN)
Item (Item-no, Name, Cost) Item-Bill (Item-no,
Component-no, Quantity)
Employee (Emp-id, Name, Birthdate,
Manager-id)
40
Representing Relationships as Relations

4. IS-A Relationship (Class-Subclass or
generalisation)

PROPERTY
BEACH PROPERTY
MOUNTAIN PROPERTY
Property (Street-address, City-state-postcode,
No-rooms, Typical-rent) Beach (Street-address,
City-state-postcode , Distance-to-beach) Mountain
(Street-address, City-state-postcode , Skiing)
41
Normalisation of relations merging relations

During the normalisation process two or more
relations with the same primary key may appear
The set of 3NF relations must not contain any
duplicate data
Relations with the same primary key should be
merged

42
Normalisation of relations

Synonyms
Two or more attributes may have different names
but the same meaning
Either adopt one of the names as a standard or
choose a third name
STUDENT1 (Student-id, Name, Phone-no)
STUDENT2 (VCE-no, Name, Address)
STUDENT (Student-id, Name, Address, Phone-no)

43
Normalisation of relations

Homonyms
Two or more attributes may have the same name but
different meanings
To resolve the conflict, new attribute names need
to be created
STUDENT1 (Student-id, Name, Address)
STUDENT2 (Student-id , Name, Phone-no, Address)
STUDENT (Student-id, Name, Phone-no,
Campus-address, Permanent-address )

44
Normalisation of relations

Transitive dependencies

When two 3NF relations are merged, transitive
dependencies may result e.g.
STUDENT1 (Student-id, Major)
STUDENT2 (Student-id , Advisor)
STUDENT (Student-id, Major, Advisor)
BUT MAJOR ADVISOR
STUDENT1 (Student-id, Major)
MAJOR (Major , Advisor)

45
Data Structure Diagrams

A set of 3 NF relations may be converted to a
simple diagrammatic form to begin physical
database design
The conversion is simple
1. Draw a named rectangle for each relation
2. Draw a relationship line between rectangles
linked by foreign keys with a many
cardinality at the foreign key end of the
relationship

46
Data Structure Diagram example

CUSTOMER (Cust, Cname,Phone number)
SALES ORDER (Sord, Sord-date, Cust)
SALES ORDER-ITEM (Sord, Item, Qty)
ITEM (Item, Item-desc)

CUSTOMER
ITEM
SALES ORDER
SALES ORDER LINE
47
Data Structure Diagram example

Eliminate Redundant Relationships

TOUR
(Tourcode, ...)
DEPARTURE
(Tourcode, depdate, ...)
redundant
(Booking, ... , Tourcode, depdate)
BOOKING
48
Relational Database Design
Logical Process Model Designer Added
Procedures Frequency Response
Req. Constraints
Logical Data Model Designer Added Data Volume
Sizing Info Constraints
Logical Database Design Logical Transaction
Model
Target Platform Specific Design
49
Database Design Objectives
Structure (Stability / Adaptability) Preserve
correspondence to normalised data model Preserve
data integrity
Performance (Speed and Resource
Utilisation) Transaction response
requirements Minimise secondary storage
requirements Minimise transfers between primary
and secondary storage
50
Database Design - Logical Design

Logical Design
Designer added entity types and attribute types
Review primary and foreign keys
Review derived and aggregate data
Define table and column structure
Establish row and column ratios
Define transaction access maps
Review transaction access maps considering
performance
Develop composite load map
Evaluate design in terms of corporate standards,
response requirements and resource utilisation.

51
Database Design - Physical Design

Selection of table access paths including indexes
and file organisation.
Selection of data types and sizes
Selection of other DBMS dependent variables
page size,
free space management option,
data and index clustering,
compression,
concurrency control issues,
database sizing - fill for tables,
buffer sizes,
access control constraints.

52
Designer Added Data

It is often necessary to extend the logical data
model with extra entity types, attribute types
and / or relationship types to reflect design and
implementation decisions.
Additional data may be required to support
archiving
back-up
access control
restart and recovery
audit and integrity requirements
system housekeeping functions and the storing of
historical information

53
Example Designer Added Data

Adding entities for the purpose of archiving

Has Assigned
Project
Employee
Archived Project
Archived Employee
Last Used Empno
54
Convert Logical Data Model

Relations become tables.
Attributes become columns
a number of decisions must be made when mapping
attributes to columns.
Domains must be defined and linked to the
appropriate table columns. (if domains are
supported by the DBMS)

55
Primary Keys

Some guidelines regarding primary keys
Primary keys identified in the logical data model
may not always be suitable for implementation.
The length of each attribute forming the primary
key should be as short as possible. Avoid long
character fields.
Keep the use of composite keys to a minimum.
Surrogate keys may be used to reduce these
problems. Sometimes these keys may be hidden
from the users.
Compact unique identifiers are particularly
suitable as primary and therefore foreign keys.
No component of a primary key can accept null
values, and the uniqueness property must be
enforced.

56
Transaction Analysis

A technique to analyse the data access loads
generated by each transaction type.
This allows alternative approaches and their
impact on data access loads to be considered.
The loads generated by each transaction type may
be aggregated to calculate the total access
loads on the database, and this information is
useful in identifying heavily loaded structures
of the database.

57
Statistics Must be Representative

To properly evaluate a design it may be necessary
to calculate several sets of performance
statistics.
Transaction loads may vary widely from peak to
average.
Database size may vary widely during application
execution particularly if cumulative data is
held.
It is much safer to perform design calculations
using peak loads, rather than 'average' loads.
Planned equipment utilisation should never exceed
70.

58
Transaction Analysis Technique

Define the cardinality of each table, and the
ratios between table rows.
Develop transaction access maps for each
transaction type showing the sequence in which
each table is accessed and the basis of each
access.
Use table cardinality and ratio information to
determine the number of rows of each table that
each transaction type will access.

59
Transaction Analysis Technique

Aggregate these details to produce total access
loads for each table in the database.
The individual loads for each transaction type
and the composite load map should be carefully
examined for opportunities to improve the
efficiency of the system.
Great care needs to be taken to balance
flexibility against performance, when considering
amendments to transaction access paths.

60
Transaction Analysis Technique

The following data model fragment will be used
for examples. The numbers relate to
table cardinality (eg Order table 900)
table ratios (eg Order to Orderline ratio 110)

61
Table Cardinality Row Ratios

Item (Item, Description, Price,
Quantity-on-hand)
1000 records with 5 growth pa
Order-line (Order, Item, Qty-ordered,
Qty-shipped, Unit-price)
9000 records with 7 growth pa,
Ratios ( 1 Item 9 Order-lines), (1 Order 10
Order-lines)
Order ( Order, Ord-date, etc)
900 records with 7 growth pa,
Next Order (Order)
1 record with no growth
Each Item has on average 9 Order-lines
Each Order has on average 10 Order-lines

62
Transaction Access Path Maps

Developed from action diagrams for elementary
processes. For each transaction, or at least the
most critical, it is necessary to identify
Each table access,
The sequence of the access within the
transaction,
The type of access C. R. U. D.(create, read,
update, delete)
The columns and predicate of the search condition
for the CRUD access
Whether the predicate values come from other
tables accessed by the same transaction,
The columns required as a result of this access,
The number of accesses per transaction,
The number of accesses per period.

63
Example Transaction Access Path Map
3
1 2
4 5
6
64
Example Transaction Access Path Map
(1000)
Item
1 2
Assumption 70 of enquiries are by Item, 30
by description (Desc)
65
Example Transaction Access Path Map
1.
3.
2.
66
Ad Hoc Queries

Complicate the situation, because they are
difficult to quantify and predict in advance.
Some controls on when ad hoc queries are
permitted may be necessary.
Some DBMS allow the cost of a query to be
calculated and the user may be forewarned or
prevented from executing an expensive query.
Data may be extracted from the production
database during quiet times and loaded onto
another database on a different machine to
minimise the impact of ad hoc query.

67
Analysis of the Transaction Access Path Map

The access path for each transaction should be
reviewed for efficiency.
A number of techniques may be used to reduce the
cost of transactions.
Alteration of aggregation time.
Alteration of column derivation time.
Data Replication
Combining and splitting tables
Introducing codes simplify identification and
compress data

68
Developing the Composite Load Map

Summarise the information from each transaction
access map to show the total number of accesses
by table, access type, table columns used for
access and the access predicate.
This will highlight possible keys and indexes,
row sequencing requirements and the columns used
to join tables.
Ideally the production of the composite load map
should be automated to allow for automatic
re-calculation of totals following changes to
transaction access maps.

69
Example Composite Load Map
70
Interpreting the Composite Load Map

The composite load map is designed to highlight
table access types that have a very high
frequency.
One logical input-output operation does not
necessarily result in one physical input-output
operation. Buffering techniques, indexes, and
logging complicate the actual number of physical
I/O operations.
The transactions contributing to these high
levels of access should be closely scrutinised to
determine whether the number of I/O accesses
could be reduced by altering their data access
strategy.
The composite load map also provides a base for
determining where entry point access methods such
as primary and secondary indexes may provide
substantial benefit.

71
Interpreting the Composite Load Map

Where the composite load map shows high join
frequencies between tables, control of table row
placement, or the use of table access methods
such as hashing techniques or indexes may be used
to optimise join operations.
Where a table has a high frequency of sequential
access, physical ordering of table rows, or the
creation of an index to support the access could
be useful.
Tables with a small number of rows could be held
in main storage, if this option is supported by
the DBMS.
The actual access path taken will depend on the
DBMS query optimiser.

72
Compromising the Logical Model

Given the limitations of some DBMS it may be
necessary to compromise the logical data model
for performance reasons.
In general this should be hidden from application
logic by the use of special access routines that
are responsible for mapping the normalised
application view to the de-normalised storage
view.
Some DBMS specific design techniques may allow a
normalised application view to be maintained.

73
Transaction Design

Some transactions may implement designer added
procedures that are part of the designed business
system rather than part of the process model
developed during analysis.
The starting point for transaction design for
process implementing procedures is the logic
defined via an action diagram for the elementary
process(es) which the transaction (procedure) is
to implement.
If action diagrams have not been developed during
analysis then these should be developed in close
association with the data model as part of
design.

74
Physical Issues in Transaction Design