Data Design

1 / 74
About This Presentation
Title:

Data Design

Description:

DELETE data for last item lost when last order for that item is deleted ... a relation has a composite key, it must be checked ... Develop composite load map ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 75
Provided by: andrewb77

less

Transcript and Presenter's Notes

Title: Data Design


1
IMS2805 - Systems Design and Implementation
  • Lecture 6
  • Data Design

2
References
  • HOFFER, J.A., GEORGE, J.F. and VALACICH (2002)
    3rd ed., Modern Systems Analysis and Design,
    Prentice-Hall, New Jersey, Chap 12
  • HOFFER, J.A., GEORGE, J.F. and VALACICH (2005)
    4th ed., Modern Systems Analysis and Design,
    Prentice-Hall, New Jersey, Chap 10
  • WHITTEN, J.L., BENTLEY, L.D. and DITTMAN, K.C.
    (2001) 5th ed., Systems Analysis and Design
    Methods, Irwin/McGraw-Hill, New York, NY.
    Chapter 12

3
Data Design
  • Logical data design
  • Logical data modelling
  • Physical file and database design

4
Overview of logical data modelling
  • Logical data modelling involves
  • collecting detailed attributes for each entity
    and relationship identified
  • converting ER models to relations
  • normalising the relations
  • merging relations from each user viewpoint
  • converting the normalised and merged relations to
    create a data structure diagram

5
Data Design
  • The data structures developed in the logical
    system design phase need to be converted to
    physical data structures for the physical design
    stage.
  • A knowledge of the relevant data base management
    system (DBMS) is important for this phase.
  • The aim is to develop a compromise between the
    ideal (most flexible) structure developed
    during normalisation and the most efficient
    (fastest, cheapest) structure preferred for
    implementation.

6
Physical Data Design
  • Information required
  • Normalised relations, including volume estimates
  • Definitions of each attribute
  • Descriptions of where and when data are used
    (CRUD)
  • Expectations/requirements for response time and
    data integrity
  • Descriptions of technologies to be used for
    implementing the files and databases

7
Physical Data Design
  • Key decisions
  • The storage format (data type) for each
    attribute, e.g. length,coding scheme, decimal
    places, range of values
  • Grouping of attributes into physical records
    (data structure)
  • Arranging records in secondary memory for
    storage, retrieval, update (file organisation)
  • Selection of media and structures for efficient
    access
  • DBMS technologies require selection of options
    for retrieval and for implementation of
    relationships between entities

8
The relational database model
  • The relational database model represents data in
    the form of tables or relations
  • Important concepts are
  • relation
  • primary key
  • foreign key
  • functional dependency
  • normalisation

9
Normalisation
  • Normalisation is a process for converting complex
    data structures into simple, stable data
    structures in the form of relations.
  • Data models consisting of normalised relations
  • are robust and stable
  • have minimum redundancy
  • are flexible
  • are technology-independent (logical)

10
Normalisation
  • Normalisation ensures that each attribute is
    attached to the appropriate relation
  • i.e. each attribute is contained in the relation
    which represents the real world system object or
    concept that the attribute describes or is a
    property of
  • e.g. the attribute Student-name should be in the
    relation STUDENT which represents the real world
    object student of interest to a student records
    system

11
Normalisation
  • Normalisation was originally developed as part of
    relational database theory by E.F. Codd (1970)
  • Normalisation is accomplished in stages, each of
    which corresponds to a normal form
  • Originally, Codd defined first, second and third
    normal forms. Third normal form is adequate for
    most business applications.
  • Later extensions include Boyce-Codd, 4th, 5th and
    domain-key normal forms.

12
Relation
  • A relation is a named, two-dimensional table of
    data
  • Each relation consists of a set of named columns
    and an arbitrary number of rows
  • Each column corresponds to an attribute of the
    relation
  • Each row corresponds to an instance (or record)
    that contains values for that instance

13
Example relation
  • A relation generally corresponds to some real
    world object or concept of interest to the system
    (similar to an entity) e.g.

Employee
Emp
Name
Salary
Dept
1247
Adams
24000
Finance
1982
Smith
27000
MIS
9314
Jones
33000
Finance
Employee (Emp, Name, Salary, Dept)
14
Properties of relations
  • Relational tables are tables in which
  • data values are atomic (single-valued)
  • data values in columns are from the same domain
  • each row in the relation is unique
  • the sequence of columns is insignificant
  • the sequence of rows is insignificant

15
Primary key
  • An attribute or group of attributes which
    uniquely identifies a row of a relation
  • Entity integrity (relational data base theory)
    requires that each relation has a non-null
    primary key
  • Where several possible keys are identified, they
    are known as candidate keys - choose one to be
    the primary key

E.g.
Employee (Emp, Name, Salary, Dept) Order-item
(Order, Item, Qty-ordered)
16
Foreign key
  • A foreign key is an attribute in one relation
    that is also a primary key in another relation
  • The referential integrity constraint (relational
    database theory) specifies that if an attribute
    value exists in one relation then it must also
    exist in a linked relation
  • A foreign key must satisfy referential integrity

17
Foreign key example
  • In the example below, if a given Dept exists in
    an Employee relation then that Dept must exist
    in the Department relation

Employee (Emp, Name, Salary, Dept)
foreign key
Department (Dept, Dname, Budget)
18
Functional dependency
  • A functional dependency is a particular
    relationship between attributes in a relation
  • For any relation R, attribute B is functionally
    dependent on attribute A if each value of A has
    only ONE value of B associated with it,
  • i.e. if for every valid instance of A, that
    value of A uniquely determines the value of B

A identifies B A B Emp
Emp-name Emp Salary
19
Steps in normalisation
  • Normalisation to third normal form is
    accomplished in 3 steps each corresponding to a
    basic normal form
  • A normal form is a state of a relation that can
    be determined by applying simple rules concerning
    dependencies within that relation
  • Each step of the normalisation process is applied
    to a single relation in sequence so that the
    relation is converted to third normal form

20
Steps in normalisation
Unnormalised table
Remove repeating groups
First Normal Form
Remove partial dependencies
Second Normal Form
Remove transitive dependencies
Third Normal Form
21
Well Structured Relations
  • A well structured relation contains a minimum
    amount of redundancy and allows users to insert,
    modify, and delete rows in a table without errors
    or inconsistencies (known as anomalies)
  • Three types of anomaly are possible
  • insertion
  • deletion
  • Modification
  • Third normal form relations are considered to be
    well structured relations

22
First normal form (1 NF)
  • A relation is in first normal form if it contains
    no repeating data
  • the value of the data at the intersection of
    each row and column must be single-valued
  • Remove any repeating groups of attributes to
    convert a relation to 1 NF (key of the removed
    group is a composite key)

Order (Order, Customer, (Item, Desc,
Qty)) Order-Item (Order, Item, Desc, Qty)
Order (Order, Customer)
23
Second normal form (2 NF)
  • A relation is in 2 NF if it is in 1 NF and every
    non-ket attribute is fully functionally dependent
    on the primary key
  • A partial dependency exists if one or more
    non-key attributes are dependent on only part of
    a composite primary key
  • Remove any partial dependencies to convert a 1 NF
  • relation to 2 NF
  • If the primary key consists of only one attribute
    or there are no non-key attributes, then a 1 NF
    relation is automatically in 2 NF

24
Second normal form (2 NF)
  • Remove Partial Dependencies
  • a non-key attribute cannot be identified by part
    of a composite key
  • Order (Order, Item, Desc, Qty-ordered)
  • Order-Item (Order, Item, Qty-ordered)
  • Item (Item, Desc)

25
Partial dependency anomalies
Order-Item
Order
Item
Qty
Item-desc
2
27
873
nut
1
28
402
bolt
10
28
873
nut
50
30
495
washer
  • UPDATE change item -desc in many places
  • DELETE data for last item lost when last order
    for that item is deleted
  • CREATE cannot add new item until it is ordered

26
The solution for these anomalies 2 NF
Order
delete last order for item, but item remains
Order
Item
Qty
27
873
2
28
402
1
28
873
10
30
495
50
Item
add new item at any time
Item
Desc
873
nut
change item description in one place only
402
bolt
495
washer
27
Dependencies within the primary key
  • If a relation has a composite key, it must be
    checked for dependencies within the key to
    determine whether they should be retained e.g.
  • DEPT (Dept, Dept-name, (Emp, Emp-name))
  • DEPT (Dept, Dept-name), DEPT-EMP (Dept, Emp,
    Emp-name)
  • But DEPT-EMP (Dept, Emp, Emp-name)
  • Remove Dept from the key
  • EMP (Emp ,Emp-name, Dept)

28
Third normal form (3 NF)
  • A relation is in 3 NF if it is in 2 NF and no
    transitive dependencies exist
  • A transitive dependency is a functional
    dependency between two or more non-key attributes
  • Remove any transitive dependencies to convert a 2
    NF relation to 3 NF

29
Third normal form
  • Remove Transitive Dependencies
  • a non-key attribute cannot be identified
  • by another non-key attribute

Employee (Emp, Ename, Dept, Dname) Employee
(Emp, Ename, Dept) Department( Dept, Dname)
(look for foreign keys and their attributes)
30
Transitive Dependency Anomalies
Employee
Emp
Emp-name
Dept
Dname
10
Smith
D5
MIS
20
Jones
D7
Finance
25
Smith
D7
Finance
30
Black
D8
Sales
  • UPDATE change dept name in many places
  • DELETE data for dept lost when last employee
    for that dept is deleted
  • CREATE cannot add new dept until an employee is
    allocated to it

31
The solution for these anomalies 3 NF
delete last emp in dept, but dept remains
Employee
Emp
Ename
Dept
10
Smith
D5
20
Jones
D7
25
Smith
D7
30
Black
D8
Item
add new dept at any time
Dept
Dname
D5
MIS
D7
Finance
change dept name in one place
D8
Sales
32
Normalisation to 3 NF
  • A relation is normalised if all attributes are
    fully
  • functionally dependent on the primary key

Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
33
Transforming ER Diagrams into Relations
  • Transforming an ER diagram into normalised
    relations, and then merging all the relations
    into one final, consolidated set of relations can
    be accomplished in four steps
  • 1. Represent entities as relations
  • 2. Represent relationships as relations
  • 3. Normalise each relation
  • 4. Merge the relations

34
Representing Entities as Relations
  • Each entity in the ER model is transformed into a
    relation e.g.
  • becomes

CUSTOMER
Customer (Customer-no, Name, Address, City,
State, Postcode, Discount)
35
Representing Relationships as Relations
  • 1. Binary Relationships (1N, 11)

places
CUSTOMER
ORDER
Customer (Customer-no, Name, Address, City,
State, Postcode, Discount) Order (Order-no,
Order-date, Promised-date, Customer-no)
36
Representing Relationships as Relations
  • 1. Binary Relationships (1N, 11)
  • For 1M, add the primary key of the entity on
    the one side of the relationship as a foreign
    key in the relation that is on the many side
  • For 11 relationship involving entities A and
    B, choose from
  • add the primary key of A as a foreign key of B
  • add the primary key of B as a foreign key of A
  • both of the above

37
Representing Relationships as Relations
  • 2. Binary and Higher Degree Relationships
    (MN)

PRODUCT
ORDER
requests
Where we wish to know the quantity of each
product on each order, i.e. this attribute is an
attribute of the relationship requests/requested
by
Order (Order-no, Order-date, Promised-date) Order
Line (Order-no, Product-no, Quantity-ordered) Prod
uct (Product-no, description, (other attributes))
38
Representing Relationships as Relations
  • 2. Binary and Higher Degree Relationships
    (MN)
  • For MN, first create a relation for each for
    each of the entity types, and then create a
    relation for the relationship, with a composite
    primary key formed from the primary keys of the
    participating entity types.

39
Representing Relationships as Relations
  • 3. Unary Relationships

manages
Has component
EMPLOYEE
ITEM
Reports to
Is a component of
(1N)
(MN)
Item (Item-no, Name, Cost) Item-Bill (Item-no,
Component-no, Quantity)
Employee (Emp-id, Name, Birthdate,
Manager-id)
40
Representing Relationships as Relations
  • 4. IS-A Relationship (Class-Subclass or
    generalisation)

PROPERTY
BEACH PROPERTY
MOUNTAIN PROPERTY
Property (Street-address, City-state-postcode,
No-rooms, Typical-rent) Beach (Street-address,
City-state-postcode , Distance-to-beach) Mountain
(Street-address, City-state-postcode , Skiing)
41
Normalisation of relations merging relations
  • During the normalisation process two or more
    relations with the same primary key may appear
  • The set of 3NF relations must not contain any
    duplicate data
  • Relations with the same primary key should be
    merged

42
Normalisation of relations
  • Synonyms
  • Two or more attributes may have different names
    but the same meaning
  • Either adopt one of the names as a standard or
    choose a third name
  • STUDENT1 (Student-id, Name, Phone-no)
  • STUDENT2 (VCE-no, Name, Address)
  • STUDENT (Student-id, Name, Address, Phone-no)

43
Normalisation of relations
  • Homonyms
  • Two or more attributes may have the same name but
    different meanings
  • To resolve the conflict, new attribute names need
    to be created
  • STUDENT1 (Student-id, Name, Address)
  • STUDENT2 (Student-id , Name, Phone-no, Address)
  • STUDENT (Student-id, Name, Phone-no,
    Campus-address, Permanent-address )

44
Normalisation of relations
  • Transitive dependencies
  • When two 3NF relations are merged, transitive
    dependencies may result e.g.
  • STUDENT1 (Student-id, Major)
  • STUDENT2 (Student-id , Advisor)
  • STUDENT (Student-id, Major, Advisor)
  • BUT MAJOR ADVISOR
  • STUDENT1 (Student-id, Major)
  • MAJOR (Major , Advisor)

45
Data Structure Diagrams
  • A set of 3 NF relations may be converted to a
    simple diagrammatic form to begin physical
    database design
  • The conversion is simple
  • 1. Draw a named rectangle for each relation
  • 2. Draw a relationship line between rectangles
    linked by foreign keys with a many
    cardinality at the foreign key end of the
    relationship

46
Data Structure Diagram example
  • CUSTOMER (Cust, Cname,Phone number)
  • SALES ORDER (Sord, Sord-date, Cust)
  • SALES ORDER-ITEM (Sord, Item, Qty)
  • ITEM (Item, Item-desc)

CUSTOMER
ITEM
SALES ORDER
SALES ORDER LINE
47
Data Structure Diagram example
  • Eliminate Redundant Relationships

TOUR
(Tourcode, ...)
DEPARTURE
(Tourcode, depdate, ...)
redundant
(Booking, ... , Tourcode, depdate)
BOOKING
48
Relational Database Design
Logical Process Model Designer Added
Procedures Frequency Response
Req. Constraints
Logical Data Model Designer Added Data Volume
Sizing Info Constraints
Logical Database Design Logical Transaction
Model
Target Platform Specific Design
49
Database Design Objectives
Structure (Stability / Adaptability) Preserve
correspondence to normalised data model Preserve
data integrity
Performance (Speed and Resource
Utilisation) Transaction response
requirements Minimise secondary storage
requirements Minimise transfers between primary
and secondary storage
50
Database Design - Logical Design
  • Logical Design
  • Designer added entity types and attribute types
  • Review primary and foreign keys
  • Review derived and aggregate data
  • Define table and column structure
  • Establish row and column ratios
  • Define transaction access maps
  • Review transaction access maps considering
    performance
  • Develop composite load map
  • Evaluate design in terms of corporate standards,
    response requirements and resource utilisation.

51
Database Design - Physical Design
  • Selection of table access paths including indexes
    and file organisation.
  • Selection of data types and sizes
  • Selection of other DBMS dependent variables
  • page size,
  • free space management option,
  • data and index clustering,
  • compression,
  • concurrency control issues,
  • database sizing - fill for tables,
  • buffer sizes,
  • access control constraints.

52
Designer Added Data
  • It is often necessary to extend the logical data
    model with extra entity types, attribute types
    and / or relationship types to reflect design and
    implementation decisions.
  • Additional data may be required to support
  • archiving
  • back-up
  • access control
  • restart and recovery
  • audit and integrity requirements
  • system housekeeping functions and the storing of
    historical information

53
Example Designer Added Data
  • Adding entities for the purpose of archiving

Has Assigned
Project
Employee
Archived Project
Archived Employee
Last Used Empno
54
Convert Logical Data Model
  • Relations become tables.
  • Attributes become columns
  • a number of decisions must be made when mapping
    attributes to columns.
  • Domains must be defined and linked to the
    appropriate table columns. (if domains are
    supported by the DBMS)

55
Primary Keys
  • Some guidelines regarding primary keys
  • Primary keys identified in the logical data model
    may not always be suitable for implementation.
  • The length of each attribute forming the primary
    key should be as short as possible. Avoid long
    character fields.
  • Keep the use of composite keys to a minimum.
  • Surrogate keys may be used to reduce these
    problems. Sometimes these keys may be hidden
    from the users.
  • Compact unique identifiers are particularly
    suitable as primary and therefore foreign keys.
  • No component of a primary key can accept null
    values, and the uniqueness property must be
    enforced.

56
Transaction Analysis
  • A technique to analyse the data access loads
    generated by each transaction type.
  • This allows alternative approaches and their
    impact on data access loads to be considered.
  • The loads generated by each transaction type may
    be aggregated to calculate the total access
    loads on the database, and this information is
    useful in identifying heavily loaded structures
    of the database.

57
Statistics Must be Representative
  • To properly evaluate a design it may be necessary
    to calculate several sets of performance
    statistics.
  • Transaction loads may vary widely from peak to
    average.
  • Database size may vary widely during application
    execution particularly if cumulative data is
    held.
  • It is much safer to perform design calculations
    using peak loads, rather than 'average' loads.
  • Planned equipment utilisation should never exceed
    70.

58
Transaction Analysis Technique
  • Define the cardinality of each table, and the
    ratios between table rows.
  • Develop transaction access maps for each
    transaction type showing the sequence in which
    each table is accessed and the basis of each
    access.
  • Use table cardinality and ratio information to
    determine the number of rows of each table that
    each transaction type will access.

59
Transaction Analysis Technique
  • Aggregate these details to produce total access
    loads for each table in the database.
  • The individual loads for each transaction type
    and the composite load map should be carefully
    examined for opportunities to improve the
    efficiency of the system.
  • Great care needs to be taken to balance
    flexibility against performance, when considering
    amendments to transaction access paths.

60
Transaction Analysis Technique
  • The following data model fragment will be used
    for examples. The numbers relate to
  • table cardinality (eg Order table 900)
  • table ratios (eg Order to Orderline ratio 110)

61
Table Cardinality Row Ratios
  • Item (Item, Description, Price,
    Quantity-on-hand)
  • 1000 records with 5 growth pa
  • Order-line (Order, Item, Qty-ordered,
    Qty-shipped, Unit-price)
  • 9000 records with 7 growth pa,
  • Ratios ( 1 Item 9 Order-lines), (1 Order 10
    Order-lines)
  • Order ( Order, Ord-date, etc)
  • 900 records with 7 growth pa,
  • Next Order (Order)
  • 1 record with no growth
  • Each Item has on average 9 Order-lines
  • Each Order has on average 10 Order-lines

62
Transaction Access Path Maps
  • Developed from action diagrams for elementary
    processes. For each transaction, or at least the
    most critical, it is necessary to identify
  • Each table access,
  • The sequence of the access within the
    transaction,
  • The type of access C. R. U. D.(create, read,
    update, delete)
  • The columns and predicate of the search condition
    for the CRUD access
  • Whether the predicate values come from other
    tables accessed by the same transaction,
  • The columns required as a result of this access,
  • The number of accesses per transaction,
  • The number of accesses per period.

63
Example Transaction Access Path Map
3
1 2
4 5
6
64
Example Transaction Access Path Map
(1000)
Item
1 2
Assumption 70 of enquiries are by Item, 30
by description (Desc)
65
Example Transaction Access Path Map
1.
3.
2.
66
Ad Hoc Queries
  • Complicate the situation, because they are
    difficult to quantify and predict in advance.
  • Some controls on when ad hoc queries are
    permitted may be necessary.
  • Some DBMS allow the cost of a query to be
    calculated and the user may be forewarned or
    prevented from executing an expensive query.
  • Data may be extracted from the production
    database during quiet times and loaded onto
    another database on a different machine to
    minimise the impact of ad hoc query.

67
Analysis of the Transaction Access Path Map
  • The access path for each transaction should be
    reviewed for efficiency.
  • A number of techniques may be used to reduce the
    cost of transactions.
  • Alteration of aggregation time.
  • Alteration of column derivation time.
  • Data Replication
  • Combining and splitting tables
  • Introducing codes simplify identification and
    compress data

68
Developing the Composite Load Map
  • Summarise the information from each transaction
    access map to show the total number of accesses
    by table, access type, table columns used for
    access and the access predicate.
  • This will highlight possible keys and indexes,
    row sequencing requirements and the columns used
    to join tables.
  • Ideally the production of the composite load map
    should be automated to allow for automatic
    re-calculation of totals following changes to
    transaction access maps.

69
Example Composite Load Map
70
Interpreting the Composite Load Map
  • The composite load map is designed to highlight
    table access types that have a very high
    frequency.
  • One logical input-output operation does not
    necessarily result in one physical input-output
    operation. Buffering techniques, indexes, and
    logging complicate the actual number of physical
    I/O operations.
  • The transactions contributing to these high
    levels of access should be closely scrutinised to
    determine whether the number of I/O accesses
    could be reduced by altering their data access
    strategy.
  • The composite load map also provides a base for
    determining where entry point access methods such
    as primary and secondary indexes may provide
    substantial benefit.

71
Interpreting the Composite Load Map
  • Where the composite load map shows high join
    frequencies between tables, control of table row
    placement, or the use of table access methods
    such as hashing techniques or indexes may be used
    to optimise join operations.
  • Where a table has a high frequency of sequential
    access, physical ordering of table rows, or the
    creation of an index to support the access could
    be useful.
  • Tables with a small number of rows could be held
    in main storage, if this option is supported by
    the DBMS.
  • The actual access path taken will depend on the
    DBMS query optimiser.

72
Compromising the Logical Model
  • Given the limitations of some DBMS it may be
    necessary to compromise the logical data model
    for performance reasons.
  • In general this should be hidden from application
    logic by the use of special access routines that
    are responsible for mapping the normalised
    application view to the de-normalised storage
    view.
  • Some DBMS specific design techniques may allow a
    normalised application view to be maintained.

73
Transaction Design
  • Some transactions may implement designer added
    procedures that are part of the designed business
    system rather than part of the process model
    developed during analysis.
  • The starting point for transaction design for
    process implementing procedures is the logic
    defined via an action diagram for the elementary
    process(es) which the transaction (procedure) is
    to implement.
  • If action diagrams have not been developed during
    analysis then these should be developed in close
    association with the data model as part of
    design.

74
Physical Issues in Transaction Design
  • Operational, audit and access control issues.
  • Impact of designer added data on procedure logic.
  • Concurrency and locking issues.
  • Failure, fallback, restart and recovery issues.
  • Standardisation of transaction resource
    requirements.
  • Frequency of use and transaction response
    requirements.
  • The use of derived and redundant data to minimise
    access to secondary storage.
  • The most appropriate time to derive aggregate
    data.
Write a Comment
User Comments (0)