Views - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Views

Description:

User Reports are physical, tangible artefacts, e.g forms ... determines the value of another attribute and vice versa; that is, A B and B A. for example: ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 59
Provided by: michae570
Category:
Tags: bb | views

less

Transcript and Presenter's Notes

Title: Views


1
Database Design - 1
2
Levels of Abstraction
  • User Reports are physical, tangible artefacts,
    e.g forms
  • User Views describe the data that the system must
    capture and are derived from current and future
    system inputs / outputs
  • Conceptual Schema defines logical structure
  • Internal Schema describes how data is physically
    stored (tables, indexes, etc.)
  • Levels of abstraction give data independence

3
Example Purchase Order
4
Example Purchase Order
  • User View
  • Purchase Order Order Number
  • Date
  • Supplier Name
  • Supplier Address
  • Delivery Address
  • Currency
  • Line Item

  • Quantity

  • Description of Goods
  • Unit
    Cost
  • Total
    Cost
  • VAT-exclusive
    Total
  • VAT
  • VAT-inclusive
    Total
  • Authorised by
  • Account Codes

5
Example Purchase Order
  • Conceptual Schema

6
Stages of Database Design
  • Conceptual Design
  • The process of constructing a model of the
    information used in an enterprise, independent of
    all physical considerations
  • Logical Design
  • The mapping of the conceptual design to a
    specific model of data representation (for
    example, the relational model) but independent of
    a particular DBMS
  • With the relational model, this involves
    normalisation, decomposition of MM
    relationships, selection of primary and foreign
    keys, and specification of attributes, domains
    and integrity constraints
  • Physical Design
  • The process of implementing the logical design
    using a specific DBMS technology for example,
    ORACLE

7
E-R Modeling
  • Entity-Relationship (E-R) modelling is a popular
    technique for conceptual and logical database
    design
  • E-R Diagrams are the diagrammatic representation
    of E-R Models
  • There are a variety of E-R diagramming notations.
    The most common are the Chen notation, and the
    crows foot (Barker) notation, and variants
    thereof

8
Entities and Attributes
  • An entity is something about which the
    organisation needs to record data for example,
    Student, Course, Lecturer, Degree, etc.
  • An instance is a single occurrence of an entity
    for example Michael Lang is an instance of
    Lecturer, Commerce is an instance of Degree
  • Entities are described by a set of attributes
    for example, a Student may be defined by (Student
    ID, Name, Address, Date of Birth)
  • Attributes have domains. A domain is a set of
    allowable values for example, Student Mark must
    be an integer between 1 and 100, Date of Birth
    must be a date no later than 17 years ago,
    Address must be a sequence of no more than 80
    alphanumeric characters

9
Identifiers
  • Every instance of an entity must be uniquely
    identified
  • An identifier (primary key) can be a single
    attribute, or a combination of two or more
    attributes (composite identifier)
  • An identifier can be invented if there is no
    suitable or naturally occurring candidate
    attribute(s)
  • The chosen identifier should be
  • minimal - choose the fewest columns necessary
  • stable - choose a key that seldom changes
  • simple / familiar - something that is easy to
    remember
  • not null - there must always be a value (in the
    case of composite identifiers, all component
    attributes must have a value)

10
Identifiers Examples
  • Employee Staff_Number Name
    Telephone_Extension Email RSI_Number
  • What should be the key ?
  • Name is not guaranteed to be unique. Besides,
    there are other issues, such as misspellings
    (Michael/Micheal, Lang/Laing/Lange/Long), case
    sensitivity (UPPER/lower), and variations
    (Thomas/Tom/Tommy)
  • Telephone_Extension is probably unique in 95 of
    cases. However, note the exceptions of shared
    offices, and employees without telephones
  • Email is unique. However, not everyone has email.
    Also, email addresses tend to frequently change
  • RSI_Number is unique. However, its not easy to
    remember
  • Therefore Staff_Number is the best choice. Its
    unique, easy to remember, stable, and everyone
    has one

11
Relationships
  • One-to-One Relationship (11)
  • Two entities are participants in a 11
    relationship if, for any instance of either
    entity there is at most one instance of the
    other. Very rare in actuality
  • One-to-Many Relationship (1M)
  • Two entities are participants in a 1M
    relationship if, for every instance of the first
    entity there are zero or more instances of the
    second. Vice versa for M1. 1M relationships
    subsume 11 relationships
  • Many-to-Many Relationship (MM)
  • Two entities are participants in a MM
    relationship if, for every instance of either
    entity there are zero or more instances of the
    other

12
Relationships
One-to-One
One-to-Many
Many-to-One
Many-to-Many
13
Synonymous Terms
  • Terminology may be confusing for end-users /
    student analysts
  • Roughly speaking, there are a number of
    synonymous terms which equate to each other,
    depending on the stage of design
  • Files Entities / Objects Tables /
    Relations
  • Records Instances Rows / Tuples
  • Fields Attributes Columns
  • Identifiers Primary Keys

14
E-R Modeling
  • Chen Notation
  • Crows Foot Notation

ENTITY A
ENTITY B
15
One-to-One Relationships
DEPARTMENT
EMPLOYEE
workplace of
dept_name
emp_no
work in
dept_floor
emp_fname
dept_phone
emp_salary
managed by
boss of
  • Where do you put the foreign keys ?

16
One-to-One Relationships
  • This second representation is better

17
One-to-Many Relationships
  • Implemented in relational database model by the
    inclusion of a foreign key at the Many side of
    the relationship

DEGREE
STUDENT
18
Relationship Hierarchies
  • Occur very frequently in practice typical
    example organisational hierarchy
  • Multiple 1M relationships
  • If misused, can lead to inflexible models

DEPART- MENT
DIVISION
SECTION
FIRM
19
Many-to-Many Relationships
  • Relational databases cannot implement MM
    relationships directly
  • Must create a third entity, a so-called
    intersection or associative entity
  • This new entity can have separate attributes of
    its own
  • Example
  • A student attends many courses a course may be
    attended by many students
  • Student Student_ID Name Address DOB
  • Course Course_Code Course_Title ECTS_Units
  • Enrollment Student_ID Course_Code
    Days_Absent

20
Many-to-Many Relationships
BORROWER
BOOK
LOAN
callno
loandate
borrowerid
ISBN
borrowerid

booktitle
callno
duedate

  • Assumptions
  • Need to keep records of loans, even after book is
    returned
  • A borrower will not borrow a book, return it, and
    borrow it again all on the same day. Therefore
    every loan can be uniquely identified by
    callnoloandateborrowerid

21
Recursive Relationships
  • A recursive relationship relates an entity to
    itself
  • 1M
  • The entity gets an additional column for the
    foreign key
  • Need a name different from the primary key

EMP
empno
empfname
empsalary
boss of
employee
22
Recursive Relationships
23
E-R Models Examples
  • In-class worked example
  • Ollscoil na Coiribe is constituted of seven
    Faculties. Each Faculty consists of a number of
    Departments. There are many Lecturers working
    within each of these Departments. A Lecturer may
    teach one or more Courses. A Course may be
    simultaneously attended by Students pursuing
    different Programmes of education.
  • A Student is registered for a Programme within a
    Faculty. Programmes consist of a number of
    Courses. A Course will not normally be available
    unless at least ten students are registered to
    present for examination in the current session.
    Students are required to attend a number of
    obligatory Courses, and may opt for additional
    Courses. All Courses have a unique Course Code
    and are allocated a weighting in Units.
  • It is also necessary to record Venue Reservations
    for the purposes of assembling a timetable. This
    consists of Course Code, Class Group, Day, Time
    and Venue. Two sets of Venue Reservations exist,
    one for each semester.
  • At the end of each semester, there are
    Examinations. Each student is allocated an
    Examination Number and must present for
    examination in all Courses for which he/she has
    registered for examination in that session. There
    are three examination sessions every year
    Semester 1, Semester 2, and Autumn. Examination
    marks are recorded for all students, and are
    maintained on a historical basis.

24
Database Design - 2
25
Entity Types
  • Strong (Independent)
  • Weak (Dependent)
  • Subordinate (Sub-type)
  • Intersection / Associative

26
Strong and Weak Entities
  • An instance of a Strong Entity can exist
    independently
  • An instance of a Weak Entity can only exist if
    there is a related instance of some other entity
  • Example A Degree may be taken by zero or more
    Students A Student must register for a Degree

27
Intersection Entities
  • A decomposition of a single Many-to-Many
    relationship into multiple One-to-Many /
    Many-to-One relationships
  • At the very least, its attributes will consist of
    the combination of the identifiers of the
    entities which participated in the Many-to-Many
    relationship
  • It can have additional attributes of its own
  • Naming intersection entities
  • Concatenate entity names if there is no obvious
    name
  • Example Railway Company - A train stops at many
    stations a station is serviced by many trains

28
Intersection Entities
  • For example, a train leaves Galway at 930AM
    every Monday with final destination Dublin
  • We need to know what stations it stops at
  • but we also need to know what time it departs
    each station, and from what platform (attributes
    of the relationship)

29
Intersection Entities
  • The solution is to create an intersection entity
  • (In this example, dep_stn, arr_stn, and stop_stn
    are all foreign keys linked to stn_code)

30
Subordinate Entities
  • Consider the following example
  • There are 800 employees in an organisation, 600
    of which receive hourly wages, and 200 of which
    receive salaries
  • Of the 200 salaried employees, 50 are field
    service engineers
  • The RSI Number, Taxation Bracket, Name, Address,
    Telephone Extension, Email Address, Department,
    and Job Title of all employees is recorded in a
    database
  • Additionally, for waged employees, the Hourly
    Wage and Hours Worked is recorded
  • For salaried employees, the Weekly Salary is
    recorded
  • For the field service engineers, Company Car
    Registration Number and Mobile Phone are recorded
  • How should this be modeled ? ...

31
Subordinate Entities
EMPLOYEE RSI_Number Taxation_Bracket Name Addres
s Telephone_Ext Email Dept Job_Title Hourly_Rate H
rs_Worked Weekly_Salary Car_Reg_Num Mobile_Phone_N
um
  • Possibility 1
  • Model it as a single entity
  • Problem this is inefficient, wasteful, and
    confusing

32
Subordinate Entities
EMPLOYEE RSI_Number Taxation_Bracket Name Address
Telephone_Ext Email Dept Job_Title WAGED_EMP Hou
rly_Rate Hrs_Worked SALARIED_EMP Weekly_Salary F
IELD_SVC_ENG Car_Reg_Num Mobile_Phone_Num
  • Possibility 2
  • Model it using subordinate entities

33
Subordinate Entities
WAGED_EMP RSI_Number Hourly_Rate Hrs_Worked
EMPLOYEE RSI_Number Taxation_Bracket Name Addres
s Telephone_Ext Email Dept Job_Title
SALARIED_EMP RSI_Number Weekly_Salary
  • Possibility 3
  • use One-to-One relationships
  • may be exclusive either-or relationships
  • may be mandatory-to-mandatory or
    optional-to-mandatory

FIELD_SVC_ENG RSI_Number Car_Reg_Num Mobile_Phon
e_Num
34
What is a Good Data Model ?
  • Basic construction rules should be obeyed
  • There should be no ambiguity
  • All entities, attributes, relationships, and
    identifiers are defined and labeled
  • Names are meaningful to the client and to the
    developer
  • Beware of homonyms(same word/different meanings)
    and synonyms (different words/same meaning)
  • The model should be faithful to reality
  • Accurately describes the world it is supposed to
    represent
  • Relationships are of the correct degree
    (cardinality)
  • Data model is complete, understandable, precise,
    and at the right level of detail ?

35
What is a Good Data Model ?
  • The model should be as simple as possible
  • There should be no redundant or duplicated data
  • The model should be flexible and adaptable to
    future needs
  • Have all exceptions been considered ?

36
Normalisation
  • Normalisation is a technique for simplifying the
    design of a relational database so that it
    achieves the optimum structure with minimal
    redundancy
  • The normal forms are a linear progression of
    rules applied to database design, with each
    higher normal form achieving a better, more
    efficient design
  • The normal forms are
  • Un-normalised
  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)
  • In this course, we shall concern ourselves only
    with progression to 3NF

37
Relationships of Attributes
  • one-to-one
  • a value of an attribute determines the value of
    another attribute and vice versa that is, A ? B
    and B ? A
  • for example
  • IRL ? Republic of Ireland, Republic of Ireland
    ? IRL
  • Only occurs when both attributes are unique
  • one-to-many
  • a value of one attribute determines the value of
    another attribute but not vice versa that is, A
    ? B but not B ? A
  • for example
  • United States of America ? Dollar
  • Dollar not ? United States of America (could be
    Canada ? Hong Kong ? Australia ? Bahamas ? New
    Zealand ? Singapore ? etc..)

38
Relationships of Attributes
  • many-to-many
  • neither attribute determines the other
  • A not ? B
  • B not ? A
  • for example
  • country name not ? language (in Ireland, official
    languages are Irish and English are spoken)
  • language not ? country name (English is an
    official language of Ireland, United Kingdom,
    United States of America, etc.)

39
Functional Dependencies
  • A functional dependency is a relationship between
    attributes in a relation, whereby an attribute
    (or combination of attributes) determine the
    value of another attribute
  • Attribute B is said to be functionally dependent
    on attribute A if, for every unique value of A in
    the database, there exists no more than one
    correlated value of B
  • Otherwise put, attribute A is said to be the
    determinant of attribute B, A ? B
  • If the determinant is a single-attribute, those
    attributes which depend on it are said to be
    fully functionally dependent
  • If the determinant is a combination of attributes
    (for example, a composite identifier), an
    attribute is not fully functionally dependent
    upon it only a part of the combination is
    required to determine it. This is also called a
    partial dependency

40
First Normal Form (1NF)
  • For a relation (table) to be in 1NF, it must
    satisfy the following conditions
  • All rows must have the same number of columns
  • All values are atomic otherwise put, there are
    no multi-valued attributes / repeating groups
    (at most one value for each row-column
    intersection)
  • There can be no duplicate rows that is, every
    row must have a unique primary key (identifier)
  • Every non-key attribute must be functionally
    dependent upon the primary key
  • The rows and columns are not necessarily in any
    particular order

41
Multi-valued Attributes
  • What really is a multi-valued attribute ?
  • For example, an Items attribute of a Sales
    Invoice which contains values such as 50 Exhaust
    Pipes _at_ 120/each, 80 Hubcaps _at_ 30/each, 60
    Towbars _at_ 55/each, 90 Wing Mirrors _at_ 45/each,
    ... is multi-valued (a repeating group)
  • but consider a typical address, such as
    Milltown Road, Tuam, County Galway, Ireland
  • Should this be stored as a single Address
    attribute, or as Street Address, Town, County,
    Region, Country, Postcode ?
  • If there is a requirement to access parts of the
    address separately, represent it as separate
    single-valued attributes
  • Examples a mail merge, or demographic analysis
    such as How many customers do we have in each
    country ?
  • If there is no such current or foreseeable
    requirement, a single Address attribute is
    acceptable

42
Multi-valued Attributes
  • How are multi-valued attributes / repeating
    groups resolved ?
  • Consider a Students Exam Result Sheet with
    following structure
  • Student ID
  • Student_Name
  • Student_Address
  • Module_Code Module_Description Mark
  • Average
  • Grade
  • Transform to 1NF by placing repeating groups into
    new relation
  • STUDENT Student ID Student_Name
    Student_Address
  • Average Grade
  • EXAM_RESULT Student ID Module_Code

  • Module_Description Mark

43
1NF Anomalies
  • 1NF is inefficient as it can give rise to a
    variety of possible anomalies
  • Insertion Anomaly
  • bear in mind the rule of entity integrity it
    is not allowable for any part of the primary key
    of a record to be null
  • in the case of composite identifiers, it is
    insufficient to have a value in one of the
    component columns but no value in the other(s)
  • insertion anomalies arise whereby data cannot be
    inserted because a complete primary key is not
    available
  • Deletion Anomaly
  • this is the inverse of the insertion anomaly
    data that should be kept must be deleted because
    part of the primary key has become null

44
1NF Anomalies
  • Update Anomaly
  • this arises because of redundancy (duplicated
    data) the same data may be stored in two or more
    records
  • if that data is modified, all occurrences of it
    must be modified, otherwise integrity is breached
    and data is unsynchronised / contradictory

45
1NF Anomalies
  • Primary Key Student_ID Module_Code
  • Insertion Anomaly cannot create new Module
    unless a Student has registered to take an exam
    in that Module
  • Deletion Anomaly If there were a Module for
    which only 1 student was registered, and if we
    were to delete that record, we would
    inadvertently delete details of the Module
  • Update Anomaly what if the Module_Name
    corresponding to an Module_Code were to change ?
    Many records might be affected all would have to
    be altered (duplication / redundancy)

46
Second Normal Form (2NF)
  • The next step of data analysis is to transform
    relations into 2NF.
  • Relations must already satisfy rules of 1NF
  • 2NF looks for an anomaly called a partial
    dependency, meaning an attribute(s) whose value
    is determined by only part of the primary key
  • all non-key attributes must be fully functionally
    dependent on the primary key
  • Relations that have a single-column primary key
    are already in 2NF
  • Only those relations that have a composite
    (concatenated) primary key need to be checked

47
Second Normal Form (2NF)
  • To transform a relation from 1NF to 2NF,
  • identify all functional dependencies in 1NF
  • Make each determinant the primary key of a new
    relation
  • Place all attributes that are functionally
    dependent on a given determinant alongside that
    determinant as non-key attributes in the
    newly-created relation of which that determinant
    is the primary key
  • ????Confused???? see worked examples to follow
    ...

48
2NF Example
  • The above relation is in 1NF. What are the
    functional dependencies ?
  • Student_ID Module_Code ? Mark
  • Module_Code ? Module_Name
  • In 2NF, we therefore have two relations
  • EXAM_RESULT Student_ID Module_Code Mark
  • MODULE Module_Code Module_Name

49
Third Normal Form (3NF)
  • Entities are assumed to be in 2NF before
    beginning 3NF analysis.
  • 3NF analysis looks for two types of problems,
    derived data and transitive dependencies.
  • In both cases, the fundamental error is that
    non-key attributes are dependent on other non-key
    attributes
  • Derived attributes are those whose values can
    either be calculated from other attributes, or
    derived through logic from the values of other
    attributes
  • A transitive dependency exists when a non-key
    attribute is dependent on another non-key
    attribute (other than by derivation)
  • This error usually indicates that an undiscovered
    relation is still embedded within the problem
    relation

50
3NF Example
  • Consider the following scenario
  • The Health Safety Officer of an organisation of
    300 employees is scheduling an obligatory 1-day
    Seminar in Ergonomics. The only available venue
    is capable of holding 50 employees, hence 6
    different sessions are necessary. An employee
    will only have to attend one of these sessions.
  • The H S Officer has created a simple database
    to help her make arrangements. It contains a
    SEMINAR table with attributes Employee_ID
    Session_Num Date Time Venue
  • What is wrong with this structure ?
  • it gives rise to insertion, update, and delete
    anomalies in much the same way as 1NF

51
3NF Example
  • continued
  • SEMINAR Employee_ID Session_Num Date Time
    Venue
  • Insertion Anomaly details of a session cannot be
    recorded until at least one employee is assigned
    to that session
  • Deletion Anomaly if there were only a single
    employee assigned so far to a session, and if
    that record were to be deleted (say, if the
    employee left the company), then data pertaining
    to the session would be inadvertently lost
  • Update Anomaly if 20 employees were assigned to
    a session, and if that session needed to be
    rescheduled, all 20 records would have to be
    updated, else integrity would be breached
  • Problem arises because of transitive dependency
  • Employee_ID ? Session_Num ? Date, Time, Venue

52
3NF Example
  • continued
  • SEMINAR Employee_ID Session_Num Date Time
    Venue
  • Resolution split SEMINAR into two relations
  • SESSION Session_Num Date Time Venue
  • EMP_ASSIGNMENT Employee_ID Session_Num
  • (Note In SESSION table, combination of Date
    Time Venue is unique because otherwise you
    could have a double-booking ! Session_Num is
    chosen as primary key in preference to this
    combination because of the principle of
    minimality i.e. least number of columns)

53
Third Normal Form (3NF)
  • An entity is in third normal form if every
    non-key attribute is dependent on the primary
    key, the whole primary key, and nothing but the
    primary key

54
Example Invoice
55
Example Invoice
  • Assumptions
  • The price of an item is separately agreed for
    each invoice at the time of the sale
  • VAT rates fluctuate over time, and different
    rates may apply to different products
  • Take a short-cut by ignoring calculated fields
    (Line Total, Invoice Total)
  • First Normal Form
  • INVOICE Invoice_Num Date Cust_Num
    Cust_Name Cust_Address
  • INVOICE_LINE Invoice_Num Item_Code
    Item_Description Qty Unit_Price VAT_Rate

56
Example Invoice
  • Second Normal Form
  • No change to INVOICE relation as it has a
    single-column primary key
  • In 1NF, INVOICE_LINE Invoice_Num Item_Code
    Item_Description Qty Unit_Price VAT_Rate
  • Item Code ? Item Description
  • Invoice Num Item Code ? Qty, Unit_Price,
    VAT_Rate
  • We therefore end up with three relations in 2NF
  • INVOICE Invoice_Num Date Cust_Num
    Cust_Name Cust_Address
  • INVOICE_LINE Invoice Num Item Code Qty
    Unit_Price VAT_Rate
  • INVENTORY Item Code Item Description

57
Example Invoice
  • Third Normal Form
  • There is a transitive dependency in INVOICE
  • Invoice Num ? Cust Num ? Cust Name, Address
  • This is resolved by splitting INVOICE relation
  • INVOICE Invoice_Num Date Cust_Num
  • CUSTOMER Cust_Num Cust_Name Cust_Address

58
Physical Database Design
  • After logical design comes physical design, which
    is concerned with implementation aspects such as
  • selection of data types
  • distribution / partition of data
  • capacity planning
  • database workloads / optimisation of queries
  • access methods and indexing
  • security checks
  • fine tuning of performance
Write a Comment
User Comments (0)
About PowerShow.com