Relational Database M 358 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Relational Database M 358

Description:

... on B and B is functionally dependent on A, then C is ... [offending FD: in a transitive dependency A B and B C implying A C, B C is the offending FD] ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 35
Provided by: Ange325
Category:

less

Transcript and Presenter's Notes

Title: Relational Database M 358


1
Relational Database (M 358)
  • Handout 7
  • Tang Kwok Wa, Billy

2
Block 4 Database Development
This block concerns
Establish Requirement
data requirements
Data analysis (ER Modeling, Normalization)
Conceptual data model
Database design
Logical schema
Implementation
Schema and database
3
Establishing Data Requirements
  • Data requirements is normally a document written
    in natural language.
  • Documenting data requirements should
  • enable the identification of subjects
  • enable the recognition of relevant connections
    between subjects together with the properties of
    the relations
  • Problems of ambiguity in data requirements

4
Data Analysis Process - Entity-Relation Modeling
  • Entity-Relation Modeling
  • Model is for communication between people, but
    not for computers.
  • Not influenced by design issues. The following
    issues are not relevant
  • data format (e.g. length of strings)
  • data volume (e.g. how many occurrences of
    students)
  • data usage (what processing required)
  • data access (how data may be retrieved
    efficiently)
  • Good ER model should
  • represent all users data requirements -
    completeness
  • have no duplication - integrity
  • include all constraints - integrity
  • be general (not restricted by the particular
    initial requirements) - flexibility
  • understandable (simple, with no unnecessary
    complexity) - usability

5
Data Analysis Process - Entity-Relation Modeling
  • Entity Types
  • Represents things having a meaning in a given
    context and with a need to record data.
  • Defines the properties common to a collection of
    entities - e.g. Student
  • How to look for the entity type? In the
    University data requirement, there are words like
    information about each student is initially
    recorded at registration. So one would think
    each student would be represented by an entity
    with some information to be recorded.
  • Student is the subject being modeled. To identify
    subjects
  • exclude terms that only refer to data about those
    subjects, such as course code or identification
    number.
  • exclude nouns that refer to processes, such as
    registration which refers to the process of
    initially recording a students details.

6
Data Analysis Process - Entity-Relation Modeling
  • Relationships
  • Relationship name
  • Assign the relationship name in one to many
    direction
  • E.g
  • staff - counsels - student, not student -
    IsCounselled By - staff.
  • Remember
  • In E-R model, the relationship counsel is simply
    represented by the line connected between Staff
    and Student.
  • The attribute counsellor_no, being a foreign key
    in Student referencing to Staff for the Counsel
    relationship in Relational model, is not an
    attribute in the Student entity type of the E-R
    model

Student(student_id, name, registration, region)
Staff (staff_no, name, region)
7
Data Analysis Process - Entity-Relation Modeling
  • Relationships
  • mn relationship
  • mn relationship is decomposed into two 1n
    relationships with an entity type introduced in
    between them, unless there is absolutely no
    additional data to be recorded by the
    relationship.
  • the participation conditions of the introduced
    entity type in the decomposition are mandatory.
    It is meaningless for any occurrence exists in
    this entity type without relationship to both.
  • identifiers from two related entity types are
    combined to give a composite identifier for the
    intersection entity type
  • Inclusive and exclusive relationships
  • two relationships are inclusive if it is required
    that every occurrence of the common entity type
    participates either in both relationships or in
    neither
  • two relationships are exclusive if it is required
    that no occurrence of common entity type
    participates in both relationships.

8
Data Analysis Process - Entity-Relation Modeling
  • Attributes
  • According to data requirements, one may find
    specification of data directly associated with
    the subject. E.g. each student would be recorded
    with his/her name, then student name will be one
    of the attributes for the student entity type.
  • Attributes should be assigned to avoid duplicated
    data
  • E.g. Each student should be given his
    counsellors telephone number for communication
  • if counsellor telephone numbers are recorded in
    Student entity type, forming
  • student(student_id, counsellor_telno)
  • then, counsellor_telno will have repeated values
    in this student entity type since a counsellor
    would counsel many students. Thus
    counsellor_telno should be recorded in staff
    entity type.

9
Data Analysis Process - Entity-Relation Modeling
  • Attributes
  • Complex values
  • A staff has more than one qualifications If we
    just introduce a Qualification attribute in Staff
    entity to cater for this requirement, then a
    staff will be multivalued when the staff has more
    than one qualification such as BSc, MSc and PhD
  • Resolved by 1n relationship.
  • Derived data
  • having values based on some processing deriving
    from other data
  • should not be included in an E-R model because it
    is a kind of redundancy
  • occasionally exists in a data model, either
    because its derivation is difficult to infer, or
    because it is important for the data to be made
    explicit. (should include a constraint to
    prevent the inconsistency that may result from
    duplication.)

Obtained
Staff
Qualifications
Qualifications(StudentId, Degree) (5324,
Bsc) (5324, MSc) (5324, PhD)
10
Normalization
  • A process to assess and improve the quality of
    relational data model by identifying duplication
    in a table and reorganizing the data into two or
    more tables, without duplication or loss of
    information.
  • Used in data analysis stage during the
    development of database.
  • Problem of a table not in normalized form
  • Amending a value may cause changes in many tuples
    ( in order to maintain consistency of the data)
  • If GPId g1 changes the name from Leila to John,
    then rows 1, 2, 3, 9, 10 have to be changed

11
Normalization
  • Single-Valued Fact (SVF)
  • is a statement of expressing the meaning of
    attribute in a relation, it associates one
    attribute with exactly one another attribute.
  • E.g. a general practitioner has a name
  • It is interpreted as a general practitioner
    identified by general
  • practitioner identifier has (exactly) a name
  • Having known such SVF, we know that in a
    relation GeneralPractitioner,
  • GPID as the primary key determines exactly one
    non-primary key attribute GPname
  • Every non-primary-key attribute in a relation
    (e.g. GPName) becomes a single-valued fact about
    the primary key (e.g. GPID).
  • Properties of single-valued fact
  • Occurrences of a SVF will be recorded just once.
    4 occurrences in the above example
  • The first occurrence of this SVF is the
    general practitioner identified by g1 has a name
    Leila
  • The second occurrence of this SVF is the
    general practitioner identified by g2 has a name
    Ftizdare..
  • It is not reversible. A general practitioner has
    one name - it is not true that a name is
    associated with just one general practitioner.

relation GeneralPractitioner
GPId GPName g1 Leila g2 Fitzdare g3 von B g4 von B
12
Normalization
  • Functional Dependencies
  • Functional dependency is - Within a relation R,
    attribute B is functionally dependent on
    attribute A if and only if each value of A
    determines precisely one value of B
  • same as what single-value fact expresses.
  • SVF a general practitioner identified by general
    practitioner identifier (GPID) has a name can be
    expressed in a short form as functional
    dependency using the notation below
  • GPId ? GPName
  • (it means that GPId determines precisely one
    value of GPName)
  • GPId is said to be a determinant
  • Given a relation R, B is fully functionally
    dependent on A if it is functionally dependent on
    A and not functionally dependent on any subset of
    A
  • E.g Appointment(PatientId, ApptDate, ApptTime,
    ConsId)
  • PatientId, ApptDate, ApptTime ? ConsId (not
    full functional dependency)
  • PatientId, ApptDate ? ConsId (full
    functional dependency)

13
Normalization Process
  • Normalization process
  • first normal form - gt second normal form - gt
    third normal form - gt Boyce-Codd Normal Form
  • Starting point of normalization process
  • Given a relation in first normal form
  • Given a collection of Single Value Facts (or
    Functional Dependencies)

ExtendedAppointment (PatientId, PatientName,
GPId, GPName, ApptDate, ApptTime, ConsId,
ConsName, HospNo, HospName)
14
Normalization Process
  • First Normal Form (1NF)
  • A relation is in 1NF if and only if every
    non-primary key attribute is functionally
    dependent (not necessarily fully functionally
    dependent) on the primary key.
  • Any relation is in 1NF, since a primary value
    will identify a tuple and thus determine exactly
    one value of each of the non-primary attributes.
  • ExtendedAppointment is in 1NF because it is a
    relation.

15
Normalization Process
  • Second Normal Form (2NF)
  • A relation is in 2NF if and only if it is in 1NF
    and every non-primary key attribute is fully
    functionally dependent on the primary key.
  • Need to know all the FDs holds in the relation.
  • Identify additional determinant from the given
    relation. (FD10 -FD17)
  • Add Transitivity FDs
  • if PatientId ? GPId and GPId ? GPName then
    PatientId ? GPName also holds
  • Add Augmentation FDs
  • if A ? B holds, then A,Z ? B, Z also holds.
  • E.g. PatientId, ApptDate ? ConsId (FD 5)
  • ? PatientId, ApptDate ? ConsId, ApptDate
  • ConsId, ApptDate ? HospNo (FD 9)
  • therefore PatientId, ApptDate ? HospNo (FD
    15)

16
Normalization Process
  • Whether a relation is not in 2NF
  • Examine whether non-primary key attributes are
    fully functionally dependent on primary key
  • In ExtendedAppointment, FD2, FD3 and FD10,
    PatientId ? PatientName, PatientId ? GPId,
    PatientId ? GPName. PatientId is only subset of
    primary key (which is PatientId, ApptDate).
    Therefore ExtendedAppointment is not in 2NF.
  • Putting a relation into 2NF
  • ExtendedAppointment (PatientId, PatientName,
    GPId, GPName, ApptDate, ApptTime, ConsId,
    ConsName, HospNo, HospName)
  • Separate the original relation into two
  • Put the subset of primary key together with its
    determined attributes into a separate relation.
  • Patient2(PatientId , PatientName, GPId, GPName)
  • Put the primary key together with those
    attributes not determined by above into another
    relation
  • PatientAppointment2 (PatientId, ApptDate,
    ApptTime, ConsId, ConsName, HospNo, HospName)

17
Normalization Process
1NF
FD 2, 3, 10 PatientId ? PatientName PatientId
? GPId PatientId ? GPName
2NF
PatientAppointment2 (PatientId, ApptDate,
ApptTime, ConsId, ConsName, HospNo, HospName)
PatientA2 (PatientId, PatientName, GPId, GPName)
Producing 1n relationship through decomposition
of original relation, with the common names
(PatientId) as the foreign key
18
Normalization Process
  • Third Normal Form (3NF)
  • A relation is in 3NF if and only if it is in 2NF
    and no non-primary key attribute is transitively
    dependent on the primary key.
  • Transitive dependency is
  • Within a relation, where A, B and C are each an
    attribute or combination of attributes with no
    attributes in common, if C is functionally
    dependent on B and B is functionally dependent on
    A, then C is transitively dependent on A via B,
    provided that A is not functionally dependent on
    B or C.
  • Putting relations into 3NF
  • to move the attributes of the offending FDs, into
    separate relation (or relations if determinant
    differ)
  • offending FD in a transitive dependency A ? B
    and B ? C implying A ?C, B ?C is the offending
    FD
  • In PatientA2 (PatientId, PatientName, GPId,
    GPName), GPName is transitive dependent of
    PatientId because of PatientId ? GPId and GPId
    ? GPName
  • thus GPId, GPName should be moved to a new
    relation (see in the following page)
  • remove the right-hand side attributes from
    original relation
  • create new relation for each different
    determinant that occurs in the offending FDs.
  • primary key of the new relation(s) is the
    determinant of the offending FDs used to
    construct it
  • primary key of the relation from which the
    attributes have been removed is unchanged

19
Normalization Process
2NF
PatientA2
PatientAppointment2
3NF
PatientA3

FD7,8 ConsId ? ConsName HospNo ?
HospName
ConsultantA3
PatientAppointmentA3

HospitalA3
GeneralPractitionerA3
20
Normalization Process
  • Putting Third Normal Form to Boyce-Codd Normal
    Form (BCNF)
  • A relation is in BCNF if and only if every
    determinant is a candidate key.
  • So, in BCNF, all the non-primary attributes can
    only be determined from the candidate keys
  • See if any 3NF in which a determinant is not
    candidate key. If there is such determinant,
    open new relation for it.
  • Whether a 3NF is BCNF, checking by
  • only one candidate key (that is, there are no
    alternate keys)
  • or if more than one candidate key, then
  • the candidate keys are not combinations of
    attributes
  • or if the candidate keys are combination of
    attributes, then
  • the candidate keys do not overlap

21
Normalization Process
ConsId, ApptDate ? HospNo in
PatientAppointmentA3 (FD 9), but ConsId,
ApptDate cannot determine ApptTime and ConsId,
ApptDate cannot determine PatientID, thus
ConsId, ApptDate is not a candidate key in
PatientAppointmentA3
PatientAppointmentA3
ConsultantA3
PatientA3
3NF
HospitalA3
GeneralPractitionerA3

PatientAppointmentBCNF
General PractitionerA3
ConsultantVisitBCNF
HospitalA3
ConsultantA3
ConsultantVisitBCNF
PatientA3
PatientAppointmentBCNF
22
Normalization Process
  • Limitation of Normalization
  • it only produce relations of 1n relationships,
    results in no redundancy
  • relations of 11 relationship cannot be produced
    by normalization.
  • no participation conditions are considered
  • no constraint conditions are considered.

23
Database Design
Use a relational conceptual data model to give a
set of tables for initial database design
Do the tables represent the data in an acceptable
way according to the given criteria for
usability, efficiency, and so on?
Yes
No
  • Define constraints of each table
  • columns
  • primary key
  • foreign keys
  • constraints

Revise table
Revise if necessary
Implementation
24
Defining Column in SQL
  • Numeric data type
  • Range
  • Integer 2147483647
  • Smallint 32767
  • Decimal numbers
  • precision number of decimal digits
  • scale number of digits after the decimal point
  • E.g cast( input_no as decimal(5,2)) as output_no
  • if input_no 999.123456 output_no 999.12
  • Metadata (data about data) - any data that is
    required to interpret other data as meaningful
    information.
  • E.g. A numeric column Amount showing x.xx
    would mean HK x.xx K. The actual meaning of
    Amount would be interpreted as in scaling
    factor of HK000 (metadata)

25
Defining Column in SQL
  • Character data type
  • VARCHAR (variable length)
  • CHAR (fixed length, adding spaces so that string
    of the specified length is stored)
  • (Infomaker only supports VARCHAR. CHAR is
    implemented as VARCHAR)
  • Date-time data types
  • DATE
  • TIME
  • TIMESTAMP (combination of date and time)
  • Default values
  • Real values as default e.g. default is current
    year., or
  • Non real value as default, e.g. 9999 as default
    for integer column. May cause error in queries,
    e.g. avg()

26
Defining Column in SQL
  • Using Codes
  • e.g. HK representing Hong Kong
  • Advantage less space, less typing mistakes
  • Disadvantage Not immediately understandable
  • (may need to link with a full name table)
  • SQL domains not in same way of relational
    domains
  • comparison of two SQL columns requires only that
    they are comparable data type, but not same
    domain.
  • SQL foreign keys and operations are not
    constrained by domains as they are for relations.

27
Defining Table in SQL
  • Defining primary key
  • Choose primary key according to uniqueness,
    minimality and not being null.
  • Using surrogate - a field with distinguish values
    for primary key. It has no meaning at all, just
    generate by a DBMS to distinguish different row.
    E.g. a numeric column as DEFAULT AUTOINCREMENT
    will automatically generate a new and unique
    value for the column when a new row is added.
  • Defining constraints in SQL
  • primary key - expressed as PRIMARY KEY
  • alternate key - expressed as UNIQUE
  • referential constraints FOREIGN KEY
  • mandatory participation condition - NOT NULL for
    foreign keys, or using CHECK clause
  • Constraints part of the model - expressed as a
    CHECK clause
  • Considerations in using CHECK
  • Inefficient processing if CHECK statement is
    complicate.
  • Every update involves processing a CHECK, causing
    unacceptable delay.

28
Revising Tables
  • Avoiding Null
  • E.g. Student( StudentId, Name, Telephone_no)
  • Because some students might not have telephone,
    telephone_no column will be null in some rows.
  • To avoid null, can use two tables
  • Student( StudentId, Name) Telephone(StudentId,
    Telephone_no)
  • However, query might become complex as it will
    then involve join.
  • Denormalization
  • Deliberately combining normalized tables into
    one, in order to achieve efficient data retrieval
    and simple query.
  • Beware duplicated data after denormalization.

29
Additional Data
  • Derived data - values that are obtained by
    processing other data.
  • To maintain exact consistency of derived data, it
    should be updated every time the data used in the
    derivation is changed (can be done by trigger)
  • Can also be done by periodic updating of
    student_count by a separate application process
    (sacrifice some data consistency but relief
    processing burden)
  • Defining Extra Tables because of
  • snapshot data (e.g. to make an annual report)
  • summary data for users
  • historical record (for auditing purpose)

30
Database Implementation
  • Indexes - fast data access method to support
    efficient retrieval of data.
  • It involves a mechanism that associates a value
    in a column of a table with the stored locations
    of all the rows which contain that value.
  • When a query is executed in which an indexed
    column is referenced in the WHERE clause, it can
    use the index to provide the stored locations of
    the particular rows that satisfy the WHERE
    condition, thus enable direct access to the
    required data.
  • Need extra space for index file
  • Index must be updated every time when there is
    change in the values in the indexed column
  • Need to balance against the possible benefits of
    an index in deciding whether one is appropriate
    for a given column(s)

31
Database Implementation
  • Establishing a database
  • Ways of populating the tables of a new database
  • by import of old tables
  • prepares an application program that reads in the
    old data, transforms it to appropriate format and
    then inserts it into the new table.
  • Backup (for data recovery when necessary)
  • continuous copying of all changes to a database
    in a log file
  • periodic unloading of database to make a backup
    copy

32
Logical Processing Sequential Processing
(quicker)
Find the best plan for executing SQL statements
FROM get the required table store as the 1st
intermediate table
FROM get a row of the required table
next row
WHERE get the 1st intermediate table retain the
rows that satisfy the where clause store as 2nd
int. table
WHERE retain the rows that satisfy the where
clause
SELECT get the 2nd intermediate table retain the
columns required produce the final table
SELECT retain the columns required
1 disk access
5 disk accesses
33
Find the best plan for executing SQL statements
Using Index
  • When will index become ineffective?
  • For small table, single disk access would
    retrieve the whole table. Index will not help
    much
  • With many rows spreading in each blocks, most of
    the blocks are likely to be accessed, so using an
    index to identify the required blocks may be
    slower than simply retrieving every block for
    sequential plan of processing of the whole table.
  • Any conditions in the WHERE clause not involving
    the indexed column would need to be processed
    separately

With an index on tutor_no
FROM WHERE get a row that satisfies the WHERE
clause
next row
SELECT retain the columns required
34
THE END
Write a Comment
User Comments (0)
About PowerShow.com