Title: Functional Dependencies FDs and
1Lecture 23 (11/08/05)
- Functional Dependencies (FDs) and
- Normalization
2Questionnaire
- 18/20 answered
- Most are pleased with the course so far ?
- Learned so far
- 7 Fair Amount, 10 Much, 1 Great Deal
- Met Expectations
- 1 Not very well, 7 Adequately, 8 Well, 2 Very
Well - Positives shared by most
- Class interactivity
- Fast feedback
- Slides
- Deadlines/Assignments posted early on website
3Questionnaire
- Concerns (comments)
- Exams
- Long (will shorten)
- Matching is too difficulty (none anymore)
- Better idea of what to expect before exam
- More daily tasks
- Smaller but more frequent assignments
- Read Chapter
- MySQL (Not the best DBMS---Limited (all are))
- Simple so that you focus on learning SQL not
overwhelmed with functionality/GUI - Oracle, Postgres
- Open source ? available for you to download
- Easy setup JDBC
- Most free relational datasets online use MySQL
- Large support community good support website
4Questionnaire
- Stop calling on people by name to answer
questions - More interactivity/participation ( attendance ?
5 of score) - NOT INTENDED TO TEST YOUR KNOWLEDGE OR PUT YOU ON
THE SPOT - Select when no one raises hand
- Pass if you dont feel like answering
- (better if you raise your hand and answer wrong
than me calling on you and answer right!) - In-class group activities
- Balance both
- Cant cover as much/more time
- Text book --- dry, long
5Assignment 3
- (7) List the transaction number and the total
value of every transaction amounting to more than
200 - SELECT T. Trans_Number, SUM(P.Number_OrderedP.Quo
ted_Price) AS ValFROM Trans T, Trans_Part P
WHERE T.Trans_Number P.Trans_NumberGROUP BY
T.Trans_NumberHaving Valgt200 - (9) Find the customer number, last name, and
first name for every customer who did not make a
transaction on September 5, 2002
6Assignment 3
- SELECT Customer_number, Last, FirstFROM CUSTOMER
CWHERE NOT EXISTS (SELECT FROM TRAN T
WHERE C.Customer_number T.Customer_number and
T.Trans_Date Like 09/05/200) - Use except?
- Is there any for the QUOTED_PRICE column in table
TRANS_PART or is it a repetition of the
UNIT_PRICE column in table PART?
7The Database Design Process
- Conceptual design
- Use a data model language to come up with an
accurate, high-level description of the system
requirements - Words (unstructured) ? Diagrams
- Logical design
- Map the resulting EERD into a set of relations
- Diagram ? Relations
- Physical design
- Use DDL on some DBMS to create tables
corresponding to your relations
8The Database Design Process
- Limitations of E-R Designs
- The EER model provides a set of guidelines
- Does not result in a unique database schema
- Does not provide a formal way of evaluating
alternatives - Relies largely on the common sense of the
designer - Here we try to answer
- What are the criteria for "good" base relations?Â
- Meaningful grouping of attributes
- When designing a relation schema, how to decide
which attributes to include? - So far, attributes are grouped to form the
relation schema by using the common sense of the
database designer
9The Database Design Process
- First discuss informal guidelines for good
relational design - Then we discuss formal concepts of functional
dependencies and normal forms - - 1NF (First Normal Form)
- - 2NF (Second Normal Form)
- - 3NF (Third Normal Form)
- - BCNF (Boyce-Codd Normal Form) (Hopefully)
- Additional types of dependencies, further normal
forms, relational design algorithms by synthesis
are discussed in Chapter 11
10Relation Schema Informal Measures
- We have some informal measures
- Semantics of the attributes
- Reducing the redundancy values in tuples
- Disallowing the possibility of generating
spurious tuples - Reducing null values
- Not always independent of one another
11Semantics of the Relation Attributes (1)
- Any grouping of attributes to form a relation
schema must portray a certain real-world meaning - Each tuple in a relation should represent one
entity or relationship instance - Guideline 1 Design a relation schema so that it
is easy to explain its meaning - Semantics of attributes should be easy to
interpret - Attributes of different entities should not be
mixed - Only foreign keys should be used to refer to
other entities
12Relations that violate Guideline 1 by intermixing
attributes from different relations
13Relation schemas that abide by Guideline 1
14Redundant Information (2)
- One goal of schema design is to reduce redundancy
- Information is stored redundantly wasting storage
- Problems with update anomalies
- Modification anomalies
- Insertion anomalies
- Deletion anomalies
- Mixing attributes of multiple entities may cause
the above problems
15- Update anomalies
- Modification Anomalies
- Update PNAME from ProductY to
Customer-Accounting - Insert Anomalies
- Insert a new employee not assigned to known
project - Insert a new project with no working employees
- Delete Project
- Delete PNUMBER2
- Delete the sole employee of a project
-
16Modification Anomalies
- Consider the relation
- EMP_PROJ ( Emp, Proj, No_hours, Ename, Pname,
Plocation) - Modification Anomaly
- Changing the name of project number P2 from
Project Y to Customer-Accounting - May cause this update to be made for all
employees working on project P2 otherwise the DB
will become inconsistent
17Modification Anomalies
- Consider the relation
- EMP_PROJ( Emp, Proj, No_hours, Ename, Pname,
Plocation) - Insert Anomaly Cannot insert a project unless
an employee is assigned to - Inversely - Cannot insert an employee unless
he/she is assigned to a project. - Delete Anomaly When a project is deleted ?
delete all the employees who work on the project - Alternately, if an employee is the sole employee
on a project, deleting that employee would result
in deleting the corresponding project
18Guideline to Redundant Information in Tuples and
Update Anomalies
- Guideline 2 Design a schema that does not suffer
from the insertion, deletion and update anomalies
- If there are any present, then note them so that
applications can be made to take them into
account - Might need to break the guidelines to improve
performance for certain queries - Imagine that we always access employee
information only with department information - ? The design EMP_PROJ (Emp, Proj, No_hours,
Ename, Pname, Plocation) might be could for such
cases
19Example
ER Model
SSN Name Address
Hobby 1111 Joe 123 Main biking,
hiking
Relational Model (SSN, Hobby, Name, Address)
20Example
- Redundancy leads to anomalies
- A change in Address must be made in several
places - Suppose a person gives up all hobbies. Do we
- Set Hobby attribute to null? No, since Hobby is
part of key - Delete the entire row? No, since we lose other
information in the row - No hobby information?
- Hobby value must be supplied for any inserted row
since Hobby is part of key
21Decomposition
- Solution use two relations to store Person
information - Person1 (SSN, Name, Address)
- Hobbies (SSN, Hobby)
- People with/without hobbies can now be described
- No update anomalies
- Name and address stored once
- A hobby can be separately supplied or deleted
22Spurious Tuples (3)
- Bad designs for a relational database (or bad
decompositions) may result in erroneous results
for certain JOIN operations - Any decomposition MUST have the "lossless join"
property - No spurious tuples should be generated by doing a
natural-join of any decomposed relations - Person1 (SSN, Name, Address)
- Hobbies (SSN, Name)
- Here, loss relates to loss of information
23Spurious Tuples (3)
- Suppose we replace
- EMP_PROJ ( SSN, PNUMBER, ENAME, PNAME, PLOCATION,
HOURS) by - EMP_PROJ1(SSN, PNUMBER, PNAME, PLOCATION, HOURS)
ANDEMP_LOCS (ENAME, PLOCATION) - Guideline 3 The relations should be designed to
satisfy the lossless join condition - Avoid relations that contain matching attributes
that are not (foreign key, primary key)
combinations
24Null Values in Tuples (4)
- Guideline 4 Relations should be designed such
that their tuples will have as few NULL values as
possible - Make sure only NULLs are exceptional cases
- If many attributes do not apply to all tuples in
the relation, we end up with many nulls - Waste space
- Ambiguity in meaning
- Attribute not applicable or invalid
- Value known to exist, but unavailable
- Attribute value unknown (may or may not exist)
- Difficulty specifying JOIN operations (inner or
outer joins) - Attributes that are NULL frequently could be
placed in separate relations (with the primary
key)
25Functional Dependencies
- Functional dependencies (FDs) are used to specify
formal measures of the "goodness" of a
relational database design - FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes - An FD is a constraint between two sets of
attributes X and Y - X ? Y holds if whenever two tuples have the same
value for X, they must have the same value for Y - For any two tuples t1 and t2 in any relation
instance r(R) If t1Xt2X, then t1Yt2Y
26Functional Dependencies
- X ? Y A set of attributes X functionally
determines a set of attributes Y (or Y is
functionally determined by X) if the value of X
determines a unique value for Y - X ? Y in R specifies a constraint on all
relation instances r(R) - FDs are derived from the real-world constraints
on the attributes - Property of the intention of the database
- An FD is a property of the attributes in the
schema R - The constraint must hold on every relation
instance r(R) - Can NEVER be deduced from an extension
- E.g. if in some case, all people having the same
first name are registered for the same course,
can we deduce that name ? course?
27Examples of FD Constraints
- EMP_PROJ (SSN, PNUMBER, ENAME, PNAME, PLOCATION,
HOURS) - social security number determines employee name
- SSN ? ENAME
- project number determines project name and
location - PNUMBER ? PNAME, PLOCATION
- employee SSN and project number determines the
hours per week that the employee works on the
project - SSN, PNUMBER ? HOURS
28More on FD Constraints
- Definition of a relation (If K is a key of R)
- K functionally determines all attributes in R
- If X?Y is true, does that make Y?X true?
- Some FDs are always true regardless of the
relation in which they occur - State, Driver_License_Number ? SSN
- Zip ? City, State