Database Overview

About This Presentation

Title:

Database Overview

Description:

Relation is a subset of a cartesian product of ... as a single relation such as ... by 'possible r ' we mean a relation r that could exist in the ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 128

Provided by: yur5

Learn more at: https://www.cs.kent.edu

Category:

more less

Transcript and Presenter's Notes

Title: Database Overview

1
Database Overview

File Management vs Database Management
Advantages of Database systems storage
persistence,
programming interface, transaction management
Three level Data Model
DBMS Architecture
Database System Components
Users classification

2
File Management System Problems

Data redundancy
Data Access New request-new program
Data is not isolated from the access
implementation
Format incompatible data
Concurrent program execution on the same file
Difficulties with security enforcement
Integrity issues

3
Advantages of Databases

Persistent Storage Database not only provides
persistent storage but also efficient access to
large amounts of data
Programming Interface Database allows users to
access and modify data using powerful query
language. It provides flexibility in data
management
Transaction Management Database supports a
concurrent access to the data

4
Three Aspects to Studying DBMS's

1. Modeling and design of databases.
Allows exploration of issues before committing to
an implementation.
2. Programming queries and DB operations like
update.
SQL intergalactic dataspeak.
3. DBMS implementation.
.

5
Definitions

A database is a collection of stored operational
data used by various applications and/or users by
some particular enterprise or by a set of outside
authorized applications and authorized users
A DataBase Management System (DBMS) is a software
system that manages execution of users
applications to access and modify database data
so that the data security, data integrity, and
data reliability is guaranteed for each
application and each application is written with
an assumption that it is the only application
active in the database.

6
Three Level Data View Data Abstractions
. . . . .
View1
View k
Conceptual View Of Data
Phyisal Data Storage
7
DBMS Architecture
8
Logical and Physical Database Components

Data Definition Language (DDL)
Data Manipulation Language (DML)
Host Language Interface
Data Administrator
Users
Query Processor
Compiler
Optimizer
Management
Transaction Manager
File Manager
Buffer Manager
Authorization and Integrity Manager

Logical
Physical
9
Database Languages
Department
Faculty
Dept
Chair
Name
Dept
SQL

SELECT Chair
FROM Faculty, DepartmentWHERE Faculty.name
Ken Noname AND Faculty.Dept
Department.Dept
Data definition language (DDL) like type
definitions in C or C
Data Manipulation Language (DML) Query
(SELECT) UPDATE lt relation name gt SET
ltattributegt lt new-valuegt WHERE ltconditiongt

10
Logical Data Models

A collection of tools for describing
data
data relationships
data semantics
data constraints
Value based models ER Model, OO Model
Record Based Models Relational Model

11
Entity-Relationship Model

The enterprise data can be described as a set of
entities and a set of relationships between them.
Entity a data that pertains to, or describes
some component of the enterprise
Each entity is characterized by a set of
attributes
Relationship describes an interconnection
between different entities
Entity Set a set of entities that are
characterized by the same entity definition
Relationship Set a set of relationships of the
same type

12
Entity-Relationship Model

Example of schema in the entity-relationship model

13
Relational Model

An enterprise is represented as a set of
relations
Domain is a set of atomic values. Each domain
has a NULL value.
Data type Description of a form that domain
values can be represented.
Relation is a subset of a cartesian product of
one or more domains
The elements of relations are called tuples. Each
element in the cartesian product is called
attribute.

Relational model is good for
Large amounts of data gt simple operations
Navigate among small number of relations
Difficult Applications for relational model
VLSI Design (CAD in general)
CASE
Graphical Data

ALU
ADDER
CPU
A
FA
Adder
ALU
ADDER
Bill of Materials or transitive closure
15
Relational Model
Attributes
Street
City
gpa
Name
Student-id

Example of tabular data in the relational model

Johnson Smith Johnson Jones Smith
192-83-7465 019-28-3746 192-83-7465 321-12-3123
019-28-3746
Alma North Alma Main North
3.6 2.7 3.2 4.0 3.45
Palo Alto Rye Palo Alto Harrison Rye
16
Relational Algebra

Lecture 2

17
Relational Model

Basic Notions
Fundamental Relational Algebra Operations
Additional Relational Algebra Operations
Extended Relational Algebra Operations
Null Values
Modification of the Database
Views
Bags and Bag operations

18
Basic Structure

Formally, given sets D1, D2, . Dn a relation r
is a subset of D1 x D2 x x DnThus,
a relation is a set of n-tuples (a1, a2, , an)
where each ai ? Di
Example
customer_name Jones, Smith,
Curry, Lindsay customer_street Main, North,
Park customer_city Harrison, Rye,
PittsfieldThen r (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) is a relation over
customer_name , customer_street, customer_city

19
Relation Schema

A1, A2, , An are attributes
R (A1, A2, , An ) is a relation schema
Example
Customer_schema (customer_name,
customer_street, customer_city)
r(R) is a relation on the relation schema R
Example
customer (Customer_schema)

20
Database

A database consists of multiple relations
Information about an enterprise is broken up into
parts, with each relation storing one part of
the information
account stores information about accounts
depositor stores information about which
customer owns which
account customer stores information
about customers
Storing all information as a single relation such
as bank(account_number, balance,
customer_name, ..)results in repetition of
information (e.g., two customers own an account)
and the need for null values (e.g., represent a
customer without an account)

21
Keys

Let K ? R
K is a superkey of R if values for K are
sufficient to identify a unique tuple of each
possible relation r(R)
by possible r we mean a relation r that could
exist in the enterprise we are modeling.
Example customer_name, customer_street and
customer_name are both superkeys
of Customer, if no two customers can possibly
have the same name.
K is a candidate key if K is minimalExample
customer_name is a candidate key for.
Primary Key

22
Select Operation Example
A
B
C
D

Relation r

? ? ? ?
? ? ? ?
1 5 12 23
7 7 3 10

?AB D gt 5 (r)

A
B
C
D
? ?
? ?
1 23
7 10
23
Project Operation Example
A
B
C
? ? ? ?
10 20 30 40
1 1 1 2

Relation r

A
C
A
C

?A,C (r)

That is, the projection of a relation on a set of
attributes is a set of tuples
? ? ? ?
1 1 1 2
? ? ?
1 1 2

24
Union Operation Example
A
B
A
B
? ? ?
1 2 1
? ?
2 3

Relations r, s

s
r
A
B
? ? ? ?
1 2 1 3
r ? s
25
Set Difference Operation Example
A
B
A
B
? ? ?
1 2 1
? ?
2 3

Relations r, s

s
r
A
B
? ?
1 1
r s
26
Cartesian-Product Operation-Example
A
B
C
D
E
Relations r, s
? ?
1 2
? ? ? ?
10 10 20 10
a a b b
r
s
r x s
A
B
C
D
E
? ? ? ? ? ? ? ?
1 1 1 1 2 2 2 2
? ? ? ? ? ? ? ?
10 10 20 10 10 10 20 10
a a b b a a b b
27
Additional Operations

We define additional operations that do not add
any power
to the relational algebra, but that simplify
common queries.
Set intersection
Natural join
Division
Assignment

28
Set-Intersection Operation - Example
A B

Relation r, s
r ? s

A B
? ? ?
1 2 1
? ?
2 3
r
s
A B
? 2
29
Natural Join Operation Example

Relations r, s

B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ? ? ? ?
1 1 1 1 2
? ? ? ? ?
a a a a b
? ? ? ? ?
30
Division Operation Example
A
B
Relations r, s
B
? ? ? ? ? ? ? ? ? ? ?
1 2 3 1 1 1 3 4 6 1 2
1 2
s
r ? s
A
r
? ?
31
Another Division Example
Relations r, s
A
B
C
D
E
D
E
? ? ? ? ? ? ? ?
a a a a a a a a
? ? ? ? ? ? ? ?
a a b a b a b b
1 1 1 1 3 1 1 1
a b
1 1
s
r
A
B
C
r ? s
? ?
a a
? ?
32
Example Queries

Find the largest account balance
1. Rename account relation as d
2. The query is

?balance(account) - ?account.balance
(?account.balance lt d.balance (account x rd
(account)))
33
Example Queries

Find all customers who have an account at all
branches located in Brooklyn city.

34
Extended Relational-Algebra-Operations

Generalized Projection
Outer Join
Aggregate Functions

35
Generalized Projection

Extends the projection operation by allowing
arithmetic functions to be used in the projection
list. ? F1, F2, , Fn(E)
E is any relational-algebra expression
Each of F1, F2, , Fn are are arithmetic
expressions involving constants and attributes in
the schema of E.
Given relation credit-info(customer-name, limit,
credit-balance), find how much more each person
can spend
?customer-name, limit credit-balance
(credit-info)

36
Aggregate Functions and Operations

Aggregation function takes a collection of values
and returns a single value as a result.
avg average value min minimum value max
maximum value sum sum of values count
number of values
Aggregate operation in relational algebra
G1, G2, , Gn g F1( A1), F2( A2),, Fn( An)
(E)
E is any relational-algebra expression
G1, G2 , Gn is a list of attributes on which to
group (can be empty)
Each Fi is an aggregate function
Each Ai is an attribute name

37
Aggregate Operation Example
A
B
C

Relation r

? ? ? ?
? ? ? ?
7 7 3 10
sum-C
g sum(c) (r)
27
38
Aggregate Operation Example

Relation account grouped by branch-name

branch-name
account-number
balance
Perryridge Perryridge Brighton Brighton Redwood
A-102 A-201 A-217 A-215 A-222
400 900 750 750 700
branch-name g sum(balance) (account)
branch-name
balance
Perryridge Brighton Redwood
1300 1500 700
39
Aggregate Functions

Result of aggregation does not have a name
Can use rename operation to give it a name
For convenience, we permit renaming as part of
aggregate operation

branch-name g sum(balance) as sum-balance
(account)
40
Outer Join Example

Relation loan
Employee(ename,str,city)
Works(ename,cname,sal)
Company(cname,city)
Manages(ename,mname)

Relation borrower

41
SQLLecture 3
42
SQL

Data Definition
Basic Query Structure
Set Operations
Aggregate Functions
Null Values
Nested Subqueries
Complex Queries
Views

43
Data Definition Language
Allows the specification of not only a set of
relations but also information about each
relation, including

The schema for each relation.
The domain of values associated with each
attribute.
Integrity constraints
The set of indices to be maintained for each
relations.
Security and authorization information for each
relation.
The physical storage structure of each relation
on disk.

44
Basic Query Structure

A typical SQL query has the form select A1, A2,
..., An from r1, r2, ..., rm where P
order by
group by
having Q
Ais represent attributes
ris represent relations
P is a predicate.
This query is equivalent to the relational
algebra expression.
?A1, A2, ..., An(?P (r1 x r2 x ... x
rm))
The result of an SQL query is a relation.

45
Set Operations

Find all customers who have a loan, an account,
or both

(select customer-name from depositor) union (sel
ect customer-name from borrower)

Find all customers who have both a loan and
an account.

(select customer-name from depositor) intersect
(select customer-name from borrower)

Find all customers who have an account but no
loan.

(select customer-name from depositor) minus
(select customer-name from borrower)

46
Aggregate Functions

These functions operate on the multiset of values
of a column of a relation, and return a value
avg average value min minimum value max
maximum value sum sum of values count
number of values

47
Null Values and Aggregates

Total all loan amounts
select sum (amount) from loan
Above statement ignores null amounts
result is null if there is no non-null amount,
that is the
All aggregate operations except count() ignore
tuples with null values on the aggregated
attributes.

48
Nested Subqueries

SQL provides a mechanism for the nesting of
subqueries.
A subquery is a select-from-where expression that
is nested within another query.
A common use of subqueries is to perform tests
for set membership, set comparisons, and set
cardinality.

49
Set Comparison

Find all branches that have greater assets than
some branch located in Brooklyn.

select distinct T.branch-name from branch as T,
branch as S where T.assets gt S.assets and
S.branch-city Brooklyn

Same query using gt some clause

select branch-name from branch where assets gt
some (select assets from branch
where branch-city Brooklyn)
50
Example Query

Find the names of all branches that have greater
assets than all branches located in Brooklyn.

select branch-name from branch where assets gt
all (select assets from branch where
branch-city Brooklyn)
51
Test for Empty Relations

The exists construct returns the value true if
the argument subquery is nonempty.
exists r ?? r ? Ø
not exists r ?? r Ø

52
Example

Student(name,sport)

Name
Sport
Yuri soccer Yuri baseball Yuri
tennis Joe football Joe soccer Jane
tennis Jim tennis Yuri tennis Jim
football
Find students that play all sports
.
(Student)
.
Student
sport
53
Example

Student(name,sport)
Find students that play all sports
Select distinct id
from students S
where not exists (
(select distinct
sport from student)
minus
(select distinct
sport from student T
where S.id
T.id)
Find students that are playing exactly one sport
Select id
from (select id, count()
from student
group by id
having count() 1)

54
Entity-Relationship Model

Lecture 5

55
ER Model Components

Entity Sets
Attributes
Relationships

56
ER Model
title
year
Name
Address
Star-in
Movies
Stars
length
genre
owns
Studios
Name
Address
57
Relationship

A relationship is a cartesian product of
n ? 2 entities
(e1, e2, en) e1 ? E1, e2 ? E2, , en ?
Enwhere (e1, e2, , en) is a relationship

58
Degree of a Relationship Set

Refers to number of entity sets that participate
in a relationship set.
Relationship sets that involve two entity sets
are binary (or degree two). Generally, most
relationship sets in a database system are
binary.
Relationship sets may involve more than two
entity sets.
Relationships between more than two entity sets
are rare. Most relationships are binary.

E.g. Suppose employees of a bank may have jobs
(responsibilities) at multiple branches, with
different jobs at different branches. Then there
is a ternary relationship set between entity sets
employee, job and branch

59
Courses
Enrolls
Students
Instructors

Students Courses TAs
Ann CS43005 Jan
Sue CS43005 Pat
Bob CS43005 Jan

60
Types of Binary Relationships
Many-one
Many-many
One-one

Representation of Many-One

Many-one E/R arrow pointing to one.
61
Keys for Relationship Sets

The combination of primary keys of the
participating entity sets forms a super key of a
relationship set.
(customer-id, account-number) is the super key of
depositor
NOTE this means a pair of entity sets can have
at most one relationship in a particular
relationship set.
E.g. if we wish to track all access-dates to each
account by each customer, we cannot assume a
relationship for each access. We can use a
multivalued attribute though
Must consider the mapping cardinality of the
relationship set when deciding the what are the
candidate keys
Need to consider semantics of relationship set in
selecting the primary key in case of more than
one candidate key

62
Converting Multiway to 2-Way

Create a new connecting E.S. to represent rows of
a relationship set.
E.g., (Joe's Bar, Bud, 2.50) for the Sells
relationship.
Many-one relationships from the connecting E.S.
to the others.

BBP
The-Bar
The-Beer
The-Price
Bars
Beers
Price
63
Specialization

within an entity set that are distinctive from
other entities in the set.
These subgroupings become lower-level entity sets
that have attributes or participate in
relationships that do not apply to the
higher-level entity set.
Depicted by a triangle component labeled ISA
(E.g. Top-down design process we designate
subgroupings customer is a person).
Attribute inheritance a lower-level entity set
inherits all the attributes and relationship
participation of the higher-level entity set to
which it is linked.

64
Specialization Example
65
Generalization

A bottom-up design process combine a number of
entity sets that share the same features into a
higher-level entity set.
Specialization and generalization are simple
inversions of each other they are represented in
an E-R diagram in the same way.
The terms specialization and generalization are
used interchangeably.

66
Aggregation

Consider the ternary relationship works-on,
which we saw earlier
Suppose we want to record managers for tasks
performed by an employee at a branch

67
E-R Diagram With Aggregation
68
Weak Entity Sets

An entity set that does not have a primary key is
referred to as a weak entity set.
The existence of a weak entity set depends on the
existence of a identifying entity set
it must relate to the identifying entity set via
a total, one-to-many relationship set from the
identifying to the weak entity set
Identifying relationship depicted using a double
diamond
The discriminator (or partial key) of a weak
entity set is the set of attributes that
distinguishes among all the entities of a weak
entity set.
The primary key of a weak entity set is formed by
the primary key of the strong entity set on which
the weak entity set is existence dependent, plus
the weak entity sets discriminator.

69
Weak Entity Sets

We depict a weak entity set by double rectangles.
We underline the discriminator of a weak entity
set with a dashed line.
payment-number discriminator of the payment
entity set
Primary key for payment (loan-number,
payment-number)

70
Example
student
instructor
enrolls
teaches
offering
isoffered
course
requires
71
Reduction of an E-R Schema to Tables

Primary keys allow entity sets and relationship
sets to be expressed uniformly as tables which
represent the contents of the database.
A database which conforms to an E-R diagram can
be represented by a collection of tables.
For each entity set and relationship set there is
a unique table which is assigned the name of the
corresponding entity set or relationship set.
Each table has a number of columns (generally
corresponding to attributes), which have unique
names.

72
Representing Entity Sets as Tables

A strong entity set reduces to a table with the
same attributes.

73
Representing Relationship Sets as Tables

A many-to-many relationship set is represented as
a table with columns for the primary keys of the
two participating entity sets, and any
descriptive attributes of the relationship set.
E.g. table for relationship set borrower

74
Additional Rules for Translating Relationship
into Relation
If one entity set participates several times in
the relationship with different roles, its key
attributes must be listed as many times and with
different names for each role. Studies(SSN,
Name) Favorite(SSN, Name) Friends(SSN1, SSN2)
Name
SSN
subject
studies
Student
friends
favorite
75
Redundancy of Tables

Many-to-one relationship sets that are total on
the many-side can be represented by adding an
extra attribute to the many side, containing the
primary key of the one side
Example We eliminate relation Favorite and we
extend relation for Student as follows
Student(SSN, Name, Subject.name)
If, however, the relationship is many-to-many we
cannot do that since it leads to redundancy
For example relation Studies cannot be
eliminated since otherwise we may end up with
111-222-333 John OS
111-222-333 John DBMS

76
Representing Weak Entity Sets

A weak entity set becomes a table that includes a
column for the primary key of the identifying
strong entity set

77
Representing Specialization as Tables

Form a table for the higher level entity
Form a table for each lower level entity set,
include primary key of higher level entity set
and local attributes table table
attributesperson name, street, city
customer name, credit-ratingemployee name,
salary
Drawback getting information about, e.g.,
employee requires accessing two tables

78
Relations Corresponding to Aggregation

To represent aggregation, create a table
containing
primary key of the aggregated relationship,
the primary key of the associated entity set
Any descriptive attributes

79
Example
ssn
name
ISA
person
passenger
age
booked
ISA
date
departure
assigned
pilot
fhrs
gate
instantof
canfly
flight
plane
dtime
atime
F
man
model
80
Example
ssn
Name
name
Plays
Player
Team
age
Uses
ISA
Has
Colors
Captain
Likes-Colors
likes
Fan
name
addr
81
Relational Database Design Theory

Lecture 6

82
First Normal Form

Domain is atomic if its elements are considered
to be indivisible units
Examples of non-atomic domains
Set of names, composite attributes
Identification numbers like CS101 that can be
broken up into parts
A relational schema R is in first normal form if
the domains of all attributes of R are atomic
Non-atomic values complicate storage and
encourage redundant (repeated) storage of data
Example Set of accounts stored with each
customer, and set of owners stored with each
account
We assume all relations are in first normal form

83
First Normal Form

Atomicity is actually a property of how the
elements of the domain are used.
Example Strings would normally be considered
indivisible
Suppose that students are given roll numbers
which are strings of the form CS0012 or EE1127
If the first two characters are extracted to find
the department, the domain of roll numbers is not
atomic.
Doing so is a bad idea leads to encoding of
information in application program rather than in
the database.

84
Functional Dependencies

Constraints on the set of legal relations.
Require that the value for a certain set of
attributes determines uniquely the value for
another set of attributes.
A functional dependency is a generalization of
the notion of a key.

85
Functional Dependencies

Let R(A1, A2, .Ak) be a relational schema X and
Y are subsets of A1, A2, Ak. We say that X-gtY,
if any two tuples that agree on X, then
they agree on Y.
Example
Student(SSN,Name,Addr,subjectTaken,favSubject,Prof
)
SSN-gtName
SSN-gtAddr
subjectTaken-gtProf
Assign(Pilot,Flight,Date,Departs)
Pilot,Date,Departs-gtFlight
Flight,Date-gtPilot

86
Functional Dependencies

A functional dependency X-gtY is trivial if it is
satisfied by any relation that includes
attributes from X and Y
E.g.
customer-name, loan-number ? customer-name
customer-name ? customer-name
In general, ? ? ? is trivial if ? ? ?

87
Closure of a Set of Functional Dependencies

Given a set F set of functional dependencies,
there are certain other functional dependencies
that are logically implied by F.
E.g. If A ? B and B ? C, then we can infer
that A ? C
The set of all functional dependencies logically
implied by F is the closure of F.
We denote the closure of F by F.

88
Closure of a Set of Functional Dependencies

An inference axiom is a rule that states if a
relation satisfies certain FDs, it must also
satisfy certain other FDs
Set of inference rules is sound if the rules lead
only to true conclusions
Set of inference rules is complete, if it can be
used to conclude every valid FD on R
We can find all of F by applying Armstrongs
Axioms
if ? ? ?, then ? ? ?
(reflexivity)
if ? ? ?, then ? ? ? ? ?
(augmentation)
if ? ? ?, and ? ? ?, then ? ? ? (transitivity)
These rules are
sound and complete

89
Example

R (A, B, C, G, H, I)F A ? B A ? C CG
? H CG ? I B ? H
some members of F
A ? H
by transitivity from A ? B and B ? H
AG ? I
by augmenting A ? C with G, to get AG ? CG
and then transitivity with CG ? I

90
Procedure for Computing F

To compute the closure of a set of functional
dependencies F
F Frepeat for each functional
dependency f in F apply reflexivity and
augmentation rules on f add the resulting
functional dependencies to F for each pair of
functional dependencies f1and f2 in F if
f1 and f2 can be combined using transitivity
then add the resulting functional dependency to
Funtil F does not change any further

91
Closure of Attribute Sets

Given a set of attributes a, define the closure
of a under F (denoted by a) as the set of
attributes that are functionally determined by a
under F a ? ? is in F ? ? ? a
Algorithm to compute a, the closure of a under
F result a while (changes to result)
do for each ? ? ? in F do begin if ? ?
result then result result ? ? end

92
Uses of Attribute Closure

There are several uses of the attribute closure
algorithm
Testing for superkey
To test if ? is a superkey, we compute ?, and
check if ? contains all attributes of R.
Testing functional dependencies
To check if a functional dependency ? ? ? holds
(or, in other words, is in F), just check if ? ?
?.
That is, we compute ? by using attribute
closure, and then check if it contains ?.
Is a simple and cheap test, and very useful
Computing closure of F
For each ? ? R, we find the closure ?, and for
each S ? ?, we output a functional dependency ?
? S.

93
Example of Attribute Set Closure

R (A, B, C, G, H, I)
F A ? B, A ? C, CG ? H, CG ? I, B ? H
(AG)
1. result AG
2. result ABCG (A ? C and A ? B)
3. result ABCGH (CG ? H and CG ? AGBC)
4. result ABCGHI (CG ? I and CG ? AGBCH)
Is AG a key?
Is AG a super key?
Does AG ? R? Is (AG) ? R
Is any subset of AG a superkey?
Does A ? R? Is (A) ? R
Does G ? R? Is (G) ? R

94
Extraneous Attributes

Consider a set F of functional dependencies and
the functional dependency ? ? ? in F.
Attribute A is extraneous in ? if A ? ? and
F logically implies (? A) ? ? or
Attribute A ? ? is extraneous in ? if A ? ?
and the set of functional dependencies (F
? ? ?) ? ? ?(? A) logically implies F.
Example Given F A ? C, AB ? C
B is extraneous in AB ? C because A ? C
logically implies AB? C, A ?C.
Example Given F A ? C, AB ? CD
C is extraneous in AB ? CD since AB ? D,A ?C
implies AB ? C

95
Testing if an Attribute is Extraneous

Consider a set F of functional dependencies and
the functional dependency ? ? ? in F.
To test if attribute A ? ? is extraneous in ?
compute (? A) using the dependencies in
F
2. check that (? A) contains A if it does,
A is extraneous
To test if attribute A ? ? is extraneous in ?
compute ? using only the dependencies in
F (F ? ? ?) ? ? ?(? A),
check that ? contains A if it does, A is
extraneous

96
Canonical Cover

Sets of functional dependencies may have
redundant dependencies that can be inferred from
the others
Eg A ? C is redundant in A ? B, B ? C,
A ? C
Parts of a functional dependency may be redundant
E.g. A ? B, B ? C, A ? CD can be
simplified to A ? B,
B ? C, A ? D
E.g. A ? B, B ? C, AC ? D can be
simplified to A ? B,
B ? C, A ? D
A canonical cover of F is a minimal set of
functional dependencies equivalent to F, having
no redundant dependencies or redundant parts of
dependencies

97
Canonical Cover(Formal Definition)

A canonical cover for F is a set of dependencies
Fc such that
F logically implies all dependencies in Fc, and
Fc logically implies all dependencies in F, and
No functional dependency in Fc contains an
extraneous attribute, and
Each left side of functional dependency in Fc is
unique.

98
Canonical CoverComputation

To compute a canonical cover for Frepeat Use
the union rule to replace any dependencies in
F ?1 ? ?1 and ?1 ? ?1 with ?1 ? ?1 ?2 Find a
functional dependency ? ? ? with an extraneous
attribute either in ? or in ? If an extraneous
attribute is found, delete it from ? ? ? until F
does not change

99
Example of Computing a Canonical Cover

R (A, B, C)F A ? BC B ? C A ? B AB ?
C
Combine A ? BC and A ? B into A ? BC
A is extraneous in AB ? C
Set is now A ? BC, B ? C
C is extraneous in A ? BC
Check if A ? C is logically implied by A ? B and
the other dependencies
Yes using transitivity on A ? B and B ? C.
The canonical cover is A ? B B ? C

100
Decomposition

All attributes of an original schema (R) must
appear in the decomposition (R1, R2)
R R1 ? R2
Lossless-join decomposition.For all possible
relations r on schema R
r ?R1 (r) ?R2 (r)
A decomposition of R into R1 and R2 is lossless
join if and only if at least one of the following
dependencies is in F
R1 ? R2 ? R1
R1 ? R2 ? R2

101
Normalization Using Functional Dependencies

When we decompose a relation schema R with a set
of functional dependencies F into R1, R2,.., Rn
we want
Lossless-join decomposition Otherwise
decomposition would result in information loss.
Dependency preservation Let Fi be the set of
dependencies F that include only attributes in
Ri.
(F1 ? F2 ? ? Fn) F
.

102
Example

R (A, B, C)F A ? B, B ? C)
Can be decomposed in two different ways
R1 (A, B), R2 (B, C)
Lossless-join decomposition
R1 ? R2 B and B ? BC
Dependency preserving
R1 (A, B), R2 (A, C)
Lossless-join decomposition
R1 ? R2 A and A ? AB
Not dependency preserving (cannot check B ? C
without computing R1 R2)

103
Testing for Dependency Preservation

To check if a dependency ??? is preserved in a
decomposition of R into R1, R2, , Rn we apply
the following simplified test (with attribute
closure done w.r.t. F)
result ?while (changes to result) do for each
Ri in the decomposition t (result ? Ri) ?
Ri result result ? t
If result contains all attributes in ?, then the
functional dependency ? ? ? is preserved.
We apply the test on all dependencies in F to
check if a decomposition is dependency preserving
This procedure takes polynomial time, instead of
the exponential time required to compute F and
(F1 ? F2 ? ? Fn)

104
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a
set F of functional dependencies if for all
functional dependencies in F of the form ??? ?,
where ? ? R and ? ? R, at least one of the
following holds

?? ? ? is trivial (i.e., ? ? ?)
? is a superkey for R

105
Example

R (A, B, C)F A ? B B ? CKey A
R is not in BCNF
Decomposition R1 (A, B), R2 (B, C)
R1 and R2 in BCNF
Lossless-join decomposition
Dependency preserving

106
Testing for BCNF

To check if a non-trivial dependency ???? causes
a violation of BCNF
1. compute ? (the attribute closure of ?), and
2. verify that it includes all attributes of R
Using only F is incorrect when testing a relation
in a decomposition of R
E.g. Consider R (A, B, C, D), with F A ?B, B
?C
Decompose R into R1(A,B) and R2(A,C,D)
Neither of the dependencies in F contain only
attributes from (A,C,D) so we might be mislead
into thinking R2 satisfies BCNF.
In fact, dependency A ? C in F shows R2 is not
in BCNF.

107
BCNF Decomposition Algorithm

result Rdone falsecompute Fwhile
(not done) do if (there is a schema Ri in result
that is not in BCNF) then begin let ?? ? ?
be a nontrivial functional dependency that holds
on Ri such that ?? ? Ri is not in F, and ? ? ?
?result (result Ri ) ? (Ri ?) ? (?, ?
) end else done true
Each Ri is in BCNF, and decomposition is
lossless-join.

108
BCNF and Dependency Preservation
It is not always possible to get a BCNF
decomposition that is dependency preserving

R (A, B, C)F AB ? C C ? BTwo candidate
keys AB and AC
R is not in BCNF
Any decomposition of R will fail to preserve
AB ? C

109
Third Normal Form

A relation schema R is in third normal form (3NF)
if for all ? ? ? in F at least one of the
following holds
? ? ? is trivial (i.e., ? ? ?)
? is a superkey for R
Each attribute A in ? ? is contained in a
candidate key for R.
If a relation is in BCNF it is in 3NF (since in
BCNF one of the first two conditions above must
hold).
Third condition is a minimal relaxation of BCNF
to ensure dependency preservation.

110
Third Normal Form

Example
R (A,B,C)F AB ? C, C ? B
Two candidate keys AB and AC
R is in 3NF
AB ? C AB is a superkey C ? B B is contained
in a candidate key
BCNF decomposition has (AC) and (BC)
Testing for AB ? C requires a join

111
Testing for 3NF

Use attribute closure to check for each
dependency ? ? ?, if ? is a superkey.
If ? is not a superkey, we have to verify if each
attribute in ? is contained in a candidate key of
R
this test is rather more expensive, since it
involve finding candidate keys
testing for 3NF has been shown to be NP-hard
However, decomposition into third normal form can
be done in polynomial time

112
3NF Decomposition Algorithm

Let Fc be a canonical cover for Fi 0for
each functional dependency ? ? ? in Fc do if
none of the schemas Rj, 1 ? j ? i contains ? ?
then begin i i 1 Ri ? ?
endif none of the schemas Rj, 1 ? j ? i
contains a candidate key for R then begin i
i 1 Ri any candidate key for
R end return (R1, R2, ..., Ri)

113
Storage Hierarchy
114
Storage Hierarchy

primary storage Fastest media but volatile
(cache, main memory).
secondary storage next level in hierarchy,
non-volatile, moderately fast access time
also called on-line storage
E.g. flash memory, magnetic disks
tertiary storage lowest level in hierarchy,
non-volatile, slow access time
also called off-line storage
E.g. magnetic tape, optical storage

115
Magnetic Disks

Disk controller interfaces between the computer
system and the disk drive hardware.
accepts high-level commands to read or write a
sector
initiates actions such as moving the disk arm to
the right track and actually reading or writing
the data
Computes and attaches checksums to each sector to
verify that data is read back correctly
If data is corrupted, with very high probability
stored checksum wont match recomputed checksum
Ensures successful writing by reading back sector
after writing it
Performs remapping of bad sectors

116
Disk Subsystem

Multiple disks connected to a computer system
through a controller
Controllers functionality (checksum, bad sector
remapping) often carried out by individual disks
reduces load on controller
Disk interface standards families
ATA (AT adaptor) range of standards
SCSI (Small Computer System Interconnect) range
of standards
Several variants of each standard (different
speeds and capabilities)

117
Performance Measures of Disks

Access time the time it takes from when a read
or write request is issued to when data transfer
begins. Consists of
Seek time time it takes to reposition the arm
over the correct track.
Average seek time is 1/2 the worst case seek
time.
4 to 10 milliseconds on typical disks
Rotational latency time it takes for the sector
to be accessed to appear under the head.
Average latency is 1/2 of the worst case
latency.
4 to 11 milliseconds on typical disks
Data-transfer rate the rate at which data can
be retrieved from or stored to the disk.
4 to 8 MB per second is typical
Multiple disks may share a controller, so
transfer rate that controller can handle is also
important
E.g. ATA-5 66 MB/second, SCSI-3 40 MB/s
Fiber Channel 256 MB/s

118
Storage Access

A database file is partitioned into fixed-length
storage units called blocks. Blocks are units of
both storage allocation and data transfer.
Database system seeks to minimize the number of
block transfers between the disk and memory. We
can reduce the number of disk accesses by keeping
as many blocks as possible in main memory.
Buffer portion of main memory available to
store copies of disk blocks.
Buffer manager subsystem responsible for
allocating buffer space in main memory.

119
Buffer Manager

Programs call on the buffer manager when they
need a block from disk.
If the block is already in the buffer, the
requesting program is given the address of the
block in main memory
If the block is not in the buffer,
the buffer manager allocates space in the buffer
for the block, replacing (throwing out) some
other block, if required, to make space for the
new block.
The block that is thrown out is written back to
disk only if it was modified since the most
recent time that it was written to/fetched from
the disk.
Once space is allocated in the buffer, the buffer
manager reads the block from the disk to the
buffer, and passes the address of the block in
main memory to requester.

120
Buffer-Replacement Policies

Most operating systems replace the block least
recently used (LRU strategy)
Idea behind LRU use past pattern of block
references as a predictor of future references
Queries have well-defined access patterns (such
as sequential scans), and a database system can
use the information in a users query to predict
future references
LRU can be a bad strategy for certain access
patterns involving repeated scans of data
e.g. when computing the join of 2 relations r
and s by a nested loops for each tuple tr of r
do for each tuple ts of s do if the
tuples tr and ts match
Mixed strategy with hints on replacement strategy
providedby the query optimizer is preferable

121
Buffer-Replacement Policies

Pinned block memory block that is not allowed
to be written back to disk.
Toss-immediate strategy frees the space
occupied by a block as soon as the final tuple of
that block has been processed
Most recently used (MRU) strategy system must
pin the block currently being processed. After
the final tuple of that block has been processed,
the block is unpinned, and it becomes the most
recently used block.
Buffer manager can use statistical information
regarding the probability that a request will
reference a particular relation
E.g., the data dictionary is frequently accessed.
Heuristic keep data-dictionary blocks in main
memory buffer
Buffer managers also support forced output of
blocks for the purpose of recovery

122
File Organization

The database is stored as a collection of files.
Each file is a sequence of records. A record is
a sequence of fields.
One approach
assume record size is fixed
each file has records of one particular type only
different files are used for different relations
This case is easiest to implement will consider
variable length records later.

123
Fixed-Length Records

Simple approach
Store record i starting from byte n ? (i 1),
where n is the size of each record.
Record access is simple but records may cross
blocks
Modification do not allow records to cross block
boundaries
Deletion of record I alternatives
move records i 1, . . ., n to i, . . . , n 1
move record n to i
do not move records, but link all free records
on afree list

124
Variable-Length Records Slotted Page Structure

Slotted page header contains
number of record entries
end of free space in the block
location and size of each record
Records can be moved around within a page to keep
them contiguous with no empty space between them
entry in the header must be updated.
Pointers should not point directly to record
instead they should point to the entry for the
record in header.

125
Organization of Records in Files

Heap a record can be placed anywhere in the
file where there is space
Sequential store records in sequential order,
based on the value of the search key of each
record
Hashing a hash function computed on some
attribute of each record the result specifies in
which block of the file the record should be
placed
Records of each relation may be stored in a
separate file. In a clustering file organization
records of several different relations can be
stored in the same file
Motivation store related records on the same
block to minimize I/O

126
Sequential File Organization

Suitable for applications that require sequential
processing of the entire file
The records in the file are ordered by a
search-key

127
Clustering File Organization

Simple file structure stores each relation in a
separate file
Can instead store several relations in one file
using a clustering file organization
E.g., clustering organization of customer and
depositor
l scan using a secondary index is expensive
each record access may fetch a new block from
disk
an entry was deleted from their parent)
Root node then had only one child, and was
deleted and its child became the new root node