Database Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Database Overview

Description:

Relation is a subset of a cartesian product of ... as a single relation such as ... by 'possible r ' we mean a relation r that could exist in the ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 128
Provided by: yur5
Learn more at: https://www.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: Database Overview


1
Database Overview
  • File Management vs Database Management
  • Advantages of Database systems storage
    persistence,
  • programming interface, transaction management
  • Three level Data Model
  • DBMS Architecture
  • Database System Components
  • Users classification

2
File Management System Problems
  • Data redundancy
  • Data Access New request-new program
  • Data is not isolated from the access
    implementation
  • Format incompatible data
  • Concurrent program execution on the same file
  • Difficulties with security enforcement
  • Integrity issues

3
Advantages of Databases
  • Persistent Storage Database not only provides
    persistent storage but also efficient access to
    large amounts of data
  • Programming Interface Database allows users to
    access and modify data using powerful query
    language. It provides flexibility in data
    management
  • Transaction Management Database supports a
    concurrent access to the data

4
Three Aspects to Studying DBMS's
  • 1. Modeling and design of databases.
  • Allows exploration of issues before committing to
    an implementation.
  • 2. Programming queries and DB operations like
    update.
  • SQL intergalactic dataspeak.
  • 3. DBMS implementation.
  • .

5
Definitions
  • A database is a collection of stored operational
    data used by various applications and/or users by
    some particular enterprise or by a set of outside
    authorized applications and authorized users
  • A DataBase Management System (DBMS) is a software
    system that manages execution of users
    applications to access and modify database data
    so that the data security, data integrity, and
    data reliability is guaranteed for each
    application and each application is written with
    an assumption that it is the only application
    active in the database.

6
Three Level Data View Data Abstractions
. . . . .
View1
View k
Conceptual View Of Data
Phyisal Data Storage
7
DBMS Architecture
8
Logical and Physical Database Components
  • Data Definition Language (DDL)
  • Data Manipulation Language (DML)
  • Host Language Interface
  • Data Administrator
  • Users
  • Query Processor
  • Compiler
  • Optimizer
  • Management
  • Transaction Manager
  • File Manager
  • Buffer Manager
  • Authorization and Integrity Manager

Logical
Physical
9
Database Languages
Department
Faculty
Dept
Chair
Name
Dept
SQL
  • SELECT Chair
  • FROM Faculty, DepartmentWHERE Faculty.name
    Ken Noname AND Faculty.Dept
    Department.Dept
  • Data definition language (DDL) like type
    definitions in C or C
  • Data Manipulation Language (DML) Query
    (SELECT) UPDATE lt relation name gt SET
    ltattributegt lt new-valuegt WHERE ltconditiongt

10
Logical Data Models
  • A collection of tools for describing
  • data
  • data relationships
  • data semantics
  • data constraints
  • Value based models ER Model, OO Model
  • Record Based Models Relational Model

11
Entity-Relationship Model
  • The enterprise data can be described as a set of
    entities and a set of relationships between them.
  • Entity a data that pertains to, or describes
    some component of the enterprise
  • Each entity is characterized by a set of
    attributes
  • Relationship describes an interconnection
    between different entities
  • Entity Set a set of entities that are
    characterized by the same entity definition
  • Relationship Set a set of relationships of the
    same type

12
Entity-Relationship Model
  • Example of schema in the entity-relationship model

13
Relational Model
  • An enterprise is represented as a set of
    relations
  • Domain is a set of atomic values. Each domain
    has a NULL value.
  • Data type Description of a form that domain
    values can be represented.
  • Relation is a subset of a cartesian product of
    one or more domains
  • The elements of relations are called tuples. Each
    element in the cartesian product is called
    attribute.

14
  • Relational model is good for
  • Large amounts of data gt simple operations
  • Navigate among small number of relations
  • Difficult Applications for relational model
  • VLSI Design (CAD in general)
  • CASE
  • Graphical Data

ALU
ADDER
CPU
A
FA
Adder
ALU
ADDER
Bill of Materials or transitive closure
15
Relational Model
Attributes
Street
City
gpa
Name
Student-id
  • Example of tabular data in the relational model

Johnson Smith Johnson Jones Smith
192-83-7465 019-28-3746 192-83-7465 321-12-3123
019-28-3746
Alma North Alma Main North
3.6 2.7 3.2 4.0 3.45
Palo Alto Rye Palo Alto Harrison Rye
16
Relational Algebra
  • Lecture 2

17
Relational Model
  • Basic Notions
  • Fundamental Relational Algebra Operations
  • Additional Relational Algebra Operations
  • Extended Relational Algebra Operations
  • Null Values
  • Modification of the Database
  • Views
  • Bags and Bag operations

18
Basic Structure
  • Formally, given sets D1, D2, . Dn a relation r
    is a subset of D1 x D2 x x DnThus,
    a relation is a set of n-tuples (a1, a2, , an)
    where each ai ? Di
  • Example
  • customer_name Jones, Smith,
    Curry, Lindsay customer_street Main, North,
    Park customer_city Harrison, Rye,
    PittsfieldThen r (Jones, Main, Harrison),
    (Smith, North, Rye),
    (Curry, North, Rye),
    (Lindsay, Park, Pittsfield) is a relation over
  • customer_name , customer_street, customer_city

19
Relation Schema
  • A1, A2, , An are attributes
  • R (A1, A2, , An ) is a relation schema
  • Example
  • Customer_schema (customer_name,
    customer_street, customer_city)
  • r(R) is a relation on the relation schema R
  • Example
  • customer (Customer_schema)

20
Database
  • A database consists of multiple relations
  • Information about an enterprise is broken up into
    parts, with each relation storing one part of
    the information
  • account stores information about accounts
    depositor stores information about which
    customer owns which
    account customer stores information
    about customers
  • Storing all information as a single relation such
    as bank(account_number, balance,
    customer_name, ..)results in repetition of
    information (e.g., two customers own an account)
    and the need for null values (e.g., represent a
    customer without an account)

21
Keys
  • Let K ? R
  • K is a superkey of R if values for K are
    sufficient to identify a unique tuple of each
    possible relation r(R)
  • by possible r we mean a relation r that could
    exist in the enterprise we are modeling.
  • Example customer_name, customer_street and
    customer_name are both superkeys
    of Customer, if no two customers can possibly
    have the same name.
  • K is a candidate key if K is minimalExample
    customer_name is a candidate key for.
  • Primary Key

22
Select Operation Example
A
B
C
D
  • Relation r

? ? ? ?
? ? ? ?
1 5 12 23
7 7 3 10
  • ?AB D gt 5 (r)

A
B
C
D
? ?
? ?
1 23
7 10
23
Project Operation Example
A
B
C
? ? ? ?
10 20 30 40
1 1 1 2
  • Relation r

A
C
A
C
  • ?A,C (r)

That is, the projection of a relation on a set of
attributes is a set of tuples
? ? ? ?
1 1 1 2
? ? ?
1 1 2

24
Union Operation Example
A
B
A
B
? ? ?
1 2 1
? ?
2 3
  • Relations r, s

s
r
A
B
? ? ? ?
1 2 1 3
r ? s
25
Set Difference Operation Example
A
B
A
B
? ? ?
1 2 1
? ?
2 3
  • Relations r, s

s
r
A
B
? ?
1 1
r s
26
Cartesian-Product Operation-Example
A
B
C
D
E
Relations r, s
? ?
1 2
? ? ? ?
10 10 20 10
a a b b
r
s
r x s
A
B
C
D
E
? ? ? ? ? ? ? ?
1 1 1 1 2 2 2 2
? ? ? ? ? ? ? ?
10 10 20 10 10 10 20 10
a a b b a a b b
27
Additional Operations
  • We define additional operations that do not add
    any power
  • to the relational algebra, but that simplify
    common queries.
  • Set intersection
  • Natural join
  • Division
  • Assignment

28
Set-Intersection Operation - Example
A B
  • Relation r, s
  • r ? s

A B
? ? ?
1 2 1
? ?
2 3
r
s
A B
? 2
29
Natural Join Operation Example
  • Relations r, s

B
D
E
A
B
C
D
1 3 1 2 3
a a a b b
? ? ? ? ?
? ? ? ? ?
1 2 4 1 2
? ? ? ? ?
a a b a b
r
s
A
B
C
D
E
? ? ? ? ?
1 1 1 1 2
? ? ? ? ?
a a a a b
? ? ? ? ?
30
Division Operation Example
A
B
Relations r, s
B
? ? ? ? ? ? ? ? ? ? ?
1 2 3 1 1 1 3 4 6 1 2
1 2
s
r ? s
A
r
? ?
31
Another Division Example
Relations r, s
A
B
C
D
E
D
E
? ? ? ? ? ? ? ?
a a a a a a a a
? ? ? ? ? ? ? ?
a a b a b a b b
1 1 1 1 3 1 1 1
a b
1 1
s
r
A
B
C
r ? s
? ?
a a
? ?
32
Example Queries
  • Find the largest account balance
  • 1. Rename account relation as d
  • 2. The query is

?balance(account) - ?account.balance
(?account.balance lt d.balance (account x rd
(account)))
33
Example Queries
  • Find all customers who have an account at all
    branches located in Brooklyn city.

34
Extended Relational-Algebra-Operations
  • Generalized Projection
  • Outer Join
  • Aggregate Functions

35
Generalized Projection
  • Extends the projection operation by allowing
    arithmetic functions to be used in the projection
    list. ? F1, F2, , Fn(E)
  • E is any relational-algebra expression
  • Each of F1, F2, , Fn are are arithmetic
    expressions involving constants and attributes in
    the schema of E.
  • Given relation credit-info(customer-name, limit,
    credit-balance), find how much more each person
    can spend
  • ?customer-name, limit credit-balance
    (credit-info)

36
Aggregate Functions and Operations
  • Aggregation function takes a collection of values
    and returns a single value as a result.
  • avg average value min minimum value max
    maximum value sum sum of values count
    number of values
  • Aggregate operation in relational algebra
  • G1, G2, , Gn g F1( A1), F2( A2),, Fn( An)
    (E)
  • E is any relational-algebra expression
  • G1, G2 , Gn is a list of attributes on which to
    group (can be empty)
  • Each Fi is an aggregate function
  • Each Ai is an attribute name

37
Aggregate Operation Example
A
B
C
  • Relation r

? ? ? ?
? ? ? ?
7 7 3 10
sum-C
g sum(c) (r)
27
38
Aggregate Operation Example
  • Relation account grouped by branch-name

branch-name
account-number
balance
Perryridge Perryridge Brighton Brighton Redwood
A-102 A-201 A-217 A-215 A-222
400 900 750 750 700
branch-name g sum(balance) (account)
branch-name
balance
Perryridge Brighton Redwood
1300 1500 700
39
Aggregate Functions
  • Result of aggregation does not have a name
  • Can use rename operation to give it a name
  • For convenience, we permit renaming as part of
    aggregate operation

branch-name g sum(balance) as sum-balance
(account)
40
Outer Join Example
  • Relation loan
  • Employee(ename,str,city)
  • Works(ename,cname,sal)
  • Company(cname,city)
  • Manages(ename,mname)
  • Relation borrower

41
SQLLecture 3
42
SQL
  • Data Definition
  • Basic Query Structure
  • Set Operations
  • Aggregate Functions
  • Null Values
  • Nested Subqueries
  • Complex Queries
  • Views

43
Data Definition Language
Allows the specification of not only a set of
relations but also information about each
relation, including
  • The schema for each relation.
  • The domain of values associated with each
    attribute.
  • Integrity constraints
  • The set of indices to be maintained for each
    relations.
  • Security and authorization information for each
    relation.
  • The physical storage structure of each relation
    on disk.

44
Basic Query Structure
  • A typical SQL query has the form select A1, A2,
    ..., An from r1, r2, ..., rm where P
  • order by
  • group by
  • having Q
  • Ais represent attributes
  • ris represent relations
  • P is a predicate.
  • This query is equivalent to the relational
    algebra expression.
  • ?A1, A2, ..., An(?P (r1 x r2 x ... x
    rm))
  • The result of an SQL query is a relation.

45
Set Operations
  • Find all customers who have a loan, an account,
    or both

(select customer-name from depositor) union (sel
ect customer-name from borrower)
  • Find all customers who have both a loan and
    an account.

(select customer-name from depositor) intersect
(select customer-name from borrower)
  • Find all customers who have an account but no
    loan.
  • (select customer-name from depositor) minus
  • (select customer-name from borrower)

46
Aggregate Functions
  • These functions operate on the multiset of values
    of a column of a relation, and return a value
  • avg average value min minimum value max
    maximum value sum sum of values count
    number of values

47
Null Values and Aggregates
  • Total all loan amounts
  • select sum (amount) from loan
  • Above statement ignores null amounts
  • result is null if there is no non-null amount,
    that is the
  • All aggregate operations except count() ignore
    tuples with null values on the aggregated
    attributes.

48
Nested Subqueries
  • SQL provides a mechanism for the nesting of
    subqueries.
  • A subquery is a select-from-where expression that
    is nested within another query.
  • A common use of subqueries is to perform tests
    for set membership, set comparisons, and set
    cardinality.

49
Set Comparison
  • Find all branches that have greater assets than
    some branch located in Brooklyn.

select distinct T.branch-name from branch as T,
branch as S where T.assets gt S.assets and
S.branch-city Brooklyn
  • Same query using gt some clause

select branch-name from branch where assets gt
some (select assets from branch
where branch-city Brooklyn)
50
Example Query
  • Find the names of all branches that have greater
    assets than all branches located in Brooklyn.

select branch-name from branch where assets gt
all (select assets from branch where
branch-city Brooklyn)
51
Test for Empty Relations
  • The exists construct returns the value true if
    the argument subquery is nonempty.
  • exists r ?? r ? Ø
  • not exists r ?? r Ø

52
Example
  • Student(name,sport)

Name
Sport
Yuri soccer Yuri baseball Yuri
tennis Joe football Joe soccer Jane
tennis Jim tennis Yuri tennis Jim
football
Find students that play all sports
.
(Student)
.
Student
sport
53
Example
  • Student(name,sport)
  • Find students that play all sports
  • Select distinct id
  • from students S
  • where not exists (
  • (select distinct
    sport from student)
  • minus
  • (select distinct
    sport from student T
  • where S.id
    T.id)
  • Find students that are playing exactly one sport
  • Select id
  • from (select id, count()
  • from student
  • group by id
  • having count() 1)

54
Entity-Relationship Model
  • Lecture 5

55
ER Model Components
  • Entity Sets
  • Attributes
  • Relationships


56
ER Model
title
year
Name
Address
Star-in
Movies
Stars
length
genre
owns
Studios
Name
Address
57
Relationship
  • A relationship is a cartesian product of
  • n ? 2 entities
  • (e1, e2, en) e1 ? E1, e2 ? E2, , en ?
    Enwhere (e1, e2, , en) is a relationship

58
Degree of a Relationship Set
  • Refers to number of entity sets that participate
    in a relationship set.
  • Relationship sets that involve two entity sets
    are binary (or degree two). Generally, most
    relationship sets in a database system are
    binary.
  • Relationship sets may involve more than two
    entity sets.
  • Relationships between more than two entity sets
    are rare. Most relationships are binary.
  • E.g. Suppose employees of a bank may have jobs
    (responsibilities) at multiple branches, with
    different jobs at different branches. Then there
    is a ternary relationship set between entity sets
    employee, job and branch

59
Courses
Enrolls
Students
Instructors
  • Students Courses TAs
  • Ann CS43005 Jan
  • Sue CS43005 Pat
  • Bob CS43005 Jan

60
Types of Binary Relationships
Many-one
Many-many
One-one
  • Representation of Many-One

Many-one E/R arrow pointing to one.
61
Keys for Relationship Sets
  • The combination of primary keys of the
    participating entity sets forms a super key of a
    relationship set.
  • (customer-id, account-number) is the super key of
    depositor
  • NOTE this means a pair of entity sets can have
    at most one relationship in a particular
    relationship set.
  • E.g. if we wish to track all access-dates to each
    account by each customer, we cannot assume a
    relationship for each access. We can use a
    multivalued attribute though
  • Must consider the mapping cardinality of the
    relationship set when deciding the what are the
    candidate keys
  • Need to consider semantics of relationship set in
    selecting the primary key in case of more than
    one candidate key

62
Converting Multiway to 2-Way
  • Create a new connecting E.S. to represent rows of
    a relationship set.
  • E.g., (Joe's Bar, Bud, 2.50) for the Sells
    relationship.
  • Many-one relationships from the connecting E.S.
    to the others.

BBP
The-Bar
The-Beer
The-Price
Bars
Beers
Price
63
Specialization
  • within an entity set that are distinctive from
    other entities in the set.
  • These subgroupings become lower-level entity sets
    that have attributes or participate in
    relationships that do not apply to the
    higher-level entity set.
  • Depicted by a triangle component labeled ISA
    (E.g. Top-down design process we designate
    subgroupings customer is a person).
  • Attribute inheritance a lower-level entity set
    inherits all the attributes and relationship
    participation of the higher-level entity set to
    which it is linked.

64
Specialization Example
65
Generalization
  • A bottom-up design process combine a number of
    entity sets that share the same features into a
    higher-level entity set.
  • Specialization and generalization are simple
    inversions of each other they are represented in
    an E-R diagram in the same way.
  • The terms specialization and generalization are
    used interchangeably.

66
Aggregation
  • Consider the ternary relationship works-on,
    which we saw earlier
  • Suppose we want to record managers for tasks
    performed by an employee at a branch

67
E-R Diagram With Aggregation
68
Weak Entity Sets
  • An entity set that does not have a primary key is
    referred to as a weak entity set.
  • The existence of a weak entity set depends on the
    existence of a identifying entity set
  • it must relate to the identifying entity set via
    a total, one-to-many relationship set from the
    identifying to the weak entity set
  • Identifying relationship depicted using a double
    diamond
  • The discriminator (or partial key) of a weak
    entity set is the set of attributes that
    distinguishes among all the entities of a weak
    entity set.
  • The primary key of a weak entity set is formed by
    the primary key of the strong entity set on which
    the weak entity set is existence dependent, plus
    the weak entity sets discriminator.

69
Weak Entity Sets
  • We depict a weak entity set by double rectangles.
  • We underline the discriminator of a weak entity
    set with a dashed line.
  • payment-number discriminator of the payment
    entity set
  • Primary key for payment (loan-number,
    payment-number)

70
Example
student
instructor
enrolls
teaches
offering
isoffered
course
requires
71
Reduction of an E-R Schema to Tables
  • Primary keys allow entity sets and relationship
    sets to be expressed uniformly as tables which
    represent the contents of the database.
  • A database which conforms to an E-R diagram can
    be represented by a collection of tables.
  • For each entity set and relationship set there is
    a unique table which is assigned the name of the
    corresponding entity set or relationship set.
  • Each table has a number of columns (generally
    corresponding to attributes), which have unique
    names.

72
Representing Entity Sets as Tables
  • A strong entity set reduces to a table with the
    same attributes.

73
Representing Relationship Sets as Tables
  • A many-to-many relationship set is represented as
    a table with columns for the primary keys of the
    two participating entity sets, and any
    descriptive attributes of the relationship set.
  • E.g. table for relationship set borrower

74
Additional Rules for Translating Relationship
into Relation
If one entity set participates several times in
the relationship with different roles, its key
attributes must be listed as many times and with
different names for each role. Studies(SSN,
Name) Favorite(SSN, Name) Friends(SSN1, SSN2)
Name
SSN
subject
studies
Student
friends
favorite
75
Redundancy of Tables
  • Many-to-one relationship sets that are total on
    the many-side can be represented by adding an
    extra attribute to the many side, containing the
    primary key of the one side
  • Example We eliminate relation Favorite and we
    extend relation for Student as follows
  • Student(SSN, Name, Subject.name)
  • If, however, the relationship is many-to-many we
    cannot do that since it leads to redundancy
  • For example relation Studies cannot be
    eliminated since otherwise we may end up with
  • 111-222-333 John OS
  • 111-222-333 John DBMS

76
Representing Weak Entity Sets
  • A weak entity set becomes a table that includes a
    column for the primary key of the identifying
    strong entity set

77
Representing Specialization as Tables
  • Form a table for the higher level entity
  • Form a table for each lower level entity set,
    include primary key of higher level entity set
    and local attributes table table
    attributesperson name, street, city
    customer name, credit-ratingemployee name,
    salary
  • Drawback getting information about, e.g.,
    employee requires accessing two tables

78
Relations Corresponding to Aggregation
  • To represent aggregation, create a table
    containing
  • primary key of the aggregated relationship,
  • the primary key of the associated entity set
  • Any descriptive attributes

79
Example
ssn
name
ISA
person
passenger
age
booked
ISA
date
departure
assigned
pilot
fhrs
gate
instantof
canfly
flight
plane
dtime
atime
F
man
model
80
Example
ssn
Name
name
Plays
Player
Team
age
Uses
ISA
Has
Colors
Captain
Likes-Colors
likes
Fan
name
addr
81
Relational Database Design Theory
  • Lecture 6

82
First Normal Form
  • Domain is atomic if its elements are considered
    to be indivisible units
  • Examples of non-atomic domains
  • Set of names, composite attributes
  • Identification numbers like CS101 that can be
    broken up into parts
  • A relational schema R is in first normal form if
    the domains of all attributes of R are atomic
  • Non-atomic values complicate storage and
    encourage redundant (repeated) storage of data
  • Example Set of accounts stored with each
    customer, and set of owners stored with each
    account
  • We assume all relations are in first normal form

83
First Normal Form
  • Atomicity is actually a property of how the
    elements of the domain are used.
  • Example Strings would normally be considered
    indivisible
  • Suppose that students are given roll numbers
    which are strings of the form CS0012 or EE1127
  • If the first two characters are extracted to find
    the department, the domain of roll numbers is not
    atomic.
  • Doing so is a bad idea leads to encoding of
    information in application program rather than in
    the database.

84
Functional Dependencies
  • Constraints on the set of legal relations.
  • Require that the value for a certain set of
    attributes determines uniquely the value for
    another set of attributes.
  • A functional dependency is a generalization of
    the notion of a key.

85
Functional Dependencies
  • Let R(A1, A2, .Ak) be a relational schema X and
    Y are subsets of A1, A2, Ak. We say that X-gtY,
  • if any two tuples that agree on X, then
    they agree on Y.
  • Example
  • Student(SSN,Name,Addr,subjectTaken,favSubject,Prof
    )
  • SSN-gtName
  • SSN-gtAddr
  • subjectTaken-gtProf
  • Assign(Pilot,Flight,Date,Departs)
  • Pilot,Date,Departs-gtFlight
  • Flight,Date-gtPilot

86
Functional Dependencies
  • A functional dependency X-gtY is trivial if it is
    satisfied by any relation that includes
    attributes from X and Y
  • E.g.
  • customer-name, loan-number ? customer-name
  • customer-name ? customer-name
  • In general, ? ? ? is trivial if ? ? ?

87
Closure of a Set of Functional Dependencies
  • Given a set F set of functional dependencies,
    there are certain other functional dependencies
    that are logically implied by F.
  • E.g. If A ? B and B ? C, then we can infer
    that A ? C
  • The set of all functional dependencies logically
    implied by F is the closure of F.
  • We denote the closure of F by F.

88
Closure of a Set of Functional Dependencies
  • An inference axiom is a rule that states if a
    relation satisfies certain FDs, it must also
    satisfy certain other FDs
  • Set of inference rules is sound if the rules lead
    only to true conclusions
  • Set of inference rules is complete, if it can be
    used to conclude every valid FD on R
  • We can find all of F by applying Armstrongs
    Axioms
  • if ? ? ?, then ? ? ?
    (reflexivity)
  • if ? ? ?, then ? ? ? ? ?
    (augmentation)
  • if ? ? ?, and ? ? ?, then ? ? ? (transitivity)
  • These rules are
  • sound and complete

89
Example
  • R (A, B, C, G, H, I)F A ? B A ? C CG
    ? H CG ? I B ? H
  • some members of F
  • A ? H
  • by transitivity from A ? B and B ? H
  • AG ? I
  • by augmenting A ? C with G, to get AG ? CG
    and then transitivity with CG ? I

90
Procedure for Computing F
  • To compute the closure of a set of functional
    dependencies F
  • F Frepeat for each functional
    dependency f in F apply reflexivity and
    augmentation rules on f add the resulting
    functional dependencies to F for each pair of
    functional dependencies f1and f2 in F if
    f1 and f2 can be combined using transitivity
    then add the resulting functional dependency to
    Funtil F does not change any further

91
Closure of Attribute Sets
  • Given a set of attributes a, define the closure
    of a under F (denoted by a) as the set of
    attributes that are functionally determined by a
    under F a ? ? is in F ? ? ? a
  • Algorithm to compute a, the closure of a under
    F result a while (changes to result)
    do for each ? ? ? in F do begin if ? ?
    result then result result ? ? end

92
Uses of Attribute Closure
  • There are several uses of the attribute closure
    algorithm
  • Testing for superkey
  • To test if ? is a superkey, we compute ?, and
    check if ? contains all attributes of R.
  • Testing functional dependencies
  • To check if a functional dependency ? ? ? holds
    (or, in other words, is in F), just check if ? ?
    ?.
  • That is, we compute ? by using attribute
    closure, and then check if it contains ?.
  • Is a simple and cheap test, and very useful
  • Computing closure of F
  • For each ? ? R, we find the closure ?, and for
    each S ? ?, we output a functional dependency ?
    ? S.

93
Example of Attribute Set Closure
  • R (A, B, C, G, H, I)
  • F A ? B, A ? C, CG ? H, CG ? I, B ? H
  • (AG)
  • 1. result AG
  • 2. result ABCG (A ? C and A ? B)
  • 3. result ABCGH (CG ? H and CG ? AGBC)
  • 4. result ABCGHI (CG ? I and CG ? AGBCH)
  • Is AG a key?
  • Is AG a super key?
  • Does AG ? R? Is (AG) ? R
  • Is any subset of AG a superkey?
  • Does A ? R? Is (A) ? R
  • Does G ? R? Is (G) ? R

94
Extraneous Attributes
  • Consider a set F of functional dependencies and
    the functional dependency ? ? ? in F.
  • Attribute A is extraneous in ? if A ? ? and
  • F logically implies (? A) ? ? or
  • Attribute A ? ? is extraneous in ? if A ? ?
    and the set of functional dependencies (F
    ? ? ?) ? ? ?(? A) logically implies F.
  • Example Given F A ? C, AB ? C
  • B is extraneous in AB ? C because A ? C
    logically implies AB? C, A ?C.
  • Example Given F A ? C, AB ? CD
  • C is extraneous in AB ? CD since AB ? D,A ?C
    implies AB ? C

95
Testing if an Attribute is Extraneous
  • Consider a set F of functional dependencies and
    the functional dependency ? ? ? in F.
  • To test if attribute A ? ? is extraneous in ?
  • compute (? A) using the dependencies in
  • F
  • 2. check that (? A) contains A if it does,
    A is extraneous
  • To test if attribute A ? ? is extraneous in ?
  • compute ? using only the dependencies in
    F (F ? ? ?) ? ? ?(? A),
  • check that ? contains A if it does, A is
    extraneous

96
Canonical Cover
  • Sets of functional dependencies may have
    redundant dependencies that can be inferred from
    the others
  • Eg A ? C is redundant in A ? B, B ? C,
    A ? C
  • Parts of a functional dependency may be redundant
  • E.g. A ? B, B ? C, A ? CD can be
    simplified to A ? B,
    B ? C, A ? D
  • E.g. A ? B, B ? C, AC ? D can be
    simplified to A ? B,
    B ? C, A ? D
  • A canonical cover of F is a minimal set of
    functional dependencies equivalent to F, having
    no redundant dependencies or redundant parts of
    dependencies

97
Canonical Cover(Formal Definition)
  • A canonical cover for F is a set of dependencies
    Fc such that
  • F logically implies all dependencies in Fc, and
  • Fc logically implies all dependencies in F, and
  • No functional dependency in Fc contains an
    extraneous attribute, and
  • Each left side of functional dependency in Fc is
    unique.

98
Canonical CoverComputation
  • To compute a canonical cover for Frepeat Use
    the union rule to replace any dependencies in
    F ?1 ? ?1 and ?1 ? ?1 with ?1 ? ?1 ?2 Find a
    functional dependency ? ? ? with an extraneous
    attribute either in ? or in ? If an extraneous
    attribute is found, delete it from ? ? ? until F
    does not change

99
Example of Computing a Canonical Cover
  • R (A, B, C)F A ? BC B ? C A ? B AB ?
    C
  • Combine A ? BC and A ? B into A ? BC
  • A is extraneous in AB ? C
  • Set is now A ? BC, B ? C
  • C is extraneous in A ? BC
  • Check if A ? C is logically implied by A ? B and
    the other dependencies
  • Yes using transitivity on A ? B and B ? C.
  • The canonical cover is A ? B B ? C

100
Decomposition
  • All attributes of an original schema (R) must
    appear in the decomposition (R1, R2)
  • R R1 ? R2
  • Lossless-join decomposition.For all possible
    relations r on schema R
  • r ?R1 (r) ?R2 (r)
  • A decomposition of R into R1 and R2 is lossless
    join if and only if at least one of the following
    dependencies is in F
  • R1 ? R2 ? R1
  • R1 ? R2 ? R2

101
Normalization Using Functional Dependencies
  • When we decompose a relation schema R with a set
    of functional dependencies F into R1, R2,.., Rn
    we want
  • Lossless-join decomposition Otherwise
    decomposition would result in information loss.
  • Dependency preservation Let Fi be the set of
    dependencies F that include only attributes in
    Ri.
  • (F1 ? F2 ? ? Fn) F
  • .

102
Example
  • R (A, B, C)F A ? B, B ? C)
  • Can be decomposed in two different ways
  • R1 (A, B), R2 (B, C)
  • Lossless-join decomposition
  • R1 ? R2 B and B ? BC
  • Dependency preserving
  • R1 (A, B), R2 (A, C)
  • Lossless-join decomposition
  • R1 ? R2 A and A ? AB
  • Not dependency preserving (cannot check B ? C
    without computing R1 R2)

103
Testing for Dependency Preservation
  • To check if a dependency ??? is preserved in a
    decomposition of R into R1, R2, , Rn we apply
    the following simplified test (with attribute
    closure done w.r.t. F)
  • result ?while (changes to result) do for each
    Ri in the decomposition t (result ? Ri) ?
    Ri result result ? t
  • If result contains all attributes in ?, then the
    functional dependency ? ? ? is preserved.
  • We apply the test on all dependencies in F to
    check if a decomposition is dependency preserving
  • This procedure takes polynomial time, instead of
    the exponential time required to compute F and
    (F1 ? F2 ? ? Fn)

104
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a
set F of functional dependencies if for all
functional dependencies in F of the form ??? ?,
where ? ? R and ? ? R, at least one of the
following holds
  • ?? ? ? is trivial (i.e., ? ? ?)
  • ? is a superkey for R

105
Example
  • R (A, B, C)F A ? B B ? CKey A
  • R is not in BCNF
  • Decomposition R1 (A, B), R2 (B, C)
  • R1 and R2 in BCNF
  • Lossless-join decomposition
  • Dependency preserving

106
Testing for BCNF
  • To check if a non-trivial dependency ???? causes
    a violation of BCNF
  • 1. compute ? (the attribute closure of ?), and
  • 2. verify that it includes all attributes of R
  • Using only F is incorrect when testing a relation
    in a decomposition of R
  • E.g. Consider R (A, B, C, D), with F A ?B, B
    ?C
  • Decompose R into R1(A,B) and R2(A,C,D)
  • Neither of the dependencies in F contain only
    attributes from (A,C,D) so we might be mislead
    into thinking R2 satisfies BCNF.
  • In fact, dependency A ? C in F shows R2 is not
    in BCNF.

107
BCNF Decomposition Algorithm
  • result Rdone falsecompute Fwhile
    (not done) do if (there is a schema Ri in result
    that is not in BCNF) then begin let ?? ? ?
    be a nontrivial functional dependency that holds
    on Ri such that ?? ? Ri is not in F, and ? ? ?
    ?result (result Ri ) ? (Ri ?) ? (?, ?
    ) end else done true
  • Each Ri is in BCNF, and decomposition is
    lossless-join.

108
BCNF and Dependency Preservation
It is not always possible to get a BCNF
decomposition that is dependency preserving
  • R (A, B, C)F AB ? C C ? BTwo candidate
    keys AB and AC
  • R is not in BCNF
  • Any decomposition of R will fail to preserve
  • AB ? C

109
Third Normal Form
  • A relation schema R is in third normal form (3NF)
    if for all ? ? ? in F at least one of the
    following holds
  • ? ? ? is trivial (i.e., ? ? ?)
  • ? is a superkey for R
  • Each attribute A in ? ? is contained in a
    candidate key for R.
  • If a relation is in BCNF it is in 3NF (since in
    BCNF one of the first two conditions above must
    hold).
  • Third condition is a minimal relaxation of BCNF
    to ensure dependency preservation.

110
Third Normal Form
  • Example
  • R (A,B,C)F AB ? C, C ? B
  • Two candidate keys AB and AC
  • R is in 3NF
  • AB ? C AB is a superkey C ? B B is contained
    in a candidate key
  • BCNF decomposition has (AC) and (BC)
  • Testing for AB ? C requires a join

111
Testing for 3NF
  • Use attribute closure to check for each
    dependency ? ? ?, if ? is a superkey.
  • If ? is not a superkey, we have to verify if each
    attribute in ? is contained in a candidate key of
    R
  • this test is rather more expensive, since it
    involve finding candidate keys
  • testing for 3NF has been shown to be NP-hard
  • However, decomposition into third normal form can
    be done in polynomial time

112
3NF Decomposition Algorithm
  • Let Fc be a canonical cover for Fi 0for
    each functional dependency ? ? ? in Fc do if
    none of the schemas Rj, 1 ? j ? i contains ? ?
    then begin i i 1 Ri ? ?
    endif none of the schemas Rj, 1 ? j ? i
    contains a candidate key for R then begin i
    i 1 Ri any candidate key for
    R end return (R1, R2, ..., Ri)

113
Storage Hierarchy
114
Storage Hierarchy
  • primary storage Fastest media but volatile
    (cache, main memory).
  • secondary storage next level in hierarchy,
    non-volatile, moderately fast access time
  • also called on-line storage
  • E.g. flash memory, magnetic disks
  • tertiary storage lowest level in hierarchy,
    non-volatile, slow access time
  • also called off-line storage
  • E.g. magnetic tape, optical storage

115
Magnetic Disks
  • Disk controller interfaces between the computer
    system and the disk drive hardware.
  • accepts high-level commands to read or write a
    sector
  • initiates actions such as moving the disk arm to
    the right track and actually reading or writing
    the data
  • Computes and attaches checksums to each sector to
    verify that data is read back correctly
  • If data is corrupted, with very high probability
    stored checksum wont match recomputed checksum
  • Ensures successful writing by reading back sector
    after writing it
  • Performs remapping of bad sectors

116
Disk Subsystem
  • Multiple disks connected to a computer system
    through a controller
  • Controllers functionality (checksum, bad sector
    remapping) often carried out by individual disks
    reduces load on controller
  • Disk interface standards families
  • ATA (AT adaptor) range of standards
  • SCSI (Small Computer System Interconnect) range
    of standards
  • Several variants of each standard (different
    speeds and capabilities)

117
Performance Measures of Disks
  • Access time the time it takes from when a read
    or write request is issued to when data transfer
    begins. Consists of
  • Seek time time it takes to reposition the arm
    over the correct track.
  • Average seek time is 1/2 the worst case seek
    time.
  • 4 to 10 milliseconds on typical disks
  • Rotational latency time it takes for the sector
    to be accessed to appear under the head.
  • Average latency is 1/2 of the worst case
    latency.
  • 4 to 11 milliseconds on typical disks
  • Data-transfer rate the rate at which data can
    be retrieved from or stored to the disk.
  • 4 to 8 MB per second is typical
  • Multiple disks may share a controller, so
    transfer rate that controller can handle is also
    important
  • E.g. ATA-5 66 MB/second, SCSI-3 40 MB/s
  • Fiber Channel 256 MB/s

118
Storage Access
  • A database file is partitioned into fixed-length
    storage units called blocks. Blocks are units of
    both storage allocation and data transfer.
  • Database system seeks to minimize the number of
    block transfers between the disk and memory. We
    can reduce the number of disk accesses by keeping
    as many blocks as possible in main memory.
  • Buffer portion of main memory available to
    store copies of disk blocks.
  • Buffer manager subsystem responsible for
    allocating buffer space in main memory.

119
Buffer Manager
  • Programs call on the buffer manager when they
    need a block from disk.
  • If the block is already in the buffer, the
    requesting program is given the address of the
    block in main memory
  • If the block is not in the buffer,
  • the buffer manager allocates space in the buffer
    for the block, replacing (throwing out) some
    other block, if required, to make space for the
    new block.
  • The block that is thrown out is written back to
    disk only if it was modified since the most
    recent time that it was written to/fetched from
    the disk.
  • Once space is allocated in the buffer, the buffer
    manager reads the block from the disk to the
    buffer, and passes the address of the block in
    main memory to requester.

120
Buffer-Replacement Policies
  • Most operating systems replace the block least
    recently used (LRU strategy)
  • Idea behind LRU use past pattern of block
    references as a predictor of future references
  • Queries have well-defined access patterns (such
    as sequential scans), and a database system can
    use the information in a users query to predict
    future references
  • LRU can be a bad strategy for certain access
    patterns involving repeated scans of data
  • e.g. when computing the join of 2 relations r
    and s by a nested loops for each tuple tr of r
    do for each tuple ts of s do if the
    tuples tr and ts match
  • Mixed strategy with hints on replacement strategy
    providedby the query optimizer is preferable

121
Buffer-Replacement Policies
  • Pinned block memory block that is not allowed
    to be written back to disk.
  • Toss-immediate strategy frees the space
    occupied by a block as soon as the final tuple of
    that block has been processed
  • Most recently used (MRU) strategy system must
    pin the block currently being processed. After
    the final tuple of that block has been processed,
    the block is unpinned, and it becomes the most
    recently used block.
  • Buffer manager can use statistical information
    regarding the probability that a request will
    reference a particular relation
  • E.g., the data dictionary is frequently accessed.
    Heuristic keep data-dictionary blocks in main
    memory buffer
  • Buffer managers also support forced output of
    blocks for the purpose of recovery

122
File Organization
  • The database is stored as a collection of files.
    Each file is a sequence of records. A record is
    a sequence of fields.
  • One approach
  • assume record size is fixed
  • each file has records of one particular type only
  • different files are used for different relations
  • This case is easiest to implement will consider
    variable length records later.

123
Fixed-Length Records
  • Simple approach
  • Store record i starting from byte n ? (i 1),
    where n is the size of each record.
  • Record access is simple but records may cross
    blocks
  • Modification do not allow records to cross block
    boundaries
  • Deletion of record I alternatives
  • move records i 1, . . ., n to i, . . . , n 1
  • move record n to i
  • do not move records, but link all free records
    on afree list

124
Variable-Length Records Slotted Page Structure
  • Slotted page header contains
  • number of record entries
  • end of free space in the block
  • location and size of each record
  • Records can be moved around within a page to keep
    them contiguous with no empty space between them
    entry in the header must be updated.
  • Pointers should not point directly to record
    instead they should point to the entry for the
    record in header.

125
Organization of Records in Files
  • Heap a record can be placed anywhere in the
    file where there is space
  • Sequential store records in sequential order,
    based on the value of the search key of each
    record
  • Hashing a hash function computed on some
    attribute of each record the result specifies in
    which block of the file the record should be
    placed
  • Records of each relation may be stored in a
    separate file. In a clustering file organization
    records of several different relations can be
    stored in the same file
  • Motivation store related records on the same
    block to minimize I/O

126
Sequential File Organization
  • Suitable for applications that require sequential
    processing of the entire file
  • The records in the file are ordered by a
    search-key

127
Clustering File Organization
  • Simple file structure stores each relation in a
    separate file
  • Can instead store several relations in one file
    using a clustering file organization
  • E.g., clustering organization of customer and
    depositor
  • l scan using a secondary index is expensive
  • each record access may fetch a new block from
    disk
  • an entry was deleted from their parent)
  • Root node then had only one child, and was
    deleted and its child became the new root node
Write a Comment
User Comments (0)
About PowerShow.com