Title: Midterm Review II
1Midterm Review II
2Anomalies
- Redundancy.
- Information may be repeated unnecessarily in
several tuples. - E.g. length and filmType.
- Update anomalies.
- We may change information in one tuple but leave
it unchanged in other tuples. - E.g. we could change the length of Star Wars to
125, in the first tuple, and forget to do the
same in the second and third tuple. - Deletion anomalies.
- If a set of values becomes empty, we may lose
other information as a side effect. - E.g. if we delete Emilio Estevez we will lose all
the information about Mighty Ducks.
3Decomposing Relations - Example
Movie1 relation
- No true redundancy!
- The update anomaly disappeared. If we change the
length of a movie, it is done only once. - The deletion anomaly disappeared. If we delete
all the stars from Movie2 we still will have the
other info for a movie.
Movie2 relation
4Boyce-Codd Normal Form
- The goal of decomposition is to replace a
relation by several that do not exhibit
anomalies. - There is a simple condition under which the
anomalies can be guaranteed not to exist. - This condition is called Boyce-Codd Normal Form,
or BCNF. - A relation is in BCNF if
- Whenever there is a nontrivial dependency
- A1A2An?B1B2Bm
- for R, it must be the case that the left hand
side - A1 , A2 , , An is a superkey for R.
5Boyce-Codd Normal Form - Example
Violating BCNF
- Relation Movie in the previous figure is not in
BCNF. - Consider the FD title year?length filmType
studioName - Unfortunately, the left side of the above
dependency is not a superkey. - In particular we know that the title and the year
does not functionally determine starName. - On the other hand, Movie1 is in BCNF.
- The only key is title, year and
- title year ? length filmType studioName is the
only (non-trivial) FD that holds in the relation.
6Decomposition into BCNF
- The decomposition strategy is
- Find a non-trivial FD A1A2An?B1B2Bm that
violates BCNF, i.e. A1A2An is not a superkey. - Decompose the relation schema into two
overlapping relation schemas - One is all the attributes involved in the
violating dependency and - the other is the left side and all the other
attributes not involved in the dependency. - By repeatedly, choosing suitable decompositions,
we can break any relation schema into a
collection of smaller schemas in BCNF. - The data in the original relation is represented
faithfully by the data in the relations that are
the result of the decomposition. - i.e. we can reconstruct the original relation
exactly from the decomposed relations.
7Boyce-Codd Normal Form - Example
Consider relation schema Movies(title, year,
studioName, president, presAddr) and functional
dependencies title year ? studioName studioName
? president president ? presAddr Last two FDs
violate BCNF. Why? Compute title, year,
studioName, president and see if you get
all the attributes of the relation. If not, you
got some FD which is violates BCNF, and need to
break relation.
8Boyce-Codd Normal Form Example
- Lets decompose starting with
- studioName ? president
- Lets add to the right-hand side any other
attributes in the closure of studioName (optional
rule of thumb). - XstudioName studioName?president
- XstudioName, president president?presAddr
- XstudioNamestudioName, president, presAddr
9Boyce-Codd Normal Form Example
From the closure we get studioName?president
presAddr We decompose the relation schema into
the following two schemas Movies1(studioName,
president, presAddr) Movies2(title, year,
studioName) Movies2 is in BCNF. Because we cant
find a bad FD holding there. What about
Movies1? The following dependency violates
BCNF. president?presAddr Why its bad to leave
Movies1 table as is? If many studios share the
same president than we would have redundancy when
repeating the presAddr in all those studios.
10Boyce-Codd Normal Form Example
We must decompose Movies1, using the
FD president?presAddr The resulting relation
schemas, both in BCNF, are Movies11(title, year,
studioName) Movies12(studioName, president) In
general, we must keep applying the decomposition
rule as many times as needed, until all our
relations are in BCNF. So, finally we got
Movies11, Movies12, and Movies2.
11Finding FDs for the decomposed relations
- When we decompose a relation, we need to check
that the resulting schemas are in BCNF. - We cant tell a relation is in BCNF, unless we
can determine the FDs that hold for that
relation.
12Finding FDs for the decomposed relations
- Suppose S is one of the resulting relations in a
decomposition of R. - For this
- Consider each subset X of attributes of S.
- Compute X using the FD on R.
- At the end throw out the attributes of R, which
arent in S. - Then, for each attribute B such that
- B is an attribute of S,
- B is in X
- we have that the functional dependency X?B holds
in S.
13Relational Algebra Operations
- Operations of relational algebra fall into four
broad classes - The usual set operations
- union
- intersection
- difference
- Operations that remove parts of a relation
- selection eliminates some rows(tuples)
- projection eliminates some columns
- Operations that combine the tuples of two
relations - Cartesian product pairs the tuples of two
relations in all possible ways - join selectively pairs tuples from two
relations. - An operation called renaming.
14Conditions for Set Operations on Relations
- R and S must have schemas with identical sets of
attributes. - Before applying the operations, the columns of R
and S must be ordered so that the order of
attributes is the same for both relations.
15Projection
Produces from a relation R a new relation that
has only some of Rs columns. ?A1, A2,,An(R) is
a relation that has only the columns for
attributes A1, A2,, An of R. Example Compute
the expression ?title, year, length(Movies) on
the table title year length filmType
studioName producerC Star wars
1977 124 color Fox 12345 Mighty Ducks
1991 104 color Disney 67890 Waynes World
1992 95 color Paramount 99999
16Example (Continued)
Resulting relation title year length Star
wars 1977 124 Mighty Ducks 1991 104 Waynes
World 1992 95 What about ?filmtype(Movies)
17Selection
Selection, applied to a relation R, produces a
new relation with a subset of Rs tuples. The
tuples in the result are those that satisfy some
condition C. Denote it with ?C( R ). The schema
for the resulting relation is the same as Rs
schema. Example The expression
?length?100(Movie) is title
year length filmType studioName producerC Star
wars 1977 124 color Fox 12345 Mighty
Ducks 1991 104 color Disney 67890
18Cartesian Product
Cartesian Product of two relations R and S is the
set of pairs that can be formed by choosing the
first element of the pair to be any element of R
and the second an element of S. This denoted as
R?S. Example R A B S B C D
1 2 2 5 6 3 4
4 7 8 9 10 11
R?S A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 1 2 9 1
0 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11
19Natural Join
Denoted as R S. Let A1, A2,,An be the
attributes in both the schema of R and the schema
of S. Then a tuple r from R and a tuple s from S
are successfully paired if and only if r and s
agree on each of the attributes A1, A2, , An.
Example The natural join of the relation R and
S from previous example is A B C D
1 2 5 6 3 4 7 8
20Combing Operations to Form Queries
- What are the title and years of movies made by
Fox that are at least 100 minutes long? - One way to compute the answer to this query is
- Select those Movie tuples that have length ?
100. - Select those Movie tuples that have studioName
Fox. - Compute the intersection of first and second
steps. - Project the relation from the third step onto
attributes title and year.
21Another Example
Consider two relations Movie1 and Movie2, With
schemas Movie1(title, year, length, filmType,
studioName) Movie2(title, year, starName) Suppose
we want to know Find the stars of the movies
that are at least 100 minutes long. First we
join the two relations Movie1, Movie2 Second we
select movies with length at least 100 min. Then
we project the starName.
22Relational Algebra on Bags
- A bag is like a set, but an element may appear
more than once. - Example 1,2,1,3 is a bag. 1,2,3 is also a
bag that happens to be a set. - Bags also resemble lists, but order in a bag is
unimportant. - Example
- 1,2,1 1,1,2 as bags, but
- 1,2,1 ! 1,1,2 as lists.
23Operations on Bags
- Selection applies to each tuple, so its effect on
bags is like its effect on sets. - Projection also applies to each tuple, but as a
bag operator, we do not eliminate duplicates. - Products and joins are done on each pair of
tuples, so duplicates in bags have no effect on
how we operate.
24Bag Union, Intersection, Difference
- An element appears in the union of two bags the
sum of the number of times it appears in each
bag. - Example 1,2,1 ? 1,1,2,3,1
1,1,1,1,1,2,2,3 - An element appears in the intersection of two
bags the minimum of the number of times it
appears in either. - Example 1,2,1 ?? 1,2,3 1,2.
- An element appears in difference A B of bags
as many times as it appears in A, minus the
number of times it appears in B. - But never less than 0 times.
- Example 1,2,1 1,2,3 1.
25The Extended Algebra
- ? eliminate duplicates from bags.
- ? sort tuples.
- Extended projection arithmetic, duplication of
columns. - ? grouping and aggregation.
- OUTERJOIN avoids dangling tuples tuples that
do not join with anything.
26Example Outerjoin
R A B S B C 1 2 2 3 4 5 6 7 (1,2)
joins with (2,3), but the other two tuples are
dangling.
27Aggregation Operators
- They apply to entire columns of a table and
produce a single result. - The most important examples
- SUM
- AVG
- COUNT
- MIN
- MAX
28Example Aggregation
R A B 1 3 3 4 3 2
SUM(A) 7 COUNT(A) 3 MAX(B) 4 MIN(B)
2 AVG(B) 3
29Grouping Operator
- R1 ?L (R2).
- L is a list of elements that are either
- Grouping attributes.
- AGG(A), where AGG is one of the aggregation
operators and A is an attribute. - Semantics
- Group R according to all the grouping attributes
on list L. - That is, form one group for each distinct list of
values for those attributes in R. - Within each group, compute AGG(A) for each
aggregation on list L. - Result has grouping attributes and aggregations
as attributes. - One tuple for each list of values for the
grouping attributes and their groups
aggregations.
30Example Grouping/Aggregation
R A B C 1 2 3 4 5 6 1 2 5 ? A,B,AVG(C) (R)
??
31Example Grouping/Aggregation
- StarsIn(title, year, starName)
- We want, for each star who has appeared in at
least three movies the earliest year in which he
or she appeared. - First we group, using starName as a grouping
attribute. - Then, we have to compute the MIN(year) for each
group. - However, we need also compute COUNT(title)
aggregate for each group, in order to filter out
those stars with less than three movies. - ?ctTitlegt3?starName,MIN(year)?minYear,COUNT(title
)?ctTitle(StarsIn)
32Aggregations in SQL
- SUM, AVG, COUNT, MIN, and MAX can be applied to a
column in a SELECT clause to produce that
aggregation on the column. - Find the average length of movies from Disney.
- SELECT AVG(length)
- FROM Movie
- WHERE studioName 'Disney'
33Eliminating Duplicates in an Aggregation
- DISTINCT inside an aggregation causes duplicates
to be eliminated before the aggregation. - Example Find the number of different producers
for Disney movies - SELECT COUNT(DISTINCT producerc)
- FROM Movie
- WHERE studioname 'Disney'
- This is not the same as
- SELECT DISTINCT COUNT(producerc)
- FROM Movie
- WHERE studioname 'Disney'
34NULLs Ignored in Aggregation
- NULL never contributes to a sum, average, or
count, and can never be the minimum or maximum of
a column. - select SUM(networth)
- from movieexec
35Example Effect of NULLs
- SELECT count()
- FROM Movie
- WHERE studioName 'Disney'
- SELECT count(length)
- FROM Movie
- WHERE studioName 'Disney'
The number of movies from Disney.
36Grouping
- We may follow a SELECT-FROM-WHERE expression by
GROUP BY and a list of attributes. - The relation that results from the
SELECT-FROM-WHERE is grouped according to the
values of all those attributes, and any
aggregation is applied only within each group. - From Movie relation, find the average length for
each studio - SELECT studioName, AVG(length)
- FROM Movie
- GROUP BY studioName
37Example Grouping
- Find the producers total length of film
produced. -
- SELECT name, SUM(length)
- FROM Movie, MovieExec
- WHERE producerc cert
- GROUP BY name
38Restriction on SELECT Lists With Aggregation
- If any aggregation is used, then each element of
the SELECT list must be either - Aggregated, or
- An attribute on the GROUP BY list.
39Illegal Query Example
- We might think we could find the shortest movie
of Disney as - SELECT title, MIN(length)
- FROM Movie
- WHERE studioName 'Disney'
- But this query is illegal in SQL. Why?
- Because title is neither aggregated nor on the
GROUP BY list. - We should do instead
- SELECT title, length
- FROM Movie
- WHERE studioName 'Disney' AND length
- (SELECT MIN(length)
- FROM Movie
- WHERE studioName 'Disney')
40HAVING Clauses
- HAVING ltconditiongt may follow a GROUP BY clause.
- If so, the condition applies to each group, and
groups not satisfying the condition are
eliminated. - These conditions may refer to attributes that
make sense within a group i.e., they are either - Grouping attributes, or
- Aggregated attributes.
41Example HAVING
- Suppose that we didnt wish to include all the
producers in our table of aggregated movie
lengths. - Suppose for instance we want those producers who
have at least one movie before 1972. - SELECT name, SUM(length)
- FROM MovieExec, Movie
- WHERE producerc cert
- GROUP BY name
- HAVING MIN(year) lt 1973