Midterm Review II - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Midterm Review II

Description:

We may change information in one tuple but leave it unchanged ... Mike Meyers. Dana Carvey. Emilio Estevez. Harrison Ford. Mark Hamill. Carrie Fisher. starName ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 42
Provided by: scie232
Category:
Tags: meyers | midterm | review

less

Transcript and Presenter's Notes

Title: Midterm Review II


1
Midterm Review II
2
Anomalies
  • Redundancy.
  • Information may be repeated unnecessarily in
    several tuples.
  • E.g. length and filmType.
  • Update anomalies.
  • We may change information in one tuple but leave
    it unchanged in other tuples.
  • E.g. we could change the length of Star Wars to
    125, in the first tuple, and forget to do the
    same in the second and third tuple.
  • Deletion anomalies.
  • If a set of values becomes empty, we may lose
    other information as a side effect.
  • E.g. if we delete Emilio Estevez we will lose all
    the information about Mighty Ducks.

3
Decomposing Relations - Example
Movie1 relation
  • No true redundancy!
  • The update anomaly disappeared. If we change the
    length of a movie, it is done only once.
  • The deletion anomaly disappeared. If we delete
    all the stars from Movie2 we still will have the
    other info for a movie.

Movie2 relation
4
Boyce-Codd Normal Form
  • The goal of decomposition is to replace a
    relation by several that do not exhibit
    anomalies.
  • There is a simple condition under which the
    anomalies can be guaranteed not to exist.
  • This condition is called Boyce-Codd Normal Form,
    or BCNF.
  • A relation is in BCNF if
  • Whenever there is a nontrivial dependency
  • A1A2An?B1B2Bm
  • for R, it must be the case that the left hand
    side
  • A1 , A2 , , An is a superkey for R.

5
Boyce-Codd Normal Form - Example
Violating BCNF
  • Relation Movie in the previous figure is not in
    BCNF.
  • Consider the FD title year?length filmType
    studioName
  • Unfortunately, the left side of the above
    dependency is not a superkey.
  • In particular we know that the title and the year
    does not functionally determine starName.
  • On the other hand, Movie1 is in BCNF.
  • The only key is title, year and
  • title year ? length filmType studioName is the
    only (non-trivial) FD that holds in the relation.

6
Decomposition into BCNF
  • The decomposition strategy is
  • Find a non-trivial FD A1A2An?B1B2Bm that
    violates BCNF, i.e. A1A2An is not a superkey.
  • Decompose the relation schema into two
    overlapping relation schemas
  • One is all the attributes involved in the
    violating dependency and
  • the other is the left side and all the other
    attributes not involved in the dependency.
  • By repeatedly, choosing suitable decompositions,
    we can break any relation schema into a
    collection of smaller schemas in BCNF.
  • The data in the original relation is represented
    faithfully by the data in the relations that are
    the result of the decomposition.
  • i.e. we can reconstruct the original relation
    exactly from the decomposed relations.

7
Boyce-Codd Normal Form - Example
Consider relation schema Movies(title, year,
studioName, president, presAddr) and functional
dependencies title year ? studioName studioName
? president president ? presAddr Last two FDs
violate BCNF. Why? Compute title, year,
studioName, president and see if you get
all the attributes of the relation. If not, you
got some FD which is violates BCNF, and need to
break relation.
8
Boyce-Codd Normal Form Example
  • Lets decompose starting with
  • studioName ? president
  • Lets add to the right-hand side any other
    attributes in the closure of studioName (optional
    rule of thumb).
  • XstudioName studioName?president
  • XstudioName, president president?presAddr
  • XstudioNamestudioName, president, presAddr

9
Boyce-Codd Normal Form Example
From the closure we get studioName?president
presAddr We decompose the relation schema into
the following two schemas Movies1(studioName,
president, presAddr) Movies2(title, year,
studioName) Movies2 is in BCNF. Because we cant
find a bad FD holding there. What about
Movies1? The following dependency violates
BCNF. president?presAddr Why its bad to leave
Movies1 table as is? If many studios share the
same president than we would have redundancy when
repeating the presAddr in all those studios.
10
Boyce-Codd Normal Form Example
We must decompose Movies1, using the
FD president?presAddr The resulting relation
schemas, both in BCNF, are Movies11(title, year,
studioName) Movies12(studioName, president) In
general, we must keep applying the decomposition
rule as many times as needed, until all our
relations are in BCNF. So, finally we got
Movies11, Movies12, and Movies2.
11
Finding FDs for the decomposed relations
  • When we decompose a relation, we need to check
    that the resulting schemas are in BCNF.
  • We cant tell a relation is in BCNF, unless we
    can determine the FDs that hold for that
    relation.

12
Finding FDs for the decomposed relations
  • Suppose S is one of the resulting relations in a
    decomposition of R.
  • For this
  • Consider each subset X of attributes of S.
  • Compute X using the FD on R.
  • At the end throw out the attributes of R, which
    arent in S.
  • Then, for each attribute B such that
  • B is an attribute of S,
  • B is in X
  • we have that the functional dependency X?B holds
    in S.

13
Relational Algebra Operations
  • Operations of relational algebra fall into four
    broad classes
  • The usual set operations
  • union
  • intersection
  • difference
  • Operations that remove parts of a relation
  • selection eliminates some rows(tuples)
  • projection eliminates some columns
  • Operations that combine the tuples of two
    relations
  • Cartesian product pairs the tuples of two
    relations in all possible ways
  • join selectively pairs tuples from two
    relations.
  • An operation called renaming.

14
Conditions for Set Operations on Relations
  • R and S must have schemas with identical sets of
    attributes.
  • Before applying the operations, the columns of R
    and S must be ordered so that the order of
    attributes is the same for both relations.

15
Projection
Produces from a relation R a new relation that
has only some of Rs columns. ?A1, A2,,An(R) is
a relation that has only the columns for
attributes A1, A2,, An of R. Example Compute
the expression ?title, year, length(Movies) on
the table title year length filmType
studioName producerC Star wars
1977 124 color Fox 12345 Mighty Ducks
1991 104 color Disney 67890 Waynes World
1992 95 color Paramount 99999
16
Example (Continued)
Resulting relation title year length Star
wars 1977 124 Mighty Ducks 1991 104 Waynes
World 1992 95 What about ?filmtype(Movies)
17
Selection
Selection, applied to a relation R, produces a
new relation with a subset of Rs tuples. The
tuples in the result are those that satisfy some
condition C. Denote it with ?C( R ). The schema
for the resulting relation is the same as Rs
schema. Example The expression
?length?100(Movie) is title
year length filmType studioName producerC Star
wars 1977 124 color Fox 12345 Mighty
Ducks 1991 104 color Disney 67890
18
Cartesian Product
Cartesian Product of two relations R and S is the
set of pairs that can be formed by choosing the
first element of the pair to be any element of R
and the second an element of S. This denoted as
R?S. Example R A B S B C D
1 2 2 5 6 3 4
4 7 8 9 10 11
R?S A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 1 2 9 1
0 11 3 4 2 5 6 3 4 4 7 8 3 4 9 10 11
19
Natural Join
Denoted as R S. Let A1, A2,,An be the
attributes in both the schema of R and the schema
of S. Then a tuple r from R and a tuple s from S
are successfully paired if and only if r and s
agree on each of the attributes A1, A2, , An.
Example The natural join of the relation R and
S from previous example is A B C D
1 2 5 6 3 4 7 8
20
Combing Operations to Form Queries
  • What are the title and years of movies made by
    Fox that are at least 100 minutes long?
  • One way to compute the answer to this query is
  • Select those Movie tuples that have length ?
    100.
  • Select those Movie tuples that have studioName
    Fox.
  • Compute the intersection of first and second
    steps.
  • Project the relation from the third step onto
    attributes title and year.

21
Another Example
Consider two relations Movie1 and Movie2, With
schemas Movie1(title, year, length, filmType,
studioName) Movie2(title, year, starName) Suppose
we want to know Find the stars of the movies
that are at least 100 minutes long. First we
join the two relations Movie1, Movie2 Second we
select movies with length at least 100 min. Then
we project the starName.
22
Relational Algebra on Bags
  • A bag is like a set, but an element may appear
    more than once.
  • Example 1,2,1,3 is a bag. 1,2,3 is also a
    bag that happens to be a set.
  • Bags also resemble lists, but order in a bag is
    unimportant.
  • Example
  • 1,2,1 1,1,2 as bags, but
  • 1,2,1 ! 1,1,2 as lists.

23
Operations on Bags
  • Selection applies to each tuple, so its effect on
    bags is like its effect on sets.
  • Projection also applies to each tuple, but as a
    bag operator, we do not eliminate duplicates.
  • Products and joins are done on each pair of
    tuples, so duplicates in bags have no effect on
    how we operate.

24
Bag Union, Intersection, Difference
  • An element appears in the union of two bags the
    sum of the number of times it appears in each
    bag.
  • Example 1,2,1 ? 1,1,2,3,1
    1,1,1,1,1,2,2,3
  • An element appears in the intersection of two
    bags the minimum of the number of times it
    appears in either.
  • Example 1,2,1 ?? 1,2,3 1,2.
  • An element appears in difference A B of bags
    as many times as it appears in A, minus the
    number of times it appears in B.
  • But never less than 0 times.
  • Example 1,2,1 1,2,3 1.

25
The Extended Algebra
  1. ? eliminate duplicates from bags.
  2. ? sort tuples.
  3. Extended projection arithmetic, duplication of
    columns.
  4. ? grouping and aggregation.
  5. OUTERJOIN avoids dangling tuples tuples that
    do not join with anything.

26
Example Outerjoin
R A B S B C 1 2 2 3 4 5 6 7 (1,2)
joins with (2,3), but the other two tuples are
dangling.
27
Aggregation Operators
  • They apply to entire columns of a table and
    produce a single result.
  • The most important examples
  • SUM
  • AVG
  • COUNT
  • MIN
  • MAX

28
Example Aggregation
R A B 1 3 3 4 3 2
SUM(A) 7 COUNT(A) 3 MAX(B) 4 MIN(B)
2 AVG(B) 3
29
Grouping Operator
  • R1 ?L (R2).
  • L is a list of elements that are either
  • Grouping attributes.
  • AGG(A), where AGG is one of the aggregation
    operators and A is an attribute.
  • Semantics
  • Group R according to all the grouping attributes
    on list L.
  • That is, form one group for each distinct list of
    values for those attributes in R.
  • Within each group, compute AGG(A) for each
    aggregation on list L.
  • Result has grouping attributes and aggregations
    as attributes.
  • One tuple for each list of values for the
    grouping attributes and their groups
    aggregations.

30
Example Grouping/Aggregation
R A B C 1 2 3 4 5 6 1 2 5 ? A,B,AVG(C) (R)
??
31
Example Grouping/Aggregation
  • StarsIn(title, year, starName)
  • We want, for each star who has appeared in at
    least three movies the earliest year in which he
    or she appeared.
  • First we group, using starName as a grouping
    attribute.
  • Then, we have to compute the MIN(year) for each
    group.
  • However, we need also compute COUNT(title)
    aggregate for each group, in order to filter out
    those stars with less than three movies.
  • ?ctTitlegt3?starName,MIN(year)?minYear,COUNT(title
    )?ctTitle(StarsIn)

32
Aggregations in SQL
  • SUM, AVG, COUNT, MIN, and MAX can be applied to a
    column in a SELECT clause to produce that
    aggregation on the column.
  • Find the average length of movies from Disney.
  • SELECT AVG(length)
  • FROM Movie
  • WHERE studioName 'Disney'

33
Eliminating Duplicates in an Aggregation
  • DISTINCT inside an aggregation causes duplicates
    to be eliminated before the aggregation.
  • Example Find the number of different producers
    for Disney movies
  • SELECT COUNT(DISTINCT producerc)
  • FROM Movie
  • WHERE studioname 'Disney'
  • This is not the same as
  • SELECT DISTINCT COUNT(producerc)
  • FROM Movie
  • WHERE studioname 'Disney'

34
NULLs Ignored in Aggregation
  • NULL never contributes to a sum, average, or
    count, and can never be the minimum or maximum of
    a column.
  • select SUM(networth)
  • from movieexec

35
Example Effect of NULLs
  • SELECT count()
  • FROM Movie
  • WHERE studioName 'Disney'
  • SELECT count(length)
  • FROM Movie
  • WHERE studioName 'Disney'

The number of movies from Disney.
36
Grouping
  • We may follow a SELECT-FROM-WHERE expression by
    GROUP BY and a list of attributes.
  • The relation that results from the
    SELECT-FROM-WHERE is grouped according to the
    values of all those attributes, and any
    aggregation is applied only within each group.
  • From Movie relation, find the average length for
    each studio
  • SELECT studioName, AVG(length)
  • FROM Movie
  • GROUP BY studioName

37
Example Grouping
  • Find the producers total length of film
    produced.
  • SELECT name, SUM(length)
  • FROM Movie, MovieExec
  • WHERE producerc cert
  • GROUP BY name

38
Restriction on SELECT Lists With Aggregation
  • If any aggregation is used, then each element of
    the SELECT list must be either
  • Aggregated, or
  • An attribute on the GROUP BY list.

39
Illegal Query Example
  • We might think we could find the shortest movie
    of Disney as
  • SELECT title, MIN(length)
  • FROM Movie
  • WHERE studioName 'Disney'
  • But this query is illegal in SQL. Why?
  • Because title is neither aggregated nor on the
    GROUP BY list.
  • We should do instead
  • SELECT title, length
  • FROM Movie
  • WHERE studioName 'Disney' AND length
  • (SELECT MIN(length)
  • FROM Movie
  • WHERE studioName 'Disney')

40
HAVING Clauses
  • HAVING ltconditiongt may follow a GROUP BY clause.
  • If so, the condition applies to each group, and
    groups not satisfying the condition are
    eliminated.
  • These conditions may refer to attributes that
    make sense within a group i.e., they are either
  • Grouping attributes, or
  • Aggregated attributes.

41
Example HAVING
  • Suppose that we didnt wish to include all the
    producers in our table of aggregated movie
    lengths.
  • Suppose for instance we want those producers who
    have at least one movie before 1972.
  • SELECT name, SUM(length)
  • FROM MovieExec, Movie
  • WHERE producerc cert
  • GROUP BY name
  • HAVING MIN(year) lt 1973
Write a Comment
User Comments (0)
About PowerShow.com