Title: CS411 Database Systems SQL III
1CS411Database SystemsSQL III
- HW2 out. Due 1130 am Next Friday
- Due today Project Stage 1. Each group submits
a hard copy of the ER diagram of your application
domain, plus English descriptions on the
assumptions you make (e.g., We assume a director
can direct multiple movies).
2Subqueries powerful boolean operators
3EXISTS
- EXISTS( ltrelationgt ) is true if and only if the
ltrelationgt is not empty. - SELECT FROM students s
- WHERE EXISTS(SELECT FROM enroll e WHERE
year2007 AND e.sids.sid)
4NOT EXISTS
- EXISTS( ltrelationgt ) is true if and only if the
ltrelationgt is not empty. - SELECT FROM students s
- WHERE NOT EXISTS(SELECT FROM enroll e WHERE
year2007 AND e.sids.sid)
5IN
- SELECT FROM STUDENT
- WHERE sid IN (1645,1323,1532,123)
- Can be used to efficiently retrieve a large
number of specific records. But watch out for
max. length of SQL statement. e.g only retrieve
200 at a time, not 20000. - Here's an alternative-
- sid1645 OR sid1323 OR sid1532 OR sid123
6IN
- SELECT FROM students
- WHERE sid IN (SELECT sid FROM enroll WHERE
year2007)
7NOT IN
- SELECT FROM students
- WHERE sid NOT IN (SELECT sid FROM enroll WHERE
year2007)
8Would this work instead?
- SELECT s.sid FROM students s, enroll e
- WHERE s.sid ! e.sid
9Would this work instead?
- SELECT s.sid FROM students s, enroll e
- WHERE s.sid ! e.sid
- No. Remember inner loop semantics...
- FOR s in STUDENT
- FOR e in ENROLL
- IF s.sid ! e.sid PRINT s.sid
10Give me DISTINCT tuples!
- for which years do I have enrollment data?
- SELECT year FROM enroll
- ... large of rows!
11Give me DISTINCT tuples!
- for which years do I have enrollment data?
- SELECT year FROM enroll
- ... large of rows!
- SELECT DISTINCT year FROM enroll
- year
- 2004
- 2005
- 2006
12ALL, ANY
- x gt ANY(ltrelationgt) x is larger than at least
one tuple. - x gt ALL(ltrelationgt) x is larger than all tuples.
- ... can also use lt gt lt eg x ltANY(R)
13Students that started earlier than our of
enrollment information?Which SQL is correct?
- SELECT FROM students s
- WHERE s.startyear lt ANY(SELECT DISTINCT year
FROM enroll) - SELECT FROM students s
- WHERE s.startyear lt ALL(SELECT DISTINCT year
FROM enroll)
14Students that started earlier than our of
enrollment information?
- SELECT FROM students s
- WHERE s.startyear lt ANY(SELECT DISTINCT year
FROM enroll) - students that started before latest year
- SELECT FROM students s
- WHERE s.startyear lt ALL(SELECT DISTINCT year
FROM enroll)
15Except Union Intersect
- There's often more than one way to write a query
- SELECT sid FROM students
- WHERE sid NOT IN (SELECT sid FROM enroll WHERE
year2007) - SELECT sid FROM students EXCEPT SELECT sid FROM
enroll WHERE year2007
16Subqueries as a single value
- WHERE price(SELECT price FROM r WHERE ... )
- A run-time error occurs if there is no tuple or
more than one tuple.
17Example
- From Sells(bar, beer, price), find the bars that
serve Miller for the same price Joe charges for
Bud. - Two queries would surely work
- Find the price Joe charges for Bud.
- Find the bars that serve Miller at that price.
18Query Subquery Solution
- SELECT bar
- FROM Sells
- WHERE beer Miller AND
- price (SELECT price
- FROM Sells
- WHERE bar Joes Bar
- AND beer Bud)
The price at which Joe sells Bud
19Example
- From Beers(name, manf) and Likes(drinker, beer),
find the name and manufacturer of each beer that
Fred likes. - SELECT
- FROM Beers
- WHERE name IN
The set of beers Fred likes
20Example
- From Beers(name, manf) and Likes(drinker, beer),
find the name and manufacturer of each beer that
Fred likes. - SELECT
- FROM Beers
- WHERE name IN (SELECT beer
- FROM Likes
- WHERE drinker Fred)
The set of beers Fred likes
21The Exists Operator
- EXISTS( ltrelationgt ) is true if and only if the
ltrelationgt is not empty. - Being a boolean-valued operator, EXISTS can
appear in WHERE clauses. - Example From Beers(name, manf), find those beers
that are the only beer by their manufacturer.
22Example Query with EXISTS
- SELECT name
- FROM Beers b1
- WHERE NOT EXISTS(
- SELECT
- FROM Beers
- WHERE manf b1.manf AND
- name ltgt b1.name)
Notice scope rule manf refers to closest nested
FROM with a relation having that attribute.
Set of beers with the same manf as b1, but not
the same beer
23The Operator ANY
- x ANY( ltrelationgt ) is a boolean condition
meaning that x equals at least one tuple in the
relation. - Similarly, can be replaced by any of the
comparison operators. - Example x gt ANY( ltrelationgt ) means x is not
smaller than all tuples in the relation. - Note tuples must have one component only.
24The Operator ALL
- Similarly, x ltgt ALL( ltrelationgt ) is true if and
only if for every tuple t in the relation, x is
not equal to t. - That is, x is not a member of the relation.
- The ltgt can be replaced by any comparison
operator. - Example x gt ALL( ltrelationgt ) means there is no
tuple larger than x in the relation.
25Example
- From Sells(bar, beer, price), find the beer(s)
sold for the highest price. - SELECT beer
- FROM Sells
- WHERE price gt ALL(
- SELECT price
- FROM Sells)
price from the outer Sells must not be less than
any price.
26Relations as Bags
27Relational Algebra Operations on Bags
- Union a,b,b,c U a,b,b,b,e,f,f
a,a,b,b,b,b,b,c,e,f,f - add the number of occurrences
- Difference a,b,b,b,c,c b,c,c,c,d a,b,b
- subtract the number of occurrences
- Intersection a,b,b,b,c,c b,b,c,c,c,c,d
b,b,c,c - minimum of the two numbers of occurrences
- Selection preserve the number of occurrences
- Projection preserve the number of occurrences
(no duplicate elimination) - Cartesian product, join no duplicate elimination
Read Section 5.3 of the book for more detail
28Intersection Example
- From relations Likes(drinker, beer), Sells(bar,
beer, price) and Frequents(drinker, bar), find
the drinkers and beers such that - The drinker likes the beer, and
- The drinker frequents at least one bar that
sells the beer.
29Intersection Example
- From relations Likes(drinker, beer), Sells(bar,
beer, price) and Frequents(drinker, bar), find
the drinkers and beers such that - The drinker likes the beer, and
- The drinker frequents at least one bar that
sells the beer. - (SELECT FROM Likes)
- INTERSECT
30Solution
- (SELECT FROM Likes)
- INTERSECT
- (SELECT drinker, beer
- FROM Sells, Frequents
- WHERE Frequents.bar Sells.bar
- )
31Bag Semantics for SELECT
- SELECT-FROM-WHERE statement uses bag semantics,
- ... unless you specify DISTINCT
32Bag Semantics for SELECT
- SELECT-FROM-WHERE statement uses bag semantics,
- ... unless you specify DISTINCT
- UNION, INTERSECTION, and DIFFERENCE
- use set semantics!
- That is, duplicates are eliminated as the
operation is applied. ... unless you specify ALL
33Motivation Efficiency
- When doing projection in relational algebra, it
is easier to avoid eliminating duplicates. - Just work one tuple-at-a-time.
- When doing intersection or difference, it is most
efficient to sort the relations first. - At that point you may as well eliminate the
duplicates anyway.
34Controlling Duplicate Elimination
- Force the result to be a set by SELECT
DISTINCT . . . - Force the result to be a bag (i.e., dont
eliminate duplicates) by ALL, as in . . .
UNION ALL . . .
35Example DISTINCT
- From Sells(bar, beer, price), find all the
different prices charged for beers - SELECT DISTINCT price
- FROM Sells
- Notice that without DISTINCT, each price would be
listed as many times as there were bar/beer pairs
at that price.
36Example ALL
- Using relations Frequents(drinker, bar) and
Likes(drinker, beer) - (SELECT drinker FROM Frequents)
- EXCEPT ALL
- (SELECT drinker FROM Likes)
- Lists drinkers who frequent more bars than they
like beers, and does so as many times as the
difference of those counts.
37Join Expressions
38Join Expressions
- SQL provides a number of expression forms that
act like varieties of join in relational algebra. - But using bag semantics, not set semantics.
- These expressions can be stand-alone queries or
used in place of relations in a FROM clause.
39Products and Natural Joins
- Natural join is obtained by
- R NATURAL JOIN S
- Cartesian Product is obtained by
- R CROSS JOIN S
- E.g.
- Likes NATURAL JOIN Serves
- Relations can be parenthesized subexpressions
40Blessed are the NULL makers
41Outer Join creates NULLs for us
- Product(pid,pname) Order(oid,pid,cname)
- 2 Plant 111 2 Amy
- 3 Chocolate
- Product NATURAL JOIN Order
- pid pname oid cname
- 2 Plant 111 Amy
- ... oops we lost Chocolate.
42Outer Join creates NULLs for us
- Product(pid,pname) Order(oid,pid,cname)
- 2 Plant 111 1 Amy
- 3 Chocolate
- Product NATURAL LEFT OUTER JOIN Order
- pid pname oid cname
- 2 Plant 112 Amy
- 3 Chocolate NULL NULL
43Theta Join
- R JOIN S ON ltconditiongt is a theta-join, using
ltconditiongt for selection. - Example using Drinkers(name, addr) and
Frequents(drinker, bar) - Drinkers JOIN Frequents ON
- name drinker
- gives us all (d, a, d, b) quadruples such that
drinker d lives at address a and frequents bar b.
44Challenge
- Average price of beer?
- of beers in our database?
- Average price of beer per bar?
- Sells(bar,beer,price)
45Grouping and Aggregation
46- We need Aggregation
- how to group rows?
- Group By
- Having 'where clause but for groups'
47Aggregations
- SUM, AVG, COUNT, MIN, and MAX can be applied to a
column in a SELECT clause to produce that
aggregation on the column. - Also, COUNT() counts the number of tuples.
48Example Aggregation
- From Sells(bar, beer, price), find the average
price of Bud - SELECT AVG(price)
- FROM Sells
- WHERE beer Bud
49Eliminating Duplicates in an Aggregation
- DISTINCT inside an aggregation causes duplicates
to be eliminated before the aggregation. - Example find the number of different prices
charged for Bud - SELECT COUNT(DISTINCT price)
- FROM Sells
- WHERE beer Bud
50NULLs Ignored in Aggregation
- NULL never contributes to a sum, average, or
count, and can never be the minimum or maximum of
a column. - But if there are no non-NULL values in a column,
then the result of the aggregation is NULL.
51Example Effect of NULLs
- SELECT count()
- FROM Sells
- WHERE beer Bud
- SELECT count(price)
- FROM Sells
- WHERE beer Bud
52Grouping
- We may follow a SELECT-FROM-WHERE expression by
GROUP BY and a list of attributes. - The relation that results from the
SELECT-FROM-WHERE is grouped according to the
values of all those attributes, and any
aggregation is applied only within each group.
53Example Grouping
- From Sells(bar, beer, price), find the average
price for each beer - SELECT beer, AVG(price)
- FROM Sells
- GROUP BY beer
54Example Grouping
- From Sells(bar, beer, price) and
Frequents(drinker, bar), find for each drinker
the average price of Bud at the bars they
frequent - SELECT drinker, AVG(price)
- FROM Frequents, Sells
- WHERE beer Bud AND
- Frequents.bar Sells.bar
- GROUP BY drinker
55Restriction on SELECT Lists With Aggregation
- If any aggregation is used, then each element of
the SELECT list must be either - Aggregated, or
- An attribute on the GROUP BY list.
- Why?
56Illegal Query Example
- You might think you could find the bar that sells
Bud the cheapest by - SELECT bar, MIN(price)
- FROM Sells
- WHERE beer Bud
- But this query is illegal in SQL.
- Why? Note bar is neither aggregated nor on the
GROUP BY list.
57HAVING Clauses
- HAVING ltconditiongt may follow a GROUP BY clause.
- If so, the condition applies to each group, and
groups not satisfying the condition are
eliminated.
58Requirements on HAVING Conditions
- These conditions may refer to any relation or
tuple-variable in the FROM clause. - They may refer to attributes of those relations,
as long as the attribute makes sense within a
group i.e., it is either - A grouping attribute, or
- Aggregated.
59Example HAVING
- From Sells(bar, beer, price) and Beers(name,
manf), find the average price of those beers that
are either served in at least three bars or are
manufactured by Petes.
60Example HAVING
- From Sells(bar, beer, price) and Beers(name,
manf), find the average price of those beers that
are either served in at least three bars or are
manufactured by Petes. - SELECT beer, AVG(price)
- FROM Sells
- GROUP BY beer
- HAVING ...
61Solution
- SELECT beer, AVG(price)
- FROM Sells
- GROUP BY beer
- HAVING COUNT(DISTINCT bar) gt 3 OR beer IN
(SELECT name - FROM Beers
- WHERE manf Petes)
62General form of Grouping and Aggregation
- SELECT S
- FROM R1,,Rn
- WHERE C1
- GROUP BY a1,,ak
- HAVING C2
- S may contain attributes a1,,ak and/or any
aggregates but NO OTHER ATTRIBUTES - C1 is any condition on the attributes in
R1,,Rn - C2 is any condition on aggregate expressions or
grouping attributes
63General form of Grouping and Aggregation
- SELECT S
- FROM R1,,Rn
- WHERE C1
- GROUP BY a1,,ak
- HAVING C2
- Evaluation steps
- Compute the FROM-WHERE part, obtain a table with
all attributes in R1,,Rn - Group by the attributes a1,,ak
- Compute the aggregates in C2 and keep only groups
satisfying C2 - Compute aggregates in S and return the result
64Database Modification
65Database Modifications
- A modification command does not return a result
as a query does, but it changes the database in
some way. - There are three kinds of modifications
- Insert a tuple or tuples.
- Delete a tuple or tuples.
- Update the value(s) of an existing tuple or
tuples.
66Insertion
- To insert a single tuple
- INSERT INTO ltrelationgt
- VALUES ( ltlist of valuesgt )
- Example add to Likes(drinker, beer) the fact
that Sally likes Bud. - INSERT INTO Likes
- VALUES(Sally, Bud)
67Specifying Attributes in INSERT
- We may add to the relation name a list of
attributes. - There are two reasons to do so
- We forget the standard order of attributes for
the relation. - We dont have values for all attributes, and we
want the system to fill in missing components
with NULL or a default value.
68Example Specifying Attributes
- Another way to add the fact that Sally likes Bud
to Likes(drinker, beer) - INSERT INTO Likes(beer, drinker)
- VALUES(Bud, Sally)
69Inserting Many Tuples
- We may insert the entire result of a query into a
relation, using the form - INSERT INTO ltrelationgt
- ( ltsubquerygt )
70Example Insert a Subquery
- Using Frequents(drinker, bar), enter into the new
relation PotBuddies(name) all of Sallys
potential buddies, i.e., those drinkers who
frequent at least one bar that Sally also
frequents.
71Solution
The other drinker
Pairs of Drinker tuples where the first is for
Sally, the second is for someone else, and the
bars are the same.
- INSERT INTO PotBuddies
- (SELECT d2.drinker
- FROM Frequents d1, Frequents d2
- WHERE d1.drinker Sally AND
- d2.drinker ltgt Sally AND
- d1.bar d2.bar
- )