Title: SQL cont'
1SQL (cont.)
2Single-Relation Queries
3Our Running Example
- Most of our SQL queries will be based on the
following database schema. - Underline indicates key attributes.
- Beers(name, manf)
- Bars(name, addr, license)
- Drinkers(name, addr, phone)
- Likes(drinker, beer)
- Sells(bar, beer, price)
- Frequents(drinker, bar)
4Example
- Using Beers(name, manf), what beers are made by
Anheuser-Busch? - SELECT name
- FROM Beers
- WHERE manf Anheuser-Busch
5Null Values
6Testing for Null
- Can test for NULL explicitly
- x IS NULL
- x IS NOT NULL
- SELECT
- FROM Person
- WHERE age lt 25 OR age gt 25 OR age IS NULL
- Now it includes all Persons
7Multi-Relation Queries
8Multirelation Queries
- Interesting queries often combine data from more
than one relation. - We can address several relations in one query by
listing them all in the FROM clause. - Distinguish attributes of the same name by
ltrelationgt.ltattributegt
9Example
- Using relations Likes(drinker, beer) and
Frequents(drinker, bar), find the beers liked by
at least one person who frequents Joes Bar. - SELECT beer
- FROM Likes, Frequents
- WHERE bar Joes Bar AND
- Frequents.drinker Likes.drinker
10Another Example
Product (pname, price, category, maker) Purchase
(buyer, seller, store, product) Company
(cname, stockPrice, country) Person(pname,
phoneNumber, city) Find names of people living
in Champaign that bought gizmo products, and the
names of the stores they bought from
SELECT pname, storeFROM Person,
PurchaseWHERE pnamebuyer AND
cityChampaign AND
productgizmo
11Disambiguating Attributes
Find names of people buying telephony products
Product (name, price, category, maker) Purchase
(buyer, seller, store, product) Person(name,
phoneNumber, city)
SELECT Person.name FROM Person, Purchase,
Product WHERE Person.namePurchase.buyer
AND ProductProduct.name AND
Product.categorytelephony
12Formal Semantics
- Almost the same as for single-relation queries
- Start with the product of all the relations in
the FROM clause. - Apply the selection condition from the WHERE
clause. - Project onto the list of attributes and
expressions in the SELECT clause.
13Operational Semantics
- Imagine one tuple-variable for each relation in
the FROM clause. - These tuple-variables visit each combination of
tuples, one from each relation. - If the tuple-variables are pointing to tuples
that satisfy the WHERE clause, send these tuples
to the SELECT clause.
14Example
drinker bar drinker
beer tv1 tv2 Sally Bud Sally
Joes Likes Frequents
15Explicit Tuple-Variables
- Sometimes, a query needs to use two copies of the
same relation. - Distinguish copies by following the relation name
by the name of a tuple-variable, in the FROM
clause. - Its always an option to rename relations this
way, even when not essential.
16Example
- From Beers(name, manf), find all pairs of beers
by the same manufacturer. - Do not produce pairs like (Bud, Bud).
- Produce pairs in alphabetic order, e.g. (Bud,
Miller), not (Miller, Bud). - SELECT b1.name, b2.name
- FROM Beers b1, Beers b2
- WHERE b1.manf b2.manf AND
- b1.name lt b2.name
17Tuple Variables
Find pairs of companies making products in the
same category
SELECT product1.maker, product2.maker FROM
Product AS product1, Product AS product2 WHERE
product1.categoryproduct2.category AND
product1.maker ltgt product2.maker
Product ( name, price, category, maker)
18Tuple Variables
Tuple variables introduced automatically by the
system Product ( name, price, category,
maker) Becomes Doesnt
work when Product occurs more than once In that
case the user needs to define variables
explicitly.
SELECT name FROM Product WHERE price gt
100
SELECT Product.name FROM Product AS Product
WHERE Product.price gt 100
19Meaning (Semantics) of SQL Queries
- SELECT a1, a2, , ak
- FROM R1 AS x1, R2 AS x2, , Rn AS xn
- WHERE Conditions
- 1. Nested loops
Answer for x1 in R1 do for x2 in R2
do .. for xn in Rn
do if Conditions
then Answer Answer U
(a1,,ak) return Answer
20Meaning (Semantics) of SQL Queries
- SELECT a1, a2, , ak
- FROM R1 AS x1, R2 AS x2, , Rn AS xn
- WHERE Conditions
- 2. Parallel assignment
- Doesnt impose any order !
Answer for all assignments x1 in R1, , xn
in Rn do if Conditions then Answer
Answer U (a1,,ak) return Answer
21Meaning (Semantics) of SQL Queries
- SELECT a1, a2, , ak
- FROM R1 AS x1, R2 AS x2, , Rn AS xn
- WHERE Conditions
- 3. Translation to Relational algebra
- a1,,ak ( s Conditions (R1 x R2 x x Rn))
- Select-From-Where queries are precisely
Select-Project-Join
22Exercises
Product ( pname, price, category,
maker) Purchase (buyer, seller, store,
product) Company (cname, stock price,
country) Person( per-name, phone number,
city) Ex 1 Find people who bought telephony
products. Ex 2 Find names of people who bought
American products Ex 3 Find names of people who
bought American products and did not
buy French products Ex 4 Find names of people
who bought American products and they
live in Champaign. Ex 5 Find people who bought
stuff from Joe or bought products
from a company whose stock prices is more than
50.
23SubqueriesBoolean Operators IN, EXISTS, ANY,
ALL
24Subqueries
- A parenthesized SELECT-FROM-WHERE statement
(subquery) can be used as a value in a number of
places, including FROM and WHERE clauses. - Example in place of a relation in the FROM
clause, we can place another query, and then
query its result. - Better use a tuple-variable to name tuples of the
result.
25Subqueries That Return One Tuple
- If a subquery is guaranteed to produce one tuple,
then the subquery can be used as a value. - Usually, the tuple has one component.
- Also typically, a single tuple is guaranteed by
keyness of attributes. - A run-time error occurs if there is no tuple or
more than one tuple.
26Example
- From Sells(bar, beer, price), find the bars that
serve Miller for the same price Joe charges for
Bud. - Two queries would surely work
- Find the price Joe charges for Bud.
- Find the bars that serve Miller at that price.
27Query Subquery Solution
- SELECT bar
- FROM Sells
- WHERE beer Miller AND
- price (SELECT price
- FROM Sells
- WHERE bar Joes Bar
- AND beer Bud)
The price at which Joe sells Bud
28The IN Operator
- lttuplegt IN ltrelationgt is true if and only if the
tuple is a member of the relation. - lttuplegt NOT IN ltrelationgt means the opposite.
- IN-expressions can appear in WHERE clauses.
- The ltrelationgt is often a subquery.
29Example
- From Beers(name, manf) and Likes(drinker, beer),
find the name and manufacturer of each beer that
Fred likes. - SELECT
- FROM Beers
- WHERE name IN (SELECT beer
- FROM Likes
- WHERE drinker Fred)
The set of beers Fred likes
30The Exists Operator
- EXISTS( ltrelationgt ) is true if and only if the
ltrelationgt is not empty. - Being a boolean-valued operator, EXISTS can
appear in WHERE clauses. - Example From Beers(name, manf), find those beers
that are the unique beer by their manufacturer.
31Example Query with EXISTS
- SELECT name
- FROM Beers b1
- WHERE NOT EXISTS(
- SELECT
- FROM Beers
- WHERE manf b1.manf AND
- name ltgt b1.name)
Notice scope rule manf refers to closest nested
FROM with a relation having that attribute.
Set of beers with the same manf as b1, but not
the same beer
32The Operator ANY
- x ANY( ltrelationgt ) is a boolean condition
meaning that x equals at least one tuple in the
relation. - Similarly, can be replaced by any of the
comparison operators. - Example x gt ANY( ltrelationgt ) means x is not
smaller than all tuples in the relation. - Note tuples must have one component only.
33The Operator ALL
- Similarly, x ltgt ALL( ltrelationgt ) is true if and
only if for every tuple t in the relation, x is
not equal to t. - That is, x is not a member of the relation.
- The ltgt can be replaced by any comparison
operator. - Example x gt ALL( ltrelationgt ) means there is no
tuple larger than x in the relation.
34Example
- From Sells(bar, beer, price), find the beer(s)
sold for the highest price. - SELECT beer
- FROM Sells
- WHERE price gt ALL(
- SELECT price
- FROM Sells)
price from the outer Sells must not be less than
any price.
35More SQL Relations as BagsGrouping and
AggregationDatabase Modification (next lecture)
36Relational Algebra Operations on Bags (and why
we care)
- Union a,b,b,c U a,b,b,b,e,f,f
a,a,b,b,b,b,b,c,e,f,f - add the number of occurrences
- Difference a,b,b,b,c,c b,c,c,c,d
a,b,b,d - subtract the number of occurrences
- Intersection a,b,b,b,c,c b,b,c,c,c,c,d
b,b,c,c - minimum of the two numbers of occurrences
- Selection preserve the number of occurrences
- Projection preserve the number of occurrences
(no duplicate elimination) - Cartesian product, join no duplicate elimination
Read Section 5.3 of the book for more detail
37Bag Semantics for SFW Queries
- The SELECT-FROM-WHERE statement uses bag
semantics - Selection preserve the number of occurrences
- Projection preserve the number of occurrences
(no duplicate elimination) - Cartesian product, join no duplicate elimination
38Union, Intersection, and Difference
- Union, intersection, and difference of relations
are expressed by the following forms, each
involving subqueries - ( subquery ) UNION ( subquery )
- ( subquery ) INTERSECT ( subquery )
- ( subquery ) EXCEPT ( subquery )
39Example
- From relations Likes(drinker, beer), Sells(bar,
beer, price) and Frequents(drinker, bar), find
the drinkers and beers such that - The drinker likes the beer, and
- The drinker frequents at least one bar that
sells the beer.
40Solution
- (SELECT FROM Likes)
- INTERSECT
- (SELECT drinker, beer
- FROM Sells, Frequents
- WHERE Frequents.bar Sells.bar
- )
41Bag Semantics for Set Operations in SQL
- Although the SELECT-FROM-WHERE statement uses bag
semantics, the default for union, intersection,
and difference is set semantics. - That is, duplicates are eliminated as the
operation is applied.
42Motivation Efficiency
- When doing projection in relational algebra, it
is easier to avoid eliminating duplicates. - Just work tuple-at-a-time.
- When doing intersection or difference, it is most
efficient to sort the relations first. - At that point you may as well eliminate the
duplicates anyway.
43Controlling Duplicate Elimination
- Force the result to be a set by SELECT
DISTINCT . . . - Force the result to be a bag (i.e., dont
eliminate duplicates) by ALL, as in . . .
UNION ALL . . .
44Example DISTINCT
- From Sells(bar, beer, price), find all the
different prices charged for beers - SELECT DISTINCT price
- FROM Sells
- Notice that without DISTINCT, each price would be
listed as many times as there were bar/beer pairs
at that price.
45Example ALL
- Using relations Frequents(drinker, bar) and
Likes(drinker, beer) - (SELECT drinker FROM Frequents)
- EXCEPT ALL
- (SELECT drinker FROM Likes)
- Lists drinkers who frequent more bars than they
like beers, and does so as many times as the
difference of those counts.
46Join Expressions
- SQL provides a number of expression forms that
act like varieties of join in relational algebra. - But using bag semantics, not set semantics.
- These expressions can be stand-alone queries or
used in place of relations in a FROM clause.
47Products and Natural Joins
- Natural join is obtained by
- R NATURAL JOIN S
- Product is obtained by
- R CROSS JOIN S
- Example
- Likes NATURAL JOIN Serves
- Relations can be parenthesized subexpressions, as
well.
48Theta Join
- R JOIN S ON ltconditiongt is a theta-join, using
ltconditiongt for selection. - Example using Drinkers(name, addr) and
Frequents(drinker, bar) - Drinkers JOIN Frequents ON
- name drinker
- gives us all (d, a, d, b) quadruples such that
drinker d lives at address a and frequents bar b.
49Motivation for Outerjoins
- Explicit joins in SQL
- Product(name, category)
- Purchase(prodName, store)
- SELECT Product.name, Purchase.store
- FROM Product JOIN Purchase ON
- Product.name
Purchase.prodName - Same as
- SELECT Product.name, Purchase.store
- FROM Product, Purchase
- WHERE Product.name Purchase.prodName
- But Products that never sold will be lost !
50Null Values and Outerjoins
- Left outer joins in SQL
- Product(name, category)
- Purchase(prodName, store)
-
- SELECT Product.name, Purchase.store
- FROM Product LEFT OUTER JOIN Purchase ON
- Product.name
Purchase.prodName
51Product
Purchase
52Outer Joins
- Left outer join
- Include the left tuple even if theres no match
- Right outer join
- Include the right tuple even if theres no match
- Full outer join
- Include the both left and right tuples even if
theres no match
53Outerjoins
- R OUTER JOIN S is the core of an outerjoin
expression. It is modified by - Optional NATURAL in front of OUTER.
- Optional ON ltconditiongt after JOIN.
- Optional LEFT, RIGHT, or FULL before OUTER.
- LEFT pad dangling tuples of R only.
- RIGHT pad dangling tuples of S only.
- FULL pad both this choice is the default.
54Aggregations
- SUM, AVG, COUNT, MIN, and MAX can be applied to a
column in a SELECT clause to produce that
aggregation on the column. - Also, COUNT() counts the number of tuples.
55Example Aggregation
- From Sells(bar, beer, price), find the average
price of Bud - SELECT AVG(price)
- FROM Sells
- WHERE beer Bud
56Eliminating Duplicates in an Aggregation
- DISTINCT inside an aggregation causes duplicates
to be eliminated before the aggregation. - Example find the number of different prices
charged for Bud - SELECT COUNT(DISTINCT price)
- FROM Sells
- WHERE beer Bud
57NULLs Ignored in Aggregation
- NULL never contributes to a sum, average, or
count, and can never be the minimum or maximum of
a column. - But if there are no non-NULL values in a column,
then the result of the aggregation is NULL.
58Example Effect of NULLs
- SELECT count()
- FROM Sells
- WHERE beer Bud
- SELECT count(price)
- FROM Sells
- WHERE beer Bud
59Grouping
- We may follow a SELECT-FROM-WHERE expression by
GROUP BY and a list of attributes. - The relation that results from the
SELECT-FROM-WHERE is grouped according to the
values of all those attributes, and any
aggregation is applied only within each group.
60Example Grouping
- From Sells(bar, beer, price), find the average
price for each beer - SELECT beer, AVG(price)
- FROM Sells
- GROUP BY beer
61Example Grouping
- From Sells(bar, beer, price) and
Frequents(drinker, bar), find for each drinker
the average price of Bud at the bars they
frequent - SELECT drinker, AVG(price)
- FROM Frequents, Sells
- WHERE beer Bud AND
- Frequents.bar Sells.bar
- GROUP BY drinker
62Restriction on SELECT Lists With Aggregation
- If any aggregation is used, then each element of
the SELECT list must be either - Aggregated, or
- An attribute on the GROUP BY list.
63Illegal Query Example
- You might think you could find the bar that sells
Bud the cheapest by - SELECT bar, MIN(price)
- FROM Sells
- WHERE beer Bud
- But this query is illegal in SQL.
- Why? Note bar is neither aggregated nor on the
GROUP BY list.
64HAVING Clauses
- HAVING ltconditiongt may follow a GROUP BY clause.
- If so, the condition applies to each group, and
groups not satisfying the condition are
eliminated.
65Requirements on HAVING Conditions
- These conditions may refer to any relation or
tuple-variable in the FROM clause. - They may refer to attributes of those relations,
as long as the attribute makes sense within a
group i.e., it is either - A grouping attribute, or
- Aggregated.
66Example HAVING
- From Sells(bar, beer, price) and Beers(name,
manf), find the average price of those beers that
are either served in at least three bars or are
manufactured by Petes.
67Solution
- SELECT beer, AVG(price)
- FROM Sells
- GROUP BY beer
- HAVING COUNT(bar) gt 3 OR
- beer IN (SELECT name
- FROM Beers
- WHERE manf Petes)
68General form of Grouping and Aggregation
- SELECT S
- FROM R1,,Rn
- WHERE C1
- GROUP BY a1,,ak
- HAVING C2
- S may contain attributes a1,,ak and/or any
aggregates but NO OTHER ATTRIBUTES - C1 is any condition on the attributes in
R1,,Rn - C2 is any condition on aggregate expressions
69General form of Grouping and Aggregation
- SELECT S
- FROM R1,,Rn
- WHERE C1
- GROUP BY a1,,ak
- HAVING C2
- Evaluation steps
- Compute the FROM-WHERE part, obtain a table with
all attributes in R1,,Rn - Group by the attributes a1,,ak
- Compute the aggregates in C2 and keep only groups
satisfying C2 - Compute aggregates in S and return the result
70Extra Materials onAggregation Grouping
71Aggregation
SELECT Sum(price) FROM Product WHERE
makerToyota
SQL supports several aggregation operations
SUM, MIN, MAX, AVG, COUNT
72Aggregation Count
SELECT Count() FROM Product WHERE year gt
1995
Except COUNT, all aggregations apply to a single
attribute
73Aggregation Count
COUNT applies to duplicates, unless otherwise
stated SELECT Count(name, category)
same as Count() FROM Product WHERE year gt
1995 Better SELECT Count(DISTINCT name,
category) FROM Product WHERE year gt 1995
74Simple Aggregation
Purchase(product, date, price, quantity) Example
1 find total sales for the entire
database SELECT Sum(price quantity) FROM
Purchase Example 1 find total sales of
bagels SELECT Sum(price quantity) FROM
Purchase WHERE product bagel
75Grouping and Aggregation
Usually, we want aggregations on certain parts of
the relation. Purchase(product, date, price,
quantity) Example 2 find total sales after 9/1
per product. SELECT product,
Sum(pricequantity) AS TotalSales FROM
Purchase WHERE date gt 9/1 GROUPBY product
76Grouping and Aggregation
1. Compute the relation (I.e., the FROM and
WHERE). 2. Group by the attributes in the
GROUPBY 3. Select one tuple for every group (and
apply aggregation) SELECT can have (1) grouped
attributes or (2) aggregates.
77HAVING Clause
We consider only products that had at least 100
buyers.
SELECT product, Sum(price quantity) FROM
Purchase WHERE date gt 9/1 GROUP
BY product HAVING Sum(quantity) gt 30
HAVING clause contains conditions on aggregates.
78Aggregation
- Author(login,name)
- Document(url, title)
- Wrote(login,url)
- Mentions(url,word)
79- Find all authors who wrote at least 10 documents
Select author.name From author, wrote Where
author.loginwrote.login Groupby
author.name Having count(wrote.url) gt 10
80- Find all authors who have a vocabulary over 10000
Select author.name From author, wrote,
mentions Where author.loginwrote.login and
wrote.urlmentions.url Groupby author.name Having
count(distinct mentions.word) gt 10000