Title: C20.0046: Database Management Systems Lecture
1C20.0046 Database Management SystemsLecture 13
- M.P. Johnson
- Stern School of Business, NYU
- Spring, 2005
2Agenda
- Finish SQL queries
- Updates and creating tables with SQL
- Indices, views, programs talking to SQL
- 3/10 (next week Thurs, before spring break)
- Midterm
- Hw2 due
- Today returning proj2 at end
- Mean 26
- Stdev 4
3Grouping Aggregation ops
- In SQL
- aggregation operators in SELECT,
- Grouping in GROUP BY clause
- Recall aggregation operators
- sum, avg, min, max, count
- strings, numbers, dates
- Each applies to scalars
- Count also applies to row count()
- Can DISTINCT inside aggregation op
count(DISTINCT x) - Grouping group rows that agree on single value
- Each group becomes one row in result
4Straight aggregation example
- Purchase(product, date, price, quantity)
- Q Find total sales for the entire database
- Q Find total sales of bagels
SELECT SUM(price quantity) FROM Purchase
SELECT SUM(price quantity) FROM
Purchase WHERE product 'bagel'
5Straight grouping
- Group rows together by field values
- Produces one row for each group
- I.e., by each (combin. of) grouped val(s)
- Dont select non-grouped fields
- Reduces to DISTINCT selections
SELECT product FROM Purchase GROUP BY
product
SELECT DISTINCT product FROM Purchase
6Illustrated GA example
- Sometimes want to group and compute aggregations
by group - Aggregation op applied to rows in group,
- not to all rows in table
- Purchase(product, date, price, quantity)
- Find total sales for products that sold for
0.50
SELECT product, SUM(pricequantity) total FROM
Purchase WHERE price .50 GROUP BY product
7Illustrated GA example
Purchase
8Illustrated GA example
- First compute the FROM-WHERE
- then GROUP BY product
9Illustrated GA example
- Finally, aggregate and select
SELECT product, SUM(pricequantity) total FROM
Purchase WHERW price .50 GROUP BY product
10Illustrated GA example
- GROUP BY may be reduced to (maybe more
complicated) subquery
SELECT product, SUM(pricequantity) total FROM
Purchase WHERE price .50 GROUP BY product
SELECT DISTINCT x.product, (SELECT
SUM(y.pricey.quantity)
FROM Purchase y
WHERE x.product y.product
AND y.price .50) total FROM Purchase
x WHERE x.price .50
11Multiple aggregations
For every product, what is the total sales and
max quantity sold?
SELECT product, SUM(price quantity)
SumSales, MAX(quantity)
MaxQuantity FROM Purchase WHERE price
.50 GROUP BY product
12Another grouping/aggregation e.g.
- Movie(title, year, length, studioName)
- Q How many total minutes of film have been
produced by each studio? - Strategy Divide movies into groups per studio,
then add lengths per group
13Another grouping/aggregation e.g.
SELECT studio, sum(length) totalLength FROM
Movies GROUP BY studio
14Another grouping/aggregation e.g.
SELECT studio, sum(length) totalLength FROM
Movies GROUP BY studio
15Another grouping/aggregation e.g.
SELECT studio, sum(length) totalLength FROM
Movies GROUP BY studio
16Grouping/aggregation example
- StarsIn(SName,Title,Year)
- Q Find the year of each stars first movie
- Q Find the span of each stars career
- Look up first and last movies
SELECT sname, min(year) firstyear FROM
StarsIn GROUP BY sname
17- Acc(name,bal,type)
- Q Who has the largest balance of each type?
- Can we do this with grouping/aggregation?
18G A for constructed relations
- Movie(title,year,producerSsn,length)
- MovieExec(name,ssn,netWorth)
- Can do the same thing for larger, non-atomic
relations - Q How many mins. of film did each producer make?
- What happens to non-producer movie-execs?
SELECT name, sum(length) total FROM Movie,
MovieExec WHERE producerSsn ssn GROUP BY name
19HAVING clauses
- Sometimes we want to limit which tuples may be
grouped - Q How many mins. of film did each rich producer
(i.e., netWorth 10000000) make? - Q Is HAVING necessary here?
- A No, could just add rich req. to WHERE
SELECT name, sum(length) total FROM Movie,
MovieExec WHERE producerSsn ssn GROUP BY
name HAVING netWorth 10000000
20HAVING clauses
- Sometimes we want to limit which tuples may be
grouped, based on properties of the group - Q How many mins. of film did each old producer
(i.e., who started before 1930) make? - Q Is HAVING necessary here?
SELECT name, sum(length) total FROM Movie,
MovieExec WHERE producerSsn ssn GROUP BY
name HAVING min(year)
21General form of GA
- NB Any attribute of relations in the FROM
clause may be aggregated in the HAVING clause,
but only those attributes that are in the GROUP
BY list may appear unaggregated in the HAVING
clause (the same rule as for the SELECT clause)
(Ullman, p283).
SELECT S FROM R1,,Rn WHERE C1 GROUP
BY As HAVING C2
Why?
- S may contain attributes As and/or any
aggregates but no other attributes - C1 condition on the attributes in R1,,Rn
- C2 condition on aggregations or attributes from
As
Why?
22Evaluation of GA
SELECT S FROM R1,,Rn WHERE C1 GROUP
BY a1,,ak HAVING C2
- Evaluation steps
- Compute the FROM-WHERE part as usual to obtain a
table with all attributes in R1,,Rn - Group by the attributes a1,,ak
- Compute the aggregates in C2 and keep only groups
satisfying C2 - Compute aggregates in S and return the result
23Web page examples
- Find all authors who wrote at least 10 documents
- Authors(login, name)
- Webpages(url, title, login)
- Attempt 1 with nested queries
Bad!
SELECT DISTINCT name FROM Authors WHERE
COUNT(SELECT url FROM Webpages
WHERE Authors.loginWebpages.login)
10
24Web page examples
- Find all authors who wrote at least 10 documents
- Attempt 2 Simplify with GROUP BY
Good!
SELECT name FROM Authors, Webpages WHERE
Authors.login Webpages.login GROUP BY
name HAVING count(Webpages.url) 10
No need for DISTINCT get for free from GROUP BY
25Web page examples
- Find all authors who have a vocabulary over 10000
words - Authors(login, name)
- Webpages(url, title, login)
- Mentions(url, word)
SELECT name FROM Authors, Webpages,
Mentions WHERE Authors.loginWrote.login AND
Webpages.urlMentions.url GROUP BY
name HAVING count(distinct word) 10000
26Summary SQL queries
- Only SELECT, FROM required
- Cant have HAVING without GROUP BY
- Can have GROUP BY without HAVING
- Any clauses used must appear in this order
SELECT LFROM Rs WHERE s GROUP
BY L2 HAVING s2 ORDER BY L3
27New topic Nulls in SQL
- If we dont have a value, can put a NULL
- Null can mean several things
- Value does not exists
- Value exists but is unknown
- Value not applicable
- The schema specifies whether null is allowed for
each attribute - NOT NULL if not allowed
- By default, null is allowed
28Null Values
- x NULL ? 4(3-x)/7 NULL
- x NULL ? x 3 x NULL
- x NULL ? 3 (x-x) NULL
- x NULL ? x Joe is UNKNOWN
- In general no row using null fields appear in
the selection test will pass the test - With one exception
- Pace Boole, SQL has three boolean values
- FALSE 0
- TRUE 1
- UNKNOWN 0.5
29Null values in boolean expressions
- C1 AND C2 min(C1, C2)
- C1 OR C2 max(C1, C2)
- NOT C1 1 C1
- height 6 UNKNOWN
- ? UNKNOWN OR weight 190 UNKOWN
- ? (age
SELECT FROM Person WHERE (age (height 6 OR weight 190)
E.g.age20heightNULLweight180
30Comparing null and non-nulls
- Unexpected behavior
- Some Persons are not included!
- The trichotomy law does not hold!
SELECT FROM Person WHERE age 25
31Testing for null values
- Can test for NULL explicitly
- x IS NULL
- x IS NOT NULL
- But
- x NULL is always null
- Now it includes all Persons
SELECT FROM Person WHERE age 25 OR age IS NULL
32Null/logic review
- TRUE AND UNKNOWN ?
- TRUE OR UNKNOWN ?
- UNKNOWN OR UNKNOWN ?
- X NULL ?
33Outerjoin
- Like L R except that dangling tuples are
included, padded with nulls - Left outerjoin dangling tuples from left are
included - Nulls appear on the right
- Right outerjoin dangling tuples from right are
included - Nulls appear on the left
34Joins operations
- Variations
- Cross join (Cartesian product)
- Join On
- Natural join
- Outer join
- Apply to relations appearing in selections
35Cross join - example
MovieStar
MovieExec
36Cross join example
SELECT FROM MovieStar CROSS JOIN MovieExec
37Join On example
SELECT FROM MovieStar JOIN MovieExec ON
MovieStar.Name MovieExec.Name
38Natural Joins
- MovieStar(name, address, gender, birthdate)
- MovieExec(name, address, networth)
- Natural Join
- MovieStar Natural Join MovieExec
- Results in list of individuals who are
movie-stars as well as executives - (Name, address, gender, birthdate, networth)
39Example - Natural join
MovieStar
MovieExec
SELECT FROM MovieStar NATURAL JOIN MovieExec
40Outer Join - Example
SELECT FROM MovieStar LEFT OUTER JOIN MovieExec
ON MovieStart.nameMovieExec.name
SELECT FROM MovieStar RIGHT OUTER JOIN
MovieExec ON MovieStart.nameMovieExec.name
41Outer Join - Example
MovieStar
MovieExec
SELECT FROM MovieStar FULL OUTER JOIN MovieExec
ON MovieStart.nameMovieExec.name
42New-style join syntax
- Old-style syntax simply lists tables separated by
commas - New-style makes the join explicit
SELECT FROM A,B WHERE
SELECT FROM A JOIN B ON WHERE
43New-style join syntax
- Functionally equivalent to old-style, but perhaps
more elegant - Introduced in Oracle 8i, MySQL 3.x/4.x
- Older versions / other DBMSs may only support
old-style syntax
44New-style join types
- cross joins (simplest)
- FROM A CROSS JOIN B
- Inner joins (regular joins)
- FROM A INNER JOIN B ON
- Natural join
- FROM A NATURAL JOIN B
- Joins on common fields and merges
- Outer joins
45New-style outer joins
- Outer joins may be left, right, or middle
- FROM A LEFT OUTER JOIN B
- FROM A RIGHT OUTER JOIN B
- FROM A FULL OUTER JOIN B
- OUTER is optional
- If OUTER is included, then FULL is the
default - Q How to remember left v. right?
- A It indicates the side whose rows are always
included
46Old-style outer joins in Oracle
- Outer joins can also be done with the old-style
syntax, but with the () - WHERE A.attB.att()
- corresponds to
- FROM A LEFT JOIN B
- The () is applied to all B attributes referred
to in the WHERE clause - Q How to remember which side gets the ()?
- A The side that gets null rows added