Title: MIS3150 Data and Information Management Query Languages
1MIS3150 Data and Information ManagementQuery
Languages SQL
2Structure of this semester
MIS3150
1. Design
2. Querying
0. Intro
Database Fundamentals
Conceptual Modeling
Query Languages
Relational Model
Advanced SQL
Normalization
Newbie
Users
Professionals
Designers
Developers
3Todays Buzzwords
- Query Languages
- Formal Query Languages
- Procedural and Declarative Languages
- Relational Algebra
- Relational Calculus
- SQL
- Aggregate Functions
- Nested Queries
4Objectives
- At the end of the lecture, you should
- Get a formal as well as practical perspective on
query languages - Have a background on query language basics (how
they came about) - Be able to write simple SQL queries from the
specification - Be able to look at SQL queries and understand
what it is supposed to do - Be able to write complex SQL queries involving
nesting - Execute queries on a database system
5Set Theory Basics
- A set a collection of distinct items with no
particular order - Set description
- b b is a Database Book
- c c is a city with a population of over a
million - x 1 lt x lt 10 and x is a natural number
- Most basic set operation
- Membership x ? S (read as x belongs to S if x
is in the set S)
6Other Set Operations
- Addition, deletion (note that adding an existing
item in the set does not change it) - Set mathematics
- Union R ? S x x ? R or x ? S
- Intersection R ? S x x ?R and x ? S
- Set Difference R S x x ? R and x ? S
- Cross-product R x S ltx,ygt x ? R and y ? S
- You can combine set operations much like
arithmetic operations R (S ? T) - Usually no well-defined precedence
7Relational Query Languages
- Query languages Allow manipulation and
retrieval of data from a database. - Relational model supports simple, powerful QLs
- Strong formal foundation based on logic.
- Allows for much optimization.
- Query Languages ! programming languages!
- QLs not expected to be Turing complete.
- QLs not intended to be used for complex
calculations. - QLs support easy, efficient access to large data
sets.
8Formal Relational Query Languages
- Two mathematical Query Languages form the basis
for real languages (e.g. SQL), and for
implementation - Relational Algebra More operational, very
useful for representing execution plans. - Relational Calculus Lets users describe what
they want, rather than how to compute it.
(Non-operational, declarative.)
- Understanding Algebra Calculus is key to
understanding SQL, query processing!
9Structured Query Language
- Need for SQL
- Operations on Data Types
- Definition Manipulation
- Operations on Sets
- Declarative (calculus) vs. Procedural (algebra)
- Evolution of SQL
- SEQUEL ..SQL_92 .. SQL_93
- SQL Dialects
- Does SQL treat Relations as Sets?
10Preliminaries
- A query is applied to relation instances, and the
result of a query is also a relation instance. - Schemas of input relations for a query are fixed
(but query will run regardless of instance!) - The schema for the result of a given query is
also fixed! Determined by definition of query
language constructs. - Positional vs. named-field notation
- Positional notation easier for formal
definitions, named-field notation more readable.
- Both used in SQL
11Example Instances
- Students, Registers, Courses relations for our
examples.
R1
C1
S2
S1
12Relational Algebra
- Basic operations
- Selection ( ) Selects a subset of rows
from relation. - Projection ( ) Deletes unwanted columns from
relation. - Cross-product ( ) Allows us to combine two
relations. - Set-difference ( ) Tuples in reln. 1, but
not in reln. 2. - Union ( ) Tuples in reln. 1 and in reln. 2.
- Additional operations
- Intersection, join, division, renaming Not
essential, but (very!) useful. - Since each operation returns a relation,
operations can be composed! (Algebra is closed.)
13Projection
- Deletes attributes that are not in projection
list. - Schema of result contains exactly the fields in
the projection list, with the same names that
they had in the (only) input relation. - Projection operator has to eliminate duplicates!
(Why??) - Note real systems typically dont do duplicate
elimination unless the user explicitly asks for
it. (Why not?)
14Vertical Slices
Algebra projection ?ltA1,A2,...Amgt (R)
- Projection
- Specifying Elements
- No Specification
- List all information about Students
- select
- from STUDENT
- (Student)
- Conditional
- List IDs, names, and addresses of all students
- select StudentID, name, address
- from STUDENT
-
- ? StudentID, name, address (Student)
15Does SQL treat Relations as Sets?
- What are the different salaries we pay to our
employees? - select salary
- from EMPLOYEE
-
- OR is the following better?
- select DISTINCT salary
- from EMPLOYEE
16Selection
- Selects rows that satisfy selection condition.
- No duplicates in result! (Why?)
- Schema of result identical to schema of (only)
input relation. - Result relation can be the input for another
relational algebra operation! (Operator
composition.)
17Horizontal Slices
Algebra selection or restriction (R)
- Restriction
- Specifying Conditions
- Unconditional
- List all students
- select
- from STUDENT
- (Student)
- Conditional
- List all students with GPA gt 3.0
- select
- from STUDENT
- where GPA gt 3.0
- ? GPA gt 3.0 (Student)
18Specifying Conditions
List all students in ... select
from STUDENT where city in
(Boston,Atlanta)
List all students in ... select
from STUDENT where zip not between
60115 and 60123
19Pattern Matching
any string with n characters, ngt0 _ any
single character. x exact sequence of string x.
List all CIS courses. select
from COURSE where course like CIS
List all CIS 3200 level courses. select
from COURSE where course
like ?
20Missing or Incomplete Information
- List all students whose address or telephone
number is missing - select
- from STUDENT
- where Address is null or GPA is null
21Horizontal and Vertical
- Query
- List all student ID, names and addresses who
have - GPA gt 3.0 and date of birth before Jan 1, 1980.
- select StudentID, Name, Address
- from STUDENT
- where GPA gt 3.0 and DOB lt 1-Jan-80
- order by Name DESC
- Algebra ? StudentID,name, address (? GPA gt 3.0
and DOB lt 1-Jan-80 (STUDENT)) - Calculus t.StudentID, t.name, t.address t ?
Student ?t.GPA gt 3.0 ? - t.DOB lt 1-Jan-80
- Order by sorts result in descending
(DESC) order. - Note The default order is ascending (ASC)
as in - order by Name
22Union, Intersection, Set-Difference
- All of these operations take two input relations,
which must be union-compatible - Same number of fields.
- Corresponding fields have the same type.
- What is the schema of result?
23Union
- List students who live in Atlanta or GPA gt 3.0
- select StudentID, Name, DOB, Address
- from STUDENT
- where Address Atlanta
- union
- select StudentID, Name, DOB, Address
- from STUDENT
- where GPA gt 3.0
-
- Can we perform a Union on any two Relations ?
24Union Compatibility
- Two relations, A and B, are union-compatible
- if
- A and B contain a same number of attributes, and
- The corresponding attributes of the two have the
same domains - Examples
- CISStudent (ID Did Name Dname Address
Daddr Grade Dgrade) - Senior-Student (SName Dname S Did Home
Daddr Grade Dgrade) - Course (C Dnumber Title Dstr Credits
Dnumber) - Are CIS-Student and Senior-Student union
compatible? - Are CIS-Student and Course union compatible?
- What happens if we have duplicate tuples?
- What will be the column names in the resulting
Relation?
25Union, Intersect, Minus
select CUSTNAME, ZIP from CUSTOMER where
STATE MA UNION select SUPNAME,
ZIP from SUPPLIER where STATE MA
ORDER BY 2
select CUSTNAME, ZIP from CUSTOMER where
STATE MA INTERSECT select SUPNAME,
ZIP from SUPPLIER where STATE MA
ORDER BY 2
select CUSTNAME, ZIP from CUSTOMER where
STATE MA MINUS select SUPNAME,
ZIP from SUPPLIER where STATE MA
ORDER BY 2
B
A
A
B
A
26Cross-Product
- Each row of S1 is paired with each row of R1.
- Result schema has one field per field of S1 and
R1, with field names inherited if possible. - Conflict Both S1 and R1 have a field called sid.
27Joins
- Condition Join
- Result schema same as that of cross-product.
- Fewer tuples than cross-product, might be able to
compute more efficiently - Sometimes called a theta-join.
28Joins
- Equi-Join A special case of condition join
where the condition c contains only equalities. - Result schema similar to cross-product, but only
one copy of fields for which equality is
specified. - Natural Join Equijoin on all common fields.
29Find names of students who have taken course 103
30Connecting/Linking Relations
- List information about all students and the
classes they are taking
Student
Class
What can we use to connect/link Relations? Join
Connecting relations so that relevant tuples can
be retrieved.
31Join
Cartesian Product
Student 30 tuples
Class 4 tuples
Total Number of Tuples in the Cartesian Product.
? (match each tuple of student to every tuple
of class) Select tuples having identical Student
Ids. Expected number of such Tuples
Join Selectivity
32Join Forms
R1
R2
- General Join Forms
- Equijoin
- Operator Dependent
- Natural Join
- Outer Join
- Left
- Right
- Full
select s., c. from STUDENT s, CLASS
c where s.StudentID c. SID
R1
R2
x gt y ltgt ...
select s., c. from STUDENT s, CLASS
c where s.StudentID c.SID ()
33Find names of students who have taken a CIS course
- Information about departments only available in
Courses so need an extra join
- A query optimizer can find this given the first
solution!
34Find students who have taken an MIS or a CS course
- Can identify all MIS or CS courses, then find
students who have taken one of these courses
- Can also define Temp1 using union! (How?)
- What happens if is replaced by in this
query?
35Find students who have taken a CIS and an ECI
Course
- Previous approach wont work! Must identify
students who have taken CIS courses, students who
have taken ECI courses, then find the
intersection (note that sid is a key for
Students)
36Relational Calculus
- Comes in two flavours Tuple relational calculus
(TRC) and Domain relational calculus (DRC). - Calculus has variables, constants, comparison
ops, logical connectives and quantifiers. - TRC Variables range over (i.e., get bound to)
tuples. - DRC Variables range over domain elements (
field values). - Both TRC and DRC are simple subsets of
first-order logic. - Expressions in the calculus are called formulas.
An answer tuple is essentially an assignment of
constants to variables that make the formula
evaluate to true.
37Find students with GPA gt 3.7 who have taken a CIS
Course
DRC
38Find students who have taken all CIS courses
How will you do this with Relational Algebra?
39Monotonic and Non-Monotonic Queries
- Monotonic queries queries for which the size of
the results either increase or stay the same as
the size of the inputs increase. The result size
never decreases - Non-monotonic queries queries for which it is
possible that the size of the result will
DECREASE when the size of the input increases - Examples of each?
- Which of the algebra operations is non-monotonic?
- What does this signify?
40Summaries and Aggregates
Calculate the average GPA select avg.
(GPA) from STUDENT, Find the lowest
GPA select min (GPA) as minGPA from
STUDENT, How many CIS majors? select count
(StudentId) from STUDENT where majorCIS
Discarding duplicates select avg (distinct
GPA) STUDENT where majorCIS (is
this above query correct?)
41Aggregate Functions
- COUNT (attr) - a simple count of values in
attr - SUM (attr) - sum of values in attr
- AVG (attr) - average of values in attr
- MAX (attr) - maximum value in attr
- MIN (attr) - minimum value in attr
- Take effect after all the data is retrieved from
the database - Applied to either the entire resulting relation
or groups - Cant be involved in any query qualifications
(where clause) - Would the following query be permitted?
- select StudentId
- from STUDENT
- where GPA max (GPA)
42Grouping Results Obtained
- Show all students enrolled in each course.
- select cno, StudentID
- from REGISTRATION
- group by cno Is this grouping OK?
- Calculate the average GPA of students by county.
- select county, avg (GPA) as CountyGPA
- from STUDENT
- group by county
- Calculate the enrollment of each class.
- select cno, year , term, count (StudentID) as
enroll - from REGISTRATION
- group by cno, year, term
43Selections on Groups
- Show all CIS courses that are full.
- select cno, count (StudentID)
- from REGISTRATION
- group by cno
- having count (StudentID) gt 29
44Grouping Results after Join
- Calculate the average GPA of each class
select course, avg (GPA) from STUDENT S, CLASS
C where S.StudentID C.SID group by course,
45Nesting Queries
- SELECT attribute(s)
- FROM relation(S)
- WHERE attr not in comparison operator
exists - ( query statement(s) )
- List names of students who are taking BA201
- select Name
- from Student
- where StudentID in
- ( select StudentID
- from REGISTRATION
- where courseBA201)
46Sub Queries
List all students enrolled in CIS
courses select name from STUDENT where
StudentId in (select StudentId from REGISTR
ATION where cno like CIS)
List all courses taken by Student (Id 1011)
select cname from COURSE where cnum
any (select cno from REGISTRATION where S
tudentId 1011)
47Sub Queries
Who received the highest grade in CIS
8140 select StudentId from REGISTRATION where c
num CIS 8140 and grade gtall (select grade
from REGISTRATION where cno CIS 8140)
List all students enrolled in CIS
courses. select name from STUDENT
S where exists (select from REGISTRATION
where StudentId S.StudentId and cno like
CIS)
48Relational Views
- Relations derived from other relations.
- Views have no stored tuples.
- Are useful to provide multiple user views.
- What level in the three layer model do views
belong? - Which kind of independence do they support?
49View Creation
- Create View view-name ( attr , attr ...)
- AS subquery
- with check option
- DROP VIEW view-name
- Create a view containing the student ID, Name,
Age and GPA for those who are qualified to take
300 level courses, i.e., GPA gt2.0.
50View Options
- With Check Option enforces the query condition
for insertion or update - To enforce the GPA gt2.0 condition on all
new student tuples inserted into the view - A view may be derived from multiple base
relations - Create a view that includes student IDs,
student names and their instructors names for
all CIS 300 students.
51View Retrieval
- Queries on views are the same as that on base
relations. - Queries on views are expanded into queries on
their base relations. - select Name, Instructor-Name
- from CIS300-Student
- where Name Instructor-Name
52View Update
- Update on a view actually changes its base
relation(s)! - update Qualified-Student
- set GPA GPA-0.1
- where StudentID s3
- insert into Qualified-Student
- values ( s9, Lisa, 4.0 )
- insert into Qualified-Student
- values ( s10, Peter, 1.7 )
-
- Why are some views not updateable?
- What type of views are updateable?
53Non-monotonic queries again!
- Need to use either MINUS or NOT EXISTS!
- Find courses where no student has gpa over 3.5
- Find students who have taken all courses that Joe
has taken - How would you solve these?
54Summary
- SQL is a low-complexity, declarative query
language - The good thing about being declarative is that
internally the query can be changed automatically
for optimization - Good thing about being low-complexity?
- No SQL query ever goes into an infinite loop
- No SQL query will ever take indefinite amount of
space to get the solution - Can be used for highly complex problems!