Title: The relational algebra
1Introduction
- The relational algebra
- The relational calculus
- Query languages
2Set operations
- Both operands must have the same number of
corresponding domains of tuples (i.e., they are
union-compatible), as does the result - Union
- R1 U R2 is the relation containing all tuples
that appear in R1 and R2. - If R1 and R2 have I and J tuples, respectively,
it will have a maximum of (I J) tuples - Set Difference
- R1 - R2 is the relation containing all tuples of
R1 that do not appear in R2. - If R1 and R2 have I and J tuples, respectively,
it will have a maximum of (I) tuples - Intersection
- R1 ? R2 is the relation containing all tuples
that appear only in both R1 and R2. - It can be derived from the set difference
R1-(R1-R2). - Cartesian Product
- R1 X R2 is the relation containing all possible
pairs from the two relations - If R1 has I tuples and M attributes and R2 has J
tuples and N attributes, the resulting relation
has (I J) tuples and (M N) attributes - The result is always a relation
- Involves more than one relation
3Join Operation ?
- Combines attributes of two relations into one.
- Tuples in R1 are related to tuples in R2
- Different from a Cartesian Product (R1 X R2)
because includes a sigma predicate, (i.e., a
Select on a Cartesian Product) - Essential to relational algebra, but
memory-intensive in the practical world, so
vendors try to optimize for these using query
optimization
4?
- Theta join
- R ? ltjoin conditiongt S
- ltjoin conditiongt can be ltconditiongt and
ltconditiongt and and ltconditiongt - ltconditiongt A ? B predicate
- Result contains the sum of the degrees of the two
relations - Equijoin (a type of theta join)
- Equality comparisons only for theta
- Natural join
- If two relations being joined have one attribute
(domain) name in common, assume that the single
attribute in common is the one being compared to
see if a new tuple will be inserted in the result - Result contains the sum of the degrees of the two
relations minus the duplicate attributes after
the first - Outer join (left, right, full) to retain
additional tuples from one or more of the sets
5?
- Join of relations R (a,b),S(b,c),T(c,d) examples
- Natural
- M(a,R.b,c) lt- R ? S
- Theta
- N(b,S.c,T.c,d) lt- S ?S.cT.c T (Equijoin)
- O(b,S.c,T.c,d) lt- S ?S.cltT.c T (Less than)
- Outer
- N(b,S.c,T.c,d) lt- S S.cT.c T (Left)
- N(b,S.c,T.c,d) lt- S S.cT.c T (Right)
- N(b,S.c,T.c,d) lt- S S.cT.c T (Full)
- Added to manage distributed transactions
- Semi
- P(b,S.c) lt- S S.cltT.c T
6Join Operation ?
- Combines attributes of two relations into one.
- Tuples in R1 are related to tuples in R2
- Different from a Cartesian Product (R1 X R2)
because includes a sigma predicate, (i.e., a
Select on a Cartesian Product) - Essential to relational algebra, but
memory-intensive in the practical world, so
vendors try to optimize for these using query
optimization
7?
- Theta join
- R ? ltjoin conditiongt S
- ltjoin conditiongt can be ltconditiongt and
ltconditiongt and and ltconditiongt - ltconditiongt A ? B predicate
- Result contains the sum of the degrees of the two
relations - Equijoin (a type of theta join)
- Equality comparisons only for theta
- Natural join
- If two relations being joined have one attribute
(domain) name in common, assume that the single
attribute in common is the one being compared to
see if a new tuple will be inserted in the result - Result contains the sum of the degrees of the two
relations minus the duplicate attributes after
the first - Outer join (left, right, full) to retain
additional tuples from one or more of the sets
8?
- Join of relations R (a,b),S(b,c),T(c,d) examples
- Natural
- M(a,R.b,c) lt- R ? S
- Theta
- N(b,S.c,T.c,d) lt- S ?S.cT.c T (Equijoin)
- O(b,S.c,T.c,d) lt- S ?S.cltT.c T (Less than)
- Outer
- N(b,S.c,T.c,d) lt- S S.cT.c T (Left)
- N(b,S.c,T.c,d) lt- S S.cT.c T (Right)
- N(b,S.c,T.c,d) lt- S S.cT.c T (Full)
- Added to manage distributed transactions
- Semi
- P(b,S.c) lt- S S.cltT.c T
9Division Operation
- A relation over the attributes in R1 defined over
the attributes in R3 that match the combination
of every tuple in R2, where R3 is the set of
attributes in R1 but not R2 - Restated For R1R2, compute all x values in R1
that are not disqualified by some y value in R2
(i.e., x tuple is disqualified if, by attaching y
tuple from R2, we obtain an xy tuple that is not
in R1) - Restated If x is in R1 and R2 ,and y is only in
R1, then R1/R2 produces all tuples containing y
from R1 that have a corresponding x for all x
values in R2 - p ltcgt(R1) p ltcgt((R2 X (p ltcgt(R1))) R1)
Show all salespeople calling on companies having
more then 10 billion in revenue R lt-
?Result(pEmpID,Comp(Contacts))(pComp(sRevgt10(Comp
anies))) Company Contacts / Companiesgt10billion
-gt R EmpID Comp. Comp. Result 56 IBM GE 56
76 GE IBM 56 GE 62 GM
10Division - explored
A
B1
A/B1
- Man Frm
- Lilly tab
- Lilly cap
- Lilly syr
- Lilly sup
- SKB tab
- SKB cap
- GSK cap
- PF cap
- PF sup
Frm cap
Man Lilly,SKB,GSK,PF
B2
A/B2
Man Lilly,PF
Frm cap,sup
A/B3
B3
Frm tab,cap,sup
Man Lilly
T1 lt- Attributes of A not in Bx T2 lt- Cartesian
Product of Bx and T1 T3 lt- Set difference of T2
and A T4 lt- Attributes of T3 not in Bx T5 lt- T1
T4
11Cover
- All relational algebra operations can be
expressed from this set of operations - s
- p
- U
- -
- X
- But inconvenient to leave out intersection
- ?
- Also useful to have
- Rename (rho ?)
- Grouping (?)
- Aggregation (F)
12Relational algebra enhancements
- Not derivable from basic operations
- Aggregation (FA)
- COUNT, SUM, AVG, MIN, MAX
- Must consider how they use null values
- Grouping (?A) or (GAF)
- Allows congregation of relation tuples into
groups according to common values in attributes
13Relational algebra queries
- Expression simplification is an important query
optimization technique, which can affect the
running time of queries by an order of magnitude
or more. - early "selection" reduces the number of tuples
- early "projection" reduces the number of domains
14Relational Operations
- Logical equivalences follow the following laws
- Identity, Domination, Idempotent, Double negation
- Commutative
- R1 n R2 ? R2 n R1
- Associative
- (R1 n R2) n R3 ? R1 n (R2 n R3)
- Distributive
- R1 U (R2 n R3) ? (R1 U R2) n (R1 U R3)
- R1 n (R2 U R3) ? (R1 n R2) U (R1 n R3)
- Can be applied to Union, Intersection,
Cross-Product, Joins - Not applicable to Set Difference or Division
- Select commutes with Project only if Select
attributes are in Project relation
15Relational Operations
- Transformations
- Double negation
- (s(S)) ? s(S)
- Nesting cascade
- sc2(sc1(S)) ? sc1(S) ? sc2(S)
- More equivalencies
- R ?c S ? sc(R X S)
- pa(sc (S)) ? sc(pa (S)) if c ? a
- sc(R X S) ? sc(R) X S if c attributes ? R
- sc(R ? S) ? sc(R) ? S if c attributes ? R
- sc1(sc2(sc3(R X S))) ? sc1(sc2(R) X sc3(S)) if
attributes are members of their respective
relation
16Relational Algebra
- Precedence is relatively unset and vendors vary
- In general
- Unary
- Muliplicative
- Additive
- Best practice
- Parenthetical
17The Relational Calculus
- High-level
- Declarative non-procedural
- From a branch of symbolic logic known as
predicate calculus - A predicate is a truth-valued function with
arguments - Replace arguments with values to obtain a
proposition - Determine if proposition is true or false
- Two forms
- Tuple relational calculus (another Codd creation)
- Domain relational calculus
- Declares what is to be retrieved, not how to
retrieve it - Requires a well-formed formula
- Identical expressive power with relational
algebra - The basis for a relationally complete language
(i.e., a language is relationally complete if any
query expressed by the relational calculus can
also be expressed by the language)
18Tuple Relational Calculus
- Tuple Relational Calculus - variables range over
tuples in a relation (tuple variables) - T F(T)
- Interpreted as Find the set of all tuples T such
that the formula F is true -
- F is a well-formed formula defining the
predicate multiple predicates are connected
using AND (/\), OR(\/) and NOT( or ) -
- T on the right is the set of all tuple variables
containing values that make F true -
- The left-hand T is free, the right becomes bound
by conditions -
- Can be unsafe (t (Employees(t))), which
yields all tuples in the universe not in the
Employees set of tuples, so the concept of
domain of a tuple relational calculus
expression was established (an method of
creating a (domain) range variable was proposed
by Date to first constrain to the range of the
expression).
19Tuple Relational Calculus
- Tuple Relational Calculus - variables range over
tuples in a relation (tuple variables) - Tuple Relational Calculus S.x1,S.x2,,S.xn
p(S.x1,S.x2,S.xm) m gt n - Generalization
- Find the set of all tuples S such that F(S) is
true SF(S) - F is a wff (well-formed formula)
- Can be thought of as
- What is retrieved predicate(s) to be
satisfied - With free variables on the left (within the
domain of the expression) and bound variables on
the right
Example Single relation List the names of all
managers who earn more than 25,000. S.Name
Staff(S) /\ S.position manager /\ Salary gt
25000 Multiple relations List names of staff
who manage properties for rent in
Glasgow S.Name Staff(S) /\ ?(P)(PropertyForRen
t(P) /\ (P.staffno S.staffno) /\ P.city
Glasgow) Compare with relational
algebra List all cities where there is a branch
office but no properties for rent TRC B.city
Branch(B) /\ ((?P)(PropertyForRent(P) /\ B.city
P.city)) RA ?city (Branch) - ?city
(PropertyForRent) and SQL SELECT DISTINCT
City FROM Branch WHERE City NOT IN (SELECT City
FROM PropertyForRent)
20Tuple Relational Calculus
- To be a well-formed formula, it must be made up
of one or more combinations of the following
predicate calculus atoms using logical operators
/\, \/ or -
- R(Sl), where R is a relation and Sl is a tuple
variable -
- Sl.aeT Sm.af where S represents tuple variables
ae represents attributes of a relation over which
Sl ranges S represents tuple variables af
represents attributes of a relation over which Sm
ranges and T (theta) represents comparison
operators (lt, lt, gt, gt, , ltgt). -
- Sl.aeT c where S represents tuple variables a
represents attributes of a relation over which S
ranges c represents a constant from the domain
of a T (theta) represents comparison operators
(lt, lt, gt, gt, , ltgt).
21Domain Relational Calculus
- Domain Relational Calculus variables range over
the domains of attributes - d1,d2,dn F(d1,d2,dm)
- d1,d2,dn are domain variables
-
- Predicate requires finding a tuple containing a
value in each domain that satisfies the
proposition
Example Find the names of managers who earn more
than 25000 N (?N,pos,sal)(Staff(N,pos,sal) /\
posmanager /\ sal gt 25000 Example List
all cities where there is either a branch office
or a property for rent DRC city(Branch(bN,st,c
ity,pc)\/(PropertyForRent(pN,st1,city,pc1,rms))
RA ?city (Branch) ? ?city (PropertyForRent)
22Domain Relational Calculus
- To be a well-formed formula, it must be made up
of one or more combinations of the following
predicate calculus atoms using logical operators
/\, \/ or -
- Of the form R(s1,s2,s3,,sn), where R is a
relation name of degree n and sn is a domain
variable -
- sl T sm where s represents domain variables a
represents attributes of a relation over which S
ranges T (theta) represents comparison operators
(lt, lt, gt, gt, , ltgt) for comparing values from
comparable domains -
- sl T c where s represents a domain variable c
represents a constant from the domain of sl T
(theta) represents comparison operators (lt, lt, gt,
gt, , ltgt) for comparing values from domain sl
with c.
23The Relational Calculus
- Find the names and ages of all students with a
gpa above 3 - TRC
- P ?S?Students(S.gpagt3?P.nameS.name ?
P.ageS.age) - DRC
- ltN,AgtltI,N,T,Agt ?Students ? T gt 3
- Binding variables using quantifiers
- ? - there is at least one set of tuple values
(existential) - ? - all tuple values in a set (universal)
- DeMorgans law can be applied, for example
- (?X)(F(X)) (?X)((F(X))
- (?X)(F(X)) (?X)((F(X))
- (?X)(F1(X) /\ F2(X)) (?X)((F1(X)) \/
(F2(X))) - (?X)(F1(X) /\ F2(X)) (?X)((F1(X)) \/
(F2(X))) -
24Relationally complete
BranchID LoanNumber Loan Value 1 1 1000 1
2 3000 2 1 1200 3 1 1500
- Consider the following
- Bank Branches give loans
- Each loan has a loan number
- Each loan has a value
- Show the branches and loan numbers with a loan
value greater than 1200 - DRCltl,b,agt ltl,b,agt ? loan ? a gt 1200
- TRCt t ? loan ? a gt 1200
- RA sagt1200Loans
- Show the loan number for any loans values greater
than 1200 - DRCltlgt ?b,a (ltl,b,agt ? loan ? a gt 1200
- TRCt ?s ? loan(t.l s.l ? s.a gt 1200
- RA pl(sagt1200Loans)
25Query By Example
- The foundation for Query By Example (QBE) is
considered to primarily be domain relational
calculus - A graphical interface where the user enters
values for what is wanted - Originally develop by IBM in the 1970s
- Now in almost all DBMSs
- Example
- Show the name of all customers less than 18
years old who reserved a tape on 10/02/02 - An example using 2 tables
26DDL
- CREATE
- SCHEMA
- DOMAIN
- TABLE
- VIEW
- INDEX
- ALTER
- DOMAIN
- TABLE
- DROP
- SCHEMA
- DOMAIN
- TABLE
- VIEW
- INDEX
27CREATE
- Example 1 CREATE TABLE TableName (
- ColumnName1 DataType NOT NULL,
- ColumnName2 DataType,
- ColumnName3 DataType DEFAULT Yes,
- ColumnName4 DataType CHECK (VALUE IN (SELECT
col003 FROM table2)), - PRIMARY KEY pkName (ColumnName1,ColumnName2),
- FOREIGN KEY fkName (ColumnName2) REFERENCES
Table5 (col006) ON DELETE NO ACTION ON UPDATE
CASCADE - Example 2 CREATE TABLE TableName (
- ColumnName1 DataType CONSTRAINT pkName PRIMARY
KEY, - ColumnName2 DataType CONSTRAINT fkName
REFERENCES Table5 (col006) ON DELETE NO ACTION ON
UPDATE CASCADE nnName NOT NULL, - ColumnName3 DataType CONSTRAINT dfName DEFAULT
Yes, - ColumnName4 DataType CONSTRAINT ckName CHECK
(VALUE IN (SELECT col003 FROM table2))
28ALTER
- ALTER TABLE TableName ADD COLUMN (ColumnName
DataType) - ALTER TABLE TableName ALTER COLUMN (ColumnName
DataType) - ALTER TABLE TableName DROP COLUMN ColumnName
29DROP
30INDEX
- CREATE INDEX IndexName ON TableName (ColumnName1,
ColumnName2) DESC - DROP INDEX IndexName
31Views
- CREATE VIEW ViewName AS
- SELECT col001, col005 FROM TableName WHERE
col007 value - DROP VIEW ViewName
- Views are updatable only if
- DISTINCT, GROUP BY, multiple tables and nested
tables referenced in the FROM clause are not
included - Every attribute is the sole pointer to only a
single column in the base table - The integrity constraints of the base table are
not violated - View materialization may improve query speed, but
delays updates