Title: Operations in the Relational Model
1Operations in the Relational Model
- Focused on database queries in an abstract point
of view - relational algebra
- Datalog
2Relational algebra an overview
- a relational query language
- Not expressible directly by users, but used in
database systems internally - Includes a number of operations to manipulate
relations. - Used to specify retrieval requests (queries)
- essentially DML
- Query result is also in the form of a relation
3Relational algebra operations
- SELECT and PROJECT operations.
- Set operations
- UNION
- INTERSECTION
- SET DIFFERENCE
- CARTESIAN PRODUCT
- JOIN operations
- Other relational operations
- DIVISION
- OUTER JOIN
- AGGREGATE FUNCTIONS
- SUM, MIN, MAX, COUNT, AVERAGE
4Set operation
- Binary operations from mathematical set theory
- union R ? S
- intersection R ? S
- set difference R S
- For set operations, the operand relations
R(A1,A2,,An) and S(B1,B2,,Bn) must have the
same number of attributes, and the domains of
corresponding attributes must be compatible that
is, dom(Ai) dom(Bi) for i 1,2,,n. This
condition is called union compatibility - Duplicate tuples are eliminated!
- If R and S have different attributes names,
rename them
5Examples of set operations
- Relation R and S
- R ? S
- R ? S
- R S
6Project operation (?)
- Keep only certain attributes (columns) from a
relation R specified in an attribute list L - ?L(R)
- Resulting relation has only those attributes of R
specified in L - Example ? title,year,length(Movie)
-
? - In principle, PROJECT eliminates duplicate tuples
in the resulting relation so that it remains a
mathematical set (no duplicate elements)
7Select operation (?)
- Selects (retrieves) the tuples from a relation R
that satisfy a certain selection condition c - ?c(R)
- The condition c is an arbitrary Boolean
expression on the attributes of R - Resulting relation includes each tuple in r(R)
whose attribute values satisfy the condition c - Example
- ?length ? 100(Movie)
- ?length ? 100 AND studioName Fox(Movie)
8Cartesian product (?)
- T(A1,A2,,An,B1,B2,,Bm) ? R(A1,A2,,An) ?
S(B1,B2,,Bm) - A tuple t exists in T for each combination of
tuples t1 from R and t2 from S such that
tA1,A2,,An t1 and tB1,B2,,Bm t2 - If R has n1 tuples and S has n2 tuples, then T
will have n1 n2 tuples - CARTESIAN PRODUCT is a meaningless operation on
its own ! - Example
9Natural join (?)
- In a natural join, the redundant join attributes
appear only once in the result - The equality condition is implied and need not be
specified - Example
10Theta-join
- Cartesian product followed by a SELECT
- T(A1,A2,,An, B1,B2,,Bn) ? R(A1,A2,,An) ?c
S(B1,B2,,Bn) - c the join condition
- Equijoin
- the join condition c includes one or more
equality comparisons involving attributes
11Queries expression
- Several operations can be combined to form a
relational algebra expression (query) - What are the titles and years of movies made by
Fox that are at least 100 minutes long - ?title,year(?length?100 (MOVIE) ?
?studioNameFox (MOVIE)) - ?title,year(?length?100 AND studioNameFox
(MOVIE))
12Queries expression example
- Movie1(title, year, length, filmType,
studioName)Movie2(title, year, starName) - Find the stars of movies that are at least 100
minutes long?starName(?length?100(Movie1 ?
Movie2))
13Renaming
- Change attribute name or relation name (denoted
by ?) - ?s(A1,A2,,An)(R)
- The resulting relation has exactly same tuples as
R, but the name of relation is S - ?s(R), to change the relation name only
- Example
14Complete set of relational algebra
- Some of the operations can be expressed in terms
of other relational-algebra operations - R ? S R ? (R ? S)
- R ?C S ?C(R ? S)
- R ? S ?L(?C(R ? S))
- ? , ?, ?, ?, ?, ? is the complete set of
relational algebra operation - none of which can be written in terms of the
others - Any query language equivalent to these operations
is called relational complete - additional operations that were not part of the
original relational algebra - Aggregate functions and grouping
- division
- Outer join, outer union
15Division operation (?)
- is useful for request that occurs frequently in
database applications - to formulate for all (?) semantic
- join is used to formulate there exists (?)
T ? R ? S
16Division example
- Retrieve the SSNs of employees who work on all
the projects that 'John Smith' works on - SSN_PNOS ? SMITH_PNOS
17Outer join
- In a regular join, tuples in R1 or R2 that do not
have matching tuples in the other relation do not
appear in the result - Some queries require all tuples in R1 (or R2 or
both) to appear in the result - When no matching tuples are found, nulls are
placed for the missing attributes - Left outer join R1 R2 lets every tuple in R1
appear in the result - Right outer join R1 R2 lets every tuple in R2
appear in the result - Full outer join R1 R2 lets every tuple in R1
or R2 appear in the result
18Outer union (union join)
- used when not union compatible
- student(name, ssn, department, advisor)faculty(na
me, ssn, department, rank)student outer-union
faculty - R(name, ssn, department, advisor, rank)
- all tuples from both relations are included
- for student tuple, null for rank attribute
- for faculty tuple, null for advisor attribute
19full outer join vs. outer union
- T1 outer-join c2c3 T2
- T1 outer-union T2
20Deductive databases
- Deductive databases
- Database Theorem proving Logic programming
- Apply AI techniques (logic and resolution) to
databases - Very active research area since mid 80s
- Major contributions to database technology
- Solid formalization of databases in a logical way
- Recursive query processing
- Theoretical provision of query processing
- Inclusion of incomplete knowledge into databases
- and so on
- Major prototypes
- Datalog (J.D. Ullman)
- LDL (S. Naqvi and S. Tsur)
- CORAL (U. of Wisconsin at Madison)
21Basis
- predicate
- P, Q, R,
- variable x,y,z,
- atom a predicate followed by its arguments
- P(x,y,z), Q(x,y,z,w), R(x),
- positive literal
- arithmetic atom x lt y, x1 gt y4?z,
- literal atom or negated atom
- clause a disjunction of literals
- Horn clause a clause with at most one positive
literal
22How to model DB
- R(1,2) is true while R(5,6) is false
- Presents databases as conjunction of logical
formulas - DB EDB (extensional databases) IDB
(intensional databases) UNA (unique name
axiom) DCA (domain closure axiom) EQA
(equality axiom) etc. - EDB corresponds to relations that are stored in
databases - IDB corresponds to virtual relations that are
computed as needed
23Datalog (Database logic)
- Datalog rule
- relational atom (head) ? body consisting of one
or more literals (called subgoal) - Note
- there is only one predicate as head literal
- head literal must be positive
- body literals must be connected by AND
- negated atom is allowed to occur only in the body
- arithmetic atom is allowed to occur only in the
body
24Datalog example
- P(x,y) ? Q(x,z) AND R(z,y) AND NOT Q(x,y)Q(x,z)
lt1,2gt, lt1,3gtR(z,y) lt2,3gt, lt3,1gt - Tuple lt1,1gt is the only one in P
25Datalog example
- Movie(title, year, length, inColor, studioName,
producerC) - Query find movie title and year whose run time
is at least 100 minutes long - LongMovie(t,y) ? Movie(t,y,l,c,s,p) AND l ?
100LongMovie(t,y) ? Movie(t,y,l,_,_,_) AND l ?
100 // anonymous variable - LongMovie ?title, year(?length?100(Movie))
26Not all formulae are valid !!!
- Not safe formulas
- biggerthan(x,y) ? x gt y
- loves(x,y) ? lover(x)
- Definition limited variable
- any variable that appears as an argument in a
relational atom of the body is limited - any variable X that appears in a subgoal Xa or
aX, where a is a constant, is limited - variable X is limited if it appears in a subgoal
XY or YX, where Y is a variable already known
to be limited - Definition A rule is safe if all its variables
are limited - P(x,y) ? Q(x,z) AND R(w,x,z) AND xlty // not
safe - P(x,y) ? Q(x,z) AND wa AND yw // safe
27From relational algebra to Datalog (1)
- R ? S
- I(n,a,g,b) ? R(n,a,g,b) AND S(n,a,g,b)
- R ? S
- U(n,a,g,b) ? R(n,a,g,b)U(n,a,g,b) ? S(n,a,g,b)
- R ? S
- D(n,a,g,b) ? R(n,a,g,b) AND S(n,a,g,b)
- ?
- P(t,y,l) ? Movie(t,y,l,_,_,_)
- ?
- P(a,b,c,d,w,x,y,z) ? R(a,b,c,d) AND S(w,x,y,z)
28From relational algebra to Datalog (2)
- ?
- ?length?100 AND studioName Fox(Movie)S(t,y,l,
c,s,p) ? Movie(t,y,l,c,s,p) AND l ? 100 AND s
Fox - ?length?100 OR studioName Fox(Movie)
S(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND l ? 100
S(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND s Fox - Any logical expression has an equivalent
disjunctive normal form, in which the expression
is the OR of conjuncts - one rule for each conjuncts
- ?NOT (length?100 OR studioName
Fox)(Movie)?lengthlt100 AND studioName ?
Fox)(Movie) S(t,y,l,c,s,p) ?
Movie(t,y,l,c,s,p) AND l lt 100 AND s ? Fox
29From relational algebra to Datalog (3)
- ? (natural join)
- J(a,b,c,d) ? R(a,b) AND S(b,c,d)
- theta-join
- U(A,B,C) V(B,C,D)
- U ? AltD AND U.B ? V.B VJ(a,ub,uc,vb,vc,d) ?
U(a,ub,uc) AND V(vb,vc,d) AND altd AND ub ? vb - U ? AltD OR U.B ? V.B VJ(a,ub,uc,vb,vc,d) ?
U(a,ub,uc) AND V(vb,vc,d) AND altd
J(a,ub,uc,vb,vc,d) ? U(a,ub,uc) AND V(vb,vc,d)
AND ub ? vb
30More example
- W(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND l ?
100X(t,y,l,c,s,p) ? Movie(t,y,l,c,s,p) AND s
FoxY(t,y,l,c,s,p) ? W(t,y,l,c,s,p) AND
X(t,y,l,c,s,p)Z(t,y) ? Y(t,y,l,c,s,p) - equivalentlyZ(t,y) ? Movie(t,y,l,c,s,p) AND l ?
100 AND s Fox
31Recursive query in Datalog
- ancestor(x,y) ? parent(x,y)ancestor(x,y) ?
parent(x,z) and ancestor(z,y) - Query find all ancestors of Park ?
ancestor(x, Park) // recursive query - Such query is beyond the expressiveness of
relational algebra expression - Note
- there is only one recursive predicate in the body
(linear recursion) - there is a linear variable pattern
- the recursive predicate is always positive
- there is always an exit rule
32Least fixed point
- Consider an equation Rf(R) where f is a
relational algebra expression - A least fixed point of the equation is a relation
R such that - R f(R) and
- If R is any relation such that R f(R), then R?
R - Thm A unique least fixed point exists if f is
monotonic, in which in the context of the partial
order ? on relations means thatif R1 ? R2 ,
then f(R1) ? f(R2) - Thm Any relational algebra expression that does
not use the set difference operator is monotonic
33Computing the least fixed point
- FollowOn(x,y) ? sequel(x,y)FollowOn(x,y) ?
sequel(x,z) and FollowOn(z,y) - first round third round
- second round
34Non-linear recursion example 1
- Flight(airline, from, to, departs,
arrives)Reaches(x,y) ? Flight(a,x,y,d,r)Reaches(
x,y) ? Reaches(x,z) and Reaches(z,y) - The first round generates 7 tuples
- The second round generates 3 tuples (SF,CHI),
(DEN,NY), (SF,NY) - The third round generates no new tuple, so stop
35Another example
- Flight(airline, from, to, departs,
arrives)connects(x,y,d,r) ? Flight(a,x,y,d,r)con
nects(x,y,d,r) ? connects(x,z,d,t1) ?
connects(z,y,t2,r) ? t1 ? t2-100 - fourth round gives no new tuples, so stop
36Negated recursion example 1
- UAreaches defines pairs of cities such that UA
flies UAreaches(x,y) ? Flight(UA,x,y,d,r)UAreach
es(x,y) ? UAreaches(x,z) ? UAreaches(z,y) - AAreaches defines pairs of cities such that AA
flies AAreaches(x,y) ? Flight(AA,x,y,d,r)AAreach
es(x,y) ? AAreaches(x,z) ? AAreaches(z,y) - UAonly defines pairs of cities such that UA flies
but AA does notUAonly(x,y) ? UAreaches(x,y) ?
?AAreaches(x,y) - First find instances of UAreaches and AAreaches,
then perform set difference
37Negated recursion example 2
- R(0) P(x) ? R(x) ? ?Q(x) // P R ? Q
Q(x) ? R(x) ? ?P(x) // Q R ? P - Informally, an element x in R is either in P or
in Q, but not both - Since R contains only one tuple (0), we know only
(0) can be in either P or Q, but not both - Two solutions are possible
- P (0), Q ?
- P ?, Q (0)
- Since there are two least fixedpoints, we cannot
answer a simple question such as Is P(0) true !
38Stratification
- Confine us to recursions in which negation is
stratified - SQL3 supports stratified recursion only
- Stratification
- Draw a graph whose node is IDB predicate
- Draw an arc from node A to node B if a rule has A
? B ... - Attach - to the edge if B is negative
- If the graph has a cycle with one or more
negative arcs, the recursion is not stratified - Stratum of a predicate A is the largest number of
negative arcs on a path beginning from A - Evaluate predicate with lower strata first
39Stratification example 1
- UAreaches(x,y) ? Flight(UA,x,y,d,r)UAreaches(x,y)
? UAreaches(x,z) ? UAreaches(z,y)
AAreaches(x,y) ? Flight(AA,x,y,d,r)AAreaches(x,y
) ? AAreaches(x,z) ? AAreaches(z,y)UAonly(x,y) ?
UAreaches(x,y) ? ?AAreaches(x,y) - AAreaches and UAreaches are in stratum 0UAonly
has stratum 1 - Hence, evaluate AAreaches and UAreaches first
40Stratification example 2
- P(x) ? R(x) ? ?Q(x) Q(x) ? R(x) ? ?P(x)
- There is a negative cycle, hence the rules are
not stratified
41Constraints in relation model
- Constraints are conditions that must hold on all
valid relation instances - Three major constraints
- key constraints
- entity integrity constraint
- referential integrity constraint
42Entity Integrity Constraints
- The primary key attributes PK of each relation
schema R cannot have null values in any tuple of
r(R) - This is because primary key values are used to
identify the individual tuples - tPK ? null for any tuple t in r(R)
- Other attributes of R may be similarly
constrained to disallow null values, even though
they are not members of the primary key
43Why referential integrity in relational model
- Values in Dep_id of Student should appear in NO
of Department (Why?) - Dep_id is said to reference to NO of Department
- Dep_id is called foreign key
- Student referential relation, Department
referenced relation - A directed arc from Dep_id of Student to NO of
Department
44Referential Integrity Constraints
- A constraint involving two relations (the
previous constraints involve a single relation). - Tuples in the referencing relation R1 have
attributes FK (foreign key attributes) that
reference the primary key attributes PK of the
referenced relation R2. A tuple t1 in R1 is said
to reference a tuple t2 in R2 if t1FK t2PK
or t1 is null. - displayed as a directed arc from R1.FK to R2.PK
- May exist at one relation !
45Referential constraint example
- Movie(title, year, length, inColor, studioName,
producerC)MovieExec(name, address, cert,
netWorth) - Producer of every movie would have to appear in
the MovieExec relation - ?producerC(Movie) ? ?cert(MovieExec)
or?producerC(Movie) ?cert(MovieExec)
?(Presuming that NULL is not allowed in
producerC of Movie)
46Additional constraint examples
- MovieStar(name, address, gender, birthdate)
- Functional dependency constraint
- name ? address
- That fd can be expressed with relational algebra
- ?MS1.name MS2.name and MS1.address ?
MS2.address(MS1 ? MS2) ?where MS1 denotes
?MS1(name,address,gender,birthdate)(MovieStar) - Domain constraint
- The only legal value for the gender attribute are
F and M - ?gender ? F and gender ? M(MovieStar) ?
47Constraints should be enforced wrt updates
- DB update operations INSERT, DELETE, MODIFY
tuple - Integrity constraints should not be violated by
the update operations - Several update operations may have to be grouped
together (i.e. transaction) - Updates may propagate to cause other updates
(trigger) to be performed automatically to
maintain integrity constraints - In case of integrity violation, several actions
can be taken - cancel the operation that causes the violation.
- perform the operation but inform the user of the
violation. - trigger additional updates so the violation is
corrected. - execute a user-specified error-correction routine.
48Relational Operations on Bags
- Commercial database systems rarely support notion
of set, instead do support multiset (bag) as
relation - Relations to be bags rather than sets can speed
up operations on relations - projection operation
- bag allows us to work with each tuple
independently - With set, we need to compare with the result of
all other projected tuples - Aggregate (say average) operation
49Union, intersection, and difference of bags
- if R and S are bag in which the tuple t appears n
and m times respectively, - tuple t appears mn times in R?S
- tuple t appears min(n,m) times in R?S
- tuple t appears max(0, n-m) times in R?S
50Projection of bags
- Each tuple is processed independently during the
projection - duplicate tuples are not eliminated from the
result of a bag-projection - ?A,B(R)
51Selection on bag
- Apply the selection condition to each tuple
independently - do not eliminate duplicate tuples in the result
- ?C?6(R)
52Product of bags
- Each tuple of one relation is paired with each
tuple of the other, regardless of whether it is a
duplicate or not - if a tuple r appears in relation R m times, and
tuple s appears n times in relation S, then the
tuple rs will appear mn times in the product R ?
S
53Joins of bags
- Compare each tuple of one relation with each
tuple of the other. - When constructing the answer, do not eliminate
duplicate tuples - Example
54Datalog Rules Applied to Bags
- The techniques for computing selections,
projections and joins of bags can be applied to
Datalog rules - do not eliminate duplicates from the head
- H(x,z) ? R(x,y) AND S(y,z)
- H(x,y) ? S(x,y) AND x gt 1H(x,y) ? S(x,y) AND y lt
5
55Algebraic laws for bags
- An algebraic law is an equivalence between two
expressions of relational algebra whose arguments
are variables standing for relations - the commutative law for union R ? S ? S ? R
- There are a number of laws that hold with set,
but do not hold with bag - Distributive law of set difference over union(R
? S) T ? (R T) ? (S T) (Think about WHY
!!!)
56Extensions to the relational model
- Other concepts and operations that are not a part
of the formal relational model but appear in real
query language - Modifications
- Aggregations
- Views
- Null value