Title: CSE544
1CSE544
- Wednesday, March 29, 2006
2Theory
- Relational databases invented by a theoretician
(Codd) - Fundamental principle separate the WHAT from the
HOW - data independence - WHAT First Order Logic (FO)
- HOW Relational algebra (RA)
3FO Syntax
- Given
- A vocabulary R1, , Rk
- An arity, ar(Ri), for each i1,,k
- An infinite supply of variables x1, x2, x3,
- Constants c1, c2, c3, ...
4FO Syntax
- Terms (t) and FO formulas (?) are
- t x c
- R(t1, ..., tar(R)) ti tj
- ? ? ? ? ? ? ??
- ?x.? ?x.?
5FO Examples
Most interesting case Vocabulary one binary
relation R (encodes a graph)
1
4
2
3
1 2
2 1
2 3
1 4
3 4
R
6FO Sentences
- Does there exists a loop in the graph ?
- Are there paths of length gt2 ?
- Is there a sink node ?
? ? ?x.R(x,x)
? ? ?x.?y.?z.?u.(R(x,y) ? R(y,z) ? R(z,u))
? ? ?x.?y.R(x,y)
7Semantics
- Given a vocabulary R1, , Rk
- A model is D (D, R1D, , RkD)
- D a set, called domain, or universe
- RiD ? D ? D ? ... ? D, (ar(Ri) times) i
1,...,k
8Semantics
- Given
- A model D (D, R1D, ..., RkD)
- A formula ?
- A substitution s x1, x2, ... ? D
- We define next the relationmeaning D
satisfies with s
D ?s
9Semantics
D (R(t1, ..., tn)) s
If (s(t1), ..., s(tn)) ? RD
D (t t) s
If s(t) s(t)
10Semantics
D (? ? ?) s
If D (?)s and D (?) s
D (? ? ?) s
If D (?)s or D (?) s
D (??) s
If not D (?)s
11First Order Logic Semantics
If for all s s.t. s(y) s(y) for all variables
y other than x, D (?)s
D (?x.?) s
D (?x.?) s
If for some s s.t. s(y) s(y) for
all variables y other than x, D (?)s
12FO and Databases
- FOa sentence ? is true in D if D ?
- Databasesa formula ? with free variables x1,
..., xn defines the query ?(D) (s(x1), ...,
s(xn)) D ?s
13FO Queries
- Find all nodes connected by a path of length 2
- Find all nodes without outgoing edges
?(x,y) ? ?u.(R(x,u) ? R(u,y))
?(x) ? ?u.(R(u,x) ? ?y.?R(x,y)
These are open formulas
14In Class
- Retrieve all nodes with at least two children
- A node x is more important than y if every child
of y is also a child of x. Retrieve all most
important nodes in the graph
15FO in Databases
FO Databases
Vocabulary R1, ..., Rn Database schema R1, ..., Rn
ModelD (D, R1D, , RkD) Database instanceD (D, R1D, , RkD)
Sentences are true or false Formulas compute queries
16FO Semantics
- In FO we express WHAT we want
- Sometimes its even unclear HOW to get it
- See accompanying slides on FO semantics
- They explain HOW to get it, but its impractical
17Relational Algebra
- An algebra over relations
- Five operators
- ?, -, ?, s, P
- Meaning
R1 ? R2 set union R1 - R2 set difference R1 ?
R2 cartesian product sc(R) subset of tuples
satisfying condition c Pa(R) projection on the
attributes in a
18FO ? RA
P1(s12(R))
?(x) ? R(x,x)
?
?(x,y) ? ?z.?u.(R(x,z) ? R(z,u) ? R(u,y))
?
P16(s23?45 (R ? R ? R))
P16 ((R join21 R) join41 R))
?
?(x) ? ?y.R(x,y)
?
WHAT
?
HOW
19FO v.s. RA
- Theorem. Every query in RA can be expressed in
FO - Proof
- This shows how to go from HOW to WHAT
- not very interesting
- What about the converse ?
20The Drinkers/Beers Example
- Vocabulary
- Find all drinkers that frequent some bar that
serve some beer that they like
Likes(drinker,beer), Serves(bar,beer),
Frequents(drinker,bar)
?(d) ? ?ba. ?be.(F(d,ba) ? L(d,be))
21Lots of Fun Examples (in class)
- Find drinkers that frequent some bar that serves
only beer they like - Find drinkers that frequent only bars that serve
some beer they like - Find drinkers that frequent only bars that serve
only beer they like
22Unsafe FO Queries
- Find all nodes that are not in the
graphwhats wrong ?
23Unsafe FO Queries
- Find all nodes that are connected to
everythingwhats wrong ?
24Unsafe FO Queries
- Find all pairs of employees or officeswhats
wrong ? - We dont want such queries !
25Safe Queries
- A model D (D, R1D, , RkD)
- In FO
- both D and R1D, , RkD may be infinite
- In databases
- D may infinite (int, string, etc)
- R1D, , RkD are always finite
- We call this a finite model
26Safe Queries
- ? is a finite query if for every finite model D,
?(D) is finite - ? is safe, or domain independent, if for every
two models D, D having the same relations
D (D, R1D, , RkD), D (D, R1D, , RkD)we
have ?(D) ?(D) - If ? is safe then it is also finite (why ?)
- Note book has different but equivalent definition
27Safe Queries
- Definition. Given D (D, R1D, , RkD), the
active domain is Da the set of all constants in
R1D, , RkD - Example. Given a graph D (D, R) Da x
?y.R(x,y) ? ?z.R(z,x) - Property. If a query is safe, it suffices to
range quantifiers only over the active domain
(why ?) - Hence we can compute safe queries
28Safe Queries
- The safe relational calculus consists only of
safe queries. However - Theorem It is undecidable if a given a FO query
is safe. - Need to write only safe queries, but how do we
know how which queries are safe ? - Work around write them in an obviously safe way
- Range restricted queries - formally defined in
AHU
29FO v.s. RA
- Theorem. Every safe query in FO can be expressed
in RA - Proof
- From WHAT to HOW
- this is really interesting and motivated the
relational model
30Limited Expressive Power
- Vocabulary binary relation R
- The following queries cannot be expressed in FO
- Transitive closure
- ?x.?y. there exists x1, ..., xn s.t.R(x,x1) ?
R(x1,x2) ? ... ? R(xn-1,xn) ? R(xn,y) - Parity the number of edges in R is even