Title: InstanceIndependent Concurrency Control for Semistructured Databases
 1Instance-Independent Concurrency Control for 
Semistructured Databases
- Jan Paredaens, Jan Hidders enStijn Dekeyser 
- ADReM onderzoeksgroep, Universiteit Antwerpen
2Problem Statement (1/4)
Concurrency Control for Semistructured Data?
-  Access additions, deletions, path expressions
-  Use tree-shape of data, tree-shape is the data
-  Path locks on instance nodes
Instance independent locking?
-  Inst. dep. locking leads to many locks
-  Instances are big, transactions small
3Problem Statement (2/4)
- Example Inst.-dep locking
//child//hobby
//child//hobby
Doc. root
//child//hobby
child//hobby
document
//child//hobby
child//hobby
//child//hobby
child//hobby
person
person
//hobby
//child//hobby
hobby
child//hobby
child
child
age
name
addr
hobby
addr
name
age
//child//hobby
child//hobby
//hobby
hobby
person
person
age
name
addr
hobby
hobby
age
name
addr 
 4Problem Statement (3/4)
Group
Transaction T3  Add(Group,member,Person2) Transac
tion T4  Add(Person2,hobby,Cycling)
member
Person1
hobby
Cycling
Schedule 4 T3  Add(Group,member,Person2) T4  
Add(Person2,hobby,Cycling) Serial - defined
Schedule 5 T4  Add(Person2,hobby,Cycling) T3  
Add(Group,member,Person2) Not defined (for any 
document)
Schedule 6 T4  Add(Person2,hobby,Cycling) Defin
ed (not defined for documents 
without Person2) 
 5Problem Statement (4/4)
-  Some schedules are defined for some input 
 documents,
- not for others 
-  Some schedules are serializable for some input 
 documents,
- not for others 
- Characterize the schedules for which there is at 
 least one
- input document for which they are defined and 
 that are
- serializable for all input documents for which 
 they are defined.
-  input documents have no DTD nor XML-schema 
-  schedules are given completely, not 
 incrementally
6Path expressions and the paths they represent
Let a, b, c be labels of edges a L(a)  
a a/b L(a/b)  a/b a//b L(a//b)  a/a/b, 
a/b/b, a/c/b,   a//b L(a//b)  a/b, a/c/b, 
a/c/a/b/c/b,   . L(.)  e  
 7Queries, Additions, Deletions
-  XQuery 
-  XUpdate 
- Query(n, pe) DT   m  there is a path in the 
 document tree
-  DT 
 from n to m that is labeled with
-  a 
 string of L(pe)
- Add(n, l, n) DT  DT ? (n, l, n), only 
 defined if the result
-  is a document tree 
- Del(n, l, n) DT  DT - (n, l, n), only 
 defined if (n, l, n) is
-  
 in DT and the result is a
-  
 document tree
8Action, Transaction, Schedule
An action  (o, t) o  Add, Del, Query t  
transaction identifier A transaction is a 
sequence of actions with the same transaction 
identifier A schedule over a set of 
transactions is an interleaving of these 
transactions 
 91
a
2
Example 
 101
(Add(1,b,3), t1 )
a
b
2
3
Example 
 111
(Add(1,b,3), t1 ) (Add(1,a,4), t1 )
a
a
b
2
3
4
Example 
 121
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 )
a
a
b
2
3
4
a
5
Example 
 131
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) ?
a
a
b
2
3
4
a
5
Example 
 141
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,5), t2 )
a
a
b
2
3
4
a
5
Example 
 151
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,5), t2 )
a
a
b
2
3
4
a
5
Example 
 161
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 )
a
a
b
2
3
4
b
a
5
6
Example 
 171
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 )
a
a
b
2
3
4
b
a
5
6
c
7
Example 
 181
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 ) 6,7
a
a
b
2
3
4
b
a
5
6
c
7
Example 
 191
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 
) (Del(1,b,4), t3 )
a
a
b
2
3
4
b
a
5
6
c
7
Example 
 201
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 
) (Del(1,b,4), t3 )
a
a
b
2
3
4
b
a
5
6
c
7
Example 
 211
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 
) (Del(2,b,6), t1 )
a
a
b
2
3
4
b
a
5
6
c
7
Example 
 221
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 
) (Del(2,b,6), t1 )
a
a
b
2
3
4
b
a
5
6
c
7
Example 
 231
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 
) (Del(6,c,7), t1 )
a
a
b
2
3
4
b
a
5
6
Example 
 241
(Add(1,b,3), t1 ) (Add(1,a,4), t1 ) (Add(3,a,5), 
t2 ) (Query(1,a//), t1 ) (Add(2,b,6), t1 
) (Add(6,c,7), t3 ) (Query(1,a//), t1 
) (Del(6,c,7), t1 ) (Query(1,a//), t2 ) 6
a
a
b
2
3
4
b
a
5
6
Example 
 25Defined, correct, equivalence (1/?) 
A schedule S is called defined on a document tree 
DT iff the sequence of actions (Adds and Dels) 
of S is defined on DT. A schedule S is called 
correct if there is at least one DT on which S 
 is defined. Two correct schedules S1 and S2 
over the same set of transactions are called 
equivalent on DT if they are both defined on DT, 
 S1DT  S2DT and the corresponding queries 
give the same result. 
 26Defined, correct, equivalence (2/?) 
Two correct schedules over the same set of 
transactions are called equivalent if they are 
defined on the same set of DTs and they are 
 equivalent on these DTs. A schedule is called 
serializable if it is equivalent with a 
serial schedule.
S1 (Add(1, a, 2), t1) (Del(1, a, 2), t2) (Add(1, 
a, 2), t1) S1 is correct S1 is not serializable 
since t1 is not correct. 
 271
1
1
Example
a
b
2
2
DT1
DT2
DT3
S1  (Add(2, b, 3),t1) (Query(1,a/b),t2)
S2  (Query(1,a/b),t2) (Add(2, b, 3),t1) 
S1DT1 /? S2DT1 S1 DT2 ? S2 DT2 S1 DT3 
and S2 DT3 not defined 
 28Example
S1  (Add(2, b, 3),t1) (Query(1,a/b),t2)
S2  (Query(1,a/b),t2) (Add(2, b, 3),t1)
S1 and S2 are defined on the same set of DTs and 
are not (necessarily) equivalent on these DTs
S3  (Add(2, b, 3),t1) (Add(2, b, 4),t2)
S4  (Add(2, b, 4),t2) (Add(2, b, 3),t1)
S3 and S4 are defined on the same set of DTs and 
are equivalent on these DTs
 S5  (Add(2, b, 3),t1) (Del(2, b, 3),t2)
S6  empty
S5 and S6 are not defined on the same set of DTs 
 29Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) NOT EQUIVALENT 
 30Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), t1) 
EQUIVALENT 
 31Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), t1) 
EQUIVALENT 
 32Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), 
t1) (Del(4, c, 6), t2) (Del(4, c, 7), t1) NOT 
EQUIVALENT 
 33Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), 
t1) (Del(4, c, 6), t2) (Del(4, c, 7), 
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) 
EQUIVALENT 
 34Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), 
t1) (Del(4, c, 6), t2) (Del(4, c, 7), 
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2, 
c, 8), t1) NOT EQUIVALENT 
 35Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), 
t1) (Del(4, c, 6), t2) (Del(4, c, 7), 
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2, 
c, 8), t1) (Query(1, b), t2) (Query(1, b), t2) 
NOT EQUIVALENT 
 36Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), 
t1) (Del(4, c, 6), t2) (Del(4, c, 7), 
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2, 
c, 8), t1) (Query(1, b), t2) (Query(1, b), 
t2) (Add(2, c, 8), t1) EQUIVALENT 
 37Example
S1 S2 (Add(1, a, 2), t1) (Add(1, b, 3), 
t2) (Add(1, b, 3), t2) (Add(1, a, 2), 
t1) (Del(4, c, 5), t1) (Del(4, c, 5), 
t1) (Del(4, c, 6), t2) (Del(4, c, 7), 
t1) (Del(4, c, 7), t1) (Del(4, c, 6), t2) (Add(2, 
c, 8), t1) (Query(1, b), t2) (Query(1, b), 
t2) (Add(2, c, 8), t1) (Query(1, a), t1) 
 (Query(1, a), t1) EQUIVALENT 
 38Results (1/2)
Is it decidable whether a given transaction is 
correct? Is it decidable whether a given 
schedule is correct? Is it decidable whether two 
given transactions are equivalent? Is it 
decidable whether two given schedules are 
equivalent? Is it decidable whether a given 
schedule is serializable? 
 39Results (2/2)
Is it decidable whether a given transaction is 
correct? YES! Is it decidable whether a given 
schedule is correct? YES! Is it decidable 
whether two given transactions are equivalent? 
YES! Is it decidable whether two given schedules 
are equivalent? YES! Is it decidable whether a 
given schedule is serializable? YES! 
 40Correctness of queryless schedules (1/2)
-  Correctness has nothing to do with queries 
- Consider queryless schedules (QL schedules). 
-  The following conditions are necessary and 
 sufficient
-  for correct QL schedules  
- Between (Add(n,a,n1),t1) and (Add(n2,b,n),t2) 
 there is (Del(n,a,n1),t3)
- Between (Add(n1,a,n),t1) and (Add(n2,b,n),t2) 
 there is (Del(n1,a,n),t3)
- Between (Add(n,a,n1),t1) and (Del(n2,b,n),t2) 
 there is (Del(n,a,n1),t3)
- Between (Add(n1,a,n),t1) and (Del(n,b,n2),t2) 
 there is (Add(n,b,n2),t3)
- Between (Add(n1,a,n),t1) and (Del(n2,b,n),t2) 
 there is (Del(n1,a,n),t3), (n1,a) ltgt (n2,b)
- Between (Del(n,a,n1),t1) and (Add(n2,b,n),t2) 
 there is (Del(n3,c,n),t3)
- Between (Del(n1,a,n),t1) and (Add(n,b,n2),t2) 
 there is (Add(n3,c,n),t3)
- Between (Del(n1,a,n),t1) and (Del(n,b,n2),t2) 
 there is (Add(n3,c,n),t3)
- Between (Del(n1,a,n),t1) and (Del(n2,b,n),t2) 
 there is (Add(n2,b,n),t3)
41Correctness of queryless schedules (2/2)
-  It is decidable whether a schedule (a 
 transaction) is
- correct in O(n3) time, n being the length of the 
 schedule
- (transaction), and constant space. 
-  SDT  DT ? ADD(S)  DEL(S) 
-  if S is defined on DT 
-  ADD(S)  edges e whose last occurrence in S is 
 Add(e)
-  DEL(S)  edges e whose last occurrence in S is 
 Del(e)
42Equivalence of correct QL schedules (1/5)
 Let S1 be correct and equivalent with the serial 
S2. We cannot necessarily go from S1 to S2 by 
swapping actions  S1 S2 (Add(1,a,2),t1) (Ad
d(1,a,2),t1) (Del(1,a,2),t2) (Add(1,b,3),t1) (Ad
d(1,b,3),t2) (Del(1,b,3),t1) (Del(1,b,3),t2) (
Del(1,a,2),t2) (Add(1,b,3),t1) (Add(1,b,3),t2) 
(Del(1,b,3),t1) (Del(1,b,3),t2) 
 43Equivalence of correct QL schedules (2/5)
 Let S1 be correct and equivalent with the serial 
S2. We cannot necessarily go from S1 to S2 by 
swapping actions  S1 S2 (Add(1,a,2),t1) (Ad
d(1,a,2),t1) (Del(1,a,2),t2) (Add(1,b,3),t1) (Ad
d(1,b,3),t2) (Del(1,b,3),t1) (Del(1,b,3),t2) (
Del(1,a,2),t2) (Add(1,b,3),t1) (Add(1,b,3),t2) 
(Del(1,b,3),t1) (Del(1,b,3),t2) 
 44Equivalence of correct QL schedules (3/5)
 Let S1 be correct and equivalent with the serial 
S2. We cannot necessarily go from S1 to S2 by 
swapping actions  S1 S2 (Add(1,a,2),t1) (Ad
d(1,a,2),t1) (Del(1,a,2),t2) (Add(1,b,3),t1) (Ad
d(1,b,3),t2) (Del(1,b,3),t1) (Del(1,b,3),t2) (
Del(1,a,2),t2) (Add(1,b,3),t1) (Add(1,b,3),t2) 
(Del(1,b,3),t1) (Del(1,b,3),t2) Remark that 
S1 is not equivalent with the other serial 
schedule S3. 
 45Equivalence of correct QL schedules (4/5)
- NI(S)  the nodes that must belong to DTs on 
 which S is defined
-   m  first occurrence of m 
 has the form Add(m,l,n), Del(m,l,n), Del(n,l,m)
- N-I(S)  the nodes that may not belong to DTs on 
 which S is defd
-   m  first occurrence of m 
 has the form Add(n,l,m)
- EI(S)  the edges that must belong to DTs on 
 which S is defined
-   e  first occurrence of m 
 has the form Del(e)
- E-I(S)  the edges that may not belong to DTs on 
 which S is defd
-   e  see paper 
-  NI(S), N-I(S), EI(S) and E-I(S) are correct 
-  NI(S), N-I(S), EI(S) and E-I(S) can be 
 calculated in O(n2) time and
-  O(n) space
46Equivalence of correct QL schedules (5/5)
-  S1 and S2, QL transactions or schedules over the 
 same set
-  of transactions are equivalent iff 
-  - NI(S1)  NI(S2) 
-  - N-I(S1)  N-I(S2) 
-  - EI(S1)  EI(S2) 
-  - E-I(S1)  E-I(S2) 
-  
-  The equivalence of two QL transactions or 
 schedules
-  over the same set of transactions can be 
 decided in O(n2) time
-  and O(n) space.
47Output Sets vs. Input Sets (1/2)
- NO(S)  the nodes that must belong to SDT 
-   m  last occurrence of m has 
 the form Add(m,l,n), Del(m,l,n), Add(n,l,m)
- N-O(S)  the nodes that may not belong to SDT 
-   m  last occurrence of m has 
 the form Del(n,l,m)
- EO(S)  the edges that must belong to SDT 
-   e  last occurrence of m has the 
 form Add(e)
- E-O(S)  the edges that may not belong to SDT 
-   e  see paper 
-  NO(S), N-O(S), EO(S) and E-O(S) are correct 
-  NO(S), N-O(S), EO(S) and E-O(S) can be 
 calculated in O(n2)
-  time and O(n) space 
48Output Sets vs. Input Sets (2/2)
-  If S1 and S2 are correct transactions or 
 schedules
-  then S1.S2 is correct iff 
-  N-O(S1) ? NI(S2)  ?, E-O(S1) ? EI(S2)  ?, 
-  NO(S1) ? N-I(S2)  ? , EO(S1) ? E-I(S2)  ? 
-  If S1, S2, , Sk, S1.S2Sk, are k1 correct 
 schedules then
-  NI(S1Sk)  ?i1..k(Ni(Si) - ?jlti N-i(Sj)) 
-  N-I(S1Sk)  ?i1..k(N-i(Si) - ?jlti Ni(Sj)) 
-  EI(S1Sk)  ?i1..k(Ei(Si) - ?jlti E-i(Sj)) 
-  E-I(S1Sk)  ?i1..k(E-i(Si) - ?jlti Ei(Sj))
49Main Results
-  Given a QL schedule S of k transactions and n 
 actions. It is decidable whether S is
 serializable in time O(f(k).n3) where f(k) can
 be exponential in k and in space O(k.n).
-  Given a correct schedule S of k transactions 
 and n actions. It is decidable whether S is
 serializable in time O(f(k).n6) where f(k) can
 be exponential in k and in space O(n2).
 to be continued