Title: A Survey of View Maintenance for SemiStructured Data
1Incremental Maintenance of Path Expression Views
SIGMOD Conference, June 2005 Baltimore, Maryland
Arsany Sawires (UC Santa Barbara NEC
Intern) Junichi Tatemura (NEC Labs) Oliver Po
(NEC Labs) Divyakant Agrawal (NEC Labs) K.
Selcuk Candan (NEC Labs)
2Incremental View Maintenance
Source Data
Source Data
Views
Views
Aux. Data
- Generally, views could be non-self-maintainable,
So - - A solution may need to issue source queries ?
(time) - - A solution may need to keep auxiliary data ?
(space) - - A solution can restrict the view specification
language - - Our Approach several small queries, linear
space, and practical path expressions
3Sample of Related Work
- Susan Davidson, elegant approach - limited view
language, may ask complex source queries - Rudenstiner powerful query languageauxiliary
data size can grow with the data source size
regardless of the actual result size
4The Role of Path Expressions
- Path Expressions are the data selection operators
in XPath/XQuery - FOR x IN Path_Expression_1FOR y
IN Path_Expression_2 WHERE Conditions on x and
yRETURN XML-Fragment from x and y - Source Updates directly affect the results of the
path expression operators.
5Outline
- The XML Data Model
- View Model Path Expressions
- Auxiliary Data
- Source Updates
- The Maintenance Approach
- Experimental Results
- Conclusion and Discussion
6The XML Data Model
- An XML document is an ordered labeled tree.
- Every node has
- Label Element / attribute name or value.
- ID Unique Identifier (Primary key)
7Path Expressions
- A path expression is a sequence of N steps
- Each step has an axis test(/or //), label
test (can be a ), and an optional predicate
test. - Example an expression with N3 /A /B1
Count(//E) 2 /C - Predicates can be any arbitrary Boolean
conditions referencing the subtree of the tested
node - If a predicate does not exist at some step, then
it is T for all nodes in the tree - A logical pipelined execution starting at the
expression context Cntxt - Example
8Example Path Expression
- Cntxt (X1 X2)
- /A /B1 Count(//E) 2 /C
Res1 (A1 A2)
Res2 (B1 B3 B4)
Res3 Res (C1 C3)
Int. Results can be as large as the source data
Def Predi(n) is the value of predi at node
nPred2(B1) T Pred2(B3) T Pred1(?)
TPred2(B2) F Pred1(E5) F
9Auxiliary Data
(X1 X2) /A /B1 Count(//E) 2 /C
- Auxiliary data are the Result Paths
- No auxiliary data saved for predicate evaluation
- Auxiliary Data size is bounded as O(N.M), where
- N Path expression size (number of steps)
- M Result size (num of nodes in final result)
10Outline
- The XML Data Model
- The View Model Path Expressions
- Auxiliary Data
- Source Updates
- The Maintenance Approach
- Experimental Results
- Conclusion and Discussion
11Source Updates
- Two primitives Add/Delete leaf. Example Add E5
- Report the full path of the update leaf U.path
- Direct View Effect Add B2 to Res2
- Indirect View Effect Add C2 to Res3
- i.e. Direct effects happen by changing Predi(n)
(T?F or F?T) - Non-monotonic effectsSource Add ? View
Add/DelSource Del ? View Add/Del
B4
12Intuitions
- The final view result Res can be affected through
affecting any step intermediate result Resi (i
N) - Every indirect addition is caused by a direct
addition - Every indirect deletion is caused by a direct
deletion - So, We solve the problem in two phases
- 1- Discover direct effects
- 2- Use the direct effects to find out the
indirect ones
13Discovering Direct Effects
- Let n be added / deleted at source
- U.path is reported
- What are the necessary conditions todirectly
add/del a node at some step? - Example Node B1 at step 21- B1 must belong to
U.path2- B1.label must match step2.label3- B1
must have qualified ancestors 4- Pred2(B1) must
be changed by U - Conditions 1-3 can be checked using U.path with
no source queries - Condition 4 needs source queries
R //Apred1 //Bpred2 //Cpred3 //Dpred4
B1
14Maintenance Outlines
- - We apply AxesLabel tests to U.path to get
candidate sequences Can1 (A1) , Can2 (B1 ,
B2) - A1 cant be a qualifying ancestor of B1 or B2
unless it passes pred1 - So, we conduct Predicate tests before moving to
the next step - Algorithm OutlineFor Every step i 1 TO
N Cani Apply AxisLabeli to Cani-1
R //Apred1 //Bpred2 //Cpred3 //Dpred4
Check changes of Predi on Cani Discover indirect
effects Filter Cani
15Predicate tests
- For every node in Cani we need to compute
Prediafter and Predibefore - Prediafter requires a source query. Example
Pred2after (B1) - However, Pred2before (B1) cant be computed by a
source query. We use the auxiliary data. - If B1 belongs to ResultPath of some node in place
2 ? Pred2before(B1) TIf not, Pred2before(B1)
may be T or F - We proved that assuming it is F is always a safe
assumption
R //Apred1 //Bpred2 //Cpred3 //Dpred4
B1
16Spirit of the Proof
- Problems can happen only if Predbefore is T and
we assume it is F - Two cases
- 1- Predafter F ? unnoticed direct deletion
- 2- Predafter T ? wrong direct addition
17Discovering Indirect Effects
R //Apred1 //Bpred2 //Cpred3 //Dpred4
- Assume that B1 is identified as a direct addition
or deletion at step 2 - If B1 is directly added at step 2 ? issue a
source query B1 //Cpred3 //Dpred4 - If B1 is directly deleted at step 2 ? Use the
Result Paths to find indirect deletions
R
A1
B1
B1
B2
E1
18Filtering Between Steps
19Experimental Results DS1
- Graph for DataSet1
- Comment
20Conclusion
- The bottom line is that using the Update path and
the Result Paths enables efficient and scalable
view maintenance. - A general schema-free solution was provided.
- A practical standard language of path
expressions is supported. - The auxiliary data used is O(M.N).
- Good time performance is achieved by issuing
multiple small source queries.
21Discussion
- Predicate Evaluation without source queries
- Batch Updates
- ???
22Thank you