A Survey of View Maintenance for SemiStructured Data - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

A Survey of View Maintenance for SemiStructured Data

Description:

Path Expression Views. Arsany Sawires (UC Santa Barbara NEC Intern) Junichi Tatemura (NEC Labs) ... Maintenance of Path Expression Views. SIGMOD 2005. 4. The ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: neclaborat
Category:

less

Transcript and Presenter's Notes

Title: A Survey of View Maintenance for SemiStructured Data


1
Incremental Maintenance of Path Expression Views
SIGMOD Conference, June 2005 Baltimore, Maryland
Arsany Sawires (UC Santa Barbara NEC
Intern) Junichi Tatemura (NEC Labs) Oliver Po
(NEC Labs) Divyakant Agrawal (NEC Labs) K.
Selcuk Candan (NEC Labs)
2
Incremental View Maintenance
Source Data
Source Data
Views
Views
Aux. Data
  • Generally, views could be non-self-maintainable,
    So
  • - A solution may need to issue source queries ?
    (time)
  • - A solution may need to keep auxiliary data ?
    (space)
  • - A solution can restrict the view specification
    language
  • - Our Approach several small queries, linear
    space, and practical path expressions

3
Sample of Related Work
  • Susan Davidson, elegant approach - limited view
    language, may ask complex source queries
  • Rudenstiner powerful query languageauxiliary
    data size can grow with the data source size
    regardless of the actual result size

4
The Role of Path Expressions
  • Path Expressions are the data selection operators
    in XPath/XQuery
  • FOR x IN Path_Expression_1FOR y
    IN Path_Expression_2 WHERE Conditions on x and
    yRETURN XML-Fragment from x and y
  • Source Updates directly affect the results of the
    path expression operators.

5
Outline
  • The XML Data Model
  • View Model Path Expressions
  • Auxiliary Data
  • Source Updates
  • The Maintenance Approach
  • Experimental Results
  • Conclusion and Discussion

6
The XML Data Model
  • An XML document is an ordered labeled tree.
  • Every node has
  • Label Element / attribute name or value.
  • ID Unique Identifier (Primary key)

7
Path Expressions
  • A path expression is a sequence of N steps
  • Each step has an axis test(/or //), label
    test (can be a ), and an optional predicate
    test.
  • Example an expression with N3 /A /B1
    Count(//E) 2 /C
  • Predicates can be any arbitrary Boolean
    conditions referencing the subtree of the tested
    node
  • If a predicate does not exist at some step, then
    it is T for all nodes in the tree
  • A logical pipelined execution starting at the
    expression context Cntxt
  • Example

8
Example Path Expression
  • Cntxt (X1 X2)
  • /A /B1 Count(//E) 2 /C

Res1 (A1 A2)
Res2 (B1 B3 B4)
Res3 Res (C1 C3)
Int. Results can be as large as the source data
Def Predi(n) is the value of predi at node
nPred2(B1) T Pred2(B3) T Pred1(?)
TPred2(B2) F Pred1(E5) F
9
Auxiliary Data
(X1 X2) /A /B1 Count(//E) 2 /C
  • Auxiliary data are the Result Paths
  • No auxiliary data saved for predicate evaluation
  • Auxiliary Data size is bounded as O(N.M), where
  • N Path expression size (number of steps)
  • M Result size (num of nodes in final result)

10
Outline
  • The XML Data Model
  • The View Model Path Expressions
  • Auxiliary Data
  • Source Updates
  • The Maintenance Approach
  • Experimental Results
  • Conclusion and Discussion

11
Source Updates
  • Two primitives Add/Delete leaf. Example Add E5
  • Report the full path of the update leaf U.path
  • Direct View Effect Add B2 to Res2
  • Indirect View Effect Add C2 to Res3
  • i.e. Direct effects happen by changing Predi(n)
    (T?F or F?T)
  • Non-monotonic effectsSource Add ? View
    Add/DelSource Del ? View Add/Del

B4
12
Intuitions
  • The final view result Res can be affected through
    affecting any step intermediate result Resi (i
    N)
  • Every indirect addition is caused by a direct
    addition
  • Every indirect deletion is caused by a direct
    deletion
  • So, We solve the problem in two phases
  • 1- Discover direct effects
  • 2- Use the direct effects to find out the
    indirect ones

13
Discovering Direct Effects
  • Let n be added / deleted at source
  • U.path is reported
  • What are the necessary conditions todirectly
    add/del a node at some step?
  • Example Node B1 at step 21- B1 must belong to
    U.path2- B1.label must match step2.label3- B1
    must have qualified ancestors 4- Pred2(B1) must
    be changed by U
  • Conditions 1-3 can be checked using U.path with
    no source queries
  • Condition 4 needs source queries

R //Apred1 //Bpred2 //Cpred3 //Dpred4
B1
14
Maintenance Outlines
  • - We apply AxesLabel tests to U.path to get
    candidate sequences Can1 (A1) , Can2 (B1 ,
    B2)
  • A1 cant be a qualifying ancestor of B1 or B2
    unless it passes pred1
  • So, we conduct Predicate tests before moving to
    the next step
  • Algorithm OutlineFor Every step i 1 TO
    N Cani Apply AxisLabeli to Cani-1

R //Apred1 //Bpred2 //Cpred3 //Dpred4
Check changes of Predi on Cani Discover indirect
effects Filter Cani
15
Predicate tests
  • For every node in Cani we need to compute
    Prediafter and Predibefore
  • Prediafter requires a source query. Example
    Pred2after (B1)
  • However, Pred2before (B1) cant be computed by a
    source query. We use the auxiliary data.
  • If B1 belongs to ResultPath of some node in place
    2 ? Pred2before(B1) TIf not, Pred2before(B1)
    may be T or F
  • We proved that assuming it is F is always a safe
    assumption

R //Apred1 //Bpred2 //Cpred3 //Dpred4
B1
16
Spirit of the Proof
  • Problems can happen only if Predbefore is T and
    we assume it is F
  • Two cases
  • 1- Predafter F ? unnoticed direct deletion
  • 2- Predafter T ? wrong direct addition

17
Discovering Indirect Effects
R //Apred1 //Bpred2 //Cpred3 //Dpred4
  • Assume that B1 is identified as a direct addition
    or deletion at step 2
  • If B1 is directly added at step 2 ? issue a
    source query B1 //Cpred3 //Dpred4
  • If B1 is directly deleted at step 2 ? Use the
    Result Paths to find indirect deletions

R
A1
B1
B1
B2
E1
18
Filtering Between Steps
  • Spirit of the proof

19
Experimental Results DS1
  • Graph for DataSet1
  • Comment

20
Conclusion
  • The bottom line is that using the Update path and
    the Result Paths enables efficient and scalable
    view maintenance.
  • A general schema-free solution was provided.
  • A practical standard language of path
    expressions is supported.
  • The auxiliary data used is O(M.N).
  • Good time performance is achieved by issuing
    multiple small source queries.

21
Discussion
  • Predicate Evaluation without source queries
  • Batch Updates
  • ???

22
Thank you
Write a Comment
User Comments (0)
About PowerShow.com