Processing XML Streams with Deterministic Automata - PowerPoint PPT Presentation

About This Presentation
Title:

Processing XML Streams with Deterministic Automata

Description:

V IN $N/text()='Galaxy' Conversion of XPath expressions into ... Only child and descendant axes. All predicates of a query must fire before the target element ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 18
Provided by: den879
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: Processing XML Streams with Deterministic Automata


1
Processing XML Streams with Deterministic Automata
  • Denis Mindolin
  • Gaurav Chandalia

2
Introduction
XML data stream
XPath query 1
XPath query 2
XML Stream Router
XPath query 3
Consumer 1
Consumer 2
Consumer 3
3
Related Work
  • The problem was introduced in Altinel and
    Franklin 2000 for a system XFilter.
  • Chan et al. 2002 describes techniques to solve
    the problem based on a trie (XTrie)
  • Diao et al. 2003 discusses a method based on
    optimized NFAs(YFilter)
  • Green et al. 2003 introduces how to solve the
    problem using lazy DFA

4
DFA approach in general
  • Convert the set of XPath expressions into the set
    of NFAs
  • Convert the set of NFAs into a single NFA
  • Convert the single NFA into a DFA
  • Process XML data stream with DFA (using SAX model)

5
DFA approach in general (cont)
  • Linear XPath expression
  • P /N //N PP
  • N E A text() text() S
  • where
  • E element label
  • A attribute label
  • / - child axis
  • // - descendant axis
  • - wild card
  • S constant string

What about predicates? To be decomposed into
linear XPath expressions
6
DFA approach in general (cont)
  • Consider two XPath expressions
  • /datasets/dataset//tableHead///text()Galaxy/
    title
  • /datasets/dataset/history/tableHead/field
  • Corresponding query tree
  • D IN R/datasets/dataset
  • H IN D/history
  • T IN D/title sax f true
  • TH IN D/tableHead sax f true
  • N IN D//tableHead//
  • F IN TH/field
  • V IN N/text()"Galaxy"

7
Conversion of XPath expressions into NFA and DFA
Query tree
Query NFA
Query DFA
X IN R/a Y IN X///b Z IN X/b/ U IN Z/d
8
Eager DFA vs. Lazy DFA
  • DFA is eager if it is obtained by the standard
    algorithm of conversion of NFA to DFA Hopcroft
    and Ullman 1979
  • DFA is lazy if it is constructed at run-time on
    demand. Initially it has a single state and
    whenever we attempt to make a transition into a
    missing state we compute it and update a
    transition.

9
Eager DFA
  • P p0 // p1 // // pk
  • pi N1 / N2 / / Nni
  • k of //s
  • ni length of pi, i0,,k
  • m max of s in each pi
  • n length (or depth) of P, i.e.
  • s alphabet size ?

Theorem. Given a linear XPath expression P,
define prefix(P) n0, and body(P)

when kgt0, and body(P) 1 when k 0. Then eager
DFA for P has at most prefix(P) body(P) states.
In particular, if m 0 and k ?1, then DFA has at
most (n1) states.
10
Lazy DFA. Example
DFA
Queries
1
\a\\\b \a\b\\d
a
2
Sample XML document

b


ltagt
3
7
ltbgt


b
ltbgt
6

b
ltd/gt
4
b
d
8
lt/bgt
d
b
lt/bgt
b

lt/agt
5
b
11
Lazy DFA
Graph schema (based on DTD)
d the maximum number of simple cycles
that a simple path can intersect D the
total number of nonempty, simple paths
starting at the root
d 2, D 13
12
Lazy DFA (cont)
  • Theorem. Consider a graph schema with d, D, and
    let Q be set of XPath expressions of maximum
    depth n. Then on any XML input satisfying the
    schema, the lazy DFA has at most 1 D(1n)d
    states
  • Corollary. The number of states of lazy DFA does
    not depend on the number of XPath expressions,
    only on their depth.
  • If n 10, and the number of XPath expressions is
    equal to 100,000.
  • Eager DFA may have ? 2100,000 states
  • Lazy DFA will have ? 1574 states

13
Lazy DFA. Implementation
  • To process XML stream, it uses SAX model
  • The subset of XPath considered in the
    implementation
  • No text() and attribute values tests
  • Only child and descendant axes
  • All predicates of a query must fire before the
    target element

14
Restrictions of the implementation
XPath queries
Sample XML document
1. All predicates fire before the target element
ltcoursesgt ltcoursegt367-203lt/coursegt
lttitlegtMEDIA WORKSHOPlt/titlegt ltlevelgtUlt/levelgt
ltsectiongt ltsectiongtSe 101lt/sectiongt
ltdaysgtTlt/daysgt lthoursgt
ltstartgt130pmlt/startgt
ltendgt520pmlt/endgt lt/hoursgt lt/sectiongt
ltcreditsgt1-3lt/creditsgt lt/coursesgt
\\courseslevel\section
2. Predicates fire between the starting and
closing tags of the target element
\\coursesdays\section
3. Predicates fire after the target element
\\coursescredits\section
15
Processing attributes
  • When processing a stream, all attributes are
    converted into elements

ltsection_listinggt ltsection nameSe 101
description/gt lthours
start"130pm end"520pm"/gtlt/sect
ion_listinggt
ltsection_listinggt ltsectiongt lt_at_namegtSe
101lt/_at_namegt lt_at_description/gt lt/sectiongt
lthoursgt lt_at_startgt130pmlt/_at_startgt
lt_at_endgt520pmlt/_at_endgt lt/hoursgt lt/section_listinggt
16
Testing
  • Reference implementation Galax 1.0.3.5
  • Testing XML stream World geographic database
    http//www.cs.washington.edu/research/xmldatasets/
    data/mondial/mondial-3.0.xml (1MB)
  • Maximum XML depth of the stream was 6
  • Number of queries was 14
  • The depth of queries had a range of 1 to 5
  • The number of predicates had a range of 0 to 3
  • The depth of predicates had a range of 1 to 4

Method used Number of states used
NFA 22
Eager DFA 87
Lazy DFA 22
17
Reference
  • Todd J. Green et al, Processing XML Streams with
    Deterministic Automata and Stream Indexes,, ACM
    Transactions on Computational Logic, 12/2004
  • Altinel, M. and Franklin, M. 2000. Efficient
    filtering of XML documents for selective
    dissemination, In Proceedings of VLDB. Cairo
  • Chen J et al, 2000, NiagaraCQ a scalable
    continuous query system for internet databases.
    In Proceedings of the ACM/SIGMOD Conference on
    Management of Data
  • Diao, Y. and Franklin, M. 2003. Query processing
    for high-volume XML message brokering. In
    Proceedings of VLDB. Berlin, Germany.
  • John E. Hopcroft, Jeffrey D. Ullman 1987,
    Introduction to automata theory, languages, and
    computation
Write a Comment
User Comments (0)
About PowerShow.com