A TransducerBased XML Query Processor - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

A TransducerBased XML Query Processor

Description:

(Optional) free lunch with CS188 staff for winner. In-class face-off on 5/3 between ... speech recognition, machine translation, information extraction, dialog ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 43
Provided by: stude660
Category:

less

Transcript and Presenter's Notes

Title: A TransducerBased XML Query Processor


1
A Transducer-Based XML Query Processor
  • Presented by Yu Deng

2
Over View
  • Introduction
  • XML Stream Machine (XSM) Framework
  • Translation to XSM Networks
  • XSM Composition
  • Optimizations
  • Experimental Results

3
Introduction
  • Web service implementations and mediators
    exchange XML message via XML import/export
    mechanisms
  • A large number of important applications require
    extremely efficient processing and transformation
    of sequentially accessed streams

4
Introduction (cont.)
  • A qualitatively different architecture is needed
    where stream processing is performed on the fly
    and within minimum memory
  • We propose the XSM-based architecture and
    algorithms for the construction of XML Query
    processors that efficiently process XML streams
    on-the-fly

5
Introduction (cont.)
  • XSM the XML Stream Machine system is a novel
    XQuery processing paradigm that is tuned to the
    efficient processing of sequentially accessed XML
    data (streams).

6
Introduction (cont.)
XQuery
7
Introduction (cont.)
  • The Key challenge is in the XQuery to XSM
    compiler taking into account both the query and
    the schemas of the input stream

8
Introduction (cont.)
  • Goal
  • Minimize the computation performed for each
    incoming piece of stream data, i.e., reduce the
    number of tests, read and write actions.
  • Minimize the number and size of buffers
  • Pipeline the computation and write tokens in the
    output stream as soon as possible

9
  • Introduction
  • XML Stream Machine Framework
  • Translation to XSM Networks
  • XSM Composition
  • Optimizations
  • Experimental Results

10
XML Stream Machine Framework
  • XML and XML streams
  • In our Model, we consider sets of element names e
    and character data (strings) D.
  • XQuery expressions containing variables drawn
    from a set of variable names V

11
XML Stream Machine Framework (cont.)
  • An XML stream is a sequence of tokens, where the
    set of tokens T is defined as
  • T ltegt e ? e U d(x) x ? D
  • U lt/egt e ? e U sv, ev v ? V U
    eol.

lt!ELEMENT root (a)gt lt!ELEMENT a
(b)gt lt!ELEMENT b PCDATAgt
sv, ltrootgt(ltagt(ltbgtPCDATAlt/bgt)lt/agt)lt/rootgt,
ev, eol
12
XML Stream Machine Framework (cont.)
  • For querying streams, it is reasonable to assume
    acyclic DTDs
  • This implies that all valid XML streams have
    bounded depth and hence no stack is required to
    check well-formedness.
  • In the sequel, we only consider valid streams
    over acyclic DTDs.

13
XML Stream Machine Framework (cont.)
  • XQuery Example
  • XQE XQE1/Constant //Path
  • XQE1, XQE2 //Concatenation
  • ltTaggt XQE1 lt/Taggt //Element Creation
  • for Var in XQE1 //Generator
  • where Cond //optional Condition
  • return XQE2 //Body
  • Var

14
XML Stream Machine Framework (cont.)
  • Running example

E
for X in R/a return
F
for Y in X/b return
Q
L
G
ltresgt Y, X lt/resgt
H
15
XML Stream Machine Framework (cont.)
  • We say that the variable V is free in an XQUERY
    expression if it is not within the scope of a
    for V in.
  • We call input variable the variable that are free
    within the outermost XQuery Q, as they correspond
    to the input streams of the Q

16
XML Stream Machine Framework (cont.)
  • XML Stream Machines
  • XSM Buffers and Buffer Actions
  • In state transitions, XSMs can access and query
    buffer contents (via read operations such as
    p), and execute sequence of actions A
    A1,,Ak. An atomic action Ai can have the form
  • p advanced pointer p
  • w(p,c) at p, write c then advance p
  • w(p,r) at p write r, then advance p
  • pp set p to the position of p.

17
XML Stream Machine Framework (cont.)
  • XML Stream Machines (cont.)
  • XSM control
  • An XSM has a finite number of states Q, one of
    which is the distinguished initial state q0.
  • An XSM moves from the current state q to the next
    state q, provided there is a transition t ? T
  • t q fA q
  • whose condition f is satisfied. Before entering
    q, the action sequence A is executed.

18
XML Stream Machine Framework (cont.)
  • XSM control(cont.)
  • The transition condition f is a boolean
    combination over the following atomic
    expressions
  • p p, p ? p, p lt p (pointer
    comparison)
  • r c, r ? c, r r, r ? r (token
    comparison)

19
XML Stream Machine Framework (cont.)
  • XSM control(cont.)
  • Example Input (X,Y) output (w)

1
0
2
3
20
  • Introduction
  • XML Stream Machine Framework
  • Translation to XSM Networks
  • XSM Composition
  • Optimizations
  • Experimental Results

21
Translation to XSM Networks
  • The XSM Compiler translates XQueries into
    optimized XSMs.
  • The process is based on building buffers for
    subexpression results and variables, a basic
    XSM for each kind of XQuery subexpression, and
    appropriately connecting the buffers and XSMs.

22
Translation to XSM Networks (cont.)
  • XSM Networks
  • An XSM network is a directed acyclic graph (DAG)
    whose nodes are XSMs and whose labeled edges are
    of the form M1 B M2 indicating that the output
    buffer of M1 is the input buffer of M2.
  • We call that M1 is a producer, M2 is a consumer
    XSM.
  • An input stream I in I M
  • An output stream O M o out

23
Translation to XSM Networks (cont.)
  • XSM Networks (cont.)
  • Example

24
E
for X in R/a return
F
for Y in X/b return
Q
L
G
ltresgt Y, X lt/resgt
H
M(F) X/b
Y
X
X
Z
O
M(L) ForVars Y Y,X -gt Y, X
R
M(G) Y, X
M(H) ltresgtZlt/resgt
X
in
out
M(E) R/a
Y
25
Translation to XSM Networks (cont.)
  • Translation Algorithm
  • Associate each input variable I with a
    corresponding input buffer named I.
  • For every path, concatenation , and element
    creation (sub) expression Q (I.e., every
    subexpression other than a for expression or a
    single variable) we create a buffer named out(Q),
    which will store the output results of the
    subexpression.
  • Which buffers are created for a for expression
  • F for V in Q1 return Q2
  • Depends on the free variables in the body Q2 of
    F

26
Translation to XSM Networks (cont.)
  • Translation Algorithm (cont.)
  • For translating an XQuery into an XSM network, we
    use the following XSM templates
  • Path(inBuf, ChildTag, OutBuf)
  • Concat(Inbuf, InBuf2, OutBuf)
  • CreateEI(Inbuf, ElemTag, OutBuf)
  • ForVars(InVar, InVars,outVars)
  • The ChildTag and ElemTag parameters have to be
    instantiated with constants, InBuf, OutBuf, InVar
    with buffer names, and InVars, OutVars with list
    of buffer names.

27
Translation to XSM Networks (cont.)
  • Translation Algorithm (cont.)

r ?lt/agtw(x, r),r
r ltagtr
0
1
2
r srr
rltagtw(x,sx),w(x,ltagt),r
rlt/agtw(x,lt/agt),w(x,ex),r
r erw(x,eol),r
M(E)Path(R,a,X)
28
Translation to XSM Networks (cont.)
  • Translation Algorithm (cont.)
  • We are now ready to produce XSM networks that
    use the buffers described above.
  • For every subexpression Q of the given XQuery
  • If Q Var then
  • /skip for variables /
  • Else if QQ1/c then
  • produce Path(out(Q1), c, out(Q))
  • Else if QQ1,Q2 then
  • Concat(out(Q1), out(Q2), out(Q))
  • Else if QltegtQ1lt/egt then
  • createEl(out(Q1), e, out(Q))
  • Else if Qfor Var in Q1 return Q2 and free
    (Q2)\V?0 then
  • InVarsfree(Q2)
  • outVarsVV ? InVars
  • produce ForVars(V,InVars,outVars)

29
  • Introduction
  • XML Stream Machine Framework
  • Translation to XSM Networks
  • XSM Composition
  • Optimizations
  • Experimental Results

30
XSM Composition
  • In XSM networks, consecutive XSMs are linked via
    buffers. For example, consider two XSMs that are
    linked via a shared buffer Bs M1 Bs M2.
  • XSM composition allows us to replace M1 and M2
    with a single XSM M3 M1 M2.
  • The composition creates opportunities for
    elimination the need for the shared buffer Bs and
    for optimizing the composed XSM

31
XSM Composition (cont.)
  • For a state q, let readPtr(q) denote the set of
    read pointers on which any outgoing transition t
    q fA q depends.
  • scPtr(q) is the subset of readPtr(q) which point
    into the shared connection buffers Bs
  • scPtr(q) ptr(Bs) ? readPtr(q)

32
XSM Composition(cont.)
  • Basic XSM composition algorithm
  • Input
  • Producer XSM M1 (Q1, q01, B1, T1)
  • Consumer XSM M2 (Q2, q02, B2, T2)
  • Shared connection buffers Bs B1 ? B2
  • Output
  • Composed XSM M3 (Q3, q03, B3, T3)

33
XSM Composition(cont.)
  • Basic XSM composition algorithm
  • Begin
  • Q3 Q1 x Q2 q03 (q01,q02)
  • B3 B1 U B2 T3 0
  • for (q1 fA1 q1) ? T1, (q2 fA1 q2) ? T2 do
  • if scPtr(q2) 0 then
  • add(T3, (q1, q2))
  • else
  • ? ?r ? scPtr(q2) ? AE( r)
  • add(T3, (q1, q2) f ? ?2 A2 (q1, q2))
  • add(T3, (q1, q2) f ? ? ? A2 (q1, q2))
  • end

AE( r)(At-End r), which is runtime check
rwp comparing the position of r in Bs with the
position wp (writePtr(buffe(r )) from M1
34
  • Introduction
  • XML Stream Machine Framework
  • Translation to XSM Networks
  • XSM Composition
  • Optimizations
  • Experimental Results

35
Optimizations
  • The efficiency of the resulting XSM can be
    improved in several ways
  • Lockstep Optimization
  • We exploit the fact that the basic algorithm
    introduces runtime checks AE(p) which can be
    shown to be valid or unsatisfiable using a static
    analysis technique.
  • The basic idea is to statically analyze when the
    producer M1 and the consumer M2 operate in
    lockstep on the shared connection buffer.
  • i.e. when a read pointer r is trailing its
    associated write pointer wp by at most one
    position. In such case the optimized composition
    can eliminate AE checks.

36
Optimizations
  • Lockstep Optimization (cont.)
  • Input
  • Producer XSM M1 (Q1, q01, B1, T1)
  • Consumer XSM M2 (Q2, q02, B2, T2)
  • Shared connection buffers Bs B1 ? B2
  • Precondition
  • For all q2 ? Q2 scPtr(q2)?1
  • Output
  • Optimized composed XSM M3 (Q3, q03, B3, T3)
  • Initialization
  • Q3 Q1 x Q2 x go, no, ae
  • Q0 (q01, q02, no) B3 B1 U B2 T3 0

37
Optimizations
  • Schema-Based Optimization
  • If the XML schema of the input stream is know,
    further optimizations are possible

38
Optimizations
  • For example
  • consider XSM M(E) Path(R,a,X). If we know that
    on the input stream R only ltagt elements can
    appear, we could simplify the XSM further

r !lt/agtw(x, r),r
0
1
2
r
rltagtw(x,sx),w(x,ltagt),r
rlt/agtw(x,lt/agt),w(x,ex),r
r erw(x,eol),r
M(E)Path(R,a,X)
39
  • Introduction
  • XML Stream Machine Framework
  • Translation to XSM Networks
  • XSM Composition
  • Optimizations
  • Experimental Results

40
Experimental Results
  • The output of the XSM compiler is a C program
    which uses a SAX parser on the incoming XML
    stream
  • We measured the performance of our XSM-based
    query processing engine and compared it to
    several publicly available XSLT engines by
    running the query on the DBLP XML database, a
    popular online XML bibliography database used by
    many researchers.

41
Experimental Results
42
A Transducer-Based XML Query Processor
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com