Title: Stefanie Scherzinger Universitt Passau
1DBPL 2003 Attribute Grammars for Scalable Query
Processing on XML Streams
Christoph KochUniversity of Edinburgh
- Stefanie ScherzingerUniversität Passau
2Querying XML
- XPath//bookyear2003/title
-
- Node selecting/boolean queries, no data
transformations - Buffers necessary
- XML Query
- ltbooksgt
- for x in input()//book
- where x/year2003
- return
- ltbookgt
- x/title
- ltauthorsgt
- x/author
- lt/authorsgt
- lt/bookgt
-
- lt/booksgt
- Buffers necessary
3Requirements for Scalable Query Processing on XML
Streams
- Evaluation in linear time in size of the input
- One linear forward scan of the data
- Bounded memory consumption
- independent of the length of the stream
- depending on depth of document
4XML-DPDT
- DPDA with output
- Rejects malformed documents
- Restricted stack discipline
- push on seeing opening tag lttgt
- pop on seeing closing tag lt/tgt
- ? Size of stack bounded by maximum depth of the
incoming document tree
?
5Ease of Use?
- DPDA T (Q,?,?, ?,q0,Z0)
- ?(q0,ltbibgt, Z0) (bib0, (q0,bib))
- ?(bib0,ltbookgt, X) (book0, (bib1,book))
- ?(book0,ltyeargt, X) (year0, (book1,year))
- ?(year0,lt/yeargt, (book1,year)) (book1,?)
- ?(book1,lttitlegt, X) (title0, (book2,title))
- ?(title0,lt/titlegt, (book2,title)) (book2,?)
- ?(book2,ltauthorgt, X) (author0, (book3,author))
- ?(book3,ltauthorgt, X) (author0, (book4,author))
- ?(book4,ltauthorgt, X) (author0, (book4,author))
- ?(book4,lt/bookgt, (bib1,book)) (bib1, ?)
- ?(book3,lt/bookgt, (bib1,book)) (bib1, ?)
- ?(author0,lt/authorgt, (book3,author)) (book3,?)
- ?(author0,lt/authorgt, (book4,author)) (book4,?)
- . .
6Our Aim
- Query formalism which
- meets requirements for scalable stream
processing, i.e. has expressive power of
XML-DPDTs. - is natural and easy to use.
- does not allow specification of queries that
cannot be evaluated scalably. - Our solution XSAGs
7XML Stream Attribute Grammars (XSAGs)
- Query language for XML streams
- Data transformations
- Scalable evaluation
8Extended Regular Tree Grammars
Grammar G (Nt,T,P,bib) Nonterminals Nt
bib,book,year,title,author Terminals T
bib,book,year,title,author,PCDATA
bib bib( book ) book book(
year.title.author.author ) year year(
PCDATA ) title title( PCDATA ) author
author( PCDATA )
? L(G)
9Basic XSAGs (bXSAGs)
- Basic XSAG based onTDLL(1) Grammar
- Attribution functions
- n t(?)
- n fI t(?)
- n t(?) fII
- n fI t(?) fII
- Regular expression ?(?) is one-unambiguous
- book1 book( ? )
- book2 book( ? )
- ?( ( book1 ? book2 ) ) ( book ? book )?
- ?( book1 . book2 ) book.book ?
- ?( book1 . book2 ) book. book ?
- can be parsed with a lookahead of one symbol
- all DTDs are TDLL(1)!
10Example Rename Root Node
bib printltbooksgt bib( book )
printlt/booksgt book ECHO book(
year.title.author.author ) year year(
PCDATA ) title title( PCDATA ) author
author( PCDATA )
printltbooksgt
printlt/booksgt
ECHO
11Propagation of Attributions
XSAG attribute
Stack
12Propagation of Attributions
out1 fI(in1)
bib
Stack
13Propagation of Attributions
book
bib
Stack
out1 fI(in1)
14Propagation of Attributions
15Propagation of Attributions
book
bib
Stack
out2 fII(in1,in2)
16Propagation of Attributions
out2 fII(in1,in2)
bib
Stack
17Propagation of Attributions
result Output
Stack
18Grouping Sibling Nodes
no ltauthorgt anymore? print lt/authorsgt
first ltauthorgt seen? print ltauthorsgt
19bXSAG Grouping Sibling Nodes
bib printltbooksgt bib( book )
printlt/booksgt book ECHO book(
year.title.author.author ) printlt/authorsgt
out1.flagoff year
year( PCDATA ) title title( PCDATA
) author if ( in1.flagoff ) author(
PCDATA ) then begin
out1.flagon
print ltauthorsgt end
20yXSAG Grouping Sibling Nodes
attribution functions within regular expression!
bib printltbooksgt bib( book )
printlt/booksgt book ECHO book(
year.title. ( printltauthorsgt
(author.author) printlt/authorsgt ) )
year year( PCDATA ) title title(
PCDATA ) author author( PCDATA )
21Parse Tree for yXSAGs
printltbooksgt
printlt/booksgt
ECHO
printltauthorsgt
printlt/authorsgt
22easy XSAG Grammars
- Easy XSAG based on STDLL(1) Grammar
- Attribution functions
- n t(?)
- n fI t(?)
- n t(?) fII
- n fI t(?) fII
- attributed regular expression ?
- ?(?) is strongly one-unambiguous
- editor ? author ?
- editor.editor ? author ?
-
- can be parsed with a lookahead of one symbol
- only one way to derive empty word ?
23Conditional Output (yXSAG)Boolean Function
MATCH_CHILDREN
bib printltbooksgt bib( book )
printlt/booksgt book book( (
MATCH_CHILDREN(2003,c) year ).
( if ( in1.ctrue )
then begin printltbookgt
ECHO end (title.
author.author) ) ) if (in2.ctrue)
then print lt/bookgt year year( PCDATA )
title title( PCDATA ) author
author( PCDATA )
24Possible Queries Dependon the Underlying Grammar
cannot select books on year
bib bib( book ) book book(
title.author.author. year ) year year(
PCDATA ) title title( PCDATA ) author
author( PCDATA )
25bXSAG vs. yXSAG
- Grammar contained (DTD) ?TDLL(1)? bXSAG
STDLL(1)?yXSAG - User-friendly queries
- ExpressivenessXML-DPDTs, basic XSAGs, easy
XSAGsshare the same expressive power - Efficiency
- space Stack of size O( depth(Stream) )
- time O(f(XSAG) Stream ), f(XSAG) is
O(2attributes)or O( XSAG2 Stream ?
XSAG ) for bXSAG O( XSAG3 Stream ?
XSAG ) for yXSAG
26Conclusion Future Work
- XSAGs meet the requirements for scalable XML
query processing - XSAGs have a well-justified foundation
- XSAGs are user-friendly
- underlying grammar guides the user
- macros for typical tasks
- common queries can be quickly and easily stated
- Current Statusprototype implementation
-
- Future Work
- Java-code in attribution functions
- Process XML Query with XSAGs.
27The END.
28Related Work
- Queries on XML Streams
- L. Fegaras, D. Levine, S. Bose, and V.
Chaluvadi. Query Processing of Streamed XML
Data. CIKM, 2002. - T. J. Green, G. Miklau, M. Onizuka, and D.
Suciu. Processing XML Streams with Deterministic
Automata. ICDT03, 2003. - B. Ludäscher, P. Mukhopadhyay, and Y.
Papakonstantinou. A Transducer-Based XML Query
Processor. VLDB02, 2002. - D. Olteanu, T. Kiesling, and F. Bry. An
Evaluation of Regular Path Expressions with
Qualifiers against XML Streams. ICDE, 2003.
Poster Session. - One-unambiguous Regular Languages
- Brüggemann-Klein and D. Wood. One-Unambiguous
Regular Languages. Information and Computation,
1998.
- XML and Attribute Grammars
- M. Benedikt, C.Y. Chang, W. Fan, J. Freire, and
R. Rastogi. Capturing both Types and Constraints
in Data Integration. SIGMOD03, 2003. - M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S.
Zhen, and A. Zhou. DTD-Directed Publishing with
Attribute Translation Grammars. VLDB02, 2002. - F. Neven and J. van de Bussche. Expressiveness
of Structured Document Query Languages Based on
Attribute Grammars. JACM, Jan. 2002. - TDLL(1) Grammars
- D. Lee, M. Mani, and M. Murata. Reasoning about
XML Schema Languages using Formal Language
Theory. Technical Report RJ 10197 Log 95071, IBM
Research, Nov. 2000.