Title: PowerPointPrsentation
1Schema-based Scheduling of Event Processors and
Buffer Minimization for Queries on Structured
Data Streams FluXQuery An Optimizing XQuery
Processorfor Streaming XML Data
Stefanie Scherzinger joint work with Christoph
Koch, Nicole Schweikardt, and Bernhard Stegmaier
2XML Streams
-
- ?Very large XML documents.
- ?Schema information provided with the data.
- ?Main-memory based applications.
3Queries on XML Streams
- 1. Boolean or node-selecting queries XPath ?
state-of-the-art techniques use little memory - 2. Transformations XQuery, XSLT ? excessive
memory consumption
4Classical XQuery Evaluation
Bibliography DTD lt!ELEMENT bib (book)gt lt!ELEMENT
book (titleauthorprice)gt
List title(s) and authors of books ltresultsgt
for b in /bib/book return ltresultgt
b/title b/author lt/resultgt lt/resultsgt
?buffer titles and authors!
Example
Buffer ltauthorgtKemperlt/authorgt lttitlegtDatenbanks
ystemelt/titlegt ltauthorgtEicklerlt/authorgt
ltbookgt ltauthorgtKemperlt/authorgt
lttitlegtDatenbanksystemelt/titlegt
ltauthorgtEicklerlt/authorgt ltpricegt40lt/pricegt
lt/bookgt
Output ltresultgt lttitlegtDatenbanksystemelt/titlegt
ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/authorgt
lt/resultgt
5The FluXQuery-Approach (1)
Bibliography DTD lt!ELEMENT bib (book)gt lt!ELEMENT
book (titleauthorprice)gt
FluX query (for book node) ltresultgt
process-stream b on title as t return t
on-first past (title,author) return
for a in b/author return a
lt/resultgt
List title(s) and authors of books ltresultsgt
for b in /bib/book return ltresultgt
b/title b/author lt/resultgt lt/resultsgt
- Less buffering than inconventional evaluation
Example
Buffer ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/
authorgt
ltbookgt ltauthorgtKemperlt/authorgt
lttitlegtDatenbanksystemelt/titlegt
ltauthorgtEicklerlt/authorgt ltpricegt40lt/pricegt
lt/bookgt
Output ltresultgt lttitlegtDatenbanksystemelt/titlegt
ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/authorgt
lt/resultgt
6The FluXQuery-Approach (2)
Bibliography DTD lt!ELEMENT bib (book)gt lt!ELEMENT
book ((titleauthor),price)gt
FluX query (for book node) ltresultgt
process-stream b on title as t return t
on-first past (title,author) return
for a in b/author return a
lt/resultgt
List title(s) and authors of books ltresultsgt
for b in /bib/book return ltresultgt
b/title b/author lt/resultgt lt/resultsgt
Example
Buffer ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/
authorgt
ltbookgt ltauthorgtKemperlt/authorgt
lttitlegtDatenbanksystemelt/titlegt
ltauthorgtEicklerlt/authorgt ltpricegt40lt/pricegt
lt/bookgt
Output ltresultgt lttitlegtDatenbanksystemelt/titlegt
ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/authorgt
lt/resultgt
7The FluXQuery-Approach (3)
Bibliography DTD lt!ELEMENT bib (book)gt lt!ELEMENT
book (title,author,price)gt
FluX query ltresultgt process-stream b on
title as t return t on author as a return
a lt/resultgt
List title(s) and authors of books ltresultsgt
for b in /bib/book return ltresultgt
b/title b/author lt/resultgt lt/resultsgt
? No buffering!
Example
Buffer
ltbookgt lttitlegtDatenbanksystemelt/titlegt
ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/authorgt
ltpricegt40lt/pricegt lt/bookgt
Output ltresultgt lttitlegtDatenbanksystemelt/titlegt
ltauthorgtKemperlt/authorgt ltauthorgtEicklerlt/authorgt
lt/resultgt
8Whats next?
- The XQuery Fragment
- FluX Query Language
- Translating XQuery into FluX
- Experiments
9XQuery-, an XQuery Fragment
- Contains...
- arbitrarily nested for-loops,
- where-conditions,
- if-statements,
- joins
- Does not contain...
- and // in paths
- aggregation
- let-constructs
10Simple XQuery- Expressions
- XQuery- expression is simple ?Can be executed
without buffering the stream
Example 1
ltagt x lt/agtif x/b 5 then ltbgt5lt/bgt
simple
Example 2
x x
not simple
11FluX Query Language
- FluX expressions
- simple XQuery- expression
- string process-stream y H string
- Event handlers H
- on-first past(S) return a
- a XQuery- expression
- S set of symbols
- on a as x return Q
- a symbol name
- x variable
- Q FluX expression
a executed on buffers
Q executed in event-based fashion
12Example
FluX query (for book node) ltresultgt
process-stream b on title as t return t
on-first past (title,author) return
for a in b/author return a
lt/resultgt
13Safe FluX Queries
- FluX query is safe ? No XQuery- expression
refers to elements that may still be encountered
in the stream
Bibliography DTD lt!ELEMENT bib (book)gt lt!ELEMENT
book ((titleauthor), price)gt
FluX query ltresultgt process-stream b on
title as t return t on-first past
(title,author) return for p in
b/price return p lt/resultgt
Data stream ltbookgt ltauthorgtKemperlt/authorgt
lttitlegtDatenbanksystemelt/titlegt
ltauthorgtEicklerlt/authorgt ltpricegt40lt/pricegt
lt/bookgt
execute
Not safe!
14Safe FluX Queries
- FluX query is safe ? No XQuery- expression
refers to elements that may still be encountered
in the stream
Bibliography DTD lt!ELEMENT bib (book)gt lt!ELEMENT
book ((titleauthor), price)gt
FluX query ltresultgt process-stream b on
title as t return t on-first past
(title,author, price)
return for p in b/price return
p lt/resultgt
Data stream ltbookgt ltauthorgtKemperlt/authorgt
lttitlegtDatenbanksystemelt/titlegt
ltauthorgtEicklerlt/authorgt ltpricegt40lt/pricegt
lt/bookgt
execute
Safe!
15XQuery to FluX
- Normalize XQuery- Q into Q
- Rewrite norm. XQuery- Q to FluX query F using
order constraints from DTD - F is safe w.r.t. DTD
- F is equivalent to Q
- F has low memory consumption
16Experiments
- Based on XMark
- Queries adapted to XQuery- fragment
- Environment
- AMD Athlon XP 2000, 512MB RAM
- Linux, Sun JDK 1.4.2_03
- Measurements
- Execution time
- Memory consumption
17Experiments with XMark
18Intermediary Summary
- FluXQuery engine supports
- powerful fragment of XQuery
- arbitrarily nested for-loops
- and joins
- event-based query processing
- conscious handling of main-memory buffers
- algebraic optimization based on schema
information
19for b in /book return b/publisher/name
b/publisher/address
lt!ELEMENT bib (book)gt lt!ELEMENT book
(title,author,publisher)gt lt!ELEMENT publisher
(name, address)gt
no buffering necessary!
20for b in /book return b/publisher/name
b/publisher/address
normalize
for b in /book return for p in
b/publisher return for n in p/name
return n for p
in b/publisher return for a in
p/address return a
loop twice over all publishers,ergo buffer!
21Algebraic Optimization
- ?in translation, exploit order constraints
- ?can also exploit cardinality constraints
- e.g. merging for-loops
22for b in /book return b/publisher/name
b/publisher/address
1. normalize
2. algebraic optimization
for b in /book return for p in
b/publisher return for n in p/name
return n for a
in p/address return a
for b in /book return for p in
b/publisher return for n in p/name
return n for q
in b/publisher return for a in
p/address return a
no need to buffer
23Future Work
- Increase XQuery fragment
- Add aggregation,
- Add - and //-paths ?allow recursive DTDs
- Extend algebraic optimizations
- Optimize backend of query engine
24Related Work
- Altinel, Franklin. Efficient Filtering of XML
Documents for Selective Dissemination of
Information. VLDB 2000 - Buneman, Grohe, Koch. Path Queries on Compressed
XML. VLDB 2003 - Chan, Felber, Garofalakis, Rastogi. Efficient
Filtering of XML Documents with XPath
Expressions. ICDE 2002 - Deutsch, Tannen. Reformulation of XML Queries
and Constraints. ICDT 2003 - Fegaras, Levine, Bose, Chaluvadi. Query
Processing on Streamed XML Data. CIKM 2002 - Green, Miklau, Onizuka, Suciu. Processing XML
Streams with Deterministic Automata. ICDT 2003 - Gupta, Suciu. Stream Processing of XPath Queries
with Predicates. SIGMOD 2003 - Ludäscher, Mukhopadhyay, Papakonstantinou. A
Transducer-Based XML Query Processor. VLDB 2002 - Marian, Siméon. Projecting XML Documents. VLDB
2003 - Olteanu, Kiesling, Bry. An Evaluation of Regular
Path Expressions with Qualifiers against XML
Streams. ICDE 2003
25lt/thanksgt