Efficient Evaluation of XQuery over Streaming Data - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Efficient Evaluation of XQuery over Streaming Data

Description:

Construct data-flow graph for a query - Static analysis based on data-flow graph ... inference is required to construct the DFG. High-level Transformation ... – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 35
Provided by: xgli
Category:

less

Transcript and Presenter's Notes

Title: Efficient Evaluation of XQuery over Streaming Data


1
Efficient Evaluation of XQuery over Streaming Data
Xiaogang Li Gagan Agrawal The Ohio State
University
2
Motivation
  • Why Stream
  • Data needs to be analyzed at real time
  • - Stock Market, Climate, Network Traffic
  • Rapid improvements in networking technologies
  • Lack of disk space
  • - 101.13 Gbps at SC2004 Bandwidth
    Challenge
  • - Retrieval from local disk may be much
    slower than from remote site

3
Motivation
  • Why XML
  • - Standard data exchanging format for the
    Internet
  • - Widely adapted in web-based, distributed
    and grid computing
  • - Virtual XML is becoming popular
  • Why XQuery
  • - Widely accepted language for querying XML
  • - Declarative Easy to use
  • - Powerful Types, user-defined functions,
    binary expressions,

4
Current Work XQuery Over Streaming Data
  • XPath over Streaming Data
  • XPath is relatively simple
  • XQuery over Streaming Data
  • Limited features handled
  • Focus on queries that are written for single pass
    evaluation

5
Contributions
  • Can the given query be evaluated correctly on
    streaming data?
  • - Only a single pass is allowed
  • - Decision made by compiler, not a user
  • If not, can it be correctly transformed ?
  • How to generate efficient code for XQuery?
  • - Computations involved in streaming
    application are non-trivial
  • - Recursive functions are frequently used
  • - Efficient memory usage is important

6
Our Approach
  • For an arbitrary query, can it be evaluated
    correctly on streaming data?
  • - Construct data-flow graph for a query
  • - Static analysis based on data-flow graph
  • If not, can it be transformed to do so ?
  • - Query transformation techniques based on
    static analysis
  • How to generate efficient code for XQuery?
  • - Techniques based on static analysis to
    minimize memory usage and optimize code
  • - Generating imperative code
  • -- Recursive analysis and aggregation
    rewrite

7
Query Evaluation Model
  • Single input stream
  • Internal computations
  • - Limited memory
  • -Linked operators
  • Pipeline operator and Blocking operator

Op1
Op3
Op2
Op4
8
Pipeline and Blocking Operators
  • Pipeline Operator
  • - each input tuple produces an output tuple
    independently
  • - Selection, Increment etc
  • Blocking Operator
  • - Can only compute output after receiving all
    input tuples
  • - Sort, Join etc
  • Progressive Blocking Operator
  • (1)outputltltinput we can buffer the
    output
  • (2) Associative and commutative operation
    discard input
  • - count(), sum()

9
Single Pass?
  • Pixels with x and y
  • Q1
  • let i /pixel
  • sortby (x)
  • Q2
  • let i for p in /pixel
  • where p/x gt ..
  • x count(/pixel)
  • A blocking operator exists
  • A progressive blocking operator is referred by
    another pipeline operator or progressive operator

Check condition 2 in a query
10
Single-Pass? Challenges
Must Analyze data dependence at expression level
A Query may be complex Need a simplified view of
the query to make decision
11
Overall Framework
Data Flow Graph Construction
Single-Pass Analysis
Stream Code Generation
12
Roadmap
  • Stream Data Flow Graph
  • High-Level Transformations
  • - Horizontal Fusion
  • - Vertical Fusion
  • Single Pass Analysis
  • Low Level Optimization
  • Experiment and Conclusion

13
Stream Data Flow Graph (DFG)
  • Node represents variable
  • Explicit and implicit
  • Sequence and atomic
  • Edge dependence relation
  • v1-gtv2 if v2 uses v1
  • Aggregate dependence and flow dependence
  • A DFG is acyclic
  • Cardinality inference is required to construct
    the DFG

S1stream/pixelxgt0 S2stream/pixel V1 count()
14
High-level Transformation
  • Goals
  • 1 Enable single pass evaluation
  • 2 Simplify the SDFG and single-pass
    analysis
  • Horizontal Fusion and Vertical Fusion
  • - Based on SDFG

15
Horizontal Fusion
  • Enable single-pass evaluation
  • - Merge sequence node with common prefix

S1stream/pixelxgt0 S2stream/pixel/y V1
count() V2 sum()
S0/stream/pixel S1xgt0 S2 /y V1 count()
V2 sum()
16
Horizontal Fusion with nested loops
  • Perform loop unrolling first
  • Merge sequence node accordingly

17
Horizontal Fusion Side-effect
  • May resulted incorrect result due to
    inter-dependence

let b count(stream/pixelxgt0) for i in
stream/pixel return i/x idiv b
for i in stream/pixel return i/x idiv
count()
Partial result of count is used to compute
output Will be dealt with at single-pass
analysis
18
Vertical Fusion
  • Simplify DFG and single-pass analysis
  • - Merge a cluster of nodes linked by flow
    dependence edges

19
Roadmap
  • Stream Data Flow Graph
  • High-Level Transformations
  • - Horizontal Fusion
  • - Vertical Fusion
  • Single Pass Analysis
  • Low Level Optimization
  • Experiment and Conclusion

20
Single-pass Analysis
  • Can a query be evaluated on-the fly?
  • THEOREM 1. If a query with dependence graph
    G(V,E) contains more than one sequence node
    after vertical fusion, it can not be evaluated
    correctly in a single pass.
  • Reason Sequence node with infinite length can
    not be buffered

21
Single-pass Analysis- Continue
  • THEOREM 2. Let S be the set of atomic nodes that
    are aggregate dependent on any sequence node in a
    stream data flow graph. For any given two
    elements s1 and s2, if there is a path between s1
    and s2, the query may not be evaluated correctly
    in a single pass.
  • Reason A progressive blocking operator is
    referred by another progressive blocking operator
  • Example count (pixel)
  • where /xgt0.005sum(/pixel/x)

22
Single-pass Analysis - Continue
  • THEOREM 3. In there is a cycle in a stream data
    flow graph G, the corresponding query may not be
    evaluated correctly using a single pass.
  • Reason A progressive blocking operator is
    referred by a pipeline operator

23
Single-pass Analysis
  • Check conditions corresponding to Theorem 1 2 and
    3
  • -Stop further processing if any condition is
    true
  • Completeness of the analysis
  • - If a query without blocking operator pass
    the test, it can be evaluated in a single pass
  • THEOREM 4. If the results of a progressive
    blocking operator with an unbounded input are
    referred to by a pipeline operator or a
    progressive blocking operator with unbounded
    input, then for the stream data flow graph, at
    least one of the three conditions holds true

24
Conservative analysis
  • Our analysis is conservative
  • - A valid query may be labeled as cannot be
    evaluated in a single-pass
  • Example

25
A review of the procedure
Can not be evaluated in a single pass!!
26
Roadmap
  • Stream Data Flow Graph
  • High-Level Transformations
  • - Horizontal Fusion
  • - Vertical Fusion
  • Single Pass Analysis
  • Low Level Optimization
  • Experiment and Conclusion

27
Low-level Transformations
  • Use GNL as intermediate representation
  • - GNL is similar to nested loops in Java
  • - Enable efficient code generation for
    reductions
  • - Enable transformation of recursive function
    into iterative operation
  • From SDFG to GNL
  • - Generate a GNL for each sequence node
    associated with XPath expression
  • - Move aggregation into GNL using aggregation
    rewrite and recursion analysis

28
GNL Example
Facilitate code generation for any desired
platform
29
Low-Level Transformations
  • Recursive Analysis
  • - extract commutative and associative
    operations from recursive functions
  • Aggregation Rewirte
  • - perform function inlining
  • - transform built-in and user-defined
    aggregate into iterative operations

30
Code Generation
  • Using SAX XML stream parser
  • - XML document is parsed as stream of events
  • ltxgt 5 lt/xgt startelement ltxgt, content 5,
    endelement ltxgt
  • - Event-Driven Need to generate code to
    handle each event
  • Using Java JDK
  • -Our compiler generates Java source code

31
Code Generation Example
startElement Insert each referred element into
buffer endElement Process each element in the
buffer, dispatch the buffer
32
Roadmap
  • Stream Data Flow Graph
  • High-Level Transformations
  • - Horizontal Fusion
  • - Vertical Fusion
  • Single Pass Analysis
  • Low Level Optimization
  • Experimental Results
  • Conclusions

33
Experiment
  • Query Benchmark
  • - Selected Benchmarks from XMARK
  • - Satellite, Virtual Microscope, Frequent
    Item
  • Systems compared with
  • - Galax
  • - Saxon
  • - Qizx/Open

34
Performance XMARK Benchmark
gt25 faster on small dataset Scales well on
very large datasets
35
Performance Real Applications
gtOne order of magnitude faster on small dataset
Works well for very large datasets 10-20
performance gain with control-flow optimization
36
Conclusions
  • Provide a formal approach for query evaluation on
    streaming XML
  • - Query transformation to enable correct
    execution on stream
  • - Formal methods for single-pass analysis
  • - Strategies for efficient low-level code
    generation
  • - Experiment results show advantage over
    other well-known systems
Write a Comment
User Comments (0)
About PowerShow.com