Adaptive Query Processing (Background) - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Query Processing (Background)

Description:

Signal in the stream (Niagara) How to re-optimize. Reorder joins ... Niagara project at U. of Wisconsin. User doesn't have to specify the sources for a query ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 26
Provided by: defau179
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Query Processing (Background)


1
Adaptive Query Processing(Background)
  • Advisor Elke A. Rundensteiner
  • Luping Ding Brad Pielech

2
Contents
  • Motivation
  • Issues to consider when building adaptive query
    system
  • Category of adaptivity and related issues
  • Related work
  • Our initial ideas thus far (to be continued)

3
Motivation
  • New environment and applications
  • Internet and web-based query system
  • Sample applications
  • Network monitoring system
  • Financial applications stock trading,
  • Characteristics
  • Distributed, heterogeneous, autonomous data
    sources
  • Un-predictable, variable data volume and transfer
    rate

4
(No Transcript)
5
Motivation II
  • Requirements
  • Ability to process streaming data using
    non-blocking operators
  • Dynamic inter- and intra- operator scheduling to
    adapt to data transfer rate
  • Sharing and re-use of sub-plan across multiple
    queries
  • The ability to output partial/approximate results
    according to user preferences (discussed later)

6
Traditional vs. Adaptive
  • Ready data
  • One-time query
  • Blocking operators
  • Query optimization before execution
  • Exact answer
  • Streaming data
  • may be continuous query
  • Non-blocking operators
  • Query optimization before and during execution
  • Partial/approximate answer

7
Challenges and Possible Solutions
  • The data arrive at a very high speed
  • Sample data and compute approximate answer
  • Un-predictable change of data transfer rate due
    to sources drying up or network congestion
  • Interleave query execution and optimization to
    rework the query plan to minimize execution
    downtime
  • Blocking operators appear in query plan caused by
    GroupBy, OrderBy, and Join clauses
  • Implement non-blocking alternatives for blocking
    operators
  • Unbounded or huge data streams need unbounded or
    huge intermediate storage
  • Compute approximate answer
  • Switch between memory and disk

8
Contents
  • Motivation
  • Issues to consider when building adaptive system
  • Category of adaptivity and related issues
  • Related work

9
General Issues I
  • Decide granularity of stream data
  • Each token
  • Individual Element
  • Decided by XPath specified by query

10
for b in document(bib.xml")/bib/book return
ltresultgt b/title b/author
lt/resultgt
  • ltbibgt
  • ltbook year"1994"gt
  • lttitlegtTCP/IP Illustratedlt/titlegt
  • ltauthorgtW. Stevenslt/authorgt
  • ltpricegt 65.95lt/pricegt
  • lt/bookgt
  • ltbook year"2000"gt
  • lttitlegtData on the Weblt/titlegt
  • ltauthorgtSerge Abiteboullt/authorgt
  • ltauthorgtPeter Bunemanlt/authorgt
  • ltauthorgtDan Suciult/authorgt
  • ltpricegt 39.95lt/pricegt
  • lt/bookgt
  • ltbibgt

11
General Issues II
  • Give order-sensitive result
  • Assign unique ID for each data unit (sequence
    number or timestamp)
  • Each algebra node keeps order of the data
  • Each algebra node doesnt keep order, but the top
    node do sorting

12
General Issues III
  • Generate approximate results
  • Answers to aggregate queries may change based on
    new tuples and thus the results are approximate
  • Generate partial results
  • New tuples will not change the validity of
    existing results
  • Both require non-blocking operator
    implementations to provide the answer so far

13
General Issues IV
  • Compute statistics
  • Data arrive speed
  • Selectivity of operator
  • Execution cost of operator
  • Introduce control message for synchronization
  • Within algebra node
  • Along with data stream

P P
14
General Issues V
  • Design mechanisms for query plan re-optimization
  • When to re-optimize
  • Action-event rule (Tukwila)
  • Signal in the stream (Niagara)
  • How to re-optimize
  • Reorder joins based on statistics
  • Possibly find other sources to obtain data from
    slow sources

15
Contents
  • Motivation
  • Issues to consider when building adaptive system
  • Category of adaptivity and related issues
  • Related work
  • Our Initial Ideas Thus Far (to be continued)

16
Categories of Adaptively
  • An adaptive system can be adaptive on many
    different levels including
  • Batch adapt query plans after X unit of time
  • Per query adapt after every query
  • Inter-operator adapt after several operators
  • Intra-operator adapt within an operator
  • Per tuple adapt after one or more tuples

17
Per Query Adaptivity Illustration
Adapt after every query has been executed
  • Sharing execution of common sub expressions
    between similar queries
  • Reusing of optimized sub-plans

18
Inter-Operator Adaptivity Illustration
Adapt after one or more operators have been
executed
  • Modify query execution plans on-the-fly when
    delays are encountered during runtime
  • Operator scheduling for CPU and memory
    allocation
  • Alternative source selecting

19
Intra-Operator Adaptivity Illustration
T
Adapt during the execution of one operator
J
J
?
N
S
  • Change execution of one operator to another
    semantically correct implementation
  • Input stream scheduling

20
Per Tuple Adaptivity Illustration
T
Adapt some operators execution on a tuple by
tuple basis
J
J
  • Each tuple can be routed to a different join in
    the query plan so that each join is busy at all
    times
  • Uses timestamp to keep track of which tuples
    have run through which joins

Tuple Router
21
Contents
  • Motivation
  • Issues to consider when building adaptive system
  • Category of adaptivity and related issues
  • Related work

22
Related Work
  • Tukwila project at U. of Washington
  • Pure XML AQP through the integration of query
    planning and execution
  • Optimizes for time-to-first tuple first, then for
    the whole result later
  • Dynamic scheduling of operators to adjust to I/O
    delays and flow rates
  • Breaks query into execution groups or fragments
    and can re-optimize plan after each group has
    been executed
  • Uses event-condition-action rules to determine if
    re-optimization should take place

23
Related Work II
  • Havasu project at Arizona State U.
  • User preference driven query optimization
  • Niagara project at U. of Wisconsin
  • User doesnt have to specify the sources for a
    query
  • Allows user to give me results so far even in
    the presence of aggregation operators
  • MIX system at San Diego State
  • Information integration system using XML as the
    intermediate data model
  • Lazy navigation into the result controlled by the
    user
  • Doesnt adapt query plan during execution

24
Related Work III
  • Aurora project at Brown/MIT/Brandeis
  • Telegraph project at UC Berkeley
  • Stream project at Stanford Univ.

25
To be continued
Write a Comment
User Comments (0)
About PowerShow.com