Title: Adaptive Query Processing (Background)
1Adaptive Query Processing(Background)
- Advisor Elke A. Rundensteiner
- Luping Ding Brad Pielech
2Contents
- Motivation
- Issues to consider when building adaptive query
system - Category of adaptivity and related issues
- Related work
- Our initial ideas thus far (to be continued)
3Motivation
- New environment and applications
- Internet and web-based query system
- Sample applications
- Network monitoring system
- Financial applications stock trading,
- Characteristics
- Distributed, heterogeneous, autonomous data
sources - Un-predictable, variable data volume and transfer
rate
4(No Transcript)
5Motivation II
- Requirements
- Ability to process streaming data using
non-blocking operators - Dynamic inter- and intra- operator scheduling to
adapt to data transfer rate - Sharing and re-use of sub-plan across multiple
queries - The ability to output partial/approximate results
according to user preferences (discussed later)
6Traditional vs. Adaptive
- Ready data
- One-time query
- Blocking operators
- Query optimization before execution
- Exact answer
- Streaming data
- may be continuous query
- Non-blocking operators
- Query optimization before and during execution
- Partial/approximate answer
7Challenges and Possible Solutions
- The data arrive at a very high speed
- Sample data and compute approximate answer
- Un-predictable change of data transfer rate due
to sources drying up or network congestion - Interleave query execution and optimization to
rework the query plan to minimize execution
downtime - Blocking operators appear in query plan caused by
GroupBy, OrderBy, and Join clauses - Implement non-blocking alternatives for blocking
operators - Unbounded or huge data streams need unbounded or
huge intermediate storage - Compute approximate answer
- Switch between memory and disk
8Contents
- Motivation
- Issues to consider when building adaptive system
- Category of adaptivity and related issues
- Related work
9General Issues I
- Decide granularity of stream data
- Each token
- Individual Element
- Decided by XPath specified by query
10for b in document(bib.xml")/bib/book return
ltresultgt b/title b/author
lt/resultgt
- ltbibgt
- ltbook year"1994"gt
- lttitlegtTCP/IP Illustratedlt/titlegt
- ltauthorgtW. Stevenslt/authorgt
- ltpricegt 65.95lt/pricegt
- lt/bookgt
- ltbook year"2000"gt
- lttitlegtData on the Weblt/titlegt
- ltauthorgtSerge Abiteboullt/authorgt
- ltauthorgtPeter Bunemanlt/authorgt
- ltauthorgtDan Suciult/authorgt
- ltpricegt 39.95lt/pricegt
- lt/bookgt
- ltbibgt
11General Issues II
- Give order-sensitive result
- Assign unique ID for each data unit (sequence
number or timestamp) - Each algebra node keeps order of the data
- Each algebra node doesnt keep order, but the top
node do sorting
12General Issues III
- Generate approximate results
- Answers to aggregate queries may change based on
new tuples and thus the results are approximate - Generate partial results
- New tuples will not change the validity of
existing results - Both require non-blocking operator
implementations to provide the answer so far
13General Issues IV
- Compute statistics
- Data arrive speed
- Selectivity of operator
- Execution cost of operator
- Introduce control message for synchronization
- Within algebra node
- Along with data stream
P P
14General Issues V
- Design mechanisms for query plan re-optimization
- When to re-optimize
- Action-event rule (Tukwila)
- Signal in the stream (Niagara)
- How to re-optimize
- Reorder joins based on statistics
- Possibly find other sources to obtain data from
slow sources
15Contents
- Motivation
- Issues to consider when building adaptive system
- Category of adaptivity and related issues
- Related work
- Our Initial Ideas Thus Far (to be continued)
16Categories of Adaptively
- An adaptive system can be adaptive on many
different levels including - Batch adapt query plans after X unit of time
- Per query adapt after every query
- Inter-operator adapt after several operators
- Intra-operator adapt within an operator
- Per tuple adapt after one or more tuples
-
17Per Query Adaptivity Illustration
Adapt after every query has been executed
- Sharing execution of common sub expressions
between similar queries - Reusing of optimized sub-plans
18Inter-Operator Adaptivity Illustration
Adapt after one or more operators have been
executed
- Modify query execution plans on-the-fly when
delays are encountered during runtime - Operator scheduling for CPU and memory
allocation - Alternative source selecting
19Intra-Operator Adaptivity Illustration
T
Adapt during the execution of one operator
J
J
?
N
S
- Change execution of one operator to another
semantically correct implementation - Input stream scheduling
20Per Tuple Adaptivity Illustration
T
Adapt some operators execution on a tuple by
tuple basis
J
J
- Each tuple can be routed to a different join in
the query plan so that each join is busy at all
times - Uses timestamp to keep track of which tuples
have run through which joins
Tuple Router
21Contents
- Motivation
- Issues to consider when building adaptive system
- Category of adaptivity and related issues
- Related work
22Related Work
- Tukwila project at U. of Washington
- Pure XML AQP through the integration of query
planning and execution - Optimizes for time-to-first tuple first, then for
the whole result later - Dynamic scheduling of operators to adjust to I/O
delays and flow rates - Breaks query into execution groups or fragments
and can re-optimize plan after each group has
been executed - Uses event-condition-action rules to determine if
re-optimization should take place
23Related Work II
- Havasu project at Arizona State U.
- User preference driven query optimization
- Niagara project at U. of Wisconsin
- User doesnt have to specify the sources for a
query - Allows user to give me results so far even in
the presence of aggregation operators - MIX system at San Diego State
- Information integration system using XML as the
intermediate data model - Lazy navigation into the result controlled by the
user - Doesnt adapt query plan during execution
24Related Work III
- Aurora project at Brown/MIT/Brandeis
- Telegraph project at UC Berkeley
- Stream project at Stanford Univ.
25To be continued