Title: Adaptive Ordering of Pipelined Stream Filters
1Adaptive Ordering of Pipelined Stream Filters
- Babu, Motwani, Munagala, Nishizawa, and Widom
- SIGMOD 2004 Jun 13-18, 2004
- presented by
- Joshua Lee
- Mingzhu Wei
- Some slides are adapted from the presentation
slides in sigmod2004
2Presentation Overview
- Motivation
- A-GREEDY Algorithm
- INDEPENDENT Algorithm
- SWEEP Algorithm
- LOCALSWAPS Algorithm
- Adaptive Ordering of MultiwayJoins Over Streams
- Conclusion
3Motivation
- Many queries on stream data involve processing
commutative sets of filters - Each filter is a stateless predicate that
indicates whether a tuple should be further
processed - A tuple is output only if it satisfies all
filters - Applying filters in different orders can affect
the processing time of the query - The changing nature of stream data requires
adaptive algorithms for filter ordering
4Motivation Example
- Consider the following example usage of a stream
processing system - A network traffic monitoring application built on
a stream engine is used to monitor the amount of
common traffic flowing through four routers, R1,
R2, R3, and R4 over the last ten minutes.
5Motivation Filter Ordering
- If the filters are dependent, then the order of
probing affects processing time - For instance, it could be known that most traffic
through R1 comes from R3 or R4, but rarely from
R2 - For incoming data on R1, probing W2 first would
drop a tuple quickest, thus saving the time of
probing W3 and W4
6Motivation Filter Ordering Challenges
- Filter selectivities may be correlated
- The candidate filter orderings to consider for N
filters is N!, which increases quickly - Attributes of stream data and arrival
characteristics can change over time
7Motivation Adaptive Filter Ordering Challenges
- Run-time overhead incurred for continuous
monitoring of statistics - Convergence properties required to prove an
adaptive algorithm generates an approximation to
correct answer - Speed of adaptivity in the face of changing data
characteristics - There is a three-way tradeoff Algorithms that
adapt quickly and have good convergence
properties incur a lot of run-time overhead
8A-GREEDY Algorithm
- The cost of a query plan ordering is
- ti is the cost of processing one tuple in filter
i - d(jj-1) is the conditional probability that
filter j will drop a tuple, given that no filter
from 1 to j-1 dropped it
9A-GREEDY Algorithm Static Version and Greedy
Invariant
- Static greedy algorithm
- Choose the filter with highest unconditional drop
probability d(i0) as the first filter - Choose the filter with the highest conditional
drop probability d(j1) as the next filter - Choose the filter with highest conditional drop
probability d(k2) as the next filter - And so on
- Greedy Invariant occurs when a filter ordering
satisfies the following
10A-GREEDY Algorithm Profiler
- Two logical components of A-Greedy
- profiler continuously collects and maintains
statistics about filter selectivities and
processing costs - Reoptimizer detects and corrects violations of
the GI in the current filter ordering -
- Challenge faced by the profiler n2n-1
conditional selectivities for n filters - Solution profile of recently dropped tuples
11A-GREEDY Algorithm Profiler
- Profile a sliding window of profile tuples
- A profile tuple contains n boolean attributes
b1,,bn corresponding to n filters - Dropped tuples are sampled with some probability
p - If a tuple e is chosen for profiling, it will be
tested by all remaining filters - A new profile tuple e bi 1 if Fi drop e,
otherwise bi0
12A-GREEDY Algorithm Reoptimizer
- The reoptimizer maintains an ordering O such that
O satisfies the GI - How does the reoptimizer use profile to derive
estimates of conditional selectivities? - It incrementally maintains a view over the
profile window
13A-GREEDY Algorithm Reoptimizer
- Matrix View over the profile window nn
- Vi, j number of tuples in the profile window
that were dropped by Ff(j) but not dropped by Ff
(1) Ff (2),, F f (i-1)
Vi, j is proportional to d (ji-1)
14A-GREEDY Algorithm Run-time Overhead and
Adaptivity
- Profile-tuple creation needs additional n-I
evaluations for a tuple dropped by Ff(i) . - Profile-window maintenance insertion and
deletion of profile tuples, averages of filter
processing times - Matrix-view update every update would cause
access to up to n2/4 entries - Detection and correction of GI violations
- Good convergence properties
- Fast adaptivity
15Variants of A Greedy
- Can we sacrifice some of A-Greedys convergence
properties or adaptivity speed to reduce its
run-time overhead? - Three variants of A greedy are proposed
16SWEEP Algorithm
- Proceeds in stages
- During one stage, only checks for GI violations
involving the filter at one specific position j - Do not need to maintain entire matrix view bf
(1) ,,b f (j) - For each profiled tuple, needs to additionally
evaluate F f (j) only - By rotating j over 2,,n, eventually detects and
corrects all GI violations - Same convergence properties as A-Greedy
- Reduced view and need for additional evaluations
- less overhead
- slower adaptivity only 1 filter is profiled in
each stage
17INDEPENDENT Algorithm
- Assumes filters are independent we only need to
maintain estimates of unconditional selectivities - Lower view maintenance overhead
- Fast adaptivity
- Convergence if assumption holds, optimal.
Otherwise, can be O(n) times worse than GI
orderings
18LOCALSWAP Algorithm
- Monitors local violations only, i.e.,
violations involving adjacent filters - LocalSwaps detects situations where a swap
between adjacent filters - Only need to maintain two diagonals of the view
- For each profiled tuple dropped by Ff(i), only
needs to additionally evaluate F f(i1)
x x
x x
x x
x
19Adaptive Ordering of MultiwayJoins Over Streams
- MJoins maintain an ordering of S0, S1,,
Sn-1-Si for each stream Si - New tuples arriving from Si is joined with other
stream windows in that order - Two-phase join algorithm
- Drop-probing phase the new tuple is used to
probe all other windows in the specified order.
If any window drops it, no further processing
will be needed for it - output-generation phase if no window drops the
tuple, proceeds as conventional MJoins - Drop probing resembles pipelined filters
- A-Greedy and its variants can be used to
determine the orderings
20Conclusion
- A-Greedy has good convergence properties and fast
adaptivity, but incurs significant run-time
overhead - Three variants of A-Greedy are proposed, each
lying at a different points along the tradeoff
spectrum among convergence, runtime overhead, and
speed of adaptivity