Adaptive Ordering of Pipelined Stream Filters - PowerPoint PPT Presentation

About This Presentation

Title:

Adaptive Ordering of Pipelined Stream Filters

Description:

Challenge faced by the profiler: n2n-1 conditional selectivities for n filters ... Only need to maintain two diagonals of the view ... Two-phase join algorithm ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 21

Provided by: josh86

Learn more at: http://web.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive Ordering of Pipelined Stream Filters

1
Adaptive Ordering of Pipelined Stream Filters

Babu, Motwani, Munagala, Nishizawa, and Widom
SIGMOD 2004 Jun 13-18, 2004
presented by
Joshua Lee
Mingzhu Wei
Some slides are adapted from the presentation
slides in sigmod2004

2
Presentation Overview

Motivation
A-GREEDY Algorithm
INDEPENDENT Algorithm
SWEEP Algorithm
LOCALSWAPS Algorithm
Adaptive Ordering of MultiwayJoins Over Streams
Conclusion

3
Motivation

Many queries on stream data involve processing
commutative sets of filters
Each filter is a stateless predicate that
indicates whether a tuple should be further
processed
A tuple is output only if it satisfies all
filters
Applying filters in different orders can affect
the processing time of the query
The changing nature of stream data requires
adaptive algorithms for filter ordering

4
Motivation Example

Consider the following example usage of a stream
processing system
A network traffic monitoring application built on
a stream engine is used to monitor the amount of
common traffic flowing through four routers, R1,
R2, R3, and R4 over the last ten minutes.

5
Motivation Filter Ordering

If the filters are dependent, then the order of
probing affects processing time
For instance, it could be known that most traffic
through R1 comes from R3 or R4, but rarely from
R2
For incoming data on R1, probing W2 first would
drop a tuple quickest, thus saving the time of
probing W3 and W4

6
Motivation Filter Ordering Challenges

Filter selectivities may be correlated
The candidate filter orderings to consider for N
filters is N!, which increases quickly
Attributes of stream data and arrival
characteristics can change over time

7
Motivation Adaptive Filter Ordering Challenges

Run-time overhead incurred for continuous
monitoring of statistics
Convergence properties required to prove an
adaptive algorithm generates an approximation to
correct answer
Speed of adaptivity in the face of changing data
characteristics
There is a three-way tradeoff Algorithms that
adapt quickly and have good convergence
properties incur a lot of run-time overhead

8
A-GREEDY Algorithm

The cost of a query plan ordering is
ti is the cost of processing one tuple in filter
i
d(jj-1) is the conditional probability that
filter j will drop a tuple, given that no filter
from 1 to j-1 dropped it

9
A-GREEDY Algorithm Static Version and Greedy
Invariant

Static greedy algorithm
Choose the filter with highest unconditional drop
probability d(i0) as the first filter
Choose the filter with the highest conditional
drop probability d(j1) as the next filter
Choose the filter with highest conditional drop
probability d(k2) as the next filter
And so on
Greedy Invariant occurs when a filter ordering
satisfies the following

10
A-GREEDY Algorithm Profiler

Two logical components of A-Greedy
profiler continuously collects and maintains
statistics about filter selectivities and
processing costs
Reoptimizer detects and corrects violations of
the GI in the current filter ordering
Challenge faced by the profiler n2n-1
conditional selectivities for n filters
Solution profile of recently dropped tuples

11
A-GREEDY Algorithm Profiler

Profile a sliding window of profile tuples
A profile tuple contains n boolean attributes
b1,,bn corresponding to n filters
Dropped tuples are sampled with some probability
p
If a tuple e is chosen for profiling, it will be
tested by all remaining filters
A new profile tuple e bi 1 if Fi drop e,
otherwise bi0

12
A-GREEDY Algorithm Reoptimizer

The reoptimizer maintains an ordering O such that
O satisfies the GI
How does the reoptimizer use profile to derive
estimates of conditional selectivities?
It incrementally maintains a view over the
profile window

13
A-GREEDY Algorithm Reoptimizer

Matrix View over the profile window nn
Vi, j number of tuples in the profile window
that were dropped by Ff(j) but not dropped by Ff
(1) Ff (2),, F f (i-1)

Vi, j is proportional to d (ji-1)
14
A-GREEDY Algorithm Run-time Overhead and
Adaptivity

Profile-tuple creation needs additional n-I
evaluations for a tuple dropped by Ff(i) .
Profile-window maintenance insertion and
deletion of profile tuples, averages of filter
processing times
Matrix-view update every update would cause
access to up to n2/4 entries
Detection and correction of GI violations
Good convergence properties
Fast adaptivity

15
Variants of A Greedy

Can we sacrifice some of A-Greedys convergence
properties or adaptivity speed to reduce its
run-time overhead?
Three variants of A greedy are proposed

16
SWEEP Algorithm

Proceeds in stages
During one stage, only checks for GI violations
involving the filter at one specific position j
Do not need to maintain entire matrix view bf
(1) ,,b f (j)
For each profiled tuple, needs to additionally
evaluate F f (j) only
By rotating j over 2,,n, eventually detects and
corrects all GI violations
Same convergence properties as A-Greedy
Reduced view and need for additional evaluations
less overhead
slower adaptivity only 1 filter is profiled in
each stage

17
INDEPENDENT Algorithm

Assumes filters are independent we only need to
maintain estimates of unconditional selectivities
Lower view maintenance overhead
Fast adaptivity
Convergence if assumption holds, optimal.
Otherwise, can be O(n) times worse than GI
orderings

18
LOCALSWAP Algorithm

Monitors local violations only, i.e.,
violations involving adjacent filters
LocalSwaps detects situations where a swap
between adjacent filters
Only need to maintain two diagonals of the view
For each profiled tuple dropped by Ff(i), only
needs to additionally evaluate F f(i1)

x x
x x
x x
x
19
Adaptive Ordering of MultiwayJoins Over Streams

MJoins maintain an ordering of S0, S1,,
Sn-1-Si for each stream Si
New tuples arriving from Si is joined with other
stream windows in that order
Two-phase join algorithm
Drop-probing phase the new tuple is used to
probe all other windows in the specified order.
If any window drops it, no further processing
will be needed for it
output-generation phase if no window drops the
tuple, proceeds as conventional MJoins
Drop probing resembles pipelined filters
A-Greedy and its variants can be used to
determine the orderings

20
Conclusion

A-Greedy has good convergence properties and fast
adaptivity, but incurs significant run-time
overhead
Three variants of A-Greedy are proposed, each
lying at a different points along the tradeoff
spectrum among convergence, runtime overhead, and
speed of adaptivity

Write a Comment

User Comments (0)