Adaptive Ordering of Pipelined Stream Filters - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Ordering of Pipelined Stream Filters

Description:

Challenge faced by the profiler: n2n-1 conditional selectivities for n filters ... Only need to maintain two diagonals of the view ... Two-phase join algorithm ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 21
Provided by: josh86
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Ordering of Pipelined Stream Filters


1
Adaptive Ordering of Pipelined Stream Filters
  • Babu, Motwani, Munagala, Nishizawa, and Widom
  • SIGMOD 2004 Jun 13-18, 2004
  • presented by
  • Joshua Lee
  • Mingzhu Wei
  • Some slides are adapted from the presentation
    slides in sigmod2004

2
Presentation Overview
  • Motivation
  • A-GREEDY Algorithm
  • INDEPENDENT Algorithm
  • SWEEP Algorithm
  • LOCALSWAPS Algorithm
  • Adaptive Ordering of MultiwayJoins Over Streams
  • Conclusion

3
Motivation
  • Many queries on stream data involve processing
    commutative sets of filters
  • Each filter is a stateless predicate that
    indicates whether a tuple should be further
    processed
  • A tuple is output only if it satisfies all
    filters
  • Applying filters in different orders can affect
    the processing time of the query
  • The changing nature of stream data requires
    adaptive algorithms for filter ordering

4
Motivation Example
  • Consider the following example usage of a stream
    processing system
  • A network traffic monitoring application built on
    a stream engine is used to monitor the amount of
    common traffic flowing through four routers, R1,
    R2, R3, and R4 over the last ten minutes.

5
Motivation Filter Ordering
  • If the filters are dependent, then the order of
    probing affects processing time
  • For instance, it could be known that most traffic
    through R1 comes from R3 or R4, but rarely from
    R2
  • For incoming data on R1, probing W2 first would
    drop a tuple quickest, thus saving the time of
    probing W3 and W4

6
Motivation Filter Ordering Challenges
  • Filter selectivities may be correlated
  • The candidate filter orderings to consider for N
    filters is N!, which increases quickly
  • Attributes of stream data and arrival
    characteristics can change over time

7
Motivation Adaptive Filter Ordering Challenges
  • Run-time overhead incurred for continuous
    monitoring of statistics
  • Convergence properties required to prove an
    adaptive algorithm generates an approximation to
    correct answer
  • Speed of adaptivity in the face of changing data
    characteristics
  • There is a three-way tradeoff Algorithms that
    adapt quickly and have good convergence
    properties incur a lot of run-time overhead

8
A-GREEDY Algorithm
  • The cost of a query plan ordering is
  • ti is the cost of processing one tuple in filter
    i
  • d(jj-1) is the conditional probability that
    filter j will drop a tuple, given that no filter
    from 1 to j-1 dropped it

9
A-GREEDY Algorithm Static Version and Greedy
Invariant
  • Static greedy algorithm
  • Choose the filter with highest unconditional drop
    probability d(i0) as the first filter
  • Choose the filter with the highest conditional
    drop probability d(j1) as the next filter
  • Choose the filter with highest conditional drop
    probability d(k2) as the next filter
  • And so on
  • Greedy Invariant occurs when a filter ordering
    satisfies the following

10
A-GREEDY Algorithm Profiler
  • Two logical components of A-Greedy
  • profiler continuously collects and maintains
    statistics about filter selectivities and
    processing costs
  • Reoptimizer detects and corrects violations of
    the GI in the current filter ordering
  • Challenge faced by the profiler n2n-1
    conditional selectivities for n filters
  • Solution profile of recently dropped tuples

11
A-GREEDY Algorithm Profiler
  • Profile a sliding window of profile tuples
  • A profile tuple contains n boolean attributes
    b1,,bn corresponding to n filters
  • Dropped tuples are sampled with some probability
    p
  • If a tuple e is chosen for profiling, it will be
    tested by all remaining filters
  • A new profile tuple e bi 1 if Fi drop e,
    otherwise bi0

12
A-GREEDY Algorithm Reoptimizer
  • The reoptimizer maintains an ordering O such that
    O satisfies the GI
  • How does the reoptimizer use profile to derive
    estimates of conditional selectivities?
  • It incrementally maintains a view over the
    profile window

13
A-GREEDY Algorithm Reoptimizer
  • Matrix View over the profile window nn
  • Vi, j number of tuples in the profile window
    that were dropped by Ff(j) but not dropped by Ff
    (1) Ff (2),, F f (i-1)

Vi, j is proportional to d (ji-1)
14
A-GREEDY Algorithm Run-time Overhead and
Adaptivity
  • Profile-tuple creation needs additional n-I
    evaluations for a tuple dropped by Ff(i) .
  • Profile-window maintenance insertion and
    deletion of profile tuples, averages of filter
    processing times
  • Matrix-view update every update would cause
    access to up to n2/4 entries
  • Detection and correction of GI violations
  • Good convergence properties
  • Fast adaptivity

15
Variants of A Greedy
  • Can we sacrifice some of A-Greedys convergence
    properties or adaptivity speed to reduce its
    run-time overhead?
  • Three variants of A greedy are proposed

16
SWEEP Algorithm
  • Proceeds in stages
  • During one stage, only checks for GI violations
    involving the filter at one specific position j
  • Do not need to maintain entire matrix view bf
    (1) ,,b f (j)
  • For each profiled tuple, needs to additionally
    evaluate F f (j) only
  • By rotating j over 2,,n, eventually detects and
    corrects all GI violations
  • Same convergence properties as A-Greedy
  • Reduced view and need for additional evaluations
  • less overhead
  • slower adaptivity only 1 filter is profiled in
    each stage

17
INDEPENDENT Algorithm
  • Assumes filters are independent we only need to
    maintain estimates of unconditional selectivities
  • Lower view maintenance overhead
  • Fast adaptivity
  • Convergence if assumption holds, optimal.
    Otherwise, can be O(n) times worse than GI
    orderings

18
LOCALSWAP Algorithm
  • Monitors local violations only, i.e.,
    violations involving adjacent filters
  • LocalSwaps detects situations where a swap
    between adjacent filters
  • Only need to maintain two diagonals of the view
  • For each profiled tuple dropped by Ff(i), only
    needs to additionally evaluate F f(i1)

x x
x x
x x
x
19
Adaptive Ordering of MultiwayJoins Over Streams
  • MJoins maintain an ordering of S0, S1,,
    Sn-1-Si for each stream Si
  • New tuples arriving from Si is joined with other
    stream windows in that order
  • Two-phase join algorithm
  • Drop-probing phase the new tuple is used to
    probe all other windows in the specified order.
    If any window drops it, no further processing
    will be needed for it
  • output-generation phase if no window drops the
    tuple, proceeds as conventional MJoins
  • Drop probing resembles pipelined filters
  • A-Greedy and its variants can be used to
    determine the orderings

20
Conclusion
  • A-Greedy has good convergence properties and fast
    adaptivity, but incurs significant run-time
    overhead
  • Three variants of A-Greedy are proposed, each
    lying at a different points along the tradeoff
    spectrum among convergence, runtime overhead, and
    speed of adaptivity
Write a Comment
User Comments (0)
About PowerShow.com