Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams


1
Static Optimization of Conjunctive Queries with
Sliding Windows over Infinite Streams
Ahmed M.Ayad and Jeffrey F.Naughton Database
Group University of Wisconsin
  • Presented by Andy Mason and Sheng Zhong

Material is partially referenced from SIGMOD 2004
1
2
Overview
  • Introduction
  • Semantics of Sliding Window Continuous Queries
  • Cost Model
  • Load Shedding
  • Optimization Framework
  • Experiments

3
Introduction
  • The intent of the paper
  • Find a execution plan that minimizes resource
    usage when resources are sufficient
  • Find an execution plan that sheds tuples when
    resources are insufficient.
  • Given a continuous query in a steady state, each
    execution plan is similar to a Queuing Network
    System
  • Arriving tuples are clients
  • Query operators are servers
  • Execution plan is feasible if the system is
    stable
  • If the plan is infeasible, load shedding is
    needed

4
Feasible and Infeasible Query Plan
0.50.25lt1
10.25gt1
Load Shedding
5
Assumptions
  • The time stamps are unique (no ties)
  • Tuples arrive in the stream in a monotonically
    increasing order by its time stamp (no out of
    order arrival)
  • There is no relational tables involved in the
    query

Discussion Why will make these assumptions?
Static optimization gt Rates of input streams are
slow changing Enough memory to hold the buffering
requirements for any query plan
6
Semantics
  • Definitions
  • Data Stream
  • Time-based Window
  • Tuple-based Window
  • Selection
  • A filter takes a stream as input and outputs a
    stream
  • Join
  • A symmetric operator that takes two input streams

The cost model
7
Variables
8
Rate and Window Calculations
  • 1 Select output rate
  • 2 Active window size
  • 3 output rate of window join
  • 4 Active size of window join
  • 5 output rate of n-ary join
  • of n streams
  • 6 Active window size
  • of n-ary join

9
Cost Model
  • An concrete example on the application of the
    cost model
  • SELECT A.a, B.b, C.c
  • FFROM A ROWS 10
  • B ROWS 10
  • C ROWS 10
  • WHERE A.a B.a
  • AND B.b C.b

10
Cost Model Plans
11
Outcome after Load Shedding
12
Load Shedding
  • A form of approximation which reduces load by
    dropping tuples from the incoming streams
  • Methods of Load Shedding
  • Random dropping of tuples ? Presented in this
    paper
  • Achieved by inserting random drop boxes at
    several points in the query plan
  • Semantic dropping of tuples
  • Goal Maximize output rate of the approximated
    query
  • Problems addressed
  • Optimal placement of drop boxes in an execution
    plan and the optimal setting of their sampling
    rate
  • Choice of plan to shed load from

13
Selection Only Queries
  • Initial condition
  • A query consisting of n consecutive filters
  • An execution plan for it that orders the filters
    in asc order by a designated number
  • n1 possible combinations
  • Observation Only need to drop tuples directly
    from the streaming source before they are
    processed by any of the filters
  • Conclusion The plan with the lowest cost yields
    the highest rate

14
Join Queries
  • Only consider tuple-based windows
  • Shedding Load From a Specific Plan
  • Choice of Plan for Load Shedding

15
Shedding Load from a Specific Plan
  • Where do we put the drop boxes?
  • Query plan joining n streams
  • Binary joins
  • Drop box can be put before each of the two inputs
    to the n - 1 join operators
  • Plus a box right after the last join is performed
  • 2n - 1 possible locations

Obs Sufficient to drop tuples from the input
sources before they are processed by any join
operator
16
Choice of Load Shedding Plan
  • Intuition for Selection queries
  • Pick plan with lowest resource utilization
  • Join queries
  • Plan with lowest resource utilization?
  • This intuition does not always work
  • Why?

17
Load Shedding Plan Example
  • Plans shed load in the order of their average
    utilization
  • Switch-over occurs 4.5 milliseconds (plan
    bbest)

18
Observations from Example
  • The plan with the lowest utilization is not
    always the best choice for shedding load
  • When the join cost is 14 milliseconds, the
    throughput of the best plan is more than twice
    the throughput of the lowest utilization plan
  • Lowest utilization plan could be the worst choice
  • Conclusion Load shedding must be integrated in
    the optimization process

19
Optimization Framework
  • Two areas
  • Throughput of the plan
  • Utilization cost of the plan
  • Feasible queries
  • Goal Minimize cost of the plan
  • Where throughput is fixed at its maximum value
    for all feasible queries
  • Infeasible queries
  • Goal Maximize throughput of the plan
  • Where cost is fixed at its maximum value for all
    p
  • Assumption
  • Search space of alternative plans always equipped
    with drop boxes
  • All plans in the search space will be feasible
  • Problem can be treated as unconstrained

20
Optimization Goal
  • Maximize
  • R(p) plan throughput/plan cost
  • Simplest optimization algorithm
  • Generate the set of all plans of the query
  • For each plan in the set
  • Compute cost of the plan
  • If cost gt 1, insert drop boxes
  • Compute R
  • Return the plan that maximizes R(p)

21
Heuristic Optimizer
  • Based on the original System R optimizer
  • Builds the plan from the bottom-up by storing the
    best plans for successively larger subsets of the
    input streams
  • Computing the best plan for any subset
  • Test whether this subplan is feasible
  • If infeasible, tune the values of the drop boxes
    placed at its input streams using load shedding
    alg

22
Computing the best subset plan
  • Test whether this subplan is feasible
  • If infeasible, tune the values of the drop boxes
    placed at its input streams using load shedding
    alg
  • Store subplan
  • At any stage
  • If a drop box is placed in front of a stream
    which had another one from a previous round, the
    two are combined into one drop box whose
    selectivity is the product of the original two

23
Experiment Setup
  • 1000 random continuous queries
  • Each query reps join of five input streaming
    sources A, B, C, D, E
  • Window sizes and join selectivities fixed
  • Rates were randomly picked from 10 to 1000
    tuples/sec

24
Need for Reoptimization
25
Average Gain in Throughput over using the Lowest
Utilization Plan
At very low resources, the gain is very
significant (almost 8 folds at the 1 mark)
26
Average and Maximum Gain
27
Heuristic Optimizer
Except at very low resources, the performance of
the heuristic optimizer is quite impressive
28
Summary
  • Presented framework for static optimization of
    sliding window conjunctive queries over infinite
    streams
  • Cost Model
  • Load Shedding
  • Load shedding must be integrated in the
    optimization process!
  • Optimization Framework
  • Experimental Results

29
References
  • 1 http//web.cs.wpi.edu/cs525/f06s-EAR/cs525-ho
    mepage_files/LITERATURE/SIGMOD04-opt-shed-wisconsi
    n.pdf
  • 2 http//se.uwaterloo.ca/tozsu/courses/cs856/F0
    5/Presentations/Week8/Stream_Maryam.pdf
Write a Comment
User Comments (0)
About PowerShow.com