Dynamic Plan Migration for Continuous Query over Data Streams - PowerPoint PPT Presentation

About This Presentation
Title:

Dynamic Plan Migration for Continuous Query over Data Streams

Description:

Moving State Strategy. Basic idea. Share common states between two migration boxes. Key steps ... Moving. Create new pointers for matched states in new box ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 29
Provided by: webC
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Plan Migration for Continuous Query over Data Streams


1
Dynamic Plan Migration for Continuous Query over
Data Streams
  • Yali Zhu, Elke Rundensteiner and George Heineman
  • Database System Research Group, WPI.
  • Massachusetts, USA
  • SIGMOD2004

Research partly supported by the RDC grant
2003-04 on On-line Stream Monitoring Systems
Untethered Healthcare, Intrusion Detection, and
Beyond.
2
Stream Query Optimization
  • Differences with Traditional Query Optimization?

3
Stream Query Optimization
  • New classes of operators (windows) may mean new
    rewrites
  • New execution modes (continous/pipelining)
  • More dynamic fluctuations in statistics ? compile
    time optimization not possible
  • Global optimization not practical as huge query
    networks ? Adaptive optimization.
  • Other cost models taking memory into account
  • Query optimization and load shedding

4
Motivation of Query Migration
  • Continuous query over streams
  • Statistics unknown before start
  • Statistics changing during execution
  • Stream rates, arrival pattern, distribution, etc
  • Need for dynamic adaptation
  • Plan re-optimization
  • Change the shape of query plan tree

5
Run-time Plan Re-Optimization
  • Step 1 - Decide when to optimize
  • Statistics Monitoring
  • Step 2 Generate new query plan
  • Query Optimization
  • Step 3 Replace current plan by new plan
  • Plan Migration

6
Naïve Plan Migration Strategy
BC
AB
AB
BC
A
A
B
B
C
C
  • Migration Steps
  • Pause execution of old plan
  • Drain out all tuples inside old plan
  • Replace old plan by new plan
  • Resume execution of new plan

Problem Works for stateless operators only
7
Stateful Operator in CQ
  • Why stateful
  • Need non-blocking operators in CQ
  • Operator needs to output partial results
  • State data structure keep received tuples

Example Symmetric NL join w/ window constraints
ax
b2
ax
b3
State A
State B
Key Observation The purge of tuples in states
relies on processing of new tuples.
AB
b1
b2
b3
b4
b5
ax
A
B
ax
8
Naïve Migration Strategy Revisited
BC
AB
Deadlock Waiting Problem
A
B
C
(2) All tuples drained
  • Steps
  • (1) Pause execution of old plan
  • (2) Drain out all tuples inside old plan
  • (3) Replace old plan by new plan
  • (4) Resume execution of new plan

(3) Old Replaced By new
(4) Processing Resumed
9
Problem Definition
  • Dynamic Plan Migration
  • Input (two migration boxes)
  • One contains old plan
  • One contains new plan
  • Have same input and output queues
  • Result
  • Old box is replaced by new box
  • Valid Migration
  • No missing tuples
  • No duplicates
  • Key points
  • - Involved plans contain stateful operators
  • Need to migrate yet still retain useful states
  • and discard useless states.

10
State of the Art
  • Efficient mid-query re-optimization of
    sub-optimal query execution plans
  • Kabra, DeWitt 1998
  • Only migrates unprocessed portion
  • Query plan competing model
  • Ioannidis, Ng, et. al. 1992 Graefe, Cole.
    1994
  • Generate several candidate query plans before
    start
  • Execute all, choose one after a while

11
Outline
  • Problem Motivation and Definition
  • Dynamic Migration Strategies
  • Moving State Strategy
  • Parallel Track Strategy
  • Experimental Results

12
Moving State Strategy
  • Basic idea
  • Share common states between two migration boxes
  • Key steps
  • State Matching
  • Match states based on IDs.
  • State Moving
  • Create new pointers for matched states in new box
  • Whats left?
  • Unmatched states in new box

QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
Old Box
New Box
13
Unmatched States
QABCD
  • State Recomputing
  • Recursively recompute unmatched SBC and SBCD from
    bottom up
  • Why always possible?
  • Old and new boxes have same input queues
  • The states associated with input queues always
    match
  • Why necessary?

AB
SA
SBCD
CD
SBC
SD
BC
SB
SC
QA
QB
QC
QD
14
Terms on Tuples
QABCD
SABC
SD
CD
SAB
  • New/Old tuples
  • Old tuples already in old box
  • when migration starts
  • New tuples not exist in old box
  • when migration starts
  • Sub-tuples
  • Tuple ABCD is result of
  • Tuple A, B, C and D are sub-tuples of tuple ABCD
  • Tuple ABCD has 2416 possible combinations of
    old/new sub-tuples

SC
BC
SA
SB
AB
QA
QB
QC
QD
15
Why Recompute Unmatched States
  • To get the complete results of ABCD, we need all
    16 old/new combinations

SA
SBCD
AB
SD
SBC
CD
SB
SC
BC
If SBC not recomputed, will miss results with
both B and C as OLD
QC
QD
QA
QB
B
C
D
A
B
C
D
A
B
C
D
A
Old Tuple
New Tuple
16
Cost Estimation of MS Migration
  • Cost of MS consists of
  • Cost of state matching
  • ID comparison (neglectable)
  • Cost of state moving
  • Create pointers (neglectable)
  • Cost of state recomputing
  • Majority of cost
  • Affecting parameters
  • Operator selectivities
  • of tuples in states
  • Estimated as (input rate x window size)
  • See paper for detailed cost models

One cost model conclusion Cost of MS has
polynomial relation to window size
17
MS Migration Pros and Cons
  • Pros
  • Fast when of tuples in states is small
  • Low input rates, low selectivity or small window
  • Cons
  • Output silence during entire migration stage
  • Can query output even during migration?
  • Motivation for Parallel Track Strategy

18
Parallel Track Strategy
  • Basic idea
  • Execute both plans in parallel and gradually
    push old tuples out of old box by purging
  • Key steps
  • Connect boxes
  • Execute in parallel
  • Until old box expired (no old tuple or
    sub-tuple)
  • Disconnect old box
  • Start execute new box only

QABCD
QABCD
SABC
SD
SBCD
SA
CD
AB
SBC
SAB
SD
SC
BC
CD
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QD
QA
QB
QC
19
Potential Duplicates
Duplicate Prevention
  • Tuple ABCD
  • 2416 possible old/new sub-tuple combinations
  • Same case not generated by both boxes
  • Otherwise we may have duplicates
  • In new box
  • all states start empty
  • only generates ABCD as (new,new,new,new)
  • In old box
  • may generate all 16 cases
  • duplicate the case of (new,new,new,new)

At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
QABCD
SABC
SD
CD
SC
SAB
BC
Other op in old box Proceed as normal
SA
SB
AB
QD
QA
QC
QB
20
Estimation of PT Migration
T
Old
Old
Old
Old
W
TM-start
Old Box
1st W
New
New
SABC
SD
CD
2nd W
New
New
TM-end
SC
SAB
BC
Estimation Formula
SA
SB
AB
TPT 2W
QA
QB
QC
QD
21
PT Migration Duration
  • Given enough system computing resources
  • new tuples processed right away
  • PT migration duration 2W
  • If not enough system resources
  • New tuples accumulated in queues
  • PT migration duration gt 2W

22
Cost Estimation of PT Migration
  • Cost of PT
  • cost of process 2W tuples in old box
  • cost of process 2W tuples in new box
  • Parameters
  • Input rates, window size, selectivity
  • Similar to MS strategy

23
PT Migrations Pros and Cons
  • Pros
  • Keep on producing results even during migration
  • no results during MS migration
  • Cons
  • Migration duration is at least 2W
  • MS may be faster depending on tuples in states

24
Outline
  • Problem Definition and Motivation
  • Dynamic Migration Strategies
  • Moving State Strategy
  • Parallel Track Strategy
  • Experimental Results

25
Experimental Setup
  • Embed in the CAPE system
  • CAPE Continuous Adaptive Processing Engine
  • A streaming query engine developed at DSRG, WPI
  • VLDB04 demo
  • Layers of Adaptations
  • Punctuation exploring
  • Adaptive scheduling
  • Query migration
  • Dynamic distribution
  • Input Streams
  • By stream generator of CAPE
  • Poisson arrival pattern
  • Experiments on migration duration
  • Vary window size

26
Migration Duration vs. Window Size
27
Conclusions
  • Identify problem of migration for stateful
    operators
  • First solutions for continuous query migration
  • Moving state strategy
  • Parallel track strategy
  • Embed both strategies into stream system
  • Cost model and experimental evaluation
  • Cost model confirmed by experiments
  • Identify performance trade-off of the two
    strategies

28
Thank You
  • For more information, check the CAPE website _at_
  • http//davis.wpi.edu/dsrg/CAPE/
Write a Comment
User Comments (0)
About PowerShow.com