Title: DAX:%20Dynamically%20Adaptive%20Distributed%20System%20for%20Processing%20CompleX%20Continuous%20Queries
1DAX Dynamically Adaptive Distributed System for
Processing CompleX Continuous Queries
- Bin Liu, Yali Zhu, Mariana Jbantova, Brad
Momberger, - and Elke A. Rundensteiner
- Department of Computer Science, Worcester
Polytechnic Institute - 100 Institute Road, Worcester, MA 01609
- Tel 1-508-831-5857, Fax 1-508-831-5776
- binliu, yaliz, jbantova, bmombe,
rundenst_at_cs.wpi.edu - VLDB05 Demonstration
- http//davis.wpi.edu/dsrg/CAPE/index.html
2Uncertainties in Stream Query Processing
Register Continuous Queries
Receive Answers
High workload of queries
Real-time and accurate responses required
Distributed Stream Query Engine
Streaming Data
Streaming Result
May have time-varying rates and high-volumes
Available resources for executing each operator
may vary over time.
Memory- and CPU resource limitations
Distribution and Adaptations are required.
3Adaptation in Distributed Stream Processing
- Adaptation Techniques
- Spilling data to disk
- Relocating work to other machines
- Reoptimizing and migrating query plan
- Granularity of Adaptation
- Operator-level distribution and adaptation
- Partition-level distribution and adaptation
- Integrated Methodologies
- Consider trade-offs between spill vs redistribute
- Consider trade-offs between migrate vs
redistribute
4System Overview LZ05, TLJ05
Query Processor
Distribution Manager
Local Plan Migrator
Connection Manager
Local Statistics Gatherer
Local Adaptation Controller
Global Plan Migrator
CAPE-Continuous Query Processing Engine
Runtime Monitor
Query Plan Manager
Repository
Data Distributor
Data Receiver
Global Adaptation Controller
Repository
Streaming Data
Network
End User
Application Server
Stream Generator
5Motivating Example
- Scalable Real-Time Data Processing Systems
Real Time Data Integration Server
Stock Price, Volumes,...
...
...
Reviews, External Reports, News, ...
Complex queries such as multi-joins are common!
- To Produce As Many Results As Possible at
Run-Time - (i.e., 900am-400pm)
- Main memory based processing
- To Require Complete Query Results
- (i.e., for offline analysis after 400pm or
whenever possible) - Load shedding not acceptable, must temporarily
spill to disk
6Random Distribution
Balanced Network Aware Distribution
Goal To minimize network connectivity. Algorithm
Takes each query plan and creates sub-plans
where neighbouring operators are grouped together.
Goal To equalize workload per machine. Algorithm
Iteratively takes each query operator and places
it on the query processor with the least number
of operators.
7Initial Distribution Process
M1
M2
Step 1
Step 2
- Step 1 Create distribution table using initial
distribution algorithm. - Step 2 Send distribution information to
processing machines (nodes).
8Operator-level Adaptation - Redistribution
Cost per machine is determined as percentage of
memory filled with tuples.
Statistics Table
Cost Table (current)
Cost Table (desired)
Balance
M Capacity 4500 tuples
Distribution Table
- Capes cost models number of tuples in memory
and network output rate. - Operators redistributed based on redistribution
policy. - Redistribution policies of Cape Balance and
Degradation.
9Redistribution Protocol Moving Operators Across
Machines
10Query Plan Performance with Query Plan of 40
Operators.
- Observations
- Initial distribution is important for query plan
performance. - Redistribution improves at run-time query plan
performance.
11Operator-level Adaptation Dynamic Plan Migration
- The last step of plan re-optimization After
optimizer generates a new query plan, how to
replace currently running plan by the new plan on
the fly? - A new challenge in streaming system because of
stateful operators. - A unique feature of the DAX system.
- But can we just take out the old plan and plug in
the new plan?
- Steps
- (1) Pause execution of old plan
- (2) Drain out all tuples inside old plan
- (3) Replace old plan by new plan
- (4) Resume execution of new plan
Deadlock Waiting Problem
BC
(2) All tuples drained
AB
(3) Old Replaced By new
A
B
C
(4) Processing Resumed
Key Observation Purge of tuples in states
relies on processing of new tuples.
12Migration Strategy - Moving State
Migration Requirements No missing results and
no duplicates Two migration boxes One contains
old sub-plan, one contains new sub-plan. Two
sub-plans semantically equivalent, with same
input and output queues Migration is abstracted
as replacing old box by new box.
- Basic idea - Share common states between two
migration boxes - Key Steps
- Drain Tuples in Old Box
- State Matching
- State in old box has unique ID. During
rewriting, new ID given to newly generated state
in new box. When rewriting done, match states
based on IDs. - State Moving
- between matched states
- Whats left?
- Unmatched states in new box
- Unmatched states in old box
QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
New Box
Old Box
13Moving StateUnmatched States
Unmatched New States (Recomputation)Recursively
recompute unmatched states from bottom
up.Unmatched Old States (Execution
Synchronization)First clean accumulated tuples
in box input queues, it is then safe to discard
these unmatched states.
A B C
Old Old Old
Old Old New
Old New Old
New Old Old
Old New New
New Old New
New New Old
New Old New
New New New
t
W 2
A
a1
a2
B
b3
b1
b2
C
c1
c2
c3
14Distributed DynamicMigration Protocols (I)
Distribution Table
Distribution Manager
op1
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
Migration Start
op2
op3
op4
M1
M2
op1
op1
2
...
1
2
1
op2
op2
3
4
3
4
op3
op4
op3
op4
Migration Stage Execution Synchronization
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
(1) Request SyncTime
(1) Request SyncTime
op3
op4
op3
op4
(3) Global SyncTime
(3) Global SyncTime
(2) Local Synctime
(4) Execution Synced
M1
M2
op1
2
1
op1
2
...
1
op2
3
4
op2
3
4
op3
op4
op3
op4
15Distributed Dynamic Migration Protocols (II)
Migration Stage Change Plan Shape
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
op3
op4
op3
op4
(5) Send New SubQueryPlan
(5) Send New SubQueryPlan
(6) PlanChanged
op2
op2
3
5
3
5
op1
op1
4
2
4
2
M1
M2
op2
3
5
op2
3
5
...
op1
op1
4
2
4
2
op3
op4
op3
op4
16Distributed Dynamic Migration Protocols (III)
Migration Stage Fill States and Reactivate
Operators
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
op3
op4
op3
op4
(7) Fill States 3, 5
(7) Fill States 2, 4
(9) Reconnect Operators
(9) Reconnet operators
(8) States Filled
(11) Activate op2
(11) Active op 1
(10) Operator Reconnected
M2
(7.1) Request state 4
M1
op2
op2
3
5
3
5
(7.2) Move state 4
op1
op1
4
2
4
2
(7.3) Request state 2
op3
op4
op3
op4
(7.4) Move state 2
17From Operator-level to Partition-level
- Problem of operator-level adaptation
- Operators have large states.
- Moving them across machines can be expensive.
- Solution as partition-level adaptation
- Partition state-intensive operators
Gra90,SH03,LR05 - Distribute Partitioned Plan into Multiple
Machines
18Partitioned Symmetric M-way Join
- Example Query A.A1 B.B1 C.C1
- Join is Processed in Two Machines
19Partition-level Adaptations
- 1 State Relocation Uneven workload among
machines!
- States relocated are active in another machine
- Overheads in monitoring and moving states across
machines
- 2 State Spill Memory overflow problem still
exists!
- Push Operator States Temporarily into Disks -
- Spilled operator states are temporarily inactive
20Approaches Lazy- vs. Active-Disk
- Independent Spill and Relocation Decisions
- Distribution Manager Trigger state relocation
if Mr lt ?r and t gt ?r - Query Processor Start state spill
if Memu / Memall gt ?s
- Partitions on Different Machines May Have
Different Productivity - i.e., Most productive partitions in machine 1 may
be less productive than least productive ones
other machines - Proposed Technique Perform State Spill Globally
21Performance Results of Lazy-Disk Active-Disk
Approaches
- Lazy-Disk vs. No-Relocation in Memory Constraint
Env.
- Lazy-Disk vs. Active Disk
Three machines, M1(50), M2(25), M3(25) Input
Rate 30ms Tuple Range30K Inc. Join Ratio
2 State spill memory threshold 100M State
relocation gt 30M, Mem thres. 80, Minspan 45s
Three machines, Input Rate 30ms Tuple
Range15K,45K State spill memory thres. 80M Avg.
Inc, Join Ratio M1(4), M2(1), M3(1) Maximal
Force-Disk memory 100M, Ratiogt2 State
relocation gt30M, Mem thres. 80, Minspan 45s
22Plan-Wide State Spill Local Methods
- Direct Extension of Single-Operator Solution
- Update Operator Productivity Values Individually
- Spill partitions with smaller Poutput/Psize
values among all operators
- Push States from Bottom Operators First
- Randomly or using local productivity value for
partition selection - Less intermediate results (states) stored -gt
reduce number of state spills
23Plan-Wide State Spill Global Outputs
- Poutput Contribution to Final Query Output
- A lineage tracing algorithm to update Poutput
statistics
k
- Update Poutput values of partitions in Join3
Join3
- Apply Split2 to each tuple and find corresponding
partitions from Join2, and update its Poutput
value
SplitE
Split2
Join2
E
SplitD
Split1
Join1
D
SplitA
SplitB
SplitC
- Apply Same Lineage Tracing Algorithm for
Intermediate Results
A
B
C
- Consider Intermediate Result Size
1
2
2
P11 Psize 10, Poutput20 P12 Psize
10, Poutput20
...
p11
p2i
2
1
p2j
p12
...
...
20
2
3
4
1
2
p11
p21
p31
p41
...
...
...
p12
...
OP1
OP2
p3j
p4j
p2j
- Intermediate Result Factor Pinter
- Poutput/(Psize Pinter)
OP4
OP1
OP2
OP3
24Experiment Results for Plan-Wide Spill
- 300 partitions
- Memory Threshold 60MB
- Push 30 of states in each state spill
- Average tuple inter-arrival time 50ms from each
input
Query with Average Join Rate Join1 1, Join2 3,
Join3 3
Query with Average Join Rate Join1 3, Join2 2,
Join3 3
25Backup Slides
26Conclusions
- Theme Partitioning State-Intensive Operator
- Low overhead
- Resolve memory shortage
- Analyzing State Adaptation Performance Policies
- State spill
- Slow down run-time throughput
- State relocation
- Low overhead
- Given sufficient main memory
- State relocation helps run-time throughput
- Insufficient main memory
- Active-Disk improves run-time throughput
- Adapting Multi-Operator Plan
- Dependency among operators
- Global throughput-oriented spill solutions
improve throughput
27Plan Shape Restructuring and Distributed Stream
Processing
- New slides for yalis migration distribution
ideas
28Migration Strategies Parallel Track
Basic Idea Execute both plans in parallel until
old box is expired, after which the old box is
disconnected and the migration is over. Potential
Duplicate Both boxes generate all-new tuples.
QABCD
QABCD
At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
SABC
SD
SBCD
SA
CD
AB
SAB
SBC
SD
SC
BC
CD
Other op in old box Proceed as normal
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QA
QB
QC
QD
- Pros Migrate in a gradual fashion. Still output
even during migration. - Cons Still rely on executing of old box to
process tuples during migration stage.
29Cost Estimations For MS
New Box
TMS Tmatch Tmove Trecompute
Trecompute(SBC) Trecompute(SBCD)
?B?CW2(Tj TssBC) 2?B?C?DW3(TjsBC TssBCsBCD)
Cost Estimations For PT
Old Box
SABC
SD
CD
T
Old
Old
SC
SAB
Old
Old
W
BC
TM-start
1st W
New
New
SA
SB
AB
2nd W
New
New
TM-end
QA
QB
QC
QD
TPT 2W given enough system resources
30Experimental Results for Plan Migration
- Observations
- Confirm with prior cost analysis.
- Duration of moving state affected by
- window size and arrival rates.
- Duration of parallel track is 2W given
- enough system resources, otherwise
- affected by system parameters, such
- as window size and arrival rates.
31Related Work on Distributed Continuous Query
Processing
- 1 Medusa M. Balazinska, H. Balakrishnan, and
M. Stonebraker. Contract-based load management in
federated distributed systems. In Ist of NSDI,
March 2004 - 2 Aurora M. Cherniack, H. Balakrishnan, M.
Balazinska, and etl. Scalable distributed stream
processing. In CIDR, 2003. - 3 Borealis T. B. Team. The design of the
Borealis Stream Processing Engine. Technical
Report, Brown University, CS Department, August
2004 - 4 Flux M. Shah, J. Hellerstein, S.
Chandrasekaran, and M. Franklin. Flux An
adaptive partitioning operator for continuous
query systems. In ICDE, pages 25-36, 2003 - 5 Distributed Eddies F. Tian, and D. DeWitt.
Tuple routing strategies for distributed Eddies.
In VLDB Proceedings, Berlin, Germany, 2003 -
32Related Work on Partitioned Processing
- Non state-intensive queries BB02,AC03,GT03
- State-Intensive operators (run-time memory
shortage) - Operator-level adaptation CB03,SLJ05,XZH05
- Fine grained state level adaptation (adapt
partial states) - Load shedding TUZC03
- Require complete query result (no load shedding)
- Drop input tuples to handle resource shortage
- XJoin UF00 and Hash-Merge Join MLA04
- Integrate both spill and relocation in
distributed environments - Investigate dependency problem for multiple
operators - Flux SH03
- Multi-Input operators
- Integrate both state spill and state relocation
- Adapt states of one single input operator across
machines - Hash-Merge Join MLA04, XJoin UF00
- Only spill states for one single operator in
central environments
33CAPE Publications and Reports
- RDZ04 E. A. Rundensteiner, L. Ding, Y. Zhu, T.
Sutherland and B. Pielech, CAPE A
Constraint-Aware Adaptive Stream Processing
Engine. Invited Book Chapter. http//www.cs.uno.e
du/nauman/streamBook/. July 2004. - ZRH04 Y. Zhu, E. A. Rundensteiner and G. T.
Heineman, "Dynamic Plan Migration for Continuous
Queries Over Data Streams. SIGMOD 2004, pages
431-442. - DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
and G. T. Heineman, "Joining Punctuated Streams.
EDBT 2004, pages 587-604. - DR04 L. Ding and E. A. Rundensteiner,
"Evaluating Window Joins over Punctuated
Streams. CIKM 2004, to appear. - DRH03 L. Ding, E. A. Rundensteiner and G. T.
Heineman, MJoin A Metadata-Aware Stream Join
Operator. DEBS 2003. - RDSZBM04 E A. Rundensteiner, L Ding, T
Sutherland, Y Zhu, B Pielech \ - And N Mehta. CAPE Continuous Query Engine
with Heterogeneous-Grained Adaptivity.
Demonstration Paper. VLDB 2004 - SR04 T. Sutherland and E. A. Rundensteiner,
"D-CAPE A Self-Tuning Continuous Query Plan
Distribution Architecture. Tech Report,
WPI-CS-TR-04-18, 2004. - SPR04 T. Sutherland, B. Pielech, Yali Zhu,
Luping Ding, and E. A. Rundensteiner, "Adaptive
Multi-Objective Scheduling Selection Framework
for Continuous Query Processing . IDEAS 2005. - SLJR05 T Sutherland, B Liu, M Jbantova, and E
A. Rundensteiner, D-CAPE Distributed and
Self-Tuned Continuous Query Processing, CIKM,
Bremen, Germany, Nov. 2005. - LR05 Bin Liu and E.A. Rundensteiner,
Revisiting Pipelined Parallelism in Multi-Join
Query Processing, VLDB 2005. - B05 Bin Liu and E.A. Rundensteiner,
Partition-based Adaptation Strategies Integrating
Spill and Relocation, Tech Report, WPI-CS-TR-05,
2005. (in submission) - CAPE Project http//davis.wpi.edu/dsrg/CAPE/index
.html
34CAPE Engine
Exploit semantic constraints such as sliding
windows and punctuations to reduce resource
usage and improve response time.
- Constraint-aware
- Adaptive
- Continuous Query
- Processing
- Engine
- Incorporate heterogeneous-grained
- adaptivity at all query processing levels.
- - Adaptive query operator execution
- Adaptive query plan re-optimization
- Adaptive operator scheduling
- Adaptive query plan distribution
Process queries in a real-time manner by
employing well-coordinated heterogeneous-grained
adaptations.
35Analyzing Adaptation Performance
- Questions Addressed
- Partitioned Parallel Processing
- Resolves memory shortage
- Should we partition non-memory intensive queries?
- How effective is partitioning memory intensive
queries? - State Spill
- Known Problem Slows down run-time throughput
- How many states to push?
- Which states to push?
- How to combine memory/disk states to produce
complete results? - State Relocation
- Known Asset Low overhead
- When (how often) to trigger state relocation?
- Is state relocation an expensive process?
- How to coordinate state moving without losing
data states? - Analyzing State Adaptation Performance Policies
- Given sufficient main memory, state relocation
helps run-time throughput - With insufficient main memory, Active-Disk
improves run-time throughput - Adapting Multi-Operator Plan
36Percentage Spilled per Adaptation
- Amount of State Pushed Each Adaptation
- Percentage of Tuples Pushed/Total of Tuples
Run-Time Query Throughput
Run-Time Main Memory Usage
(Input Rate 30ms/Input, Tuple Range30K, Join
Ratio3, Adaptation threshold 200MB)