DAX:%20Dynamically%20Adaptive%20Distributed%20System%20for%20Processing%20CompleX%20Continuous%20Queries presentation

About This Presentation

Transcript and Presenter's Notes

Title: DAX:%20Dynamically%20Adaptive%20Distributed%20System%20for%20Processing%20CompleX%20Continuous%20Queries

1
DAX Dynamically Adaptive Distributed System for
Processing CompleX Continuous Queries

Bin Liu, Yali Zhu, Mariana Jbantova, Brad
Momberger,
and Elke A. Rundensteiner
Department of Computer Science, Worcester
Polytechnic Institute
100 Institute Road, Worcester, MA 01609
Tel 1-508-831-5857, Fax 1-508-831-5776
binliu, yaliz, jbantova, bmombe,
rundenst_at_cs.wpi.edu
VLDB05 Demonstration
http//davis.wpi.edu/dsrg/CAPE/index.html

2
Uncertainties in Stream Query Processing
Register Continuous Queries
Receive Answers
High workload of queries
Real-time and accurate responses required
Distributed Stream Query Engine
Streaming Data
Streaming Result
May have time-varying rates and high-volumes
Available resources for executing each operator
may vary over time.
Memory- and CPU resource limitations
Distribution and Adaptations are required.
3
Adaptation in Distributed Stream Processing

Adaptation Techniques
Spilling data to disk
Relocating work to other machines
Reoptimizing and migrating query plan
Granularity of Adaptation
Operator-level distribution and adaptation
Partition-level distribution and adaptation
Integrated Methodologies
Consider trade-offs between spill vs redistribute
Consider trade-offs between migrate vs
redistribute

4
System Overview LZ05, TLJ05
Query Processor
Distribution Manager
Local Plan Migrator
Connection Manager
Local Statistics Gatherer
Local Adaptation Controller
Global Plan Migrator
CAPE-Continuous Query Processing Engine
Runtime Monitor
Query Plan Manager
Repository
Data Distributor
Data Receiver
Global Adaptation Controller
Repository
Streaming Data
Network
End User
Application Server
Stream Generator
5
Motivating Example

Scalable Real-Time Data Processing Systems

Real Time Data Integration Server
Stock Price, Volumes,...
...
...
Reviews, External Reports, News, ...
Complex queries such as multi-joins are common!

To Produce As Many Results As Possible at
Run-Time
(i.e., 900am-400pm)
Main memory based processing
To Require Complete Query Results
(i.e., for offline analysis after 400pm or
whenever possible)
Load shedding not acceptable, must temporarily
spill to disk

6
Random Distribution
Balanced Network Aware Distribution
Goal To minimize network connectivity. Algorithm
Takes each query plan and creates sub-plans
where neighbouring operators are grouped together.
Goal To equalize workload per machine. Algorithm
Iteratively takes each query operator and places
it on the query processor with the least number
of operators.
7
Initial Distribution Process
M1
M2
Step 1
Step 2

Step 1 Create distribution table using initial
distribution algorithm.
Step 2 Send distribution information to
processing machines (nodes).

8
Operator-level Adaptation - Redistribution
Cost per machine is determined as percentage of
memory filled with tuples.
Statistics Table
Cost Table (current)
Cost Table (desired)
Balance
M Capacity 4500 tuples
Distribution Table

Capes cost models number of tuples in memory
and network output rate.
Operators redistributed based on redistribution
policy.
Redistribution policies of Cape Balance and
Degradation.

9
Redistribution Protocol Moving Operators Across
Machines
10
Query Plan Performance with Query Plan of 40
Operators.

Observations
Initial distribution is important for query plan
performance.
Redistribution improves at run-time query plan
performance.

11
Operator-level Adaptation Dynamic Plan Migration

The last step of plan re-optimization After
optimizer generates a new query plan, how to
replace currently running plan by the new plan on
the fly?
A new challenge in streaming system because of
stateful operators.
A unique feature of the DAX system.
But can we just take out the old plan and plug in
the new plan?

Steps
(1) Pause execution of old plan
(2) Drain out all tuples inside old plan
(3) Replace old plan by new plan
(4) Resume execution of new plan

Deadlock Waiting Problem
BC
(2) All tuples drained
AB
(3) Old Replaced By new
A
B
C
(4) Processing Resumed
Key Observation Purge of tuples in states
relies on processing of new tuples.
12
Migration Strategy - Moving State
Migration Requirements No missing results and
no duplicates Two migration boxes One contains
old sub-plan, one contains new sub-plan. Two
sub-plans semantically equivalent, with same
input and output queues Migration is abstracted
as replacing old box by new box.

Basic idea - Share common states between two
migration boxes
Key Steps
Drain Tuples in Old Box
State Matching
State in old box has unique ID. During
rewriting, new ID given to newly generated state
in new box. When rewriting done, match states
based on IDs.
State Moving
between matched states
Whats left?
Unmatched states in new box
Unmatched states in old box

QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
New Box
Old Box
13
Moving StateUnmatched States
Unmatched New States (Recomputation)Recursively
recompute unmatched states from bottom
up.Unmatched Old States (Execution
Synchronization)First clean accumulated tuples
in box input queues, it is then safe to discard
these unmatched states.
A B C
Old Old Old
Old Old New
Old New Old
New Old Old
Old New New
New Old New
New New Old
New Old New
New New New
t
W 2
A
a1
a2
B
b3
b1
b2
C
c1
c2
c3
14
Distributed DynamicMigration Protocols (I)
Distribution Table
Distribution Manager
op1
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
Migration Start
op2
op3
op4
M1
M2
op1
op1
2
...
1
2
1
op2
op2
3
4
3
4
op3
op4
op3
op4
Migration Stage Execution Synchronization
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
(1) Request SyncTime
(1) Request SyncTime
op3
op4
op3
op4
(3) Global SyncTime
(3) Global SyncTime
(2) Local Synctime
(4) Execution Synced
M1
M2
op1
2
1
op1
2
...
1
op2
3
4
op2
3
4
op3
op4
op3
op4
15
Distributed Dynamic Migration Protocols (II)
Migration Stage Change Plan Shape
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
op3
op4
op3
op4
(5) Send New SubQueryPlan
(5) Send New SubQueryPlan
(6) PlanChanged
op2
op2
3
5
3
5
op1
op1
4
2
4
2
M1
M2
op2
3
5
op2
3
5
...
op1
op1
4
2
4
2
op3
op4
op3
op4
16
Distributed Dynamic Migration Protocols (III)
Migration Stage Fill States and Reactivate
Operators
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
op3
op4
op3
op4
(7) Fill States 3, 5
(7) Fill States 2, 4
(9) Reconnect Operators
(9) Reconnet operators
(8) States Filled
(11) Activate op2
(11) Active op 1
(10) Operator Reconnected
M2
(7.1) Request state 4
M1
op2
op2
3
5
3
5
(7.2) Move state 4
op1
op1
4
2
4
2
(7.3) Request state 2
op3
op4
op3
op4
(7.4) Move state 2
17
From Operator-level to Partition-level

Problem of operator-level adaptation
Operators have large states.
Moving them across machines can be expensive.
Solution as partition-level adaptation
Partition state-intensive operators
Gra90,SH03,LR05
Distribute Partitioned Plan into Multiple
Machines

18
Partitioned Symmetric M-way Join

Example Query A.A1 B.B1 C.C1
Join is Processed in Two Machines

19
Partition-level Adaptations

1 State Relocation Uneven workload among
machines!

States relocated are active in another machine
Overheads in monitoring and moving states across
machines

2 State Spill Memory overflow problem still
exists!

Push Operator States Temporarily into Disks -
Spilled operator states are temporarily inactive

20
Approaches Lazy- vs. Active-Disk

Lazy-Disk Approach

Independent Spill and Relocation Decisions
Distribution Manager Trigger state relocation
if Mr lt ?r and t gt ?r
Query Processor Start state spill
if Memu / Memall gt ?s

Active-Disk Approach

Partitions on Different Machines May Have
Different Productivity
i.e., Most productive partitions in machine 1 may
be less productive than least productive ones
other machines
Proposed Technique Perform State Spill Globally

21
Performance Results of Lazy-Disk Active-Disk
Approaches

Lazy-Disk vs. No-Relocation in Memory Constraint
Env.

Lazy-Disk vs. Active Disk

Three machines, M1(50), M2(25), M3(25) Input
Rate 30ms Tuple Range30K Inc. Join Ratio
2 State spill memory threshold 100M State
relocation gt 30M, Mem thres. 80, Minspan 45s
Three machines, Input Rate 30ms Tuple
Range15K,45K State spill memory thres. 80M Avg.
Inc, Join Ratio M1(4), M2(1), M3(1) Maximal
Force-Disk memory 100M, Ratiogt2 State
relocation gt30M, Mem thres. 80, Minspan 45s
22
Plan-Wide State Spill Local Methods

Local Output

Direct Extension of Single-Operator Solution
Update Operator Productivity Values Individually
Spill partitions with smaller Poutput/Psize
values among all operators

Bottom Up Pushing

Push States from Bottom Operators First
Randomly or using local productivity value for
partition selection
Less intermediate results (states) stored -gt
reduce number of state spills

23
Plan-Wide State Spill Global Outputs

Poutput Contribution to Final Query Output

A lineage tracing algorithm to update Poutput
statistics

Update Poutput values of partitions in Join3

Join3

Apply Split2 to each tuple and find corresponding
partitions from Join2, and update its Poutput
value

SplitE
Split2
Join2
E
SplitD
Split1
Join1

And so on

D
SplitA
SplitB
SplitC

Apply Same Lineage Tracing Algorithm for
Intermediate Results

A
B
C

Consider Intermediate Result Size

1
2
2
P11 Psize 10, Poutput20 P12 Psize
10, Poutput20
...
p11
p2i
2
1
p2j
p12
...
...
20
2
3
4
1
2
p11
p21
p31
p41
...
...
...
p12
...
OP1
OP2
p3j
p4j
p2j

Intermediate Result Factor Pinter
Poutput/(Psize Pinter)

OP4
OP1
OP2
OP3
24
Experiment Results for Plan-Wide Spill

300 partitions
Memory Threshold 60MB
Push 30 of states in each state spill
Average tuple inter-arrival time 50ms from each
input

Query with Average Join Rate Join1 1, Join2 3,
Join3 3
Query with Average Join Rate Join1 3, Join2 2,
Join3 3
25
Backup Slides
26
Conclusions

Theme Partitioning State-Intensive Operator
Low overhead
Resolve memory shortage
Analyzing State Adaptation Performance Policies
State spill
Slow down run-time throughput
State relocation
Low overhead
Given sufficient main memory
State relocation helps run-time throughput
Insufficient main memory
Active-Disk improves run-time throughput
Adapting Multi-Operator Plan
Dependency among operators
Global throughput-oriented spill solutions
improve throughput

27
Plan Shape Restructuring and Distributed Stream
Processing

New slides for yalis migration distribution
ideas

28
Migration Strategies Parallel Track
Basic Idea Execute both plans in parallel until
old box is expired, after which the old box is
disconnected and the migration is over. Potential
Duplicate Both boxes generate all-new tuples.

QABCD
QABCD
At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
SABC
SD
SBCD
SA
CD
AB
SAB
SBC
SD
SC
BC
CD
Other op in old box Proceed as normal
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QA
QB
QC
QD

Pros Migrate in a gradual fashion. Still output
even during migration.
Cons Still rely on executing of old box to
process tuples during migration stage.

29
Cost Estimations For MS
New Box
TMS Tmatch Tmove Trecompute
Trecompute(SBC) Trecompute(SBCD)
?B?CW2(Tj TssBC) 2?B?C?DW3(TjsBC TssBCsBCD)
Cost Estimations For PT
Old Box
SABC
SD
CD
T
Old
Old
SC
SAB
Old
Old
W
BC
TM-start
1st W
New
New
SA
SB
AB
2nd W
New
New
TM-end
QA
QB
QC
QD
TPT 2W given enough system resources
30
Experimental Results for Plan Migration

Observations
Confirm with prior cost analysis.
Duration of moving state affected by
window size and arrival rates.
Duration of parallel track is 2W given
enough system resources, otherwise
affected by system parameters, such
as window size and arrival rates.

31
Related Work on Distributed Continuous Query
Processing

1 Medusa M. Balazinska, H. Balakrishnan, and
M. Stonebraker. Contract-based load management in
federated distributed systems. In Ist of NSDI,
March 2004
2 Aurora M. Cherniack, H. Balakrishnan, M.
Balazinska, and etl. Scalable distributed stream
processing. In CIDR, 2003.
3 Borealis T. B. Team. The design of the
Borealis Stream Processing Engine. Technical
Report, Brown University, CS Department, August
2004
4 Flux M. Shah, J. Hellerstein, S.
Chandrasekaran, and M. Franklin. Flux An
adaptive partitioning operator for continuous
query systems. In ICDE, pages 25-36, 2003
5 Distributed Eddies F. Tian, and D. DeWitt.
Tuple routing strategies for distributed Eddies.
In VLDB Proceedings, Berlin, Germany, 2003

32
Related Work on Partitioned Processing

Non state-intensive queries BB02,AC03,GT03
State-Intensive operators (run-time memory
shortage)
Operator-level adaptation CB03,SLJ05,XZH05
Fine grained state level adaptation (adapt
partial states)
Load shedding TUZC03
Require complete query result (no load shedding)
Drop input tuples to handle resource shortage
XJoin UF00 and Hash-Merge Join MLA04
Integrate both spill and relocation in
distributed environments
Investigate dependency problem for multiple
operators
Flux SH03
Multi-Input operators
Integrate both state spill and state relocation
Adapt states of one single input operator across
machines
Hash-Merge Join MLA04, XJoin UF00
Only spill states for one single operator in
central environments

33
CAPE Publications and Reports

RDZ04 E. A. Rundensteiner, L. Ding, Y. Zhu, T.
Sutherland and B. Pielech, CAPE A
Constraint-Aware Adaptive Stream Processing
Engine. Invited Book Chapter. http//www.cs.uno.e
du/nauman/streamBook/. July 2004.
ZRH04 Y. Zhu, E. A. Rundensteiner and G. T.
Heineman, "Dynamic Plan Migration for Continuous
Queries Over Data Streams. SIGMOD 2004, pages
431-442.
DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
and G. T. Heineman, "Joining Punctuated Streams.
EDBT 2004, pages 587-604.
DR04 L. Ding and E. A. Rundensteiner,
"Evaluating Window Joins over Punctuated
Streams. CIKM 2004, to appear.
DRH03 L. Ding, E. A. Rundensteiner and G. T.
Heineman, MJoin A Metadata-Aware Stream Join
Operator. DEBS 2003.
RDSZBM04 E A. Rundensteiner, L Ding, T
Sutherland, Y Zhu, B Pielech \
And N Mehta. CAPE Continuous Query Engine
with Heterogeneous-Grained Adaptivity.
Demonstration Paper. VLDB 2004
SR04 T. Sutherland and E. A. Rundensteiner,
"D-CAPE A Self-Tuning Continuous Query Plan
Distribution Architecture. Tech Report,
WPI-CS-TR-04-18, 2004.
SPR04 T. Sutherland, B. Pielech, Yali Zhu,
Luping Ding, and E. A. Rundensteiner, "Adaptive
Multi-Objective Scheduling Selection Framework
for Continuous Query Processing . IDEAS 2005.
SLJR05 T Sutherland, B Liu, M Jbantova, and E
A. Rundensteiner, D-CAPE Distributed and
Self-Tuned Continuous Query Processing, CIKM,
Bremen, Germany, Nov. 2005.
LR05 Bin Liu and E.A. Rundensteiner,
Revisiting Pipelined Parallelism in Multi-Join
Query Processing, VLDB 2005.
B05 Bin Liu and E.A. Rundensteiner,
Partition-based Adaptation Strategies Integrating
Spill and Relocation, Tech Report, WPI-CS-TR-05,
2005. (in submission)
CAPE Project http//davis.wpi.edu/dsrg/CAPE/index
.html

34
CAPE Engine
Exploit semantic constraints such as sliding
windows and punctuations to reduce resource
usage and improve response time.

Constraint-aware
Adaptive
Continuous Query
Processing
Engine

Incorporate heterogeneous-grained
adaptivity at all query processing levels.
- Adaptive query operator execution
Adaptive query plan re-optimization
Adaptive operator scheduling
Adaptive query plan distribution

Process queries in a real-time manner by
employing well-coordinated heterogeneous-grained
adaptations.
35
Analyzing Adaptation Performance

Questions Addressed
Partitioned Parallel Processing
Resolves memory shortage
Should we partition non-memory intensive queries?
How effective is partitioning memory intensive
queries?
State Spill
Known Problem Slows down run-time throughput
How many states to push?
Which states to push?
How to combine memory/disk states to produce
complete results?
State Relocation
Known Asset Low overhead
When (how often) to trigger state relocation?
Is state relocation an expensive process?
How to coordinate state moving without losing
data states?
Analyzing State Adaptation Performance Policies
Given sufficient main memory, state relocation
helps run-time throughput
With insufficient main memory, Active-Disk
improves run-time throughput
Adapting Multi-Operator Plan

36
Percentage Spilled per Adaptation

Amount of State Pushed Each Adaptation
Percentage of Tuples Pushed/Total of Tuples

Run-Time Query Throughput
Run-Time Main Memory Usage
(Input Rate 30ms/Input, Tuple Range30K, Join
Ratio3, Adaptation threshold 200MB)

Write a Comment

User Comments (0)

About PowerShow.com

DAX:%20Dynamically%20Adaptive%20Distributed%20System%20for%20Processing%20CompleX%20Continuous%20Queries PowerPoint PPT Presentation