DAX:%20Dynamically%20Adaptive%20Distributed%20System%20for%20Processing%20CompleX%20Continuous%20Queries PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: DAX:%20Dynamically%20Adaptive%20Distributed%20System%20for%20Processing%20CompleX%20Continuous%20Queries


1
DAX Dynamically Adaptive Distributed System for
Processing CompleX Continuous Queries
  • Bin Liu, Yali Zhu, Mariana Jbantova, Brad
    Momberger,
  • and Elke A. Rundensteiner
  • Department of Computer Science, Worcester
    Polytechnic Institute
  • 100 Institute Road, Worcester, MA 01609
  • Tel 1-508-831-5857, Fax 1-508-831-5776
  • binliu, yaliz, jbantova, bmombe,
    rundenst_at_cs.wpi.edu
  • VLDB05 Demonstration
  • http//davis.wpi.edu/dsrg/CAPE/index.html

2
Uncertainties in Stream Query Processing
Register Continuous Queries
Receive Answers
High workload of queries
Real-time and accurate responses required
Distributed Stream Query Engine
Streaming Data
Streaming Result
May have time-varying rates and high-volumes
Available resources for executing each operator
may vary over time.
Memory- and CPU resource limitations
Distribution and Adaptations are required.
3
Adaptation in Distributed Stream Processing
  • Adaptation Techniques
  • Spilling data to disk
  • Relocating work to other machines
  • Reoptimizing and migrating query plan
  • Granularity of Adaptation
  • Operator-level distribution and adaptation
  • Partition-level distribution and adaptation
  • Integrated Methodologies
  • Consider trade-offs between spill vs redistribute
  • Consider trade-offs between migrate vs
    redistribute

4
System Overview LZ05, TLJ05
Query Processor
Distribution Manager
Local Plan Migrator
Connection Manager
Local Statistics Gatherer
Local Adaptation Controller
Global Plan Migrator
CAPE-Continuous Query Processing Engine
Runtime Monitor
Query Plan Manager
Repository
Data Distributor
Data Receiver
Global Adaptation Controller
Repository
Streaming Data
Network
End User
Application Server
Stream Generator
5
Motivating Example
  • Scalable Real-Time Data Processing Systems

Real Time Data Integration Server
Stock Price, Volumes,...
...
...
Reviews, External Reports, News, ...
Complex queries such as multi-joins are common!
  • To Produce As Many Results As Possible at
    Run-Time
  • (i.e., 900am-400pm)
  • Main memory based processing
  • To Require Complete Query Results
  • (i.e., for offline analysis after 400pm or
    whenever possible)
  • Load shedding not acceptable, must temporarily
    spill to disk

6
Random Distribution
Balanced Network Aware Distribution
Goal To minimize network connectivity. Algorithm
Takes each query plan and creates sub-plans
where neighbouring operators are grouped together.
Goal To equalize workload per machine. Algorithm
Iteratively takes each query operator and places
it on the query processor with the least number
of operators.
7
Initial Distribution Process
M1
M2
Step 1
Step 2
  • Step 1 Create distribution table using initial
    distribution algorithm.
  • Step 2 Send distribution information to
    processing machines (nodes).

8
Operator-level Adaptation - Redistribution
Cost per machine is determined as percentage of
memory filled with tuples.
Statistics Table
Cost Table (current)
Cost Table (desired)
Balance
M Capacity 4500 tuples
Distribution Table
  • Capes cost models number of tuples in memory
    and network output rate.
  • Operators redistributed based on redistribution
    policy.
  • Redistribution policies of Cape Balance and
    Degradation.

9
Redistribution Protocol Moving Operators Across
Machines
10
Query Plan Performance with Query Plan of 40
Operators.
  • Observations
  • Initial distribution is important for query plan
    performance.
  • Redistribution improves at run-time query plan
    performance.

11
Operator-level Adaptation Dynamic Plan Migration
  • The last step of plan re-optimization After
    optimizer generates a new query plan, how to
    replace currently running plan by the new plan on
    the fly?
  • A new challenge in streaming system because of
    stateful operators.
  • A unique feature of the DAX system.
  • But can we just take out the old plan and plug in
    the new plan?
  • Steps
  • (1) Pause execution of old plan
  • (2) Drain out all tuples inside old plan
  • (3) Replace old plan by new plan
  • (4) Resume execution of new plan

Deadlock Waiting Problem
BC
(2) All tuples drained
AB
(3) Old Replaced By new
A
B
C
(4) Processing Resumed
Key Observation Purge of tuples in states
relies on processing of new tuples.
12
Migration Strategy - Moving State
Migration Requirements No missing results and
no duplicates Two migration boxes One contains
old sub-plan, one contains new sub-plan. Two
sub-plans semantically equivalent, with same
input and output queues Migration is abstracted
as replacing old box by new box.
  • Basic idea - Share common states between two
    migration boxes
  • Key Steps
  • Drain Tuples in Old Box
  • State Matching
  • State in old box has unique ID. During
    rewriting, new ID given to newly generated state
    in new box. When rewriting done, match states
    based on IDs.
  • State Moving
  • between matched states
  • Whats left?
  • Unmatched states in new box
  • Unmatched states in old box

QABCD
QABCD
CD
AB
SABC
SD
SA
SBCD
CD
BC
SD
SBC
SAB
SC
BC
AB
SB
SC
SA
SB
QA
QB
QC
QD
QA
QB
QC
QD
New Box
Old Box
13
Moving StateUnmatched States
Unmatched New States (Recomputation)Recursively
recompute unmatched states from bottom
up.Unmatched Old States (Execution
Synchronization)First clean accumulated tuples
in box input queues, it is then safe to discard
these unmatched states.
A B C
Old Old Old
Old Old New
Old New Old
New Old Old
Old New New
New Old New
New New Old
New Old New
New New New
t
W 2
A
a1
a2
B
b3
b1
b2
C
c1
c2
c3
14
Distributed DynamicMigration Protocols (I)
Distribution Table
Distribution Manager
op1
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
Migration Start
op2
op3
op4
M1
M2
op1
op1
2
...
1
2
1
op2
op2
3
4
3
4
op3
op4
op3
op4
Migration Stage Execution Synchronization
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
(1) Request SyncTime
(1) Request SyncTime
op3
op4
op3
op4
(3) Global SyncTime
(3) Global SyncTime
(2) Local Synctime
(4) Execution Synced
M1
M2
op1
2
1
op1
2
...
1
op2
3
4
op2
3
4
op3
op4
op3
op4
15
Distributed Dynamic Migration Protocols (II)
Migration Stage Change Plan Shape
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
op3
op4
op3
op4
(5) Send New SubQueryPlan
(5) Send New SubQueryPlan
(6) PlanChanged
op2
op2
3
5
3
5
op1
op1
4
2
4
2
M1
M2
op2
3
5
op2
3
5
...
op1
op1
4
2
4
2
op3
op4
op3
op4
16
Distributed Dynamic Migration Protocols (III)
Migration Stage Fill States and Reactivate
Operators
Distribution Manager
Distribution Table
op1
op2
2
1
3
5
OP1 M1
OP 2 M2
OP 3 M1
OP 4 M2
op2
op1
3
4
4
2
op3
op4
op3
op4
(7) Fill States 3, 5
(7) Fill States 2, 4
(9) Reconnect Operators
(9) Reconnet operators
(8) States Filled
(11) Activate op2
(11) Active op 1
(10) Operator Reconnected
M2
(7.1) Request state 4
M1
op2
op2
3
5
3
5
(7.2) Move state 4
op1
op1
4
2
4
2
(7.3) Request state 2
op3
op4
op3
op4
(7.4) Move state 2
17
From Operator-level to Partition-level
  • Problem of operator-level adaptation
  • Operators have large states.
  • Moving them across machines can be expensive.
  • Solution as partition-level adaptation
  • Partition state-intensive operators
    Gra90,SH03,LR05
  • Distribute Partitioned Plan into Multiple
    Machines

18
Partitioned Symmetric M-way Join
  • Example Query A.A1 B.B1 C.C1
  • Join is Processed in Two Machines

19
Partition-level Adaptations
  • 1 State Relocation Uneven workload among
    machines!
  • States relocated are active in another machine
  • Overheads in monitoring and moving states across
    machines
  • 2 State Spill Memory overflow problem still
    exists!
  • Push Operator States Temporarily into Disks -
  • Spilled operator states are temporarily inactive

20
Approaches Lazy- vs. Active-Disk
  • Lazy-Disk Approach
  • Independent Spill and Relocation Decisions
  • Distribution Manager Trigger state relocation
    if Mr lt ?r and t gt ?r
  • Query Processor Start state spill
    if Memu / Memall gt ?s
  • Active-Disk Approach
  • Partitions on Different Machines May Have
    Different Productivity
  • i.e., Most productive partitions in machine 1 may
    be less productive than least productive ones
    other machines
  • Proposed Technique Perform State Spill Globally

21
Performance Results of Lazy-Disk Active-Disk
Approaches
  • Lazy-Disk vs. No-Relocation in Memory Constraint
    Env.
  • Lazy-Disk vs. Active Disk

Three machines, M1(50), M2(25), M3(25) Input
Rate 30ms Tuple Range30K Inc. Join Ratio
2 State spill memory threshold 100M State
relocation gt 30M, Mem thres. 80, Minspan 45s
Three machines, Input Rate 30ms Tuple
Range15K,45K State spill memory thres. 80M Avg.
Inc, Join Ratio M1(4), M2(1), M3(1) Maximal
Force-Disk memory 100M, Ratiogt2 State
relocation gt30M, Mem thres. 80, Minspan 45s
22
Plan-Wide State Spill Local Methods
  • Local Output
  • Direct Extension of Single-Operator Solution
  • Update Operator Productivity Values Individually
  • Spill partitions with smaller Poutput/Psize
    values among all operators
  • Bottom Up Pushing
  • Push States from Bottom Operators First
  • Randomly or using local productivity value for
    partition selection
  • Less intermediate results (states) stored -gt
    reduce number of state spills

23
Plan-Wide State Spill Global Outputs
  • Poutput Contribution to Final Query Output
  • A lineage tracing algorithm to update Poutput
    statistics

k
  • Update Poutput values of partitions in Join3

Join3
  • Apply Split2 to each tuple and find corresponding
    partitions from Join2, and update its Poutput
    value

SplitE
Split2
Join2
E
SplitD
Split1
Join1
  • And so on

D
SplitA
SplitB
SplitC
  • Apply Same Lineage Tracing Algorithm for
    Intermediate Results

A
B
C
  • Consider Intermediate Result Size

1
2
2
P11 Psize 10, Poutput20 P12 Psize
10, Poutput20
...
p11
p2i
2
1
p2j
p12
...
...
20
2
3
4
1
2
p11
p21
p31
p41
...
...
...
p12
...
OP1
OP2
p3j
p4j
p2j
  • Intermediate Result Factor Pinter
  • Poutput/(Psize Pinter)

OP4
OP1
OP2
OP3
24
Experiment Results for Plan-Wide Spill
  • 300 partitions
  • Memory Threshold 60MB
  • Push 30 of states in each state spill
  • Average tuple inter-arrival time 50ms from each
    input

Query with Average Join Rate Join1 1, Join2 3,
Join3 3
Query with Average Join Rate Join1 3, Join2 2,
Join3 3
25
Backup Slides
26
Conclusions
  • Theme Partitioning State-Intensive Operator
  • Low overhead
  • Resolve memory shortage
  • Analyzing State Adaptation Performance Policies
  • State spill
  • Slow down run-time throughput
  • State relocation
  • Low overhead
  • Given sufficient main memory
  • State relocation helps run-time throughput
  • Insufficient main memory
  • Active-Disk improves run-time throughput
  • Adapting Multi-Operator Plan
  • Dependency among operators
  • Global throughput-oriented spill solutions
    improve throughput

27
Plan Shape Restructuring and Distributed Stream
Processing
  • New slides for yalis migration distribution
    ideas

28
Migration Strategies Parallel Track
Basic Idea Execute both plans in parallel until
old box is expired, after which the old box is
disconnected and the migration is over. Potential
Duplicate Both boxes generate all-new tuples.

QABCD
QABCD
At root op in old box If both to-be-joined
tuples have all-new sub-tuples, dont join.
SABC
SD
SBCD
SA
CD
AB
SAB
SBC
SD
SC
BC
CD
Other op in old box Proceed as normal
SA
SB
SB
SC
BC
AB
QA
QB
QC
QD
QA
QB
QC
QD
  • Pros Migrate in a gradual fashion. Still output
    even during migration.
  • Cons Still rely on executing of old box to
    process tuples during migration stage.

29
Cost Estimations For MS
New Box
TMS Tmatch Tmove Trecompute
Trecompute(SBC) Trecompute(SBCD)
?B?CW2(Tj TssBC) 2?B?C?DW3(TjsBC TssBCsBCD)
Cost Estimations For PT
Old Box
SABC
SD
CD
T
Old
Old
SC
SAB
Old
Old
W
BC
TM-start
1st W
New
New
SA
SB
AB
2nd W
New
New
TM-end
QA
QB
QC
QD
TPT 2W given enough system resources
30
Experimental Results for Plan Migration
  • Observations
  • Confirm with prior cost analysis.
  • Duration of moving state affected by
  • window size and arrival rates.
  • Duration of parallel track is 2W given
  • enough system resources, otherwise
  • affected by system parameters, such
  • as window size and arrival rates.

31
Related Work on Distributed Continuous Query
Processing
  • 1 Medusa M. Balazinska, H. Balakrishnan, and
    M. Stonebraker. Contract-based load management in
    federated distributed systems. In Ist of NSDI,
    March 2004
  • 2 Aurora M. Cherniack, H. Balakrishnan, M.
    Balazinska, and etl. Scalable distributed stream
    processing. In CIDR, 2003.
  • 3 Borealis T. B. Team. The design of the
    Borealis Stream Processing Engine. Technical
    Report, Brown University, CS Department, August
    2004
  • 4 Flux M. Shah, J. Hellerstein, S.
    Chandrasekaran, and M. Franklin. Flux An
    adaptive partitioning operator for continuous
    query systems. In ICDE, pages 25-36, 2003
  • 5 Distributed Eddies F. Tian, and D. DeWitt.
    Tuple routing strategies for distributed Eddies.
    In VLDB Proceedings, Berlin, Germany, 2003

32
Related Work on Partitioned Processing
  • Non state-intensive queries BB02,AC03,GT03
  • State-Intensive operators (run-time memory
    shortage)
  • Operator-level adaptation CB03,SLJ05,XZH05
  • Fine grained state level adaptation (adapt
    partial states)
  • Load shedding TUZC03
  • Require complete query result (no load shedding)
  • Drop input tuples to handle resource shortage
  • XJoin UF00 and Hash-Merge Join MLA04
  • Integrate both spill and relocation in
    distributed environments
  • Investigate dependency problem for multiple
    operators
  • Flux SH03
  • Multi-Input operators
  • Integrate both state spill and state relocation
  • Adapt states of one single input operator across
    machines
  • Hash-Merge Join MLA04, XJoin UF00
  • Only spill states for one single operator in
    central environments

33
CAPE Publications and Reports
  • RDZ04 E. A. Rundensteiner, L. Ding, Y. Zhu, T.
    Sutherland and B. Pielech, CAPE A
    Constraint-Aware Adaptive Stream Processing
    Engine. Invited Book Chapter. http//www.cs.uno.e
    du/nauman/streamBook/. July 2004.
  • ZRH04 Y. Zhu, E. A. Rundensteiner and G. T.
    Heineman, "Dynamic Plan Migration for Continuous
    Queries Over Data Streams. SIGMOD 2004, pages
    431-442.
  • DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
    and G. T. Heineman, "Joining Punctuated Streams.
    EDBT 2004, pages 587-604.
  • DR04 L. Ding and E. A. Rundensteiner,
    "Evaluating Window Joins over Punctuated
    Streams. CIKM 2004, to appear.
  • DRH03 L. Ding, E. A. Rundensteiner and G. T.
    Heineman, MJoin A Metadata-Aware Stream Join
    Operator. DEBS 2003.
  • RDSZBM04 E A. Rundensteiner, L Ding, T
    Sutherland, Y Zhu, B Pielech \
  • And N Mehta. CAPE Continuous Query Engine
    with Heterogeneous-Grained Adaptivity.
    Demonstration Paper. VLDB 2004
  • SR04 T. Sutherland and E. A. Rundensteiner,
    "D-CAPE A Self-Tuning Continuous Query Plan
    Distribution Architecture. Tech Report,
    WPI-CS-TR-04-18, 2004.
  • SPR04 T. Sutherland, B. Pielech, Yali Zhu,
    Luping Ding, and E. A. Rundensteiner, "Adaptive
    Multi-Objective Scheduling Selection Framework
    for Continuous Query Processing . IDEAS 2005.
  • SLJR05 T Sutherland, B Liu, M Jbantova, and E
    A. Rundensteiner, D-CAPE Distributed and
    Self-Tuned Continuous Query Processing, CIKM,
    Bremen, Germany, Nov. 2005.
  • LR05 Bin Liu and E.A. Rundensteiner,
    Revisiting Pipelined Parallelism in Multi-Join
    Query Processing, VLDB 2005.
  • B05 Bin Liu and E.A. Rundensteiner,
    Partition-based Adaptation Strategies Integrating
    Spill and Relocation, Tech Report, WPI-CS-TR-05,
    2005. (in submission)
  • CAPE Project http//davis.wpi.edu/dsrg/CAPE/index
    .html

34
CAPE Engine
Exploit semantic constraints such as sliding
windows and punctuations to reduce resource
usage and improve response time.
  • Constraint-aware
  • Adaptive
  • Continuous Query
  • Processing
  • Engine
  • Incorporate heterogeneous-grained
  • adaptivity at all query processing levels.
  • - Adaptive query operator execution
  • Adaptive query plan re-optimization
  • Adaptive operator scheduling
  • Adaptive query plan distribution

Process queries in a real-time manner by
employing well-coordinated heterogeneous-grained
adaptations.
35
Analyzing Adaptation Performance
  • Questions Addressed
  • Partitioned Parallel Processing
  • Resolves memory shortage
  • Should we partition non-memory intensive queries?
  • How effective is partitioning memory intensive
    queries?
  • State Spill
  • Known Problem Slows down run-time throughput
  • How many states to push?
  • Which states to push?
  • How to combine memory/disk states to produce
    complete results?
  • State Relocation
  • Known Asset Low overhead
  • When (how often) to trigger state relocation?
  • Is state relocation an expensive process?
  • How to coordinate state moving without losing
    data states?
  • Analyzing State Adaptation Performance Policies
  • Given sufficient main memory, state relocation
    helps run-time throughput
  • With insufficient main memory, Active-Disk
    improves run-time throughput
  • Adapting Multi-Operator Plan

36
Percentage Spilled per Adaptation
  • Amount of State Pushed Each Adaptation
  • Percentage of Tuples Pushed/Total of Tuples

Run-Time Query Throughput
Run-Time Main Memory Usage
(Input Rate 30ms/Input, Tuple Range30K, Join
Ratio3, Adaptation threshold 200MB)
Write a Comment
User Comments (0)
About PowerShow.com