Title: DCAPE: Distributed and Self-Tuned Continuous Query Processing
1DCAPE Distributed and Self-TunedContinuous
Query Processing
- Tim Sutherland,Bin Liu,Mariana Jbantova,
- and Elke A. Rundensteiner
- Department of Computer Science, Worcester
Polytechnic Institute - 100 Institute Road, Worcester, MA 01609
- Tel 1-508-831-5857, Fax 1-508-831-5776
- tims, binliu, jbantova, rundenst_at_cs.wpi.edu
- CIKM05 Poster
- http//davis.wpi.edu/dsrg/CAPE/index.html
2Uncertainties in Stream Query Processing
Register Continuous Queries
Receive Answers
High workload of queries
Real-time and accurate responses required
Distributed Stream Query Engine
Streaming Data
Streaming Result
May have time-varying rates and high-volumes
Available resources for executing each operator
may vary over time.
Memory- and CPU resource limitations
Distribution and Adaptations are required.
3Adaptation in DCAPE Distributed Stream
Processing in a Nutshell
- Adaptation Techniques
- Spilling data to disk
- Relocating work to other machines
- Reoptimizing and migrating query plan
- Granularity of Adaptation
- Operator-level distribution and adaptation
- Partition-level distribution and adaptation
- Integrated Methodologies
- Consider trade-offs between spill vs redistribute
- Consider trade-offs between migrate vs
redistribute
4CAPE System Architecture
Query Processor
Distribution Manager
Local Plan Migrator
Connection Manager
Local Statistics Gatherer
Local Adaptation Controller
Global Plan Migrator
CAPE-Continuous Query Processing Engine
Runtime Monitor
Query Plan Manager
Repository
Data Distributor
Data Receiver
Global Adaptation Controller
Repository
Streaming Data
Streaming Data
Network
Streaming Data
End User
LZ05, TLJ05
Stream Servers
5Random Distribution
Balanced Network Aware Distribution
Goal To minimize network connectivity. Algorithm
Takes each query plan and creates sub-plans
where neighbouring operators are grouped together.
Goal To equalize workload per machine. Algorithm
Iteratively takes each query operator and places
it on the query processor with the least number
of operators.
6Initial Distribution of Query Plan Across Cluster
of Machines
M1
M2
Step 1
Step 2
- Step 1 Create distribution table using initial
distribution algorithm. - Step 2 Send distribution information to
processing machines (nodes).
7Run-Time Plan Redistribution
Cost per machine is determined as percentage of
memory filled with tuples.
Cost Table (current)
Cost Table (desired)
Balance
Operators redistributed based on a
redistribution policy.
Redistribution policies in Cape Balance and
Degradation.
Legend --------- M1 M2
Legend --------- M1 M2
8Redistribution Protocol Across Machines
- No tuples lost
- No-duplicates produced
- No incorrect results produced
- Seamless
9Query Plan Performance with Query Plan of 40
Operators.
- Observations
- Initial distribution is important for query plan
performance. - Redistribution improves at run-time query plan
performance.
10From Operator- to Partition-level Adaptation
- Problem of operator-level adaptation
- Operators have large states.
- Moving them across machines can be expensive.
- Solution as partition-level adaptation
- Partition state-intensive operators
Gra90,SH03,LR05 - Distribute Partitioned Plan into Multiple
Machines
11CAPE Publications and Reports
- RDZ04 E. A. Rundensteiner, L. Ding, Y. Zhu, T.
Sutherland and B. Pielech, CAPE A
Constraint-Aware Adaptive Stream Processing
Engine. Invited Book Chapter. http//www.cs.uno.e
du/nauman/streamBook/. July 2004. - ZRH04 Y. Zhu, E. A. Rundensteiner and G. T.
Heineman, "Dynamic Plan Migration for Continuous
Queries Over Data Streams. SIGMOD 2004, pages
431-442. - DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
and G. T. Heineman, "Joining Punctuated Streams.
EDBT 2004, pages 587-604. - DR04 L. Ding and E. A. Rundensteiner,
"Evaluating Window Joins over Punctuated
Streams. CIKM 2004, to appear. - DRH03 L. Ding, E. A. Rundensteiner and G. T.
Heineman, MJoin A Metadata-Aware Stream Join
Operator. DEBS 2003. - RDSZBM04 E A. Rundensteiner, L Ding, T
Sutherland, Y Zhu, B Pielech \ - And N Mehta. CAPE Continuous Query Engine
with Heterogeneous-Grained Adaptivity.
Demonstration Paper. VLDB 2004 - SR04 T. Sutherland and E. A. Rundensteiner,
"D-CAPE A Self-Tuning Continuous Query Plan
Distribution Architecture. Tech Report,
WPI-CS-TR-04-18, 2004. - SPR04 T. Sutherland, B. Pielech, Yali Zhu,
Luping Ding, and E. A. Rundensteiner, "Adaptive
Multi-Objective Scheduling Selection Framework
for Continuous Query Processing . IDEAS 2005. - SLJR05 T Sutherland, B Liu, M Jbantova, and E
A. Rundensteiner, D-CAPE Distributed and
Self-Tuned Continuous Query Processing, CIKM,
Bremen, Germany, Nov. 2005. - LR05 Bin Liu and E.A. Rundensteiner,
Revisiting Pipelined Parallelism in Multi-Join
Query Processing, VLDB 2005. - B05 Bin Liu and E.A. Rundensteiner,
Partition-based Adaptation Strategies Integrating
Spill and Relocation, Tech Report, WPI-CS-TR-05,
2005. (in submission) - CAPE Project http//davis.wpi.edu/dsrg/CAPE/index
.html