Continuously Adaptive Continuous Queries over Streams - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Continuously Adaptive Continuous Queries over Streams

Description:

But, CQs are long running. Initially valid assumptions less so over time ... Long running, 'standing queries', similar to trigger systems. Exclusively read-only ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 23
Provided by: ippokrat
Category:

less

Transcript and Presenter's Notes

Title: Continuously Adaptive Continuous Queries over Streams


1
Continuously Adaptive Continuous Queries over
Streams
SIGMOD 2002
(some slides were taken from Maddens SIGMOD
presentation)
  • Samuel Madden
  • Mehul Shah
  • Joseph M. Hellerstein
  • Vijayshankar Raman

Presented by Ippokratis Pandis
15-823 Hot Topics in DB Systems
2
Introduction CQ 1
  • Description
  • Streams of Data (Sensors/Web pages/Stock
    Analysis/Telephony/)
  • Users register logical specifications of interest
  • Engine filters, combines data and returns result
  • Some Characteristics
  • Proposed systems are based on static plans
  • But, CQs are long running
  • Initially valid assumptions less so over time
  • Static optimizers at their worst!

CQ systems should be Adaptive
3
Introduction CQ 2
  • Long running, standing queries, similar to
    trigger systems
  • Exclusively read-only operations
  • Installed continuously produce results until
    removed
  • Lots of queries, over the same data sources
  • Global query optimization problem hard!
  • Idea adaptive heuristics not quite as hard?
  • Bad decisions are not final

Opportunities for work sharing
4
Introduction - Eddies
  • No need to re-present them
  • Properties
  • Data-flow-oriented components
  • No static ordering of operators
  • Adapt quickly to the fluctuating environment
  • Policy dynamically orders operators on a per
    tuple basis
  • done and ready bits encode where tuple has been,
    where it can go
  • Routing policies use back-pressure and lottery
    picking to favor fast and high-filtering
    operators

5
Idea
  • CQ
  • Eddies
  • SteMs/Qrouped Filters
  • CACQ

6
CACQ Implementation
  • Monotonic queries, from point when query is
    registered
  • Streaming answers
  • Non-blocking operators
  • Windowed Symmetric Joins (Windows in tuples or
    time)

7
Single Query, Single Source
  • Use ready bits to track what to do next
  • All 1s in single source
  • Use done bits to track what has been done
  • Tuple can be output when all bits set
  • Routing policy dynamically orders tuples

R2
R2
R1
R2
R2
R2
R1
R2
SELECT FROM R WHERE R.a gt 10 AND R.b lt 15
1 1 0 0
1 1 0 1
1 1 0 0
1 1 1 0
1 1 11
8
Multiple Queries 1
R.a gt 10
R.a gt 20
R1
R.a 0
Grouped Filters
R1
R.b lt 15
R1
R.b 25
R1
R.b ltgt 50
0 0 0 0 0
0 0 1 0 0
0 1 1 0 0
0 1 1 1 1
1 1 1 1 1
9
Multiple Queries 2
R.a gt 10
R2
R.a gt 20
R2
R.a 0
R2
Grouped Filters
R2
R2
R.b lt 15
R2
Reorder Operators!
R.b 25
R.b ltgt 50
1 1 1 1 1
0 0 0 0 0
0 0 0 1 1
1 0 0 1 1
1 1 0 1 1
10
Outputting Tuples
  • Store a completionMask bitmap for each query
  • One bit per operator
  • Set if the operator in the query
  • To determine if a tuple t can be output to query
    q
  • Eddy ANDs qs completionMask with ts done bits
  • Output only if qs bit not set in ts
    queriesCompleted bits
  • Applied every time a tuple returns from an
    operator

completionMasks
Done 1100
QueriesCompleted0 0
Q1 1100
Q2 0111
Done 0111
11
Grouped Filters 1
  • Use binary trees to efficiently index range
    predicates
  • Two trees (LT GT) per attribute
  • Insert constant
  • When tuple arrives
  • Scan everything to right (for GT) or left (for
    LT) of the tuple-attribute in the tree
  • Those are the queries that the tuple does not
    pass
  • Hash tables to index equality, inequality
    predicates

12
Grouped Filters 2
Greater-than tree over S.a
S.a gt 1 S.a gt 7 S.a gt 11
13
Work sharing through Linegage
Q1 SELECT FROM s WHERE A, B, C Q2 SELECT
FROM s WHERE A, B, D
Conventional Queries
Query 1
Query 2
Lineage (Queries Completed) Enables Any Ordering!
sCDBA
Intersection of CD goes through AB an extra time!
sBC
sCDB
sBD
sAB
sAB
sCD
AB must be applied first!
sc
sD
sC
sB
s
s
s
s
Data Stream S
14
Overhead vs. Work Sharing
  • Overhead in additional bits per tuple
  • Experiments studying performance, size in paper
  • Bit / query / tuple is most significant
  • Trading accounting overhead for work sharing
  • 100 bits / tuple allows a tuple to be processed
    once, not 100 times
  • Reduce overhead by not keeping state about
    operators tuple will never pass through

15
Joins
  • Use symmetric hash join to avoid blocking
  • Use State Modules (SteMs) to share storage
    between joins with a common base relation

16
Joins via SteMs
  • Idea Share join indices over base relations
  • State Modules (SteMs) are
  • Unary indexes (e.g. hash tables, trees)
  • Built on the fly (as data arrives)
  • Scheduled by CACQ as first class operators
  • Based on symmetric hash join

17
Routing Policies 1
  • Basic Lottery Ticket Policy
  • Give operators tickets for consuming tuples, take
    away tickets for producing them
  • To choose the next operator to route, run a
    lottery
  • More selective operators scheduled earlier
  • Modification for CACQ
  • Give more tickets to operators shared by multiple
    queries (e.g. grouped filters)
  • When a shared operator outputs a tuple, charge it
    multiple tickets
  • Intuition cardinality reducing shared operators
    reduce global work more than unshared operators
  • Not optimizing for the throughput of a single
    query!

18
Routing Policies 2
Query
All attributes uniformly distributed over 0,100
19
CACQ vs. NiagaraCQ
  • NiagaraCQ is another proposal for CQ which uses
    static plans
  • CACQ does better since
  • It is adaptive
  • It can exploit more work sharing opportunities

20
Summary
  • Efficient mechanism for processing multiple
    simultaneous monitoring queries over streaming
    data sources
  • Share work by processing all queries within a
    single eddy
  • Continuous adaptivity via eddies routing policy
  • Queries come go, but performance adapts without
    costly multiquery reoptimization
  • Maximize ability to work share by explicitly
    encoding lineage
  • Share selections via grouped filter
  • Share join state via SteMs
  • Experimental results show good performance in
    comparison with other proposed CQ systems

21
Discussion
  • What was the actual intellectual contribution of
    this paper?
  • Performance on real data?
  • What is the overhead of the routing? Other
    routing policies?
  • How often do we collect data? Can we do some
    processing of the data in the nodes that acquire
    it, in order to reduce bandwidth?
  • Hardware support for those service-oriented
    component?

22
Thank you!!
ipandis_at_cs.cmu.edu
Write a Comment
User Comments (0)
About PowerShow.com