Characterizing Memory Requirements for Queries over Continuous Data Streams PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Characterizing Memory Requirements for Queries over Continuous Data Streams


1
Characterizing Memory Requirements for Queries
over Continuous Data Streams
  • Arvind Arasu, Brian Babcock, Shivnath Babu, Jon
    McAlister, Jennifer Widom

Stanford University
Speaker
2
Continuous Data Streams
  • Network traffic data
  • Transaction logs
  • Call records, Web logs, ...
  • Financial data
  • Sensor networks
  • Scientific data
  • Astronomy, Biology, ...

3
A DBMS for Data Streams?
  • Lots of existing work in data streams
  • Mostly special-purpose applications
  • Were building a general-purpose data stream
    management system (DSMS)

http//www-db.stanford.edu/stream/
4
RBDMS
DSMS
5
Query Execution Model
1. Client registers query
Client
and answers returned to client
?
2. Tuples arrive on streams...
...are read and discarded...
S
T
Limited-size scratch space available
Memory
DSMS
6
Our Problem
Given a data stream query, determine how much
memory is required to evaluate it.
7
Queries We Consider
  • SPJ Queries ?L(?P (S1 x S2 x x Sn))
  • Projection is either duplicate-preserving or
    duplicate-eliminating
  • Selection predicates are conjunctions of
  • Si.A Op Sj.B -or- Si.A Op k
  • Op ?gt, gt , , lt, lt
  • All attributes are integers

8
An example with no joins
SELECT cust_id FROM orders WHERE amt gt 5
DISTINCT
  • Requires boundedmemory
  • Remembercust_ids from1000-9999

AND cust_id gt 1000 AND cust_id lt 9999
  • Requires no scratch memory
  • Each tuple is independent
  • Tuples in the answer are streamed away
  • Requires unbounded memory
  • All cust_ids must be remembered

9
An example with an equijoin
SELECT R.prod_id FROM orders O, returns R WHERE
O.order_num R.order_num AND R.prod_id gt
100 AND R.prod_id lt 199
AND O.order_num gt 1000 AND O.order_num lt 1103
10
An example with an inequality
SELECT FROM orders O, inventory I WHERE O.amt gt
I.qty AND O.prod_id gt 100 AND O.prod_id lt 300
O.prod_id
DISTINCT O.prod_id
11
Locally Totally Ordered Queries
  • LTO Queries SPJ queries with additional
    predicates applied
  • For each stream, stipulate a total order for all
    attributes in the stream all constants
  • Only allow tuples whose attribute values follow
    that ordering
  • All SPJ queries can be written as a union of LTO
    queries

12
Example of an LTO query
Stream S (A, B)
Stream T (C, D)
SELECT S.A, T.C FROM S, T WHERE S.B gt 12
SELECT S.A, T.C FROM S, T WHERE S.B gt 12 AND S.A
S.B AND T.D lt T.C AND T.C lt 12
13
MinRef and MaxRef
  • For each stream S in the query
  • MinRef(S) S.A S.A lt T.B is a necessary
    inequality in the predicate

14
Bounded-Memory Conditions
  • 1. All attributes in the projection list must be
    bounded.
  • 2. All attributes participating in equijoins must
    be bounded.
  • 3. In each stream S, MinRef(S) MaxRef(S)
  • 0, for SELECT
  • lt 1, for SELECT DISTINCT

15
An unbounded example
SELECT DISTINCT T.E FROM S, T WHERE T.E 10
AND S.A lt T.C AND S.B lt T.D
16
Conclusion
  • We consider SPJ queries over data streams
  • We identify which queries can and cannot be
    evaluated using bounded memory
  • For queries than can, we provide an execution
    strategy based on synopses.
  • For queries that cannot, we provide examples of
    bad input streams.

Full paper at http//www-db.stanford.edu/??? E-mai
l babcock_at_cs.stanford.edu
Write a Comment
User Comments (0)
About PowerShow.com