Title: Engine Issues for Data Stream Processing
1Engine Issues for Data Stream Processing
- Mike Franklin
- UC Berkeley
- 1st Duodecennial SWiM Meeting
- January 9, 2003
2Panel Goals
- Identify those key areas where existing database
engine technology falls short for supporting data
streams. - Succinct justification for the area
- Oracle talk at CIDR showed how to do a lot of
interesting things using tables/standard ixs/SQL - Identification of interesting research areas/open
problems - Road map for progress
- To point out possible solutions, non-solutions or
just potential cool things.
3Panel Structure
- Approach Panelists requested to identify their
1 concern in engine design. - Panelists (in order of desc distance travelled)
- Alex Buchmann
- Ugur Cetintemel
- Ted Johnson
- Jennifer Widom
4My 1 Issue(s) Sharing Adaptivity
- Sharing
- Opportunity Standing queries
- can see and analyze most of the queries as a
group - long-lived queries mean benefits accrue, costs
are amortized - Benefit Scalability
- obvious avoid duplicate work
- need to keep up with the dataflow dont want to
stall pipeline (similar to staged db ideas) - reduce cost of entry for new queries
- Adaptivity
- no stats, dynamic environment,
- in particular, the query mix and workload
intensity continually fluctuate.
5Common Sub-expressions
- Traditional MQO approaches suffer from same
problems as traditional QP approaches in
streaming environments. - namely, they are static
- Insertion and removal of queries degrades global
plan quality over time. - Two approaches
- YFilter shared XML filtering
- TelegraphCQ extreme adaptive QP
6YFilterShared Processing (Yanlei Diao)
- XFilter showed how to use an event-based (SAX)
parser to drive state transitions for XML
filtering. - YFilter uses an NFA-based approach to share work
among queries.
7Combining NFA Fragments
8YFilter NFA Structure Matching
Q1/a/b Q2/a/c Q3/a/b/c Q4/a//b/c Q5/a//c Q6
/a//c Q7/a///c Q8/a/b/c
Key to scalability is sharing of machine states
and processing.
9The TelegraphCQ Approach
- Aggressive adaptivity
- Say no to static dataflows
- Continuous adaptivity
Aggressive sharing Beyond common
sub-expressions Easy addition of new queries
Sharing and Adaptivity Two sides of the same coin
! Use a single framework for both
10Fun with Eddies and STeMs
Q2 select from B, D where B.b D.d
and B.b gt 25
Q1 select from A,B,C,D where A.a
B.b and B.b C.c and C.c D.d
Grouped Selection Filter
B
D
SteMs
A
C
Output
Eddy
A B C D
11Dynamic Query Addition