Evaluating Window Joins over Punctuated Streams - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluating Window Joins over Punctuated Streams

Description:

Luping Ding and Elke A. Rundensteiner. Database Systems Research Group ... Invalidate: probe time list and stop when encountering a time-valid tuple ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 27
Provided by: lisa45
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Evaluating Window Joins over Punctuated Streams


1
Evaluating Window Joins over Punctuated Streams
  • Luping Ding and Elke A. Rundensteiner
  • Database Systems Research Group
  • Worcester Polytechnic Institute
  • lisading, rundenst_at_cs.wpi.edu

2
Stream Data Processing
  • Online Transaction Management
  • Sensor Network Monitoring
  • Network Usage Analysis
  • Online Auction

Register Continuous Queries
Stream Query Engine
Streaming Data
Streaming Result
3
New Challenges in Stream Context
  • Potentially infinite data streams vs. stateful
    operators. e.g., join, distinct,
  • Problem potentially unbounded state
  • Reason no hint on which data is no longer useful

4
Example -Symmetric Hash Join WA93
  • Memory overflow resolution state relocation
  • Example XJoin UF00,
  • Hash-Merge Join MLA04
  • Problems
  • Join state still grows with no bound
  • Delivery of some join results may be highly
    deferred

Memory Overflow
Memory
SA
SB
probe
insert
A
B
5
Avoiding Unbounded State
  • Solution exploit constraints to detect
    no-longer-useful data
  • Sliding window MWA03
  • Identify a bounded set of input data based on
    time
  • K-constraint BW03
  • Models clustered or ordered data arrival pattern
  • Punctuation TMSF03
  • Dynamically announce termination of certain value

6
Sliding Window KNV03
Wa
Wb


Timeline
Stream A
Stream B
7
Punctuation
  • Meta-knowledge embedded inside data streams
  • An ordered set of patterns corresponding to
    attributes of tuples
  • Wildcard (), constant (9), list (1,2,3), range
    (1, 20), empty (?)
  • Semantics tuples after a punctuation p will NOT
    match p


Bid
180
Marlie
820.00
Nov-13-03 110200
No more tuple will contain Item_id 180.
182
Ultrasale
1000.00
Nov-13-03 110500
180
Jocelyn
850.00
Nov-13-03 111400
180



181
pcfan
50.00
Nov-13-03 113600

8
Punctuation-Aware Join DMR04
A
C
A
B
1
200.00
Joinitem_id
2
63.00
SA
SB


175
80.00
175
80.00
175
100.00
175
100.00


No more tuple will have A 175.
175

181
50.00
180
135.00
175
20.00
158
310.00
175
20.00
Stream B
Stream A




9
Window and Punctuation Occur Simultaneously
SELECT A.item_id, Count () FROM
Auction Range 24 Hours A, Bid B
WHERE A.item_id B.item_id GROUP BY
A.item_id
Auction Stream
Group-byitem_id (count())
Joinitem_id
Bid Stream
Out1 (item_id)
Out2 (item_id, count)
Contains punctuations on item_id
Applies a 24-hour window on Auction stream
10
Optimization Opportunities
  • Maintain smaller state than either pure window
    join or pure punctuation-exploiting join
  • Bid tuples that have been joined dont need to be
    maintained in state
  • Drop tuples without affecting precision of result
  • Bid tuples out of 24-hour window of corresponding
    Auction tuple dont need to be processed
  • Produce some aggregate results earlier
  • Aggregate result for some Auciton tuples can be
    produced in less than 24 hours

11
Our Approach PWJoin
  • Punctuation-exploiting Window Join
  • Features of PWJoin
  • Include optimizations enabled by punctuations and
    by sliding windows individually
  • Accomplish optimizations enabled by interactions
    of two constraint types
  • Employ a state design that effectively
    facilitates constraint-exploiting optimizations

12
PWJoin Basics and Issue
Receive a new tuple ta from stream A
Invalidate tuples from B state
Probe B state
Insert ta into A state
Receive a new punct pa from stream A
Purge tuples from B state
Insert pa into A state
  • Issue how to design PWJoin state to facilitate
    all search-based operations?
  • Invalidate conducts time-based search
  • Probe and Purge needs value-based search

13
PWJoin State with Two-dimensional Index
Time List
I-Node Index (Hash Table)
Punctuation Time List
Window Begin
8
8
none
10
10
punctuated
8
8
10
tuple
NextValueListTNode
T-Node
4
NextTimeListTNode
8
Key
Head
Tail
PunctFlag
Window End
I-Node
14
Facilitating Search-based Operations
  • Search-based Operations
  • Invalidate probe time list and stop when
    encountering a time-valid tuple
  • Probe probe I-Node index and join with tuples in
    value list of matching I-Node
  • Purge probe I-Node index and delete tuples in
    value list of matching I-Node
  • Avoid access to irrelevant tuples

15
Punctuation Propagation
  • An operator may propagate punctuations to benefit
    downstream operators

Auction Stream
Group-byitem_id (count())
Joinitem_id
Bid Stream
Item_id
Bidder_id
Bid_price
be unblocked by punctuations propagated by join
operator
propagate punctuations on item_id
180


16
Optimizations Enabled by Combined Constraints
Early Punctuation Propagation
Tuple Dropping
a1
a1
a6
a6
a1
a1
a2
a3
a2
a3
a3
a3
a3
a3
a7
a7
a4
a4
a3
a3
a2
a2
a1
a1
a8
a8
a3
a3
propagation point 2
a2
a2
a6
a6
a3
a3
a10
a10
a3
propagation point 1
a3
Stream S1
Stream S2
Stream S1
Stream S2
17
Achieving Optimizations by Combined Constraints
  • Early propagation
  • Invalidate punctuations in punctuation time list
    as invalidating tuples
  • Expired punctuations can be propagated
  • Tuple dropping
  • When early propagation happens, set PunctFlag of
    matching I-Node as propagated
  • Drop new tuples that matches an I-Node whose
    PunctFlag is propagated

18
Memory Cost Analysis
  • SbT SbTinsert - SbTpurge SbTarrive -
    SbTpurge
  • ?bTb - ? bTb(? paT/NKb,T)
  • ?b tuple input rate of stream B
  • ?pa punctuation input rate of stream A
  • NKb,T - of distinct join values occurred in
    stream B up to Tth time unit
  • Tb time window on stream B

Saving by Punctuation
Window Join
19
Experimental Setup
  • Experimental System
  • CAPE RDS04 Continuous Query Processing System
  • Stream benchmark generate synthetic data streams
  • 733MHz Intel(R) Celeron CPU, 512MB RAM, Windows
    2000
  • Experiments
  • Compare memory overhead and tuple output rate of
    PWJoin with a pure window join
  • Compare punctuation output rate of PWJoin with
    PJoin

20
PWJoin vs. WJoin Memory and Tuple Output Rate
Stream A, B punct-asc-100-40
21
PWJoin vs. PJoin Punctuation Output Rate
Stream A punct-asc-100-40, Stream B
punct-random-30-40 Window 1 second
22
Related Work
  • Pipelined join solutions
  • Symmetric Hash Join WA93, XJoin UF00,
    Hash-Merge JoinMLA04, Ripple JoinsHH99
  • Constraint-exploiting stream query optimization
  • Window joins KNV03, GO03, GGO04, HFA03, ZRH04
  • PunctuationTMS03, PJoin DMR04
  • k-Constraint-exploiting algorithm BW04

23
Conclusion
  • Proposed PWJoin algorithm
  • Designed storage structure for PWJoin state
  • Derived cost model for PWJoin
  • Conducted experimental study to explore
    effectiveness of PWJoin

24
Thanks
  • Nishant Mehta (developing stream generator)
  • Prof. Leonidas Fegaras (feedback on paper)
  • CAPE Group Members
  • WPI Database Research Group

CAPE Project http//davis.wpi.edu/dsrg/CAPE/
25
References
  • KNV03 J. Kang, J. F. Naughton and S. D. Viglas.
    Evaluating Window Joins over Unbounded Streams.
    ICDE03.
  • UF00 T. Urhan and M. Franklin, XJoin A
    Reactively Scheduled Pipelined Join Operator.
    IEEE Data Engineering Bulletin, 23(2), 2000.
  • HH99 P. Haas and J. Hellerstein, Ripple Joins
    for Online Aggregation. SIGMOD99.
  • GO03 L. Golab and M. T. Ozsu, Processing
    Sliding Window Multi-Joins in Continuous Queries
    over Data Streams. VLDB03.
  • GGO04 L. Golab, S. Garg and M. T. Ozsu, On
    Indexing Sliding Windows over On-line Data
    Streams, EDBT04.
  • RDS04 E. A. Rundensteiner, L. Ding, T.
    Sutherland, Y. Zhu, B. Pielech and N. Mehta,
    CAPE Continuous Query Engine with
    Heterogeneous-Grained Adaptivity. VLDB Demo,
    2004.
  • BW04 S. Babu and J. Widom. Exploiting
    k-Constraints to Reduce Memory Overhead in
    Continuous Queries over Data Streams
  • TMS03 P. A. Tucker, D. Maier, T. Sheard and L.
    Fegaras. Exploiting Punctuation Semantics in
    Continuous Data Streams. TKDE, 15(3), 2003.
  • DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
    and G. T. Heineman, Joining Punctuated Streams.
    EDBT04.
  • MWA03 R. Motwani, J. Widom, A. Arasu et al.
    Query Processing, Resource Management, and
    Approximation in a Data Stream Management System.
    CIDR03.

26
PWJoin vs. WJoin Irrelevant Punctuations
Stream A punct-asc-100-40, Stream B
punct-random-30-40 Window 2 seconds
Write a Comment
User Comments (0)
About PowerShow.com