Evaluating Window Joins over Punctuated Streams - PowerPoint PPT Presentation

About This Presentation

Title:

Evaluating Window Joins over Punctuated Streams

Description:

Evaluating Window Joins over Punctuated Streams Many s taken from talk by Luping Ding and Elke A. Rundensteiner, CIKM04 Database Systems Research Group – PowerPoint PPT presentation

Number of Views:164

Avg rating:3.0/5.0

Slides: 30

Provided by: lisad189

Learn more at: http://web.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Window Joins over Punctuated Streams

1
Evaluating Window Joins over Punctuated Streams

Many slides taken from talk by
Luping Ding and Elke A. Rundensteiner, CIKM04
Database Systems Research Group
Worcester Polytechnic Institute

2
Stream Data Processing

Online Transaction Management

Sensor Network Monitoring

Network Usage Analysis

Online Auction

Potentially infinite data streams vs. stateful
operators. e.g., join, distinct,
Problem potentially unbounded state
Reason no hint on which data is no longer useful

4
Example -Symmetric Hash Join WA93

Memory overflow resolution state relocation
Example XJoin UF00,
Hash-Merge Join MLA04
Problems
Join state still grows with no bound
Delivery of some join results may be highly
deferred

Memory Overflow
Memory
SA
SB
probe
insert
A
B
5
Avoiding Unbounded State

Solution exploit constraints to detect
no-longer-useful data
Sliding window MWA03
Identify a bounded set of input data based on
time
K-constraint BW03
Models clustered or ordered data arrival pattern
Punctuation TMSF03
Dynamically announce termination of certain value

6
Sliding Window KNV03
Wa
Wb

Timeline
Stream A
Stream B
7
Punctuation

Meta-knowledge embedded inside data streams
An ordered set of patterns corresponding to
attributes of tuples
Wildcard (), constant (9), list (1,2,3), range
(1, 20), empty (?)
Semantics tuples after a punctuation p will NOT
match p

Bid
180
Marlie
820.00
Nov-13-03 110200
No more tuple will contain Item_id 180.
182
Ultrasale
1000.00
Nov-13-03 110500
180
Jocelyn
850.00
Nov-13-03 111400
180

181
pcfan
50.00
Nov-13-03 113600

8
Punctuation-Aware Join DMR04
A
C
A
B
1
200.00
Joinitem_id
2
63.00
SA
SB

175
80.00
175
80.00
175
100.00
175
100.00

No more tuple will have A 175.
175

181
50.00
180
135.00
175
20.00
158
310.00
175
20.00
Stream B
Stream A

9
Features of Punctuation

Purge rule. For any tuple ta from stream A, if
there exists a punctuation Pb that has already
been received from stream B such that match (ta,
,,Pb), ta will not be joining with any future
arriving tuples from stream B. ta doesnt need to
be maintained in the A state after being
processed.
Propagation rule. The join operator can also
propagate punctuations to the output stream in
order to help downstream operators.

Based on punctuation semantics, we derive the
following theorem as the foundation of our
punctuation propagation algorithm.
Theorem 3.1. Let pa and pb be punctuations
retrieved from streams A and B at time TSa and
TSb respectively specifying the same punctuated
value val of join attribute att. Then no output
tuples with val being the value of attribute att
will be generated after time max(TSa, TSb).

11
Sliding Window Join

Suppose Ta and Tb are time windows for streams A
and B respectively. We define the invalidation
rule from the join state based on the sliding
window
Let tuple ta be the latest tuple with timestamp
TSa from stream A that has been processed.The
tuple in the B state with timestamp TSb such that
TSb Tb lt TSa is called a time-expired tuple and
can be invalidated. The same invalidation rule
applies to tuples in the A state.

12
Basic Window join
TSa-Tb
TSb-Ta
Tb

Ta

TSa
TSb
Stream A
Stream B
timeline
13
Optimization Opportunities

Maintain smaller state than either pure window
join or pure punctuation-exploiting join
Bid tuples that have been joined dont need to be
maintained in state (Punctuation)
Drop tuples without affecting precision of result
Bid tuples out of 24-hour window of corresponding
Auction tuple dont need to be processed
Aggregate result for some Auction tuples can be
produced in less than 24 hours

14
Features of PWJoin algorithm

Punctuation-exploiting Window Join is composed of
three operations
Probing state to find matching tuples for
producing join results.
Purging no-longer-joining tuples by punctuations.
Invalidating expired tuples by windows. Among
these operations.

15
Window and Punctuation Occur Simultaneously
SELECT A.item_id, Count () FROM
Auction Range 24 Hours A, Bid B
WHERE A.item_id B.item_id GROUP BY
A.item_id
Auction Stream
Group-byitem_id (count())
Joinitem_id
Bid Stream
Out1 (item_id)
Out2 (item_id, count)
Contains punctuations on item_id
Applies a 24-hour window on Auction stream
16
PWJoin Basics and Issue
Receive a new tuple ta from stream A
Invalidate tuples from B state
Probe B state
Insert ta into A state
Receive a new punct pa from stream A
Purge tuples from B state
Insert pa into A state

Issue how to design PWJoin state to facilitate
all search-based operations?
Invalidate conducts time-based search
Probe and Purge needs value-based search

17
PWJoin State with Two-dimensional Index
Time List
I-Node Index (Hash Table)
Punctuation Time List
Punctuation Timestamp
p1 T1
p2 T2

Window Begin
8
8
none
10
10
punctuated
8
8
10
tuple
NextValueListTNode
T-Node
4
NextTimeListTNode
8
Key
Head
Tail
PunctFlag
Window End
I-Node
18
PWJoin Algorithm

Invalidate Once a new tuple t is retrieved from
stream A, its timestamp is used to invalidate
expired tuples from the head of the time list of
stream B.
Probe probe I-Node index and join with tuples in
value list of matching I-Node.
After invalidation is done, the join value of t
is used to probe the I-Node index of the B state.
If the matching I-Node iNode is found, the
corresponding value list is located by following
the Head pointer of iNode. Tuple t then joins
with all tuples in this value list by following
the NextValueListTNode pointer of each T-Node.
Finally, the PunctFlag of iNode is checked. If it
is punctuated, t is discarded. If it is none,
t is inserted into the A state.

19
PWJoin Algorithm

Purge probe I-Node index and delete tuples in
value list of matching I-Node.
When a new punctuation p is retrieved from stream
A, p is used to probe the I-Node index of the B
state. If the matching I-Node iNode is found, all
tuples in the corresponding value list are
deleted. iNode is removed from the I-Node index
as well. If the PunctFlag of iNode is
punctuated, p is discarded. If iNode is not
found or iNodes PunctFlag is none, p is used
to probe the I-Node index of the A state and set
the PunctFlag of the matching I-Node iNodea as
punctuated.
If iNodea does not exist, a new I-Node is created
with its PunctFlag marked as true and inserted
into the I-Node index of the A state.

20
Punctuation Propagation CIKM04

An operator may propagate punctuations to benefit
downstream operators

Auction Stream
Group-byitem_id (count())
Joinitem_id
Bid Stream
Item_id
Bidder_id
Bid_price
be unblocked by punctuations propagated by join
operator
propagate punctuations on item_id
180

21
Optimizations Enabled by Combined Constraints
Early Punctuation Propagation
Tuple Dropping
a1
a1
a6
a6
a1
a1
a2
a3
a2
a3
a3
a3
a3
a3
a7
a7
a4
a4
a3
a3
a2
a2
a1
a1
a8
a8
a3
a3
propagation point 2
a2
a2
a6
a6
a3
a3
a10
a10
a3
propagation point 1
a3
Stream S1
Stream S2
Stream S1
Stream S2
22
Achieving Optimizations by Combined Constraints

Early propagation
Invalidate punctuations in punctuation time list
as invalidating tuples
Expired punctuations can be propagated
Tuple dropping
When early propagation happens, set PunctFlag of
matching I-Node as propagated
Drop new tuples that matches an I-Node whose
PunctFlag is propagated

23
Memory Cost Analysis

SbT SbTinsert - SbTpurge SbTarrive -
SbTpurge
?bTb - ? bTb(? paT/NKb,T)
?b tuple input rate of stream B
?pa punctuation input rate of stream A
NKb,T - of distinct join values occurred in
stream B up to Tth time unit
Tb time window on stream B

Saving by Punctuation
Window Join
24
PWJoin vs. WJoin Memory and Tuple Output Rate
Stream A, B punct-asc-100-40
25
PWJoin vs. PJoin Punctuation Output Rate
Stream A punct-asc-100-40, Stream B
punct-random-30-40 Window 1 second
26
Conclusion

PWJoin algorithm
Designed storage structure for PWJoin state
Memory cost analysis of PWJoin

27
Thanks

WPI Database Research Group

many slides are from davis.wpi.edu/dsrg/CAPE/sl
ides
28
References

CIKM04, L. Ding and E.A. Rundensteiner.
Evaluating Window Joins over Punctuated Streams.
CIKM04.
KNV03 J. Kang, J. F. Naughton and S. D. Viglas.
Evaluating Window Joins over Unbounded Streams.
ICDE03.
UF00 T. Urhan and M. Franklin, XJoin A
Reactively Scheduled Pipelined Join Operator.
IEEE Data Engineering Bulletin, 23(2), 2000.
HH99 P. Haas and J. Hellerstein, Ripple Joins
for Online Aggregation. SIGMOD99.
GO03 L. Golab and M. T. Ozsu, Processing
Sliding Window Multi-Joins in Continuous Queries
over Data Streams. VLDB03.
GGO04 L. Golab, S. Garg and M. T. Ozsu, On
Indexing Sliding Windows over On-line Data
Streams, EDBT04.
RDS04 E. A. Rundensteiner, L. Ding, T.
Sutherland, Y. Zhu, B. Pielech and N. Mehta,
CAPE Continuous Query Engine with
Heterogeneous-Grained Adaptivity. VLDB Demo,
2004.
BW04 S. Babu and J. Widom. Exploiting
k-Constraints to Reduce Memory Overhead in
Continuous Queries over Data Streams
TMS03 P. A. Tucker, D. Maier, T. Sheard and L.
Fegaras. Exploiting Punctuation Semantics in
Continuous Data Streams. TKDE, 15(3), 2003.
DMR04 L. Ding, N. Mehta, E. A. Rundensteiner
and G. T. Heineman, Joining Punctuated Streams.
EDBT04.
MWA03 R. Motwani, J. Widom, A. Arasu et al.
Query Processing, Resource Management, and
Approximation in a Data Stream Management System.
CIDR03.