Title: Continuous Monitoring of Topk Queries over Sliding Windows
1Continuous Monitoring of Top-k Queries over
Sliding Windows
- Kyriakos Mouratidis, Spiridon Bakiras, and
Dimitris Papadias - Department of Computer Science
- Hong Kong University of Science and Technology
- Clear Water Bay, Hong Kong
- kyriakos, sbakiras, dimitris_at_cs.ust.hk
2Challenges of Data Stream Processing
- Unbounded Memory Requirements
- Approximate Query Answering
- (Sketches, Random Sampling, Histograms )
- Sliding Windows
- Batch Processing, Sampling, and Synopses
- Blocking Operators (Juggle, Punctuations)
- Queries Referencing Past Data
- (Synopses, Ad Hoc queries)
3Preliminaries
- Sliding windows ( Count / Time based)
- Dimension- p(x1, x2, . ,xn)
- Scoring/Preference Functions
- f(pi. x1,pi.x2)
- Min, Euclidean, Sum, Monotonic
Window
Data Stream
Result Tuple(s)
Result Tuple(s)
4Top-k Processing in Conventional Databases
- Onion 9, Prefer 14
- Materialized Top-k views 30
- Top-k records among the results of join over
multiple relations. Probabilistic optimization,
Rank-join algorithms. 11,16 - Distributed Data repositories. 12
5Influence region Skylines
Dominate
Skyline (top-1 query)
1,1
1
1
Influence region
p1
2-SkyBand
pk
x2
p2
p3
Line defined by the score(pk) x1 2(x2)
p4
1
x1
1
- Related to the TMA (Top-k Monitoring
algorithm)
6SkyBands and Score-Time Space
2-SkyBand
p1
p2
Score
p3
p4
Expiration Time
- Related top the SMA (Skyband Monitoring
Algorithm)
7Index and Book Keeping Structures
Grid for Tuples (Points)
Query table
x2
x1
Influence List of C
Point List of C
FIFO of tuples
8Computation Module
9Maintenance Module
Deletions
Insertions
10Maintenance Module Contd.
Delete
Insert
11SkyBand Maintenance
- Motivation To avoid computation from scratch
when some results expire.
P9 arrive at t3 units
Dominance Number
Re-computation to be done if older tuples expire
and SkyBand contains lesser than k points.
12UCB Database Group
- TelegraphCQ, an adaptive continuous query engine
for data streams. - GridDB, a data-centric workflow system for
scientific grid computation. - HiFi, hierarchical stream processing for RFID and
other receptor-based networks. - P2, a declarative dataflow engine for specifying
networks. - PIER, an internet-scale query processor.
13TelegraphCQ
Faculty -- Mike Franklin and Joe Hellerstein
- Adaptive dataflow engine, dataflow operators,
eddies, rivers, federated databases, sensor
networks.
14GridDB
Faculty -- Mike Franklin
- Declarative Interface, Interactive Query
Processing., Data Lineage, Memoization Support,
Co-Existence with process models.
15HiFi Distributed Sensing and Information
Management
Faculty -- Mike Franklin
PC
Stargates
- High Fan In, Successive aggregation, Temporal
focus, Streaming data
16P2 - Declarative dataflow engine
Faculty -- Joseph M. Hellerstein
- Any widely-distributed system needs to
track its participating nodes, and be able to
send messages among those nodes. This facility is
often called an Overlay Network, since it
provides an application with customized
networking functionality (naming, topology,
routing) that runs as a layer over traditional IP
networking. - P2, is a system which uses a high-level
declarative language to express overlay networks
in a highly compact and reusable form. - P2 provides the ability to query,
monitor and control all aspects of the network's
distributed state.
17PIER - Distributed query engine
Faculty -- Joseph M. Hellerstein, Scott Shenker,
Ion Stoica
DHT (Index Rare Items)
Flood-based Network (All items)
- Large scale P2P network, Based on the concept of
Distributed Hash Tables
18Thank You
19Goal
- Reduce CPU cost at the server side
- Alternative 2 Reduce the communication cost
for distributed top-k monitoring.