Continuous Monitoring of Topk Queries over Sliding Windows - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Continuous Monitoring of Topk Queries over Sliding Windows

Description:

Continuous Monitoring of Top-k Queries over Sliding Windows ... Blocking Operators (Juggle, Punctuations) Queries Referencing Past Data ... – PowerPoint PPT presentation

Number of Views:266
Avg rating:3.0/5.0
Slides: 19
Provided by: kpo6
Category:

less

Transcript and Presenter's Notes

Title: Continuous Monitoring of Topk Queries over Sliding Windows


1
Continuous Monitoring of Top-k Queries over
Sliding Windows
  • Kyriakos Mouratidis, Spiridon Bakiras, and
    Dimitris Papadias
  • Department of Computer Science
  • Hong Kong University of Science and Technology
  • Clear Water Bay, Hong Kong
  • kyriakos, sbakiras, dimitris_at_cs.ust.hk

2
Challenges of Data Stream Processing
  • Unbounded Memory Requirements
  • Approximate Query Answering
  • (Sketches, Random Sampling, Histograms )
  • Sliding Windows
  • Batch Processing, Sampling, and Synopses
  • Blocking Operators (Juggle, Punctuations)
  • Queries Referencing Past Data
  • (Synopses, Ad Hoc queries)

3
Preliminaries
  • Sliding windows ( Count / Time based)
  • Dimension- p(x1, x2, . ,xn)
  • Scoring/Preference Functions
  • f(pi. x1,pi.x2)
  • Min, Euclidean, Sum, Monotonic

Window
Data Stream

Result Tuple(s)
Result Tuple(s)
4
Top-k Processing in Conventional Databases
  • Onion 9, Prefer 14
  • Materialized Top-k views 30
  • Top-k records among the results of join over
    multiple relations. Probabilistic optimization,
    Rank-join algorithms. 11,16
  • Distributed Data repositories. 12

5
Influence region Skylines
Dominate
Skyline (top-1 query)
1,1
1
1
Influence region
p1
2-SkyBand
pk
x2
p2
p3
Line defined by the score(pk) x1 2(x2)
p4
1
x1
1
  • Related to the TMA (Top-k Monitoring
    algorithm)

6
SkyBands and Score-Time Space
2-SkyBand
p1
p2
Score
p3
p4
Expiration Time
  • Related top the SMA (Skyband Monitoring
    Algorithm)

7
Index and Book Keeping Structures
Grid for Tuples (Points)
Query table
x2
x1
Influence List of C
Point List of C
FIFO of tuples
8
Computation Module
9
Maintenance Module
Deletions
Insertions
10
Maintenance Module Contd.
Delete
Insert
11
SkyBand Maintenance
  • Motivation To avoid computation from scratch
    when some results expire.

P9 arrive at t3 units
Dominance Number
Re-computation to be done if older tuples expire
and SkyBand contains lesser than k points.
12
UCB Database Group
  • TelegraphCQ, an adaptive continuous query engine
    for data streams.
  • GridDB, a data-centric workflow system for
    scientific grid computation.
  • HiFi, hierarchical stream processing for RFID and
    other receptor-based networks.
  • P2, a declarative dataflow engine for specifying
    networks.
  • PIER, an internet-scale query processor.

13
TelegraphCQ
Faculty -- Mike Franklin and Joe Hellerstein
  • Adaptive dataflow engine, dataflow operators,
    eddies, rivers, federated databases, sensor
    networks.

14
GridDB
Faculty -- Mike Franklin
  • Declarative Interface, Interactive Query
    Processing., Data Lineage, Memoization Support,
    Co-Existence with process models.

15
HiFi Distributed Sensing and Information
Management
Faculty -- Mike Franklin
PC
Stargates
  • High Fan In, Successive aggregation, Temporal
    focus, Streaming data

16
P2 - Declarative dataflow engine
Faculty -- Joseph M. Hellerstein
  • Any widely-distributed system needs to
    track its participating nodes, and be able to
    send messages among those nodes. This facility is
    often called an Overlay Network, since it
    provides an application with customized
    networking functionality (naming, topology,
    routing) that runs as a layer over traditional IP
    networking.
  • P2, is a system which uses a high-level
    declarative language to express overlay networks
    in a highly compact and reusable form.
  • P2 provides the ability to query,
    monitor and control all aspects of the network's
    distributed state.

17
PIER - Distributed query engine
Faculty -- Joseph M. Hellerstein, Scott Shenker,
Ion Stoica
DHT (Index Rare Items)
Flood-based Network (All items)
  • Large scale P2P network, Based on the concept of
    Distributed Hash Tables

18
Thank You
19
Goal
  • Reduce CPU cost at the server side
  • Alternative 2 Reduce the communication cost
    for distributed top-k monitoring.
Write a Comment
User Comments (0)
About PowerShow.com