HiFi Systems: Network-Centric Query Processing for the Physical World - PowerPoint PPT Presentation

About This Presentation
Title:

HiFi Systems: Network-Centric Query Processing for the Physical World

Description:

Cheap devices can monitor the environment at a high rate. ... Characteristics of HiFi Systems. Foundational Components. TelegraphCQ. TinyDB. Research Issues ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 44
Provided by: MichaelF188
Category:

less

Transcript and Presenter's Notes

Title: HiFi Systems: Network-Centric Query Processing for the Physical World


1
HiFi Systems Network-Centric Query Processing
for the Physical World
  • Michael Franklin
  • UC Berkeley
  • 2.13.04

2
Introduction
  • Continuing improvements in sensor devices
  • Wireless motes
  • RFID
  • Cellular-based telemetry
  • Cheap devices can monitor the environment at a
    high rate.
  • Connectivity enables remote monitoring at many
    different scales.
  • Widely different concerns at each of these levels
    and scales.

3
Plan of Attack
  • Motivation/Applications/Examples
  • Characteristics of HiFi Systems
  • Foundational Components
  • TelegraphCQ
  • TinyDB
  • Research Issues
  • Conclusions

4
The Canonical HiFi System
5
RFID - Retail Scenario
  • Smart Shelves continuously monitor item
    addition and removal.
  • Info is sent back through the supply chain.

6
Extranet Information Flow
Manufacturer C
Retailer A
Aggregation/ Distribution Service
Manufacturer D
Retailer B
7
M2M - Telemetry/Remote Monitoring
  • Energy Monitoring - Demand Response
  • Traffic
  • Power Generation
  • Remote Equipment

8
Time-Shift Trend Prediction
  • National companies can exploit
    East Coast/ West Coast time differentials to
    optimize West Coast operations.

9
Virtual Sensors
  • Sensors dont have to be physical sensors.
  • Network Monitoring algorithms for detecting
    viruses, spam, DoS attacks, etc.
  • Disease outbreak detection

10
Properties
HiFi System
  • High Fan-In, globally-distributed architecture.
  • Large data volumes generated at edges.
  • Filtering and cleaning must be done there.
  • Successive aggregation as you move inwards.
  • Summaries/anomalies continually, details later.
  • Strong temporal focus.
  • Strong spatial/geographic focus.
  • Streaming data and stored data.
  • Integration within and across enterprises.

11
One View of the Design Space
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
Combined Stream/Disk Processing
On-the-fly processing
Disk-based processing
12
Another View of the Design Space
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
13
One More View of the Design Space
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
Dup Elim history hrs
Interesting Events history days
Trends/Archive history years
14
Building Blocks
15
TelegraphCQ Monitoring Data Streams
  • Streaming Data
  • Network monitors
  • Sensor Networks
  • News feeds
  • Stock tickers
  • B2B and Enterprise apps
  • Supply-Chain, CRM, RFID
  • Trade Reconciliation, Order Processing etc.
  • (Quasi) real-time flow of events and data
  • Must manage these flows to drive business (and
    other) processes.
  • Can mine flows to create/adjust business rules or
    to perform on-line analysis.

16
TelegraphCQ (Continuous Queries)
  • An adaptive system for large-scale
    shared dataflow processing.
  • Based on an extensible set of operators
  • 1) Ingress (data access) operators
  • Wrappers, File readers, Sensor Proxies
  • 2) Non-Blocking Data processing operators
  • Selections (filters), XJoins,
  • 3) Adaptive Routing Operators
  • Eddies, STeMs, FLuX, etc.
  • Operators connected through Fjords
  • queue-based framework unifying pushpull.
  • Fjords will also allow us to easily mix and match
    streaming and stored data sources.

17
Extreme Adaptivity
  • Traditional query optimization depends on
    statistical knowledge of the data and a stable
    environment.
  • The streaming world has neither.
  • This is the region that we are exploring in the
    Telegraph project.

18
Adaptivity Overview Avnur Hellerstein 2000
D
C
A
B
  • How to order and reorder operators over time?
  • Traditionally, use performance, economic/admin
    feedback
  • wont work for never-ending queries over volatile
    streams
  • Instead, use adaptive record routing.
  • Reoptimization change in routing policy

19
The TelegraphCQ Architecture
A single CQEddy can encode multiple queries.
20
The StreaQuel Query Language
  • SELECT projection_list
  • FROM from_list
  • WHERE selection_and_join_predicates
  • ORDEREDBY
  • TRANSFORMTO
  • WINDOWBY
  • Target language for TelegraphCQ
  • Windows can be applied to individual streams
  • Window movement is expressed using a for loop
    construct in the transform clause
  • Were not completely happy with our syntax at
    this point.

21
Example Window Query Landmark
22
Current Status - TelegraphCQ
  • System developed by modifying PostgreSQL.
  • Initial Version released Aug 03
  • Open Source (PostgreSQL license)
  • Shared joins with windows and aggregates
  • Archived/unarchived streams
  • Next major release planned this summer.
  • Initial users include
  • Network monitoring project at LBL (Netlogger)
  • Intrusion detection project at Eurecom (France)
  • Our own project on Sensor Data Processing
  • Class projects at Berkeley, CMU, and ???

Visit http//telegraph.cs.berkeley.edu for more
information.
23
  • Query-based interface to sensor networks
  • Developed on TinyOS/Motes
  • Benefits
  • Ease of programming and retasking
  • Extensible aggregation framework
  • Power-sensitive optimization and adaptivity
  • Sam Madden (Ph.D. Thesis) in collaboration with
    Wei Hong (Intel).

http//telegraph.cs.berkeley.edu/tinydb
24
Declarative Queries in Sensor Nets
  • Many sensor network applications can be described
    using query language primitives.
  • Potential for tremendous reductions in
    development and debugging effort.

Report the light intensities of the bright
nests.
Sensors
  • SELECT nestNo, light
  • FROM sensors
  • WHERE light gt 400
  • EPOCH DURATION 1s

Epoch nestNo Light Temp Accel Sound
0 1 455 x x x
0 2 389 x x x
1 1 422 x x x
1 2 405 x x x
Epoch nestNo Light Temp Accel Sound
0 1 455 x x x
0 2 389 x x x
25
Aggregation Query Example
Count the number occupied nests in each loud
region of the island.
Epoch region CNT() AVG()
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
SELECT region, CNT(occupied)
AVG(sound) FROM sensors GROUP BY region HAVING
AVG(sound) gt 200 EPOCH DURATION 10s
26
Query Language (TinySQL)
  • SELECT ltaggregatesgt, ltattributesgt
  • FROM sensors ltbuffergt
  • WHERE ltpredicatesgt
  • GROUP BY ltexprsgt
  • SAMPLE PERIOD ltconstgt ONCE
  • INTO ltbuffergt
  • TRIGGER ACTION ltcommandgt

27
Sensor Queries _at_ 10000 Ft
(Almost) All Queries are Continuous and Periodic
  • Written in SQL
  • With Extensions For
  • Sample rate
  • Offline delivery
  • Temporal Aggregation

M. Franklin, UC Berkeley, Feb. 04
28
In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4
3
2
1
4
Interval
29
In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4 1
3
2
1
4
Interval
1
30
In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
1 2 3 4 5
4 1
3 2
2
1
4
2
Interval
31
In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1
3
1 2 3 4 5
4 1
3 2
2 1 3
1
4
Interval
32
In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
Interval
33
In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
Interval
1
34
In Network Aggregation Example Benefits
  • 2500 Nodes
  • 50x50 Grid
  • Depth 10
  • Neighbors 20

M. Franklin, UC Berkeley, Feb. 04
35
Taxonomy of Aggregates
  • TinyDB insight classify aggregates according to
    various functional properties
  • Yields a general set of optimizations that can
    automatically be applied

Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Monotonic COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
36
Current Status - TinyDB
  • System built on top of TinyOS (10K lines
    embedded C code)Latest release 9/2003
  • Several deployments including redwoods at UC
    Botanical Garden

36m
33m 111
32m 110
30m 109,108,107
20m 106,105,104
10m 103, 102, 101
Visit http//telegraph.cs.berkeley.edu/tinydb for
more information.
37
Putting It All Together?
38
Ursa - A HiFi Implementation
  • Current effort towards building an integrated
    infrastructure that spans the large scale in
  • Time
  • Geography
  • Resources

39
TelegraphCQ/TinyDB Integration
  • Fjords Madden Franklin 02 provide the
    dataflow plumbing necessary to use TinyDB as a
    data stream.
  • Main issues revolve around what to run where.
  • TCQ is a query processor
  • TinyDB is also a query processor
  • Optimization criteria include total cost,
    response time, answer quality, answer likelihood,
    power conservation on motes,
  • Project on-going, should work by summer.
  • Related work Gigascope work at ATT

40
TCQ-based Overlay Network
  • TCQ is primarily a single node system
  • Flux operators Shah et al 03 support
    cluster-based processing.
  • Want to run TCQ at each internal node.
  • Primary issue is support for wide-area temporal
    and geographic aggregation.
  • In an adaptive manner, of course
  • Currently under design.
  • Related work Astrolabe, IRISNet, DBIS,

41
Querying the Past, Present, and Future
  • Need to handle archived data
  • Adaptive compression can reduce processing time.
  • Historical queries
  • Joins of Live and Historical Data
  • Deal with later arriving detail info
  • Archiving Storage Manager - A Split-stream SM for
    stream and disk-based processing.
  • Initial version of new SM running.
  • Related Work Temporal and Time-travel DBs

42
XML, Integration, and Other Realities
  • Eventually need to support XML
  • Must integrate with existing enterprise apps.
  • In many areas, standardization well underway
  • Augmenting moving data
  • Related Work YFilter Diao Franklin 03,
    Mutant Queries Papadimos et al. OGI, 30 years
    of data integration research, 10 years of XML
    research,

High Fan-in ? High Fan-out
43
Conclusions
  • Sensors, RFIDs, and other data collection devices
    enable real-time enterprises.
  • These will create high fan-in systems.
  • Can exploit recent advances in streaming and
    sensor data management.
  • Lots to do!
Write a Comment
User Comments (0)
About PowerShow.com