Title: HiFi Systems: Network-Centric Query Processing for the Physical World
1HiFi Systems Network-Centric Query Processing
for the Physical World
- Michael Franklin
- UC Berkeley
- 2.13.04
2Introduction
- Continuing improvements in sensor devices
- Wireless motes
- RFID
- Cellular-based telemetry
- Cheap devices can monitor the environment at a
high rate. - Connectivity enables remote monitoring at many
different scales. - Widely different concerns at each of these levels
and scales.
3Plan of Attack
- Motivation/Applications/Examples
- Characteristics of HiFi Systems
- Foundational Components
- TelegraphCQ
- TinyDB
- Research Issues
- Conclusions
4The Canonical HiFi System
5RFID - Retail Scenario
- Smart Shelves continuously monitor item
addition and removal. - Info is sent back through the supply chain.
6Extranet Information Flow
Manufacturer C
Retailer A
Aggregation/ Distribution Service
Manufacturer D
Retailer B
7M2M - Telemetry/Remote Monitoring
- Energy Monitoring - Demand Response
- Traffic
- Power Generation
- Remote Equipment
8Time-Shift Trend Prediction
- National companies can exploit
East Coast/ West Coast time differentials to
optimize West Coast operations.
9Virtual Sensors
- Sensors dont have to be physical sensors.
- Network Monitoring algorithms for detecting
viruses, spam, DoS attacks, etc. - Disease outbreak detection
10Properties
HiFi System
- High Fan-In, globally-distributed architecture.
- Large data volumes generated at edges.
- Filtering and cleaning must be done there.
- Successive aggregation as you move inwards.
- Summaries/anomalies continually, details later.
- Strong temporal focus.
- Strong spatial/geographic focus.
- Streaming data and stored data.
- Integration within and across enterprises.
11One View of the Design Space
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
Combined Stream/Disk Processing
On-the-fly processing
Disk-based processing
12Another View of the Design Space
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
13One More View of the Design Space
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
Dup Elim history hrs
Interesting Events history days
Trends/Archive history years
14Building Blocks
15TelegraphCQ Monitoring Data Streams
- Streaming Data
- Network monitors
- Sensor Networks
- News feeds
- Stock tickers
- B2B and Enterprise apps
- Supply-Chain, CRM, RFID
- Trade Reconciliation, Order Processing etc.
- (Quasi) real-time flow of events and data
- Must manage these flows to drive business (and
other) processes. - Can mine flows to create/adjust business rules or
to perform on-line analysis.
16TelegraphCQ (Continuous Queries)
- An adaptive system for large-scale
shared dataflow processing. - Based on an extensible set of operators
- 1) Ingress (data access) operators
- Wrappers, File readers, Sensor Proxies
- 2) Non-Blocking Data processing operators
- Selections (filters), XJoins,
- 3) Adaptive Routing Operators
- Eddies, STeMs, FLuX, etc.
- Operators connected through Fjords
- queue-based framework unifying pushpull.
- Fjords will also allow us to easily mix and match
streaming and stored data sources.
17Extreme Adaptivity
- Traditional query optimization depends on
statistical knowledge of the data and a stable
environment. - The streaming world has neither.
- This is the region that we are exploring in the
Telegraph project.
18Adaptivity Overview Avnur Hellerstein 2000
D
C
A
B
- How to order and reorder operators over time?
- Traditionally, use performance, economic/admin
feedback - wont work for never-ending queries over volatile
streams - Instead, use adaptive record routing.
- Reoptimization change in routing policy
19The TelegraphCQ Architecture
A single CQEddy can encode multiple queries.
20The StreaQuel Query Language
- SELECT projection_list
- FROM from_list
- WHERE selection_and_join_predicates
- ORDEREDBY
- TRANSFORMTO
- WINDOWBY
- Target language for TelegraphCQ
- Windows can be applied to individual streams
- Window movement is expressed using a for loop
construct in the transform clause - Were not completely happy with our syntax at
this point.
21Example Window Query Landmark
22Current Status - TelegraphCQ
- System developed by modifying PostgreSQL.
- Initial Version released Aug 03
- Open Source (PostgreSQL license)
- Shared joins with windows and aggregates
- Archived/unarchived streams
- Next major release planned this summer.
- Initial users include
- Network monitoring project at LBL (Netlogger)
- Intrusion detection project at Eurecom (France)
- Our own project on Sensor Data Processing
- Class projects at Berkeley, CMU, and ???
Visit http//telegraph.cs.berkeley.edu for more
information.
23- Query-based interface to sensor networks
- Developed on TinyOS/Motes
- Benefits
- Ease of programming and retasking
- Extensible aggregation framework
- Power-sensitive optimization and adaptivity
- Sam Madden (Ph.D. Thesis) in collaboration with
Wei Hong (Intel).
http//telegraph.cs.berkeley.edu/tinydb
24Declarative Queries in Sensor Nets
- Many sensor network applications can be described
using query language primitives. - Potential for tremendous reductions in
development and debugging effort.
Report the light intensities of the bright
nests.
Sensors
- SELECT nestNo, light
- FROM sensors
- WHERE light gt 400
- EPOCH DURATION 1s
Epoch nestNo Light Temp Accel Sound
0 1 455 x x x
0 2 389 x x x
1 1 422 x x x
1 2 405 x x x
Epoch nestNo Light Temp Accel Sound
0 1 455 x x x
0 2 389 x x x
25Aggregation Query Example
Count the number occupied nests in each loud
region of the island.
Epoch region CNT() AVG()
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
SELECT region, CNT(occupied)
AVG(sound) FROM sensors GROUP BY region HAVING
AVG(sound) gt 200 EPOCH DURATION 10s
26Query Language (TinySQL)
- SELECT ltaggregatesgt, ltattributesgt
- FROM sensors ltbuffergt
- WHERE ltpredicatesgt
- GROUP BY ltexprsgt
- SAMPLE PERIOD ltconstgt ONCE
- INTO ltbuffergt
- TRIGGER ACTION ltcommandgt
27Sensor Queries _at_ 10000 Ft
(Almost) All Queries are Continuous and Periodic
- Written in SQL
- With Extensions For
- Sample rate
- Offline delivery
- Temporal Aggregation
M. Franklin, UC Berkeley, Feb. 04
28In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4
3
2
1
4
Interval
29In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
Epoch
1 2 3 4 5
4 1
3
2
1
4
Interval
1
30In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 3
Sensor
1 2 3 4 5
4 1
3 2
2
1
4
2
Interval
31In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 2
Sensor
1
3
1 2 3 4 5
4 1
3 2
2 1 3
1
4
Interval
32In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 1
5
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
Interval
33In-Network Processing Aggregation
SELECT COUNT() FROM sensors
Interval 4
Sensor
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
Interval
1
34In Network Aggregation Example Benefits
- 2500 Nodes
- 50x50 Grid
- Depth 10
- Neighbors 20
M. Franklin, UC Berkeley, Feb. 04
35Taxonomy of Aggregates
- TinyDB insight classify aggregates according to
various functional properties - Yields a general set of optimizations that can
automatically be applied
Property Examples Affects
Partial State MEDIAN unbounded, MAX 1 record Effectiveness of TAG
Duplicate Sensitivity MIN dup. insensitive, AVG dup. sensitive Routing Redundancy
Exemplary vs. Summary MAX exemplary COUNT summary Applicability of Sampling, Effect of Loss
Monotonic COUNT monotonic AVG non-monotonic Hypothesis Testing, Snooping
36Current Status - TinyDB
- System built on top of TinyOS (10K lines
embedded C code)Latest release 9/2003 - Several deployments including redwoods at UC
Botanical Garden
36m
33m 111
32m 110
30m 109,108,107
20m 106,105,104
10m 103, 102, 101
Visit http//telegraph.cs.berkeley.edu/tinydb for
more information.
37Putting It All Together?
38Ursa - A HiFi Implementation
- Current effort towards building an integrated
infrastructure that spans the large scale in - Time
- Geography
- Resources
39TelegraphCQ/TinyDB Integration
- Fjords Madden Franklin 02 provide the
dataflow plumbing necessary to use TinyDB as a
data stream. - Main issues revolve around what to run where.
- TCQ is a query processor
- TinyDB is also a query processor
- Optimization criteria include total cost,
response time, answer quality, answer likelihood,
power conservation on motes, - Project on-going, should work by summer.
- Related work Gigascope work at ATT
40TCQ-based Overlay Network
- TCQ is primarily a single node system
- Flux operators Shah et al 03 support
cluster-based processing. - Want to run TCQ at each internal node.
- Primary issue is support for wide-area temporal
and geographic aggregation. - In an adaptive manner, of course
- Currently under design.
- Related work Astrolabe, IRISNet, DBIS,
41Querying the Past, Present, and Future
- Need to handle archived data
- Adaptive compression can reduce processing time.
- Historical queries
- Joins of Live and Historical Data
- Deal with later arriving detail info
- Archiving Storage Manager - A Split-stream SM for
stream and disk-based processing. - Initial version of new SM running.
- Related Work Temporal and Time-travel DBs
42XML, Integration, and Other Realities
- Eventually need to support XML
- Must integrate with existing enterprise apps.
- In many areas, standardization well underway
- Augmenting moving data
- Related Work YFilter Diao Franklin 03,
Mutant Queries Papadimos et al. OGI, 30 years
of data integration research, 10 years of XML
research,
High Fan-in ? High Fan-out
43Conclusions
- Sensors, RFIDs, and other data collection devices
enable real-time enterprises. - These will create high fan-in systems.
- Can exploit recent advances in streaming and
sensor data management. - Lots to do!