Title: Challenges in Sensor Network Query Processing
1Challenges in Sensor Network Query Processing
- Sam Madden
- NEST Retreat
- January 15, 2002
2Outline
- Background
- Server Side Solutions
- Fjords, Sensor Proxies, CACQ
- Sensor Side Solutions
- Catalog Management
- Aggregation
- Future Work
3Background Query Processors
4What is a Query?
- Declarative statement requesting a subset of data
- Possibly transforming or computing statistics
about that data - Data independent
- Query can apply to any data
5What is a Query Processor?
- Converts declarative queries into flow of data
operators, a query plan - Relational Operators
- Project, Select, Join
- Scans read data from base relations, indices,
etc. - Traditional Flows
- Pull based, iterator model
- Higher level operators call getNext() to
extract data from lower level operators
6Query Optimizer
- Given a declarative query, build the best query
plan - Choose which operators to run
- What order to run them in
- Where to run them
- In distributed databases
7Why Databases and Sensors?
- All applications depend on data processing
- Declarative query language over sensors
attractive - Application specific solutions difficult to built
and deploy - Want to combine and aggregate data streaming
from motes. - Sounds like a database
8New Problems In Sensor Databases
- Sensors unreliable
- Come on and offline, variable bandwidth
- Sensors push data
- Sensors stream data
- Sensors have limited memory, power, bandwidth
- Communication very expensive
- Sensors have processors
- Sensors very numerous
9Components of A Sensor Database
- Server Side
- Query Parser
- Catalog
- Query Optimizer
- Query Executor
- Query Processor
- Sensor Side
- Catalog Advertisements
- Query Processor
- Network Management
10Outline
- Background
- Server Side Solutions
- Fjords, Sensor Proxies, CACQ
- Sensor Side Solutions
- Catalog Management
- Aggregation
- Future Work
11Fjords
- Query Plan Abstraction to handle lack of
reliability and streaming, push based data - Combine push and pull in arbitrary combinations
- Use connectors between operators to isolate them
from flow direction - Bracket Model Graefe 93
12Fjords (Continued)
- Operators assume non-blocking queue interface
between each other. - Queues implement push vs. pull
- Pull from A to B Suspend A, schedule B until
it produces data. A cannot go forward until B
produces data. - Push from B to A A polls, scheduler thread
invokes B until it produces data. A can process
other inputs while waiting for B. - Supports parallelism between operators via
queues, state machines, and OS (e.g. NIC buffers,
DMA) in operator transparent way.
13Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
14Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
15Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
16Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
17Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
18Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
19Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
20Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
21Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
22Fjords Applications
- Combine traffic streams with web-based accident
reports
Francis Li, Sam Madden, Megan Thomas. Traffic
Visualization. http//www.cs.berkeley.edu/mct/inf
ovis/project/traffic.html
23Operators for Streaming Data
- Need special operators for dealing with streams
(See P. Seshadri, et al. The design and
implementation of a sequence database
systems..VLDB 96) - In particular, streams cant be joined or sorted
in the traditional sense - Solution Use windows e.g. Zipper Join
24Sensor Proxy
- Energy-sensitive database operator
- Buffer sensor tuples and route to multiple user
queries to hide query load from sensors - Push aggregation operators into sensors to reduce
communications load - Dynamically adjust sample rate based on user
demand - Push results into Fjords so that other operators
dont block waiting on slow or dead sensors
25Some Results
- Pushing predicates into sensors can vastly reduce
costs
Atmel Simulator 100 samples / sec 5 vehicles /
sec 7x power savings
26CACQ
- Expect hundreds to thousands of queries over same
sensor sources - Continuously Adaptive Continuous Queries
- Continuous Queries Long running queries which
combine selections and joins to improve
efficiency (See Chen, NiagaraCQ, SIGMOD 2000)
27CACQ (Cont.)
- Continuous Adaptivity From Eddies
- Route tuples differently, depending on selectvity
and cost estimates of operators
Diagrams Courtesy Joe Hellerstein
28CACQ (cont.)
- Combining CA with CQ is a win
- CQ increases number of simultaneous queries
- Adaptivity well suited to long running queries
- Eddies allow us to avoid ugly query-optimization
phase in traditional CQ - Eddies Streams few copies, unlike
traditional CQ
29CACQ (cont)
Look for a paper in SIGMOD 2002 (fingers crossed!)
30Outline
- Background
- Server Side Solutions
- Fjords, Sensor Proxies, CACQ
- Sensor Side Solutions
- Catalog Management
- Aggregation
- Future Work
31Sensor Side Solutions
- CACQ Fjords provides interface performance on
QP, but sensors still need help - Locate / identify sensors
- Reduce power consumption
- Take advantage of processors?
- Improve responsiveness
32Cataloging Sensors
- To query sensors, need a way to locate, identify
properties, extract values - Goal Drop a bunch of sensors around the DBMS,
allow them to be queried without manual effort - Idea Add a layer to each sensor which
advertises its capabilities
33Catalog (Continued)
- temperature sensor
- field
- name "temp" optional
- type int
- units celsius
- min -20
- max 100
- bits 8
- sample_cost 10.0 J optional -- for use in
costing - sample_time 10.0 ms optional -- for use in
costing - input adc2 optional read from adc channel 1
- sends ondemand
- accessorEvent GET_TEMPERATURE_DATA
- responseEvent TEMPERATURE_DATA_READY
- Compiled in 27 bytes of memory
- Layer to register with Query Processor
- Can be push or pull
34Aggregating Over Sensors
- Sensor Proxy combines user queries, pushes down
aggregates - Goal Save energy, increase efficiency
- Idea Take advantage of the routing hierarchy
35Why bother with aggregation
- Individual sensor readings are of limited use
- Interest in higher level properties, e.g. what
vehicles drove through, what is the spread of
temperatures in the building - We have a processor network on board, lets use
it - We cannot survive without aggregation
- Delivering a message to all nodes much easier
than delivering a message from each node to a
central point - Delivering a large amount of data from every node
harder still, vide connectivity experiment - Forwarding raw information too expensive
- Scarce energy
- Scarce bandwidth
- Multihop performance penalty
36Aggregation challenges
- Inherently unreliable environment, certain
information unavailable or expensive to obtain - how many nodes are present?
- how many nodes are supposed to respond?
- what is the error distribution (in particular,
what about malicious nodes?) - Trying to build an infrastructure to remove all
uncertainty from the application may not be
feasible do we want to build distributed
transactions? - Information trickles in one message at a time
- Never have a complete and up-to-date information
about the neighborhood - What type of information should we expect from
aggregation - Streams
- Robust estimates
37What does it mean to aggregate(The DB
Perspective)
- General purpose solution apply standard
aggregation operators like COUNT, MIN, MAX,
AVERAGE, and SUM to any set of sensors. - Existing solutions are application specific
- In sensors, operators may be arbitrary signal
processing functions - By assuming a standard interface, many
optimizations are possible - Example TopN queries via hypothesis testing
- Provide grouping semantics e.g. select
avg(temp) group by trunc(light/10) - In sensor networks, groups may be random samples
t1
t2
t3
t4
t5
t6
t7
t8
t9
38Outline
- Background
- Server Side Solutions
- Fjords, Sensor Proxies, CACQ
- Sensor Side Solutions
- Catalog Management
- Aggregation
- Future Work
39Future Work
- DBMS Side
- Efficient Catalog Management
- Moving Object Databases
- Query Optimization Techniques
- Sensor Side
- Efficient Grouping
- Joins over Network Topology
- Non Standard Aggregate Functions
- Somewhere In Between
- Histograms and other Correlations
- Sampling and Compression for Streams
- Real Query Language / API
- Demonstration Apps (SIGMOD Demo)
40Questions?
411
2
Scenario Count
42Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
43Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
44Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
45Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
46Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
47Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
48Sensor
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
49Counting Lessons
- Take advantage of redundancy to improve accuracy
(reply to all parents, not just one) - Use broadcast to reduce number of messages
- Result is a stream of values much more robust
to failures, movement, or collision than a single
value.
50Aggregation in network programming
- Network programming problem
- Reliable delivery of a large number of messages
to all nodes in range, while exploiting the
broadcast nature of the medium - Basic setup
- Broadcast a known number of idempotent program
fragments - Each node keeps a bitmap of fragments received
(1packet received) - Two stages of the problem single hop, and
multihop - Solutions
- Single hop, dense cell
- Broadcasting the program trivial, the central
node broadcasts - Feedback from nodes broadcast a request from
the central node Is anyone missing packets in
this packet range? - Convergence no replies to the request
51Aggregation in multihop network programming
- Broadcasting the program use flooding
- Remember the last 8 packets forwarded, use that
cache to decide whether to forward or not - Feedback from nodes
- Distribute requests for feedback using the
flooding - After some delay, respond if any packets are
missing locally - Responses from children AND with the local
bitmap, store the result locally, forward the
request - Suboptimal because there is no local fixups
- Convergence
- No replies to the request
52Aggregation over streams
- Inherent uncertainty of the system
- Can nodes communicate, do they have enough power,
have they moved? - computing a complete single answer can be very
expensive, and may not be possible - Partial estimates have their own value
- Aggregation over streams
- Values reflect the current best estimates
- Self stabilizing in the absence of changes
converges to a desired value within N steps
53Identifying Groups
- Need a way to identify groups
- Idea set of membership criteria pushed down
- Nodes determine their membership set based on
those criteria - Nodes can be in multiple but not unlimited groups
- E.g. Group 1 0 lt t lt 10, Group 2 10 lt t lt
20, - Need a way to evaluate aggregation predicates by
group - May want to allow grouping and aggregation
predicates to be expressed together to take
advantage of broadcast effects
54Local Query Rewrite
- Intermediate nodes may determine that its faster
to evaluate an aggregate by asking children a
different question. - Example 1 MAX(t). Once we have a guess T for
MAX, ask children to report iff t gt T, rather
than asking all children to compute a local
maximum. - Example 2 Network programming. Rather than
asking nodes what packets they have, ask them to
report iff packets missing. - Is this a general technique? Maybe
- Inform child of guess at aggregate, ask it to
refute. - Works for average (within error bound), not count.
55Wins and pitfalls of aggregation
- Aggregation over natural network topology
- Aggregation over an arbitrary subset of the
network may be a loss - Really dense cells
- Aggregation does not help with the starvation
problem - Use the message suppression via query rewrite
technique - Still beneficial in a multihop scenario
56Advanced Aggregation Tricks
- Break the Network Protocol Boundary
- Use analog reading from channel over time to
determine aggregates. Simple example
Reading 11 110100
Reading 21 101010
2 2 4 8 16
Sum
Reading 32