Queries over Streaming Sensor Data - PowerPoint PPT Presentation

About This Presentation
Title:

Queries over Streaming Sensor Data

Description:

... tuples and route to multiple user queries to hide query load from ... sample rate based on user demand ... Proxy combines user queries, pushes down ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 57
Provided by: samuelro5
Learn more at: https://db.csail.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Queries over Streaming Sensor Data


1
Queries over Streaming Sensor Data
  • Sam Madden
  • DB Lunch
  • October 12, 2001

2
Outline
  • Background
  • Server Side Solutions
  • Fjords, Sensor Proxies, CACQ
  • Sensor Side Solutions
  • Catalog Management
  • Aggregation
  • Future Work

3
Background Sensor Networks
4
Sensor Networks
  • Small, low cost battery powered microprocessors
    with 1 4 sensors
  • Light, temperature, vibration, acceleration, AC
    power, humidity.
  • 10 kBit 1Mbit wireless networks, 100ft range.
  • Ad-hoc networking no predefined routes.
  • Cal, MIT, UCLA OS and networking communities
    committed

5
SmartDust
  • Sensor nets motivated by SmartDust Vision
    millimeter scale microprocessors, sensor, and
    wireless communication for pennies.
  • Deployed in thousands, no concern for reliability
    of a single sensor.
  • Requires position detection, fault tolerance,
    aggregation, etc.

6
Rene / Mica Motes
  • SmartDust stand-in
  • 2cm x 3cm, OTS.

Processor Atmel 8535 4Mhz, 5 mA
Radio RFM TR1000 911 Mhz, 10kBits 25 mJ/msg, 20-30 msg / sec
Memory 512B RAM, 8k Flash, 32k EEPROM Flash R/O EEPROM slow
Power 575 mAh battery Peak load 19.5 mA, Idle 3.1 mA, sleeping 10uA.
7
TinyOS
  • Lightweight OS for sensors
  • Event-based
  • Active-message, multi-hop networking
  • Auto-idling
  • Network reprogramming, time synchronization, etc.
  • 18 J. Hill, R. Szewczyk, A. Woo, S. Hollar, and
    D. C. K. Pister. System architecture directions
    for networked sensors. In Proceedingsof the 9th
    International Conference on Architectural Support
    for Programming Languages and Operating Systems,
    November 2000.

8
Applications of Sensor Nets
  • Space Monitoring
  • Power, light, temp in buildings
  • Temperature, humidity
  • Traffic
  • Military
  • Structural
  • Personal Networks

9
Database Opportunities
  • All applications depend on data processing
  • Declarative query language over sensors
    attractive
  • Want to combine and aggregate data streaming
    from motes.
  • Sounds like a database

10
Database Challenges
  • Sensors unreliable
  • Come on and offline, variable bandwidth
  • Sensors push data
  • Sensors stream data
  • Sensors have limited memory, power, bandwidth
  • Sensors have processors

11
Outline
  • Background
  • Server Side Solutions
  • Fjords, Sensor Proxies, CACQ
  • Sensor Side Solutions
  • Catalog Management
  • Aggregation
  • Future Work

12
Fjords
  • Query Plan Abstraction to handle lack of
    reliability and streaming, push based data
  • Combine push and pull in arbitrary combinations
  • Use connectors between operators to isolate them
    from flow direction
  • Bracket Model Graefe 93

13
Fjords (Continued)
  • Operators assume non-blocking queue interface
    between each other.
  • Queues implement push vs. pull
  • Pull from A to B Suspend A, schedule B until
    it produces data. A cannot go forward until B
    produces data.
  • Push from B to A A polls, scheduler thread
    invokes B until it produces data. A can process
    other inputs while waiting for B.
  • Supports parallelism between operators via
    queues, state machines, and OS (e.g. NIC buffers,
    DMA) in operator transparent way.

14
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
15
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
16
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
17
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
18
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
19
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
20
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
21
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
22
Fjords Example
Push
Push
?
Pull
Samuel Madden, Michael J. Franklin. Fjording The
Stream An Architecture For Queries Over
Streaming Sensor Data. International Conference
on Data Engineering, 2002. To Appear, Feburary
2002.
23
Fjords Applications
  • Combine traffic streams with web-based accident
    reports

Francis Li, Sam Madden, Megan Thomas. Traffic
Visualization. http//www.cs.berkeley.edu/mct/inf
ovis/project/traffic.html
24
Operators for Streaming Data
  • Need special operators for dealing with streams
    (See P. Seshadri, et al. The design and
    implementation of a sequence database
    systems..VLDB 96)
  • In particular, streams cant be joined or sorted
    in the traditional sense
  • Solution Use windows e.g. Zipper Join

25
Sensor Proxy
  • Energy-sensitive database operator
  • Buffer sensor tuples and route to multiple user
    queries to hide query load from sensors
  • Push aggregation operators into sensors to reduce
    communications load
  • Dynamically adjust sample rate based on user
    demand
  • Push results into Fjords so that other operators
    dont block waiting on slow or dead sensors

26
Some Results
  • Pushing predicates into sensors can vastly reduce
    costs

Atmel Simulator 100 samples / sec 5 vehicles /
sec 7x power savings
27
CACQ
  • Expect hundreds to thousands of queries over same
    sensor sources
  • Continuously Adaptive Continuous Queries
  • Continuous Queries Long running queries which
    combine selections and joins to improve
    efficiency (See Chen, NiagaraCQ, SIGMOD 2000)

28
CACQ (Cont.)
  • Continuous Adaptivity From Eddies
  • Route tuples differently, depending on selectvity
    and cost estimates of operators

29
CACQ (cont.)
  • Combining CA with CQ is a win
  • CQ increases number of simultaneous queries
  • Adaptivity well suited to long running queries
  • Eddies allow us to avoid ugly query-optimization
    phase in traditional CQ
  • Eddies Streams few copies, unlike
    traditional CQ

30
CACQ (cont)
Look for a paper in SIGMOD 2002 (fingers crossed!)
31
Outline
  • Background
  • Server Side Solutions
  • Fjords, Sensor Proxies, CACQ
  • Sensor Side Solutions
  • Catalog Management
  • Aggregation
  • Future Work

32
Sensor Side Solutions
  • CACQ Fjords provides interface performance on
    QP, but sensors still need help
  • Locate / identify sensors
  • Reduce power consumption
  • Take advantage of processors?
  • Improve responsiveness

33
Cataloging Sensors
  • To query sensors, need a way to locate, identify
    properties, extract values
  • Goal Drop a bunch of sensors around the DBMS,
    allow them to be queried without manual effort
  • Idea Add a layer to each sensor which
    advertises its capabilities

34
Catalog (Continued)
  • temperature sensor
  • field
  • name "temp" optional
  • type int
  • units celsius
  • min -20
  • max 100
  • bits 8
  • sample_cost 10.0 J optional -- for use in
    costing
  • sample_time 10.0 ms optional -- for use in
    costing
  • input adc2 optional read from adc channel 1
  • sends ondemand
  • accessorEvent GET_TEMPERATURE_DATA
  • responseEvent TEMPERATURE_DATA_READY
  • Compiled in 27 bytes of memory
  • Layer to register with telegraph
  • Can be push or pull

35
Aggregating Over Sensors
  • Sensor Proxy combines user queries, pushes down
    aggregates
  • Goal Save energy, increase efficiency
  • Idea Take advantage of the routing hierarchy
    (example soon!)

36
Why bother with aggregation
  • Individual sensor readings are of limited use
  • Interest in higher level properties, e.g. what
    vehicles drove through, what is the spread of
    temperatures in the building
  • We have a processor network on board, lets use
    it
  • We cannot survive without aggregation
  • Delivering a message to all nodes much easier
    than delivering a message from each node to a
    central point
  • Delivering a large amount of data from every node
    harder still, vide connectivity experiment
  • Forwarding raw information too expensive
  • Scarce energy
  • Scarce bandwidth
  • Multihop performance penalty

37
Aggregation challenges
  • Inherently unreliable environment, certain
    information unavailable or expensive to obtain
  • how many nodes are present?
  • how many nodes are supposed to respond?
  • what is the error distribution (in particular,
    what about malicious nodes?)
  • Trying to build an infrastructure to remove all
    uncertainty from the application may not be
    feasible do we want to build distributed
    transactions?
  • Information trickles in one message at a time
  • Never have a complete and up-to-date information
    about the neighborhood
  • What type of information should we expect from
    aggregation
  • Streams
  • Robust estimates

38
1
2
Scenario Count
39
Sensor
1 2 3 4 5
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
40
Sensor
1 2 3 4 5
1 - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - -
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
41
Sensor
1 2 3 4 5
1 - - - -
1 1 1 - -
1 2 1 1 - -
- - - - -
- - - - -
- - - - -
- - - - -
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
42
Sensor
1 2 3 4 5
1 - - - -
1 1 1 - -
1 2 1 1 1 -
1 2 1 ½ 1 ½ 1 -
- - - - -
- - - - -
- - - - -
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
43
Sensor
1 2 3 4 5
1 - - - -
1 1 1 - -
1 2 1 1 1 -
1 2 1 ½ 1 ½ 1 1
13 1 ½ 1 ½ 11 1
- - - - -
- - - - -
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
44
Sensor
1 2 3 4 5
1 - - - -
1 1 1 - -
1 2 1 1 1 -
1 2 1 ½ 1 ½ 1 1
13 1 ½ 1 ½ 11 1
13 12/2 12/2 11 1
- - - - -
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
45
Sensor
1 2 3 4 5
1 - - - -
1 1 1 - -
1 2 1 1 1 -
1 2 1 ½ 1 ½ 1 1
13 1 ½ 1 ½ 11 1
13 12/2 12/2 11 1
14 12/2 12/2 11 1
Time
Goal Count the number of nodes in the
network. Number of children is unknown.
Scenario Count
46
Counting Lessons
  • Take advantage of redundancy to improve accuracy
    (reply to all parents, not just one)
  • Use broadcast to reduce number of messages
  • Result is a stream of values much more robust
    to failures, movement, or collision than a single
    value.

47
Aggregation in network programming
  • Network programming problem
  • Reliable delivery of a large number of messages
    to all nodes in range, while exploiting the
    broadcast nature of the medium
  • Basic setup
  • Broadcast a known number of idempotent program
    fragments
  • Each node keeps a bitmap of fragments received
    (1packet received)
  • Two stages of the problem single hop, and
    multihop
  • Solutions
  • Single hop, dense cell
  • Broadcasting the program trivial, the central
    node broadcasts
  • Feedback from nodes broadcast a request from
    the central node Is anyone missing packets in
    this packet range?
  • Convergence no replies to the request

48
Aggregation in multihop network programming
  • Broadcasting the program use flooding
  • Remember the last 8 packets forwarded, use that
    cache to decide whether to forward or not
  • Feedback from nodes
  • Distribute requests for feedback using the
    flooding
  • After some delay, respond if any packets are
    missing locally
  • Responses from children AND with the local
    bitmap, store the result locally, forward the
    request
  • Suboptimal because there is no local fixups
  • Convergence
  • No replies to the request

49
Aggregation over streams
  • Inherent uncertainty of the system
  • Can nodes communicate, do they have enough power,
    have they moved?
  • computing a complete single answer can be very
    expensive, and may not be possible
  • Partial estimates have their own value
  • Aggregation over streams
  • Values reflect the current best estimates
  • Self stabilizing in the absence of changes
    converges to a desired value within N steps

50
What does it mean to aggregate(The DB
Perspective)
  • General purpose solution apply standard
    aggregation operators like COUNT, MIN, MAX,
    AVERAGE, and SUM to any set of sensors.
  • Previous example are application specific
  • In sensors, operators may be arbitrary signal
    processing functions
  • Provide grouping semantics e.g. select
    avg(temp) group by trunc(light/10)
  • In sensor networks, groups may be random samples

t1
t2
t3
t4
t5
t6
t7
t8
t9
51
Identifying Groups
  • Need a way to identify groups
  • Idea set of membership criteria pushed down
  • Nodes determine their membership set based on
    those criteria
  • Nodes can be in multiple but not unlimited groups
  • E.g. Group 1 0 lt t lt 10, Group 2 10 lt t lt
    20,
  • Need a way to evaluate aggregation predicates by
    group
  • May want to allow grouping and aggregation
    predicates to be expressed together to take
    advantage of broadcast effects

52
Local Query Rewrite
  • Intermediate nodes may determine that its faster
    to evaluate an aggregate by asking children a
    different question.
  • Example 1 MAX(t). Once we have a guess T for
    MAX, ask children to report iff t gt T, rather
    than asking all children to compute a local
    maximum.
  • Example 2 Network programming. Rather than
    asking nodes what packets they have, ask them to
    report iff packets missing.
  • Is this a general technique? Maybe
  • Inform child of guess at aggregate, ask it to
    refute.
  • Works for average (within error bound), not count.

53
Wins and pitfalls of aggregation
  • Aggregation over natural network topology
  • Aggregation over an arbitrary subset of the
    network may be a loss
  • Really dense cells
  • Aggregation does not help with the starvation
    problem
  • Use the message suppression via query rewrite
    technique
  • Still beneficial in a multihop scenario

54
Advanced Aggregation Tricks
  • Break the Network Protocol Boundary
  • Use analog reading from channel over time to
    determine aggregates. Simple example

Reading 11 110100
Reading 21 101010
2 2 4 8 16
Sum
Reading 32
55
Outline
  • Background
  • Server Side Solutions
  • Fjords, Sensor Proxies, CACQ
  • Sensor Side Solutions
  • Catalog Management
  • Aggregation
  • Future Work

56
Future Work
  • DBMS Side
  • Efficient Catalog Management
  • Moving Object Databases
  • Query Optimization Techniques
  • Sensor Side
  • Efficient Grouping
  • Joins over Network Topology
  • Non Standard Aggregate Functions
  • Somewhere In Between
  • Histograms and other Correlations
  • Sampling and Compression for Streams
Write a Comment
User Comments (0)
About PowerShow.com