ModelDriven Data Acquisition in Sensor Networks Part II presentation

About This Presentation

Transcript and Presenter's Notes

Title: ModelDriven Data Acquisition in Sensor Networks Part II

1
Model-Driven Data Acquisition in Sensor
Networks- Part II -

Paper by Amol Deshpande, Carlos Guestrin, Samuel
R. Madden, Joseph M. Hellerstein and Wei Hong
In Proceedings of VLDB 2004
Presented by Shantha Ramachandran
Oct. 4, 2004

2
Last Time

BBQ probabilistic model used to create
observation plan for sensor network
Probability density function
Value, range and average queries

3
Outline

Introduction
Overview of approach
Model-based querying
Choosing an observation plan
Experimental results
Extensions and future directions
Related work
Conclusions

4
Choosing an Observation Plan

Pdfs can be conditioned on the value of observed
attributes o
Gives us a more confident answer to a query
Important which attribute to observe?
Focus select attributes that are expected to
increase the confidences in the answer to the
query, at a minimum cost

5
Cost of Observations

Let O 1,..,n be a set of observations
Expected cost C(O) of observing attributes O
C(O) Ca(O) Ct(O)

Ca(O) data acquisition cost
Sum of energy required to observe attributes O
Ca(O) SieOCa(i)
where Ca(i) is the cost of observing attribute i

Ct(O) expected data transmission cost
Depends on data collection mechanism used to
collect observations from network
Depends on network topology
If topology is unknown or changing, cost function
is basically random
Therefore, assume networks with known topologies

Network graph
Set of edges ?
Each edge eij
2 link quality estimates pij, pji
Probability that packet from i will reach j
Assume pij, pji are independent
Expected of transmission acknowledgement
messages required to guarantee a successful
transmission is 1 / pijpji
Use these values to estimate transmission cost

Choose a simple path through network that visits
all sensors, observes O, returns
Ct(O) is defined to be expected cost of this path
C(O) Ca(O) Ct(O)

10
Improvement in Confidence

Observing attributes O should improve the
confidence of posterior density
Should be able to answer query with higher level
of confidence

Suppose we have a range query Xi e ai, bi
We can compute benefit Ri(o) of o
Ri(o) maxP(Xi e ai, bio), 1- P(Xi e ai,
bio)
Ri(o) measures our confidence after observing o

For value and average queries
Ri(o) P(Xi e xi-e, xieo)
xi is the posterior mean of Xi given o

Specific value o of O is not known a priori
Must compute expected benefit Ri(O)
Ri(O) ?p(o)Ri(o)do

We may have range or value queries over multiple
attributes
Trying to achieve a particular marginal
confidence over each attribute
Must decide how to trade off confidences between
different attributes

For a query over attributes Q 1,,n
Define total benefit R(o) as either
R(o) minieQRi(o)
R(o) 1/Q SieQRi(o)
Focus minimize number of mistakes made by query
processor
Use average benefit to decide when to stop
observing new attributes

16
Optimization

We have so far defined R(O) and C(O) as expected
benefit and cost
Different sets of observed attributes lead to
different benefits and costs
If user wants confidence level 1-d, we want to
pick a set of attributes O that meet the
confidence at minimum cost
minimizeO C(O),such that R(O)gt 1- d
This is generally NP-hard

Two algorithms for solving optimization problem
Exhaustive search
Greedy algorithm

18
Exhaustive Search

Exhaustively search over all possible subsets of
possible observations, O
Finds the optimal subset
Exponential running time

19
Greedy Algorithm

Uses a greedy incremental heuristic
Initialize with empty set of attributes, O ø
For each attribute Xi not in set
Compute new expected benefit R(OUi) and cost
C(OUi)
If some set G reaches desired confidence
Then pick from G the one with lowest total cost
And terminate
Else if G ø, we have not reached our desired
confidence
Then add attribute with highest benefit over cost
ratio
Repeat until desired confidence is reached

20
Experimental Results

Measure performance of BBQ on real world data
sets
Goal demonstrate that BBQ provides ability to
efficiently execute approximate queries with user
specified confidences

21
Data Sets

Results based on running experiments on two real
world data sets
Collected using TinyDB

22
Garden

One month trace of 83,000 readings
11 sensors in a redwood tree at UC Botanical
Garden in Berkeley
Sensors placed at four different altitudes
Collected light, humidity, temperature and
voltage readings once every five minutes
Data split into training and test data sets
Model built on training set

23
Lab

54 sensors in Intel Research, Berkeley lab
Collected light, humidity, temperature and
voltage readings
Also collected network connectivity information
8 days of readings
6 days training
2 days test

24
Query Workload

Two sets of query workloads
Value queries
Predicate queries

25
Value Queries

Main type of queries anticipated
Ask to report sensor readings at all sensors
Within error bound e
With specified confidence d

26
Predicate Queries

Selection queries over sensor readings
Ask for all sensors that satisfy a certain
predicate
With specified confidence d
Also looked at average queries
Do not present these results

27
Comparison Systems

Compare BBQ against
TinyDB-style Querying
Approximate-Caching

28
TinyDB-style Querying

Query disseminated into sensor network using tree
structure
At each mote, sensor reading is observed
Results reported back along same tree to base
station
Combine results on the way back to minimize
communication costs

29
Approximate-Caching

Base station maintains view of readings at all
motes
View is guaranteed to be within a certain
interval of the actual sensor readings
If value of sensor falls outside this interval,
motes are required to report new reading

30
Methodology

BBQ used to build model of training data
Includes transition model for each hour of the
day
Generate traces from test data by taking one
reading randomly per hour
Issue one query per hour

Model computes a priori probability for each
predicate
Choose one or more sensor readings to observe if
confidence bounds not met
Execute generated observation plan
Updates model with observed values from test data
Compares predicted values for non-observed
readings to test data from that hour

Measure accuracy
Compute average mistakes per hour
How many reported values are further away from
the actual values than the specified error bound
Number of predicates whose truth value was
incorrectly approximated

TinyDB
All queries answer correctly
Approximate-Caching
Values reported upon deviation
No mistakes!

Compute cost and accuracy for each observation
plan
Both acquisition cost and communication cost

35
Garden datasetValue Based Queries

Want to analyze performance of value queries on
garden in detail
Show effectiveness of BBQ
Query requires system to report temperatures at
all motes to within specified e
Confidence 97, e varied

Vary e from 0 to 1degrees Celsius
Cost of BBQ falls rapidly as e increases
Percentage of errors stays below 5 confidence
threshold

Figure 4 Relative Costs 1
37

For reasonable values of e
BBQ uses significantly less communication
Approximate-Caching always reports values to
within e
Does not make mistakes
Average observation error is close to BBQ

Percentage of sensors that BBQ observes by hour
Varying e

Figure 5 Number of sensors 1
39

As e gets small (lt0.1), must observe all nodes on
every query
Variance between nodes high enough that it cannot
infer value of one sensor from anothers with any
accuracy
As e gets large (gt1), few observations are needed
Changes in one sensor predict values of others
Intermediate e
More observations are needed, especially during
times when readings change drastically

40
Garden DatasetCost vs. Confidence

Compare cost of plan execution
confidence 80 to 99
epsilon varying between 0.1 and 1.0

Figure 6 Energy and errors vs. confidence
interval and epsilon 1
41

Decreasing confidence intervals or epsilon
reduces energy per query

Confidence 95
Errors 0.5
Reduce expected energy cost from 5.4 J to 150 mJ
per query
Factor of 40 reduction

Figure 6A 1
42

Meet or exceed confidence interval in almost all
cases

Except 99 confidence

Figure 6B 1
43
Additional Experiments

Performance of greedy algorithm vs. optimal
algorithm
Performance of dynamic filter vs. static model

44
Garden DatasetRange Queries

Ran a number of experiments with range queries
Average number of observations required for 95
confidence

Figure 7 BBQs performace 1
45

3 different range queries
Temp. in 17,18
Temp. in 19,20
Temp. in 21,22
In all 3 cases, error rates were at or below 5
Different range queries required observations at
different times

46
Lab Dataset

Similar experiments run on lab dataset
Higher number of attributes
Temperatures harder to predict than outdoors
Human intervention -gt randomness

Cost incurred answering value query
Confidence varied
As required confidence drops, BBQ is more
efficient

Figure 8A Energy vs. confidence interval and
epsilon 1
48

BBQ achieved specified confidence bounds in
almost all cases

Figure 8B Errors vs. confidence interval and
epsilon 1
49

Example traversal executing a value query
99 confidence, e0.5 degrees C
Initial set, 8 am

Figure 9 Traversals of the lab network 1
50
Extensions and Future Directions

So far, authors have focused on core architecture
of BBQ
Goal unifying probabilistic models with
declarative queries
Several possible extensions

51
Conditional Plans

Generate plans that include early stopping
conditions
Generate plans that explore different parts of
the network, depending on the values of observed
attributes

52
More Complex Models

In particular, models that detect faulty sensors
Answer fault detection queries
Give correct answers to queries in the presence
of faults

53
Outliers

Does not work well in current implementation
Only way to detect outliers is to continuously
sample sensors
Outlier scheme would likely have high sensing
cost
Probabilistic techniques are expected to work to
avoid excessive communication

54
Support for Dynamic Networks

Current approach works best in static network
Systematic study how network topologies change
over time
New sensors added, existing sensors move
Topology change recovery strategies
Find alternate routes through network

55
Continuous Queries

Currently, exploration plan re-executes at root
node
May be possible to install code that causes
devices to periodically push readings during
times of high change

56
Related Work

Substantial work has been done on approximate
query processing in the database community
Using model-like synopses for query answering
Instead of probabilistic models
Most do not use correlations

57
AQUA Project

Proposes sampling-based synopses that can provide
approximate answers to a variety of queries using
a fraction of the total data in the database
Includes tight bounds on the correctness of the
answers
Designed to work in and environment where it is
possible to generate independent random samples
of data

Does not exploit correlations
Lacks predictive power of probabilistic models
Others propose exploiting correlations through
graphical model techniques for approximate query
processing

59
Approximate Caching

Olsten et al.
Provides bounded approximation of values of a
number of cached objects at some server
Server stores cached values along with absolute
bounds for deviation
When objects notice values outside bounds, they
send an update

This requires cached objects to continuously
monitor values
High energy overhead
Can detect outliers
BBQ cannot

61
IDSQInformation Driven Sensor Querying

Probabilistic models
Estimation of target position in tracking
applications
Sensors tasked to maximally reduce positional
uncertainty of target

62
ACQPAcquisitional Query Processing

Query processing in an environment like sensor
networks
Must be sensitive to costs of acquiring data
Main goal avoid unnecessary data acquisition

63
CONTROL

Provide interface that allows users to see
partially complete answers with confidence bounds
for long running aggregate queries
No correlations

64
Conclusions

Proposed architecture for integrating database
systems with correlation-aware probabilistic
model
Do not directly query the network
Rather, build model from stored and current
readings
Answer SQL queries by consulting model

Advantages to using model in sensor network
Shield users from faulty sensors
Reduce number of expensive sensor readings and
radio transmissions

BBQ shows encouraging order of magnitude
reductions in sampling and communication costs
BBQ general architecture is seen as proper
platform for answer queries and interpreting data
from real-world environments like sensornets
Conventional database technology is not equipped
to deal with lossiness, noise and non-uniformity
which is inherent in such environments

67
Thank YouQuestions?
1 A. Deshpande, C. Guestrin, S. Madden, J.
Hellerstein, W. Hong. Model-Driven Data
Acquisition in Sensor Networks. In Proc. Of the
30th VLDB Conference, 2004.

Write a Comment

User Comments (0)

About PowerShow.com

ModelDriven Data Acquisition in Sensor Networks Part II PowerPoint PPT Presentation