ModelDriven Data Acquisition in Sensor Networks Part II PowerPoint PPT Presentation

presentation player overlay
1 / 67
About This Presentation
Transcript and Presenter's Notes

Title: ModelDriven Data Acquisition in Sensor Networks Part II


1
Model-Driven Data Acquisition in Sensor
Networks- Part II -
  • Paper by Amol Deshpande, Carlos Guestrin, Samuel
    R. Madden, Joseph M. Hellerstein and Wei Hong
  • In Proceedings of VLDB 2004
  • Presented by Shantha Ramachandran
  • Oct. 4, 2004

2
Last Time
  • BBQ probabilistic model used to create
    observation plan for sensor network
  • Probability density function
  • Value, range and average queries

3
Outline
  • Introduction
  • Overview of approach
  • Model-based querying
  • Choosing an observation plan
  • Experimental results
  • Extensions and future directions
  • Related work
  • Conclusions

4
Choosing an Observation Plan
  • Pdfs can be conditioned on the value of observed
    attributes o
  • Gives us a more confident answer to a query
  • Important which attribute to observe?
  • Focus select attributes that are expected to
    increase the confidences in the answer to the
    query, at a minimum cost

5
Cost of Observations
  • Let O 1,..,n be a set of observations
  • Expected cost C(O) of observing attributes O
  • C(O) Ca(O) Ct(O)

6
  • Ca(O) data acquisition cost
  • Sum of energy required to observe attributes O
  • Ca(O) SieOCa(i)
  • where Ca(i) is the cost of observing attribute i

7
  • Ct(O) expected data transmission cost
  • Depends on data collection mechanism used to
    collect observations from network
  • Depends on network topology
  • If topology is unknown or changing, cost function
    is basically random
  • Therefore, assume networks with known topologies

8
  • Network graph
  • Set of edges ?
  • Each edge eij
  • 2 link quality estimates pij, pji
  • Probability that packet from i will reach j
  • Assume pij, pji are independent
  • Expected of transmission acknowledgement
    messages required to guarantee a successful
    transmission is 1 / pijpji
  • Use these values to estimate transmission cost

9
  • Choose a simple path through network that visits
    all sensors, observes O, returns
  • Ct(O) is defined to be expected cost of this path
  • C(O) Ca(O) Ct(O)

10
Improvement in Confidence
  • Observing attributes O should improve the
    confidence of posterior density
  • Should be able to answer query with higher level
    of confidence

11
  • Suppose we have a range query Xi e ai, bi
  • We can compute benefit Ri(o) of o
  • Ri(o) maxP(Xi e ai, bio), 1- P(Xi e ai,
    bio)
  • Ri(o) measures our confidence after observing o

12
  • For value and average queries
  • Ri(o) P(Xi e xi-e, xieo)
  • xi is the posterior mean of Xi given o

13
  • Specific value o of O is not known a priori
  • Must compute expected benefit Ri(O)
  • Ri(O) ?p(o)Ri(o)do

14
  • We may have range or value queries over multiple
    attributes
  • Trying to achieve a particular marginal
    confidence over each attribute
  • Must decide how to trade off confidences between
    different attributes

15
  • For a query over attributes Q 1,,n
  • Define total benefit R(o) as either
  • R(o) minieQRi(o)
  • R(o) 1/Q SieQRi(o)
  • Focus minimize number of mistakes made by query
    processor
  • Use average benefit to decide when to stop
    observing new attributes

16
Optimization
  • We have so far defined R(O) and C(O) as expected
    benefit and cost
  • Different sets of observed attributes lead to
    different benefits and costs
  • If user wants confidence level 1-d, we want to
    pick a set of attributes O that meet the
    confidence at minimum cost
  • minimizeO C(O),such that R(O)gt 1- d
  • This is generally NP-hard

17
  • Two algorithms for solving optimization problem
  • Exhaustive search
  • Greedy algorithm

18
Exhaustive Search
  • Exhaustively search over all possible subsets of
    possible observations, O
  • Finds the optimal subset
  • Exponential running time

19
Greedy Algorithm
  • Uses a greedy incremental heuristic
  • Initialize with empty set of attributes, O ø
  • For each attribute Xi not in set
  • Compute new expected benefit R(OUi) and cost
    C(OUi)
  • If some set G reaches desired confidence
  • Then pick from G the one with lowest total cost
  • And terminate
  • Else if G ø, we have not reached our desired
    confidence
  • Then add attribute with highest benefit over cost
    ratio
  • Repeat until desired confidence is reached

20
Experimental Results
  • Measure performance of BBQ on real world data
    sets
  • Goal demonstrate that BBQ provides ability to
    efficiently execute approximate queries with user
    specified confidences

21
Data Sets
  • Results based on running experiments on two real
    world data sets
  • Collected using TinyDB

22
Garden
  • One month trace of 83,000 readings
  • 11 sensors in a redwood tree at UC Botanical
    Garden in Berkeley
  • Sensors placed at four different altitudes
  • Collected light, humidity, temperature and
    voltage readings once every five minutes
  • Data split into training and test data sets
  • Model built on training set

23
Lab
  • 54 sensors in Intel Research, Berkeley lab
  • Collected light, humidity, temperature and
    voltage readings
  • Also collected network connectivity information
  • 8 days of readings
  • 6 days training
  • 2 days test

24
Query Workload
  • Two sets of query workloads
  • Value queries
  • Predicate queries

25
Value Queries
  • Main type of queries anticipated
  • Ask to report sensor readings at all sensors
  • Within error bound e
  • With specified confidence d

26
Predicate Queries
  • Selection queries over sensor readings
  • Ask for all sensors that satisfy a certain
    predicate
  • With specified confidence d
  • Also looked at average queries
  • Do not present these results

27
Comparison Systems
  • Compare BBQ against
  • TinyDB-style Querying
  • Approximate-Caching

28
TinyDB-style Querying
  • Query disseminated into sensor network using tree
    structure
  • At each mote, sensor reading is observed
  • Results reported back along same tree to base
    station
  • Combine results on the way back to minimize
    communication costs

29
Approximate-Caching
  • Base station maintains view of readings at all
    motes
  • View is guaranteed to be within a certain
    interval of the actual sensor readings
  • If value of sensor falls outside this interval,
    motes are required to report new reading

30
Methodology
  • BBQ used to build model of training data
  • Includes transition model for each hour of the
    day
  • Generate traces from test data by taking one
    reading randomly per hour
  • Issue one query per hour

31
  • Model computes a priori probability for each
    predicate
  • Choose one or more sensor readings to observe if
    confidence bounds not met
  • Execute generated observation plan
  • Updates model with observed values from test data
  • Compares predicted values for non-observed
    readings to test data from that hour

32
  • Measure accuracy
  • Compute average mistakes per hour
  • How many reported values are further away from
    the actual values than the specified error bound
  • Number of predicates whose truth value was
    incorrectly approximated

33
  • TinyDB
  • All queries answer correctly
  • Approximate-Caching
  • Values reported upon deviation
  • No mistakes!

34
  • Compute cost and accuracy for each observation
    plan
  • Both acquisition cost and communication cost

35
Garden datasetValue Based Queries
  • Want to analyze performance of value queries on
    garden in detail
  • Show effectiveness of BBQ
  • Query requires system to report temperatures at
    all motes to within specified e
  • Confidence 97, e varied

36
  • Vary e from 0 to 1degrees Celsius
  • Cost of BBQ falls rapidly as e increases
  • Percentage of errors stays below 5 confidence
    threshold

Figure 4 Relative Costs 1
37
  • For reasonable values of e
  • BBQ uses significantly less communication
  • Approximate-Caching always reports values to
    within e
  • Does not make mistakes
  • Average observation error is close to BBQ

38
  • Percentage of sensors that BBQ observes by hour
  • Varying e

Figure 5 Number of sensors 1
39
  • As e gets small (lt0.1), must observe all nodes on
    every query
  • Variance between nodes high enough that it cannot
    infer value of one sensor from anothers with any
    accuracy
  • As e gets large (gt1), few observations are needed
  • Changes in one sensor predict values of others
  • Intermediate e
  • More observations are needed, especially during
    times when readings change drastically

40
Garden DatasetCost vs. Confidence
  • Compare cost of plan execution
  • confidence 80 to 99
  • epsilon varying between 0.1 and 1.0

Figure 6 Energy and errors vs. confidence
interval and epsilon 1
41
  • Decreasing confidence intervals or epsilon
    reduces energy per query
  • Confidence 95
  • Errors 0.5
  • Reduce expected energy cost from 5.4 J to 150 mJ
    per query
  • Factor of 40 reduction

Figure 6A 1
42
  • Meet or exceed confidence interval in almost all
    cases
  • Except 99 confidence

Figure 6B 1
43
Additional Experiments
  • Performance of greedy algorithm vs. optimal
    algorithm
  • Performance of dynamic filter vs. static model

44
Garden DatasetRange Queries
  • Ran a number of experiments with range queries
  • Average number of observations required for 95
    confidence

Figure 7 BBQs performace 1
45
  • 3 different range queries
  • Temp. in 17,18
  • Temp. in 19,20
  • Temp. in 21,22
  • In all 3 cases, error rates were at or below 5
  • Different range queries required observations at
    different times

46
Lab Dataset
  • Similar experiments run on lab dataset
  • Higher number of attributes
  • Temperatures harder to predict than outdoors
  • Human intervention -gt randomness

47
  • Cost incurred answering value query
  • Confidence varied
  • As required confidence drops, BBQ is more
    efficient

Figure 8A Energy vs. confidence interval and
epsilon 1
48
  • BBQ achieved specified confidence bounds in
    almost all cases

Figure 8B Errors vs. confidence interval and
epsilon 1
49
  • Example traversal executing a value query
  • 99 confidence, e0.5 degrees C
  • Initial set, 8 am

Figure 9 Traversals of the lab network 1
50
Extensions and Future Directions
  • So far, authors have focused on core architecture
    of BBQ
  • Goal unifying probabilistic models with
    declarative queries
  • Several possible extensions

51
Conditional Plans
  • Generate plans that include early stopping
    conditions
  • Generate plans that explore different parts of
    the network, depending on the values of observed
    attributes

52
More Complex Models
  • In particular, models that detect faulty sensors
  • Answer fault detection queries
  • Give correct answers to queries in the presence
    of faults

53
Outliers
  • Does not work well in current implementation
  • Only way to detect outliers is to continuously
    sample sensors
  • Outlier scheme would likely have high sensing
    cost
  • Probabilistic techniques are expected to work to
    avoid excessive communication

54
Support for Dynamic Networks
  • Current approach works best in static network
  • Systematic study how network topologies change
    over time
  • New sensors added, existing sensors move
  • Topology change recovery strategies
  • Find alternate routes through network

55
Continuous Queries
  • Currently, exploration plan re-executes at root
    node
  • May be possible to install code that causes
    devices to periodically push readings during
    times of high change

56
Related Work
  • Substantial work has been done on approximate
    query processing in the database community
  • Using model-like synopses for query answering
  • Instead of probabilistic models
  • Most do not use correlations

57
AQUA Project
  • Proposes sampling-based synopses that can provide
    approximate answers to a variety of queries using
    a fraction of the total data in the database
  • Includes tight bounds on the correctness of the
    answers
  • Designed to work in and environment where it is
    possible to generate independent random samples
    of data

58
  • Does not exploit correlations
  • Lacks predictive power of probabilistic models
  • Others propose exploiting correlations through
    graphical model techniques for approximate query
    processing

59
Approximate Caching
  • Olsten et al.
  • Provides bounded approximation of values of a
    number of cached objects at some server
  • Server stores cached values along with absolute
    bounds for deviation
  • When objects notice values outside bounds, they
    send an update

60
  • This requires cached objects to continuously
    monitor values
  • High energy overhead
  • Can detect outliers
  • BBQ cannot

61
IDSQInformation Driven Sensor Querying
  • Probabilistic models
  • Estimation of target position in tracking
    applications
  • Sensors tasked to maximally reduce positional
    uncertainty of target

62
ACQPAcquisitional Query Processing
  • Query processing in an environment like sensor
    networks
  • Must be sensitive to costs of acquiring data
  • Main goal avoid unnecessary data acquisition

63
CONTROL
  • Provide interface that allows users to see
    partially complete answers with confidence bounds
    for long running aggregate queries
  • No correlations

64
Conclusions
  • Proposed architecture for integrating database
    systems with correlation-aware probabilistic
    model
  • Do not directly query the network
  • Rather, build model from stored and current
    readings
  • Answer SQL queries by consulting model

65
  • Advantages to using model in sensor network
  • Shield users from faulty sensors
  • Reduce number of expensive sensor readings and
    radio transmissions

66
  • BBQ shows encouraging order of magnitude
    reductions in sampling and communication costs
  • BBQ general architecture is seen as proper
    platform for answer queries and interpreting data
    from real-world environments like sensornets
  • Conventional database technology is not equipped
    to deal with lossiness, noise and non-uniformity
    which is inherent in such environments

67
Thank YouQuestions?
1 A. Deshpande, C. Guestrin, S. Madden, J.
Hellerstein, W. Hong. Model-Driven Data
Acquisition in Sensor Networks. In Proc. Of the
30th VLDB Conference, 2004.
Write a Comment
User Comments (0)
About PowerShow.com