ICS280 Presentation by Suraj Nagasrinivasa - PowerPoint PPT Presentation

About This Presentation
Title:

ICS280 Presentation by Suraj Nagasrinivasa

Description:

ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 41
Provided by: ALEX1167
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: ICS280 Presentation by Suraj Nagasrinivasa


1
ICS280 Presentationby Suraj Nagasrinivasa
  • (1) Evaluating Probabilistic Queries over
    Imprecise Data (SIGMOD 2003)
  • by R Cheng, D Kalashnikov, S Prabhakar
  • (2) Model-Driven Data Acquisition in Sensor
    Networks (VLDB 2004)
  • by A Deshpande, C Guestrin, J Hellerstein, W
    Hong, S Madden
  • Acknowledgements Dmitri Kalashnikov and Michal
    Kapalka

2
In typical sensor applications...
  • Sensors monitor external environment continuously
  • Sensor readings are sent back to the application
  • Decisions are often made based on these readings

3
However, we face uncertainty
  • Typically, DB/server collects sensor readings
  • DB cannot store true sensor value at all points
    in time
  • Scarce battery power
  • Limited network bandwidth
  • So, readings recorded at discrete time points
  • Value of phenomenon continuously changing
  • As a result, DB stored reading is mostly obsolete

4
Scenario Answering Minimum Query with discrete
DB stored readings
Recorded Temperature
Current Temperature
x1
y0
  • x0 lt y0 x is minimum
  • y1 lt x1 y is minimum
  • Wrong query result

x0
y1
x
y
5
Scenario Answering Minimum Query with
error-bound readings I
Recorded Temperature
Bound for Current Temperature
y0
  • x certainly gives the minimum temperature reading

x0
x
y
6
Scenario Answering Minimum Query with
error-bound readings II
Recorded Temperature
Bound for Current Temperature
y0
  • Both x and y have a chance of yielding the
    minimum value
  • Which one has a higher probability?

x0
x
y
7
Probabilistic Queries
  • Based on variation characteristics of sensor
    value over time
  • Bounds can be estimated for possible values
  • Probability distribution of values defined within
    bounds
  • Evaluate probability for query answers
  • Probabilistic queries give a correct answer,
    instead of a potentially incorrect answer

8
Rest of the paper
  • Notation Uncertainty Model
  • Classification of Probabilistic Queries
  • Evaluating Probabilistic Queries
  • Quality of Probabilistic Queries
  • Object Refreshment Policies
  • Experimental Results

9
Notation
  • T A set of DB objects (e.g. sensors)
  • a Dynamic attribute (e.g. pressure)
  • Ti ith object of T
  • Ti.a(t) Value of a in Ti at time t

10
Uncertainty Model
fi(x,t) uncertainty pdf
Ti.a(t)
li(t)
ui(t)
Uncertainty Interval Ui(t)
  • Can be extended in n dimensions

11
Classification of Probabilistic Queries
  • Type of Result
  • Value-based returns single value
  • E.g. Minimum query (l,u, pdf)
  • Entity-based returns set of objects
  • E.g. Range query ((Ti, pi), pigt0)
  • Aggregation
  • Non-Aggregate query result for an object is
    independent of other objects
  • E.g. Range query
  • Aggregate query result computed from set of
    objects
  • E.g. Nearest Neighbor query

12
Classification of Probabilistic Queries
Value-based answer Entity-based answer
Non-aggregate VSingleQ What is the temperature of sensor x? ERQ Which sensor has temperature between 10F and 30F?
Aggregate VAvgQ, VSumQ, VMinQ, VMaxQ What is the average temperature of the sensors? ENNQ, EMinQ, EMaxQ Which sensor gives the highest temperature?
  • Query evaluation algorithms and quality metrics
    are developed for each class

13
ENNQ algorithmProjection, Pruning, Bounding
Evaluation
14
ENNQ algorithm
15
Quality of Probabilistic Result
  • Introduce a notion of quality of answer
  • Proposed metrics for different classes of queries
  • regular range query
  • "yes" or "no" with 100
  • probabilistic query ERQ
  • yes with pi 95 OK
  • yes with pi 5 OK (95 it is not in l, u)
  • yes with pi 50 NOT OK (not certain!)

16
Quality for Entity-Aggregate Queries
  • "Which sensor, among n, has the minimum reading?"
  • Recall
  • Result set R (Ti, pi)
  • e.g. (T1, 30), (T2, 40), (T3, 30)
  • B is interval, bounding all possible values
  • e.g. minimum is somewhere in B 10,20
  • Our metrics for aggregate queries Min, Max, NN
  • objects cannot be treated independently as in ERQ
    metric
  • uniform distribution (in result set) is the worst
    case
  • metrics are based on entropy

17
Quality for Entity-Aggregate Queries
  • H(X) entropy of random variable X (X1 ,,Xn with
    p(X1) ,, p(Xn))
  • entropy is smallest (i.e., 0) iff ? i p(Xi) 1
  • entropy is largest (i.e., log2(n)) iff all Xi's
    are equally likely

18
Improving Answer Quality
  • Is important to pick right update policies that
    will help improve answer quality
  • Global Choice
  • Glb_RR (pick random)
  • Local Choice
  • Loc_RR (pick random)
  • MaxUnc (heuristic chooses max. uncertainty
    interval )
  • MinExpEntropy (heuristic choose object with
    minimum expected entropy)

19
Experiments Simulation Set-up
  • 1 server, 1000 sensors, limited network
    bandwidth, Min queries tested
  • Queries arrival is a Poisson distribution
  • Each query over a random set of 100 sensors

20
Results
21
Conclusions
  • Probabilistic Querying for handling inherent
    uncertainty in sensor DBs
  • Classification, Algorithms and Quality of Answer
    metrics for various query types
  • Very general model of uncertainty which makes the
    algorithms not directly implement-able in any
    sensor network
  • Besides, in order to achieve any reasonable
    energy-efficiency in sensor networks, application
    and network requirements that dictate sensor
    nodes to be awake have to be tightly coordinated.
    Especially in the case of multi-hop routing

22
Outline for Model Driven Data Acquisition for
Sensor Networks
  • Introduction
  • Motivation for Model-Based Queries
  • Framework Concept
  • Model Example Multivariate Gaussian
  • Algorithm
  • Resolving Model-Based Queries
  • Incorporating Dynamicity
  • Observation Plan / Cost model
  • Experiments
  • BBQ System
  • Results
  • Conclusions

23
Motivation for Model-Based Queries
  • Declarative Queries adopted as key programming
    paradigm for large sensor nets
  • However, interpreting sensor nets as databases
    results in two major problems
  • Misinterpretation of Data
  • Physically observable world is a set of
    continuous phenomenon in both time and space
  • Sensor readings are UNLIKELY to be random samples
  • Inefficient approximate queries
  • If sensor readings are not true values, need
    for quantifying uncertainty to provide reliable
    answers

24
Motivation for Model-Based Queries
  • Paper Contribution To incorporate statistical
    models of real-world processes into sensor net
    query processing architecture
  • Models help in
  • Accounting for biases in spatial sampling
  • Identifying sensors providing faulty data
  • Extrapolating values for missing sensors

25
Framework Concept
  • Goal Given a query and model, to devise an
    efficient data acquisition plan to provide best
    possible answer
  • Major dependencies
  • Correlations between sensors captured by the
    statistical model
  • Correlation between attributes for given sensor
  • Correlation between sensors for given attribute
  • Specific connectivity of the wireless network

26
Framework ConceptObservation Plan parameters
Correlations in Value Cost Differential
27
Framework Concept
28
Model Example Multivariate Gaussian
29
Resolving Model-Based Queries (Range Queries)
30
Resolving Model-Based Queries(Value Queries)
  • To compute value of Xi with maximum error e and
    confidence 1-delta
  • Compute mean of Xi (where o observations)
  • As in range queries, find probability

31
Range Queries for Gaussian
  • Projection for Gaussian is simple just drop
    unnecessary values from mean and variance matrix
  • The integral
  • has to be computed.

32
Incorporating Dynamicity
  • Use historical measurements to improve confidence
    of answers
  • Given pdf in time t
  • Compute pdf at time t1

33
Incorporating Dynamicity
  • Assumption Markovian Model
  • Dynamicity summarized by transition model

34
Observation Plan / Cost Model
  • What is the cost of making o observations?
  • C(o) acquisition cost transmission cost
  • Acquisition cost constant for each attribute
  • Transmission cost
  • Network graph
  • Edge weights (link quality)
  • Paths taken could be sub-optimal

35
Observation Plan / Cost Model
  • A set of attributes (theta) to observe are
    determined by computing expected benefit
  • And finding
  • This, being similar to the traveling salesmans
    problem, is best dealt with heuristic algorithms

36
BBQ System
  • BBQ A Tiny-Model Query System
  • Uses Multivariate Gaussians
  • Has 24 transition models for different hour of
    day

37
Results
  • Experiment 11 sensors on a tree, 83000
    measurements, 2/3 used for training and 1/3 for
    tests
  • Methodology
  • BBQ builds a model based on training data
  • One random query / hour taken possible
    observations and model is updated
  • The answer is compared to the measured value
  • Compare with two other methods
  • TinyDB Each query broadcasted over sensor
    networks using an overlay tree
  • Approximate-Caching Base station maintains a
    view of the sensor readings

38
Results
39
Results
40
Conclusion
  • Approximate queries can be well optimized, but
    model of physical phenomenon is needed
  • Defining an appropriate model is a challenge
  • The framework works well for fairly steady
    sensor data values
  • Statistical model is largely static with
    refinements to the model based on incoming
    queries and observations made as a result
Write a Comment
User Comments (0)
About PowerShow.com