Minimizing Data Uncertainty through System Design

About This Presentation

Title:

Minimizing Data Uncertainty through System Design

Description:

Features: Xt. Calculate. score. New reading. Sensor signature: St. Fault signature: F ... Optimal design: Experiments for discriminating between several models. ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 2

Provided by: alber175

Category:

more less

Transcript and Presenter's Notes

Title: Minimizing Data Uncertainty through System Design

1
Minimizing Data Uncertainty through System Design
Center for Embedded Networked Sensing
Laura Balzano, Nabil Hajj Chehade, Sheela Nair,
Nithya Ramanathan, Abhishek Sharma, Deborah
Estrin, Leana Golubchik, Ramesh Govindan, Mark
Hansen, Eddie Kohler, Greg Pottie, Mani
Srivastava Integrity Group, Center for Embedded
Networked Sensing
Introduction There are Many Sources of
Uncertainty in Interpreting Data
Hardware Uncertainty
Environment Modeling Uncertainty
Sensor Calibration Uncertainty

In a lot of applications, wireless sensing
systems are used for inference and prediction on
environmental phenomena.
Statistical models are widely used to represent
these environmental phenomena
Models characterize how unknown quantities
(phenomena) are related to known quantities
(measurements)
Choosing the models involves a great deal of
uncertainty.
Often a single model M is used. If M does not
characterize a phenomenon correctly, the
inferences and predictions will not be accurate.
It is better to start with multiple plausible
models and select the model by collecting
measurements at informative locations.

Wireless sensing systems utilize low cost and
unreliable hardware
Faults are common Examples of Sensor Faults

Accurate calibration function is required to
translate data from sensors
Calibration parameters for most sensors drifts
non-deterministically over time

Data uncertainty can be reduced through careful
system design!
Reducing Uncertainty in Model Selection
Algorithm T-Designs
A sequential algorithm is
used to iteratively collect measurements that
maximize the discrimination between the two
models 1.
Problem Description Optimal Sensor
placement Where should we collect measurements to
optimally choose a model that represents the
field? Assumptions Two plausible models.
Gaussian noise. Idea Find the locations where
the difference between the two models is the
largest. Technically
Evaluation on Real Data
Likelihoods M1? 0.1754, M2 ? 3.4368 ? M2 fits
better. Generalization In case of multiple
models, apply the same algorithm to the best two
models that fit the data at each iteration (worst
case).
1 A.C. Atkinson and V.V. Fedorov. Optimal
design Experiments for discriminating between
several models. Biometrika 62, 289-303, 1975.
Reducing Uncertainty in Hardware Functionality
(Fault Detection/Diagnosis)

Problem Description Online fault detection and
diagnosis By detecting faults when they occur,
instead of after the fact, users can take actions
in the field to validate questionable data and
fix hardware faults.
Assumptions Faults can be common, an initial
fault-free training period is not always
available, environmental phenomena are hard to
predict so tight bounds on expected behavior are
not possible
Signatures for modeling normal and faulty behavior
Confidence
Data-driven techniques for identifying faulty
sensor readings 1) Rule/Heuristic-based methods
Points far from the origin are faulty Assume a
normal distribution of distances for good points.
Points outside 2 standard deviations of the mean
distance are considered outliers and are
rejected. All other points are used to
continually update distribution parameters.
Fault Detection Algorithm (adapted from Detecting
Fraud In the Real World Cahill, Lambert,
Pinhiero, and Sun 2000)

Summarize sensor and fault behaviors using a
signature multivariate probability density of
features (Cahill, Lambert, Pinhiero, and Sun
2000)
Features chosen to exploit differences between
faulty and normal behavior. Current features
summarize temporal and spatial information
Temporal actual reading, change between
successive readings, voltage
Spatial diff. from neighboring sensors.
Calculate score for new readings using log
likelihood ratio
Higher scores are more suspicious.
Use of sensor signatures allows for
sensor-specific fault detection.

SHORT Rule Compute the rate of change between
two successive samples. If it is above a
threshold, this is an instance of SHORT fault.
NOISE Rule Compute the std. deviation of
samples within time window W. If it is above a
threshold, the samples are corrupted by NOISE
fault.

Take Physical Sample
Standard Deviation
Replace Sensor
Points are clustered using an online K-means
algorithm. Clusters are associated with a
previously successful remediating action
SHORT fault
NOISE fault
NO

2) Linear-Least Squares Estimation based method
Exploits correlation in the data measured at
different sensors
LLSE Equation

YES
Gradient
Readings are mapped into a multi-dimensional
space defined by carefully chosen features
gradient, distance from LDR, distance from NLDR,
standard deviation.
Tested on one week of Cold-Air Drainage data
Outlier Detection Using a continually updated
distribution, in place of statically defined
thresholds, makes Confidence resilient to human
configuration error and adaptable to dynamic
environments
Low voltage
3) Learning data models Hidden Markov Models
Evaluated in real-world
Deployments Confidence detects faults with low
false positive and negative rates. Difficult
to validate what is truly a fault without ground
truth In our San Joaquin deployment we
validated data by analyzing soil samples taken
from each sensor

HMM model
Number of states
Transition probabilities
Conditional probability Pr O S

Sensor 1 Sensor 2
stuck-at fault
unusually noisy readings
Sensor 2 malfunctioning at start of deployment
Noisy readings are learned as normal sensor
behavior

Results
Analyzed data sets from real world deployments to
characterize the prevalence of data faults using
these 3 methods.
NAMOS deployment CONSTANTNOISE faults, up to
30 of samples affected by data faults.
Intel Lab, Berkeley deployment CONSTANTNOISE
faults, up to 20 of samples affected by data
faults.
Great Duck Island deployment SHORTNOISE
faults, 10-15 of samples affected by data
faults.
SensorScope deployment SHORT faults, very few
samples affected by data faults.

4/06 4/08
4/10 4/12

Difficult to initialize sensor signature without
learning period that is guaranteed to be
fault-free.
Can use a stricter threshold during learning
period to decrease chance of incorporating faults
into sensor signature
Method is dependent on accurately representing
fault models, which is difficult without
available labeled training data.

Reducing Uncertainty in Sensor Calibration
Evaluation
Problem Description Blind Calibration
Blindly calibrate sensor response
from routine measurements collected from the
sensor network. Manual calibration is not a
scalable practice!
Consider a network with n sensors. We can call
the vector of a true signal from the n sensors
x And the vector of the measured signal y
In a deployment with sensors spread across a
valley at the James Reserve, using a 4-d signal
subspace constructed from the calibrated data,
the gain calibration was quite accurate. The
offset calibration, as expected, captured some of
the non-zero mean signal additionally it was
sensitive to the model.
Then under certain conditions on P, with no noise
and exact knowledge of the subspace, we can
perfectly recover the gain factors and partially
recover the offset factors.
UCLA UCR Caltech USC UC Merced

Write a Comment

User Comments (0)

About PowerShow.com

Minimizing Data Uncertainty through System Design - PowerPoint PPT Presentation

Minimizing Data Uncertainty through System Design

Features: Xt. Calculate. score. New reading. Sensor signature: St. Fault signature: F ... Optimal design: Experiments for discriminating between several models. ... – PowerPoint PPT presentation