Title: Estimating Missing Data in Sensor Network Databases Using SpatialTemporal Data Mining to Support Spa
1Estimating Missing Data in Sensor Network
Databases Using Spatial-Temporal Data Mining to
Support Space Data Analysis
- Le Gruenwald
- The University of Oklahoma
- School of Computer Science
- Norman, OK 73019
- ggruenwald_at_ou.edu
2Project Objective
-
- To develop a mining framework to automatically
estimate missing sensor readings and answer
deliberate user stream queries
3Project Progress (2007-2008)
- Designed and Developed a new Spatio-Temporal
Mining Framework for answering mining queries and
estimating missing node values (MASTER Mining
Autonomously Spatio-Temporal Environmental
Rules). - Conducted simulation experiments comparing MASTER
with existing approaches using climate sensor
datasets obtained from the Sensor Webs Botanical
Garden Project data server.
4Computing Environment (1)
- Sensor Networks
- Triggered by recent technology advances in
Micro Electro Mechanical Systems (MEMS)
technology, low-power analog and digital
electronics, and low-power radio frequency (RF)
design. - Purpose
- To monitor, combine, analyze and respond to the
data collected by hundreds (thousands) sensors
distributed in the physical world in a timely
manner. - Example
- Space Science - sensors collecting MARS
conditions. - Transportation sensors for traffic
monitoring. - Battlefield sensors attached to soldiers,
vehicles or scattered throughout important
areas.
5The Computing Environment (2)
Data Streams
SERVER
SensorN
Sensor2
Sensor1
Real World
Queries
Answers
USER
6The Computing Environment (3)
- Data Streams
- - the most natural way to process data in the
majority of sensor network applications - an
append-only collection of tuples that is ordered
by some increasing key value (often time)
Zdonik,02 - Data Stream Example
sens_id, time(n-4), reading
sens_id, time(n-3), reading
sens_id, time(n-2), reading
sens_id, time(n-1), reading
sens_id, time(n), reading
Sensor X
7Accomplishment
- Developed a Spatio-Temporal Mining Framework to
- Capture the intrinsic spatial and temporal trends
within sensor data streams - Automatically seek spatio-temporal trends to
estimate any missing values - Allow for an SQL-like query processing system to
evolving trend analysis
8Framework Contributions (1)
- Incrementally and compactly store data streams
without abstracting away key trends using a
single-pass storing procedure - Framework is resource-aware
- User-defined space usage bound
- User-defined storing time bound
- User-defined estimation time bound
- Quality of Service (QoS)
- Resource bounds (above)
- Probabilisitic bound on estimation error margin
9Framework Contributions (2)
- Assumption-free no statistical distribution
models are assumed a priori (e.g. Markovian,
Gaussian,etc.) - Comprehensive Association Rule Definition
- Include temporal qualifiers (time expressions
over user defined time attributes) - Include any number (as limited by the enforced
overhead bounds) of node items in the rule - Each multidimensional node item in the rule may
relate to other nodes in the same rule with
respect to any data range from the entire vector
space (hence linear and non-linear correlations)
10System Architecture
11Sensor Temporal Association Rule Examples
- On weekdays during rush hours, if a traffic
sensor A reports between 20 and 30 average
passing cars per minute then it can be deduced
that a far off traffic sensor will indicate an
average of 15 to 20 passing cars per minute. - On spring days between 12-2pm if the temperature
of a node A is between 30 and 35, its humidity
between 50 and 60, and the temperature reported
by a separate node B is between 20 and 25 then
the humidity reported by a third node C is
likely between 40 and 45.
12Temporal Association Rule Formalization (1)
- Traditional Association Rule
13Temporal Association Rule Formalization (2)
- Sensor-context Association Rule
- If node items of the rule have values over the
vector space of their transmission range then - We map back to Boolean items by evaluating each
node item over a specific subspace (i.e., an item
evaluates to true when the data falls in the
particular subspace that is now part of the rule
definition and false otherwise) - The goal of the rule-mining estimation method is
too seek appropriate node items over appropriate
respective subspaces to imply the consequent
subspace of the missing node
14Temporal Association Rule Formalization (3)
15Iterative EstimationMethod Workflow
16SQL-like Query Mining
- MINE
- IN
- SELECT ltnode-listgt
- FROM ltcluster-idgt
- WITH ltlargest-allowed-time-periodgt
- WHERE ltconsequent-node-setgt,
- ltinitial-relevant-subspace-list-expressiongt
- HAVING ltminimum-support-thresholdgt,
ltMCSS-thresholdgt, - ltminimum-confidence-thresholdgt
17Estimation Results Synopsis (1)
- Input
- One-year data of sensor readings sampled every 5
minutes embedded in the Huntington Botanical
Garden - Reported tuple from each sensor pod (temperature,
humidity, flux) - Output
- Estimated missing sensor temperature values
- Performance Measures
- MAE (Mean Absolute Error)
- Average Execution Time Per Round (in ms)
- Space Consumption (in kb)
18Sensor Network Spatial Map
19NASA Dataset Server
20Estimation Results Synopsis (2)
- Performance Measures
- MAE (Mean Absolute Error)
- 0.57 degree Celsius
- Absolute Error distribution (Temperature Error in
degree Celsius)