Title: DIMENSIONS: Why do we need a new Data Handling architecture for sensor networks
1DIMENSIONS Why do we need a new Data Handling
architecture for sensor networks?
- Deepak Ganesan, Deborah Estrin (UCLA), John
Heidemann (USC/ISI) - Presenter Vijay Sundaram
2Deployment Microclimate monitoring at James
Reserve Park (UC Riverside)
How well does data fit model ltMgt of variation of
temperature with altitude.
Send robotic agent to edge between low and high
precipitation regions
Weather Sensor Network
Get detailed data from node with maximum
precipitation from Sept to Dec 2003
HmmI wonder why packet-loss is so high. Get a
connectivity map of the network for all transmit
power settings
3Goals
- Flexible spatio-temporal querying
- Provide ability to mine for interesting patterns
and features in data. - Drill-down on details
- Distributed Long-term networked data storage
- Preserve ability for long-term data mining, while
catering to node storage constraints - Performance
- Reasonable Accuracy for wide range of queries
- Low communication (energy) overhead
4How can we achieve goals?
- Exploit redundancy in data
- Potentially huge gains from lossy compression
exploiting spatio-temporal correlation - Exploit rarity of interesting features
- Preserve only interesting features.
- Exploit scale of sensor network.
- large distributed storage, although limited local
storage. - Exploit low cost of approximate query processing
- allow approximate query processing that obtain
sufficiently accurate responses.
5Can existing systems satisfy design goals?
6DIMENSIONS Design Key Ideas
- Construct hierarchy of lossy compressed summaries
of data using wavelet compression. - Queries drill-down from root of hierarchy to
focus search on small portions of the network. - Progressively age lossy data along
spatio-temporal hierarchy to enable long-term
storage
Level 2
Level 1
PROGRESSIVELY LOSSY
PROGRESSIVELY AGE
Level 0
7Roadmap
- Why wavelets?
- Example Precipitation Hierarchy
- Spatial and Temporal Processing internals
- Initial Results Precipitation Dataset
8Enabling Technique Wavelets
- Very popular signal processing approach, that
provides good time and frequency localization. - JPEG2000, Geo-Spatial Data Mining
- preserves spatio-temporal features (edges,
discontinuities) while providing good
approximation of long-term trends in data - Efficient distributed implementation possible.
9Sample Architecture Precipitation Hierarchy
What is the maximum precipitation between
Sept-Dec 2002?
- Local Processing Construct lossy time-series
summary (zero communication cost) - Spatial Data Processing Hierarchical Lossy
Compression - Organize network into hierarchy. At each higher
level, reduce number of participating nodes by a
factor of 4. - At each step of the hierarchy, summarize data
from 4 quadrants, and propagate
Direct query to quadrant that best matches
query
decreasing spatial resolution
decreasing temporal resolution
10Spatial Decomposition
- Recursively split network into non-overlapping
square grids. - At each level of the hierarchy,
- Elect clusterhead
- Cluster-head combines and summarizes data from 4
quadrants - Cluster-head propagates compressed data to the
next level of the hierarchy. - Routing protocol GPSR variant (DCS - Ratnasamy
et al,)
Hierarchy construction
11Wavelet Compression Internals
Compressed Output
Thresholding Quantization Drop Subbands
Wavelet Subband Decomposition
Lossless Encoder
Input Data
time
y
Filter
x
Cost Metric
- Communication Budget
- Error bound
- Haar Filter
- Debauchies 9/7 filter
12Initial Results with Precipitation Dataset
Communication Overhead
- 15x12 grid (50km edge) of precipitation data from
1949-1994, from Pacific Northwest. Gridded
before processing. - Handpicked choice of threshold, quantization
intervals, subbands to drop. Huffman Encoder at
output. - Very large compression ratio up the hierarchy
M. Widmann and C.Bretherton. 50 km resolution
daily precipitation for the Pacific Northwest,
1949-94.
13Find maximum annual precipitation for each year.
- Exact Answer for 89 of queries. Within 90 of
answer for gt95 of queries. - Queries require less than 3 of network.
- Good performance on average with very low lookup
overhead
14Locate boundary in annual precipitation between
Low and High Precipitation Areas
- Error Metric Number of nodes greater than 1
pixel distance from drill-down boundary - Accuracy Within 25 error for 93 of the queries
(or within 13 error for 75 of the queries) - Less than 5 of the network queried.
15Open Issues
- Load Balancing and Robustness
- Hierarchical Model vs Peer Model lot of work in
p2p systems - Irregular Node Placement
- Use wavelet extensions for irregular node
placement. Computationally more expensive - Gridify dataset with interpolation
- Providing Query Guarantees
- Can we bound error in response obtained for a
drill-down query at a particular level of
hierarchy? - Implementation on IPAQ/mote network
16Summary
- DIMENSIONS provides a holistic data handling
architecture for sensor networks that can - Support a wide range of sensor-network usage and
query models (using drill-down querying of
wavelet summaries) - Provide a gracefully degrading lossy storage
model (by progressively ageing summaries) - Offer ability to tune energy expended for query
performance. (tunable lossy compression)
17Different optimization metrics
18Other Examples Packet Loss
- Different example of dataset that exhibits
spatial correlation - Throughput from one transmitter to proximate
receivers is correlated - Throughput from multiple proximate transmitters
to one receiver is correlated. - Typically, what we want to query is the
deviations from normal and average throughput.
19Packet-Loss Dataset Get Throughput Vs Distance
Map
- Involves expensive transfer of 12x14 map from
each node. - Good approximate results can be obtained from
querying compressed data.
20Long-term Storage Concepts
- Data is progressively aged, both locally, and
along the hierarchy. - Summaries that cover larger areas and longer
time-periods are retained for much longer than
raw time-series.
21Load Balancing and Robustness Concepts
- Hierarchical Model
- Naturally fits wavelet processing
- Strict hierarchies are vulnerable to node
failures. Failures near root of hierarchy can be
expensive to repair - Decentralized Peer Model
- Summaries communicated to multiple nodes
probabilistically. - Better robustness, but incurs greater
communication overhead.