Title: Scalable data handling in sensor networks
1Scalable data handling in sensor networks
- Deepak Ganesan
- Collaborators Ben Greenstein, Denis
Perelyubskiy, Deborah Estrin (UCLA) , John
Heidemann, Ramesh Govindan (USC/ISI)
2Outline
- Data challenges in high-bandwidth sensor networks
- Instance Wireless structural monitoring
- Transitioning from data acquisition systems to
distributed storage and search - Generation I Economical wireless data
acquisition systems using motes Under
Preparation - Performance analysis over structural vibration
data. - Generation II Long-lived, distributed storage
and search systems Sensys03 - Performance analysis over geo-spatial data
- Other research directions
- Optimal node placement and transmission structure
under distortion bounds. IPSN04
3Scaling high-bandwidth wireless sensor network
deployments
- We have made a good start at building scalable,
long-term sensor network deployments that deal
with low data rate applications. - Notable Examples
- Micro-climate monitoring system at James Reserve
(CENS-UCLA), Bird monitoring at Great Duck Island
(Intel-U.C.Berkeley) - Characteristics
- low-data rate (few samples/minute), medium-scale
(100s of nodes) deployments. - Scaling techniques
- Duty-cycling low-power listen/transmit, simple
aggregation schemes (TinyDiffusion/TinyDB). - We have very little understanding of how to scale
high-bandwidth sensor network applications
(involving vibration/acoustic/image sensors)
where significant data rates can be expected. - How do we deal with applications that have
predominantly relied on data collection?
4Challenges in Wireless Structural Monitoring
need more
- High Data Rates
- 100Hz, 16bit sample, 15min shaking events.
- Resource-constrained motes
- 6MHz processor, 4KB RAM, 4MB Flash Memory (40
mins of vibration data) - Diverse user requirements
- Data collection of interesting event signatures
of vibration events. - Analysis of data over different time-scales
(long-term and short-term patterns) - State of Art Expensive wireless data acquisition
systems using 802.11
5Transitioning from centralized to distributed
storage and search.
Method Wireless/Wired data acquisition
systems Advantage Centralized, persistent
storage and unconstrained search. Disadvantage
Expensive, Cumbersome, Highly power-inefficient.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
6Transitioning from centralized to distributed
storage and search.
Method Sensor node-based multi-hop data
acquisition systems Advantage Cheap, Easy to
use, centralized storage, more scalable Disadvanta
ge Power-inefficient.
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
7Transitioning from centralized to distributed
storage and search.
Method Distributed Storage and Search Advantage
Power Efficient, Flexible use Disadvantage
Non-persistent, Restricted storage and search
Multi-hop Wireless Data Acquisition using Motes
Distributed In-network storage and search
Current Data Acquisition Systems
8Building Multi-hop Wireless Data Acquisition
Systems Using Motes
- Goals
- Near real-time monitoring
- Reliable, synchronized data transfer
- Challenge
- Limited network bandwidth, hence high latency.
- How can we build a low-latency data-acquisition
system?
15 minutes of vibration event data (100KB each
after Huffman coding) from a 20 node multi-hop
wireless network takes 4-8 hours to collect
centrally!
9Progressive, On-demand Data Collection
- Progressive Data Acquisition
- Each node stores its data in its local storage
and transmits low-resolution summaries to the
base-station immediately after event. - User can analyze low-resolution data to determine
nodes from which higher resolution data is
required. - Lossless data is collected from a subset of nodes
on-demand within a window of time (before being
phased out of nodes local storage) - What did we achieve?
- Low-latency lossy data acquisition
- Lossless data acquisition on-demand.
Low-resolution data for 15 minutes of vibration
event data can be collected within 15-30 minutes
of event occurrence
10Performance Evaluation
- Choice of compression scheme
- Appropriateness for structural vibration data
- Performance metric Compression ratio, error
(rms, psnr) - Efficient implementation on resource-constrained
devices Motes - Power, memory and processing time
- Study performed on structural vibration data from
shaker table tests - CUREE-Kajima Joint Research Program, UCLA -
Thomas Kang, John Wallace
11Why wavelets?
- Most of the energy is concentrated in the lower
frequency subbands. - Signal decomposition suggests that it is highly
appropriate for wavelet compression.
12Mote implementation of Wavelet Codec
13Compression Ratio and Error for Mote
Implementation
17-fold reduction in data size with an RMS error
of 3.1 (PSNR 30dB)
- Good compression ratios can be achieved with low
error
14Transitioning towards long-term deployments
- We achieved low-latency wireless
data-acquisition, but our deployment lifetimes
were still short. - Data Acquisition systems with motes can last for
a few weeks. - How do our system objectives change for a
long-term deployment? - Need smooth transition for researchers who have
depended on data collection systems - system should retain ability to collect new event
signatures on demand. - Need to achieve very low energy usage for long
lifetime - system focus has to shift from data collection to
in-network data storage and search. - Goal Build a networked storage and search system
15Can existing storage and search systems satisfy
design goals?
16Approach Provide a gracefully degrading storage
- A distributed sensor network is a collection of
nodes sensing spatio-temporally correlated data
and possessing a comparatively larger distributed
storage facility. - A gracefully degrading storage model provides two
benefits - Retains the ability to gather data on-demand.
- Offers tradeoff between resolution and query
accuracy Lower resolution data offers lower
query quality but incurs less storage overhead,
and vice-versa. - Questions
- How do we build a gracefully degrading networked
store? - Can we efficiently query the distributed data
store?
17Related Work
- Data Storage in sensor networks
- Event Storage DCS (Ratnasamy Hotnets 2000)
- Indexing schemes DIMS (Li Sensys 2003), DIFS
(Greenstein SPNA 2002) - Multi-resolution computation
- Beyond Average (Hellerstein IPSN 2003)
- Edge detection (Nowak IPSN 2003)
- Wavelet-based compression
- Structural-health monitoring (Lynch-2003)
- Sensor network databases
- Directed Diffusion (Heidemann, Estrin), TinyDB
(Madden), Cougar (Bonnet)
18Key Design Ideas
- Construct distributed load-balanced quad-tree
hierarchy of lossy wavelet-compressed summaries
corresponding to different resolutions and
spatio-temporal scales. - Queries drill-down from root of hierarchy to
focus search on small portions of the network. - Progressively age summaries for long-term storage
and graceful degradation of query quality over
time.
Level 2
Level 1
Level 0
PROGRESSIVELY AGE
PROGRESSIVELY LOSSY
19Constructing the hierarchy
Initially, nodes fill up their own storage with
raw sampled data.
20Constructing the hierarchy
- Tesselate the network space into grids, and hash
in each to determine location of clusterhead
(ref DCS). - Send wavelet-compressed local time-series to
clusterhead.
21Processing at each level
Store incoming summaries locally for future
search.
Get compressed summaries from children.
time
Decode
Re-encode at lower resolution and forward to
parent.
y
x
Wavelet encoder/decoder
22Constructing the hierarchy
Recursively send data to higher levels of the
hierarchy.
23Distributing storage load
Hash to different locations over time to
distribute load among nodes in the network.
24Drill-down query processing
User hashes to location where the root is
located. The drill-down query is routed down till
it reaches base.
25Designing an aging policy for summaries
- Eventually, all available storage gets filled,
and we have to decide when and how to drop
summaries.
Local Storage Allocation
Res 3
Res 1
Res 2
Local storage capacity
How do we allocate storage at each node to
summaries at different resolutions to provide
gracefully degrading storage and search
capability?
26Match system performance to user requirements
95
Query Accuracy
50
Quality Difference
past
Time
present
- Objective Minimize worst case difference between
user-desired query quality (blue curve) and query
quality that the system can provide (red step
function).
27How do we determine the step function?
- Height What is the dip in query accuracy when
resolution i becomes unavailable? - What types of queries are being posed (T)?
- For each query, q, what is the expected query
error when drill-down queries terminate at level
i1, Error(q,i) ?
- Width How long is resolution i stored within the
network before being aged? - Storage allocated to resolution i at each node
(Si) - Total number of nodes in the network (N)
- What rate is assigned to resolution i (Ri)?
28Storage Allocation Constraint-Optimization
problem
- Objective Find si, i1..log4N that
- Given constraints
- Storage constraint Each node cannot store any
greater than its storage limit. - Drill-down constraint It is not useful to store
finer resolution data if coarser resolutions of
the same data is not present.
29Determining Rate and Drilldown query error
How do we determine communication rates?
- Assume Rates are fixed a-priori by
communication/network lifetime constraints.
How do we determine the drill-down query error
when prior information about deployment and data
is limited?
30Prior information about sampled data
full a priori information
Omniscient Strategy Infeasible. Use all data to
decide optimal allocation.
Solve Constraint Optimization
Training Strategy (can be used when small
training dataset from initial deployment).
1 2 4
Greedy Strategy (when no data is available, use a
simple weighted allocation to summaries).
Finer
Finest
Coarse
No a priori information
31Distributed trace-driven implementation
- Linux implementation for ipaq-class nodes
- uses Emstar (cite below), a Linux-based
emulator/simulator for sensor networks. - 3D Wavelet codec based on freeware by Geoff Davis
available at http//www.geoffdavis.net. - Query processing in Matlab.
- Geo-spatial precipitation dataset
- 15x12 grid (50km edge) of precipitation data from
1949-1994, from Pacific Northwest. (Caveat Not
real sensor data). - System parameters
- compression ratio 6122448.
- Training set 6 of total dataset.
M. Widmann and C.Bretherton. 50 km resolution
daily precipitation for the Pacific Northwest,
1949-94.
32Queries posed over precipitation data
- Use queries at different spatio-temporal scales
to evaluate the performance of schemes - Choosing a Query Set
- GlobalYearlyEdge look for spatio-temporal
feature (edge between high and low precipitation
areas). - LocalYearlyMean fine spatial and coarse temporal
granularity - GlobalDailyMax coarse spatial and fine temporal
granularity - GlobalYearlyMax coarse spatio-temporal
granularity
33How efficient is search?
Search is very efficient (lt5 of network queried)
and accurate for different queries studied.
34Comparing Aging Schemes
Training performs within 1 to optimal . Results
with greedy algorithm are sensitive to weights.
35Summary
- Provide smooth transition from current data
acquisition systems to fully distributed storage
and search systems. - Use progressive transmission wireless
data-acquisition systems as intermediate step - Support long-term storage and querying in
resource-constrained sensor network deployments. - Summarization and in-network storage of data
- Training-based optimization to determine system
parameters.
36Power-Efficient Sensor Placement and Transmission
Structure for Data Gathering under Distortion
Constraints
- Collaborators Razvan Cristescu, Baltasar
Beferrul-Lozano (EPFL, Switzerland) - to appear at IPSN 2004
37Problem Motivation and Description
- Motivation
- The vision of thousands of 10 nodes is
unrealistic in the near (10 year) term due to
economies of scale and cost of sensors. - Need to add constraint of limited nodes to
optimization. - A user has a bag of N nodes. He/She needs to
place the nodes in a region A such that the
sensed field can be reconstructed with - maximum distortion for any point in A is less
than Dmax - Average distortion over the entire region is less
than Davg - How does the user place the nodes and construct
their communication structure for data gathering
to a sink such that the total multi-hop
communication power is minimized?
38Complexity of the problem
- Interplay of two difficult problems
- Find feasible placements that satisfy distortion
bounds. - Find most energy-efficient transmission
structures for each placement (NP-complete) - Simple Example Given configurations I and II,
which would you choose? - Node B is closer to the base-station, hence
transmits its data over less distance - Node B is close to A, therefore, better
correlated. A can jointly compress their data
which will result in lower energy overhead. - Optimal solution is involves finding the most
power-efficient transmission structure among all
feasible placements and possible transmission
structures.
I
II
39Model and Assumptions
- Sensing Model
- Jointly Gaussian model for spatial data with
exponential decaying covariance function. - Data aggregation model
- Each node on tree jointly compress data from its
entire sub-tree (eg Huffman/Arithmetic coding) - Sink data reconstruction model
- Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points. - Communication model
- Power-per-bit varies super-linearly with
separation between transmitter and receiver
40Model and Assumptions
- Data Correlation Model
- Jointly Gaussian model for spatial data
- Sink data reconstruction model
- Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points. - Data aggregation model
- Each node on tree jointly compresses data from
its entire sub-tree jointly - Communication model
- Path-loss model
- Jointly Gaussian model for spatial data, X,
measured at nodes, with an N-dimensional
multivariate normal distribution Gn(ยต,K)
Covariance matrix
41Model and Assumptions
- Data Correlation Model
- Jointly Gaussian model for spatial data
- Sink data reconstruction model
- Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points. - Data aggregation model
- Each node on tree jointly compresses data from
its entire sub-tree jointly - Communication model
- Path-loss model
42Model and Assumptions
- Data Correlation Model
- Jointly Gaussian model for spatial data
- Sink data reconstruction model
- Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points. - Data aggregation model
- Each node on tree jointly compresses data from
its entire sub-tree jointly - Communication model
- Path-loss model
43Model and Assumptions
- Data Correlation Model
- Jointly Gaussian model for spatial data
- Sink data reconstruction model
- Nearest neighbor reconstruction is used to
reconstruct the field given a set of sampled
points. - Data aggregation model
- Each node on tree jointly compresses data from
its entire sub-tree jointly - Communication model
- Path-loss model
d
? 2 in free space, 2 lt ? lt 4 typically
44Optimization Problem 1-D Case
- Minimize total power
- Subject to
- Maximum distortion constraint
- Average distortion constraint
- Total area coverage constraint
- Solve using Lagrangian relaxation and numerical
constraint-optimization solving
45Extend results to 2-D instance
- Construct a wheel, with nodes on each radial
spoke being placed optimally using our 1D
placement solution. - Additional constraints
- Given N nodes, how do we decide number of nodes
per spoke and number of spokes? - How do we ensure that Voronoi cells satisfy the
average and max distortion bounds?
46Performance gains over uniformly random placement
and Shortest-path trees
- One dimensional placement
- 1-3 fold reduction in power consumption for 10-20
node linear placements - Two dimensional placement of 100-200 nodes
- Typically one order of magnitude reduction in
total power consumption. - Two orders of magnitude reduction in bottleneck
energy consumption (i.e. for node near sink)! - Other interesting observations
- Network implodes i.e. with such placement, the
farthest nodes from the base station are the
first to die and nodes nearest to the sink are
the last to die. - This is the behavior that we need since nodes
near the sink are most important for routing.
47The End