Median and Beyond: New Aggregation Techniques for Sensor Networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Median and Beyond: New Aggregation Techniques for Sensor Networks

1
Median and Beyond New Aggregation Techniques
for Sensor Networks
EECS 600 Advanced Network Research, Spring 2005
Hongbo Jiang April 11, 2005
2
Context

Introduction
Background Related Work
Q-digest
Queries on q-digest
Experiment
Discussion

3
Introduction wireless sensor networks

Limitation
Computation capability, communication bandwidth,
battery power......
Unreliability
Inherent unreliability of sensing function. That
is, individual sensor readings are inherently
unreliable.
To address these two aspects, let us look insight
some methods for communication in query
processing

4
Typical Models (I)

Each node collects data and this data needs to be
delivered to the users through the network
interconnection.
How to accomplish this? Let each sensor node
deliver its data periodically to host computer
(base station), where the data can be assembled
for subsequent analysis.
Drawback
Excessive communication.

5
Typical Models (II)

Exploit the multi-hop routing protocols in sensor
networks in such a way that messages from
multiple nodes are combined en-route from the
sensor nodes to base station.
Routing tree with the base station as the root.
Drawback It suffers from the problem of larger
message sizes as information passes through the
routing tree from the leaf nodes to the base
station.

6
TinyDB Cougar In-networks Aggregation

Observations
The individual sensor values do not hold much
value.
Extracting all the data out of a sensor network
is very inefficient in terms bandwidth and power
usage.
In-network aggregation
Compute aggregation value such as AVG, SUM, COUNT
and MIN/MAX, over routing tree, minimizing both
the number of messages as well as the size of the
message.
Drawback not suitable for some application
(sophisticated analysis by computing median,
quantiles, and consensus measures.).

7
AVG vs. MEDIAN

To compute AVG, every node sends two integers to
its parent
One representing the sum of all data values of
its children
Total number of its children.
That is, AVG can be computed by using constant
memory and by sending constant sized messages.
To compute MEDIAN, we need to keep track of all
distinct values and thus the message size and
memory required to store it grows linearly with
the size of the networks.

8
This paper focus on

Approximation schemes
100 accurate is not necessary
Approximation scheme in this paper based on
q-digest can be adapted to meet any user
specified tolerance at the expense of higher
memory and bandwidth consumption.
Q-digest
A novel data structure provides guarantees on
approximation error and maximum resource
consumption.

9
Context

Introduction
Background Related Work
Q-digest
Queries on q-digest
Experiment
Discussion

10
Assumption

Each sensors reading is assumed to be an integer
value in the range 1,s sis the maximum
possible value of the signal.
Query from BS The sensors organize themselves in
a spanning tree. (Q How can we get this spanning
tree?)
Link quality is perfect no packets loss

11
An aggregation such as MEDAIN is more difficult
than MIN, MAX, or AVERGAGE

Under the natural assumption that each sensor
only forwards a fixed amount of data, it is easy
to argue that one cannot calculate the median (or
any other quantile) precisely.
Example A B U C
B 1,2,3 C1000,1001,1002
Then only approximation is possible.

12
Context

Introduction
Background Related Work
Q-digest
Queries on q-digest
Experiment
Discussion

13
Q-digest

Interesting properties
Error-Memory Trade-off
Confidence Factor
Multiple Queries

14
Example of q-digest

A q-digest consists of a set of buckets of
different sizes and their associated counts.
The depth of the tree T is log(s)
Each node v can be considered a bucket and has a
range (v.min, v.max)
The size of the q-digest is determined by a
compression parameter k. (Q How to determine the
compression parameter k?)

15
Q-digest property

Property
i) count(v) lt n/k (for a non-leaf node)
ii) count(v) count(vp) count(vs) gt n/k
(for a non-root node)
Comments
i) assert that unless it is a leaf node, no node
should have a high count. This property will be
used later to prove error bounds on q-digest
ii) says that we should not have a node and its
children with low counts. The intuition behind
this property is that if two adjacent buckets
which are siblings have low counts, then we do
not want to include tow separate counters for them

16
Building a q-digest

An exact representation of the data will consist
of the frequencies f1,f2,,fs
To construct the q-digest we will hierarchically
merge and reduce the number of buckets
(bottom-up)
Comment
Detailed information concerning data values which
occur frequently (such as node with 4 and 6) are
preserved in the digest.
While less frequently occurring values (such as
node d) are lumped into larger buckets resulting
in information loss.

17
Merging q-digests

Take union of the two q-digest and add the counts
of buckets with the same range (min,max)
Then compress the result, to build a new q-digest

18
Representation of a q-digest

To represent a q-digest tree in a compact fashion
we number the nodes from 1 to 2s-1 in a lever by
lever order
To transmit the q-digest we send a set of tuple
of the following form ltnodeid(v), count(v)gt which
requires a total of (log(2s)log n) bits for each
tuple. Q Why?

19
Representation of a q-digest

Example
lt1,1gt,lt6,2gt,lt7,2gt,lt10,4gt,lt11,6gt

20
Context

Introduction
Background Related Work
Q-digest
Queries on q-digest
Experiment
Discussion

21
Queries on q-digest

Quantile Query Given a fraction q belongs to
(0,1), find the value whose rank in sorted
sequent of the n value is nq.
Sort the nodes of q-digest in increasing right
endpoints (max values)
Post-order traversal of list nodes in q-digest
Scan list L (from the beginning) and add the
counts of nodes as they are seen. For some node
v, this sum becomes more than qn, we report v.max
as our estimate of the quantile.
Example MEDIAN query on q-digest Q shown in Fig
1.

22
Example

Q MEDIAN query on q-digest Q shown in Fig 1. -
lt1,1gt,lt6,2gt,lt7,2gt,lt10,4gt,lt11,6gt
A (post-order)
The sorted list is lt10,4gt,lt11,6gt,lt6,2gt,lt7,2gt,lt1,1
gt
The count at node lt11,6gt will be more than 0.5n
(8). Then the answer is 4

23
Queries on q-digest

Other queries
Inverse quantile
Range Query
Consensus Query
The method is similar

24
Context

Introduction
Background Related Work
Q-digest
Queries on q-digest
Experiment
Discussion

25
Experimental Evaluation

Setup
C
Network topology routing tree
All pair of nodes within a fixed radio range can
be considered as neighbors.
10001000 area and 1000 sensors (then 20002000
area and 2000 sensors)
Sensor data value random and correlated

26
Range queries and Histogram
27
Accuracy and Message Size
28
Accuracy and Message Size

What is the list?
Q Why the list-correlated and list-random looks
different.

29
Accuracy and Message Size

Given a message size m, we ask the question What
fraction of total nodes transmitted messages of
size large then m?

30
Total Data Transmission

Sine the number of distinct values is less for
correlated scenario, the amount of data
transferred is lower for correlated data.

Median and Beyond: New Aggregation Techniques for Sensor Networks PowerPoint PPT Presentation