Title: Median and Beyond: New Aggregation Techniques for Sensor Networks
1 Median and Beyond New Aggregation Techniques
for Sensor Networks
EECS 600 Advanced Network Research, Spring 2005
Hongbo Jiang April 11, 2005
2Context
- Introduction
- Background Related Work
- Q-digest
- Queries on q-digest
- Experiment
- Discussion
3Introduction wireless sensor networks
- Limitation
- Computation capability, communication bandwidth,
battery power...... - Unreliability
- Inherent unreliability of sensing function. That
is, individual sensor readings are inherently
unreliable. - To address these two aspects, let us look insight
some methods for communication in query
processing
4Typical Models (I)
- Each node collects data and this data needs to be
delivered to the users through the network
interconnection. - How to accomplish this? Let each sensor node
deliver its data periodically to host computer
(base station), where the data can be assembled
for subsequent analysis. - Drawback
- Excessive communication.
5Typical Models (II)
- Exploit the multi-hop routing protocols in sensor
networks in such a way that messages from
multiple nodes are combined en-route from the
sensor nodes to base station. - Routing tree with the base station as the root.
- Drawback It suffers from the problem of larger
message sizes as information passes through the
routing tree from the leaf nodes to the base
station.
6TinyDB Cougar In-networks Aggregation
- Observations
- The individual sensor values do not hold much
value. - Extracting all the data out of a sensor network
is very inefficient in terms bandwidth and power
usage. - In-network aggregation
- Compute aggregation value such as AVG, SUM, COUNT
and MIN/MAX, over routing tree, minimizing both
the number of messages as well as the size of the
message. - Drawback not suitable for some application
(sophisticated analysis by computing median,
quantiles, and consensus measures.).
7AVG vs. MEDIAN
- To compute AVG, every node sends two integers to
its parent - One representing the sum of all data values of
its children - Total number of its children.
- That is, AVG can be computed by using constant
memory and by sending constant sized messages. - To compute MEDIAN, we need to keep track of all
distinct values and thus the message size and
memory required to store it grows linearly with
the size of the networks.
8This paper focus on
- Approximation schemes
- 100 accurate is not necessary
- Approximation scheme in this paper based on
q-digest can be adapted to meet any user
specified tolerance at the expense of higher
memory and bandwidth consumption. - Q-digest
- A novel data structure provides guarantees on
approximation error and maximum resource
consumption.
9Context
- Introduction
- Background Related Work
- Q-digest
- Queries on q-digest
- Experiment
- Discussion
10Assumption
- Each sensors reading is assumed to be an integer
value in the range 1,s sis the maximum
possible value of the signal. - Query from BS The sensors organize themselves in
a spanning tree. (Q How can we get this spanning
tree?) - Link quality is perfect no packets loss
11An aggregation such as MEDAIN is more difficult
than MIN, MAX, or AVERGAGE
- Under the natural assumption that each sensor
only forwards a fixed amount of data, it is easy
to argue that one cannot calculate the median (or
any other quantile) precisely. - Example A B U C
- B 1,2,3 C1000,1001,1002
- Then only approximation is possible.
12Context
- Introduction
- Background Related Work
- Q-digest
- Queries on q-digest
- Experiment
- Discussion
13Q-digest
- Interesting properties
- Error-Memory Trade-off
- Confidence Factor
- Multiple Queries
14Example of q-digest
- A q-digest consists of a set of buckets of
different sizes and their associated counts. - The depth of the tree T is log(s)
- Each node v can be considered a bucket and has a
range (v.min, v.max) - The size of the q-digest is determined by a
compression parameter k. (Q How to determine the
compression parameter k?)
15Q-digest property
- Property
- i) count(v) lt n/k (for a non-leaf node)
- ii) count(v) count(vp) count(vs) gt n/k
(for a non-root node) - Comments
- i) assert that unless it is a leaf node, no node
should have a high count. This property will be
used later to prove error bounds on q-digest - ii) says that we should not have a node and its
children with low counts. The intuition behind
this property is that if two adjacent buckets
which are siblings have low counts, then we do
not want to include tow separate counters for them
16Building a q-digest
- An exact representation of the data will consist
of the frequencies f1,f2,,fs - To construct the q-digest we will hierarchically
merge and reduce the number of buckets
(bottom-up) - Comment
- Detailed information concerning data values which
occur frequently (such as node with 4 and 6) are
preserved in the digest. - While less frequently occurring values (such as
node d) are lumped into larger buckets resulting
in information loss.
17Merging q-digests
- Take union of the two q-digest and add the counts
of buckets with the same range (min,max) - Then compress the result, to build a new q-digest
18Representation of a q-digest
- To represent a q-digest tree in a compact fashion
we number the nodes from 1 to 2s-1 in a lever by
lever order - To transmit the q-digest we send a set of tuple
of the following form ltnodeid(v), count(v)gt which
requires a total of (log(2s)log n) bits for each
tuple. Q Why?
19Representation of a q-digest
- Example
- lt1,1gt,lt6,2gt,lt7,2gt,lt10,4gt,lt11,6gt
20Context
- Introduction
- Background Related Work
- Q-digest
- Queries on q-digest
- Experiment
- Discussion
21Queries on q-digest
- Quantile Query Given a fraction q belongs to
(0,1), find the value whose rank in sorted
sequent of the n value is nq. - Sort the nodes of q-digest in increasing right
endpoints (max values) - Post-order traversal of list nodes in q-digest
- Scan list L (from the beginning) and add the
counts of nodes as they are seen. For some node
v, this sum becomes more than qn, we report v.max
as our estimate of the quantile. - Example MEDIAN query on q-digest Q shown in Fig
1.
22Example
- Q MEDIAN query on q-digest Q shown in Fig 1. -
lt1,1gt,lt6,2gt,lt7,2gt,lt10,4gt,lt11,6gt - A (post-order)
- The sorted list is lt10,4gt,lt11,6gt,lt6,2gt,lt7,2gt,lt1,1
gt - The count at node lt11,6gt will be more than 0.5n
(8). Then the answer is 4
23Queries on q-digest
- Other queries
- Inverse quantile
- Range Query
- Consensus Query
- The method is similar
24Context
- Introduction
- Background Related Work
- Q-digest
- Queries on q-digest
- Experiment
- Discussion
25Experimental Evaluation
- Setup
- C
- Network topology routing tree
- All pair of nodes within a fixed radio range can
be considered as neighbors. - 10001000 area and 1000 sensors (then 20002000
area and 2000 sensors) - Sensor data value random and correlated
26Range queries and Histogram
27Accuracy and Message Size
28Accuracy and Message Size
- What is the list?
- Q Why the list-correlated and list-random looks
different.
29Accuracy and Message Size
- Given a message size m, we ask the question What
fraction of total nodes transmitted messages of
size large then m?
30Total Data Transmission
- Sine the number of distinct values is less for
correlated scenario, the amount of data
transferred is lower for correlated data.
31Thanks