MultiTerminal Information Theory Problems in Sensor Networks

About This Presentation

Title:

MultiTerminal Information Theory Problems in Sensor Networks

Description:

Deputy Director, NSF Center for Embedded Networked Sensing (CENS) UCLA Henry Samueli School of Engineering and Applied Science. pottie_at_icsl.ucla.edu ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 52

Provided by: gregp7

Category:

more less

Transcript and Presenter's Notes

Title: MultiTerminal Information Theory Problems in Sensor Networks

1
Multi-Terminal Information Theory Problems in
Sensor Networks

Gregory J Pottie
Professor, Electrical Engineering Department
Associate Dean, Research and Physical Resources
Deputy Director, NSF Center for Embedded
Networked Sensing (CENS)
UCLA Henry Samueli School of Engineering and
Applied Science
pottie_at_icsl.ucla.edu

2
Outline

Context and general issues
Basic tools of information theory
Multi-terminal information theory
Research domains
Data fusion
Cooperative communication
Sensor network scalability
Network synchronization
Distributed large-scale systems

3
Sensor Network Operation
Cooperative communication
Data fusion
Routing
Basic goal detection/identification of point or
distributed sources subject to distortion
constraints, and timely notification of end user
4
Basic Information Theoretic Concepts

Typical Sets (of sufficiently long sequences of
i.i.d. variables)
Has probability nearly 1
The elements are equally probable
The number of elements is nearly 2nH

Xn
Yn
W
source
decoder
channel p(yx)
channel encoder

Aim of communications system
Minimize errors due to noise in channel
Maximize data rate
Minimize bandwidth and power (the resources)
Shannon Capacity establishes the fundamental
limits

5
Jointly Typical Sequences
Xn
Yn
X1n
X2n
Output set in general larger due to additive
noise Output images of inputs may overlap due to
noise
6
Basic Information Theoretic Concepts
Xn
Yn
W
source
decoder
channel encoder
channel p(yx)

Capacity C is the max mutual information I(XY)
wrt p(x) that is, choose the set X leading to
largest mutual information.
Capacity C is the largest rate at which
information can be transmitted without error
Jointly typical set from among the typical input
and output sequences, choose the ones for which
1/n log p(xn,yn) close to H(X,Y)
Size of jointly typical set is about 2nI(X,Y),
thus there are about this number of
distinguishable signals (codewords) in Xn
These codewords necessarily contain
redundancy--size of set is smaller than the
alphabet would imply sequences provide better
performance than isolated symbols if properly
chosen.

7
Gaussian Channel Capacity

Discrete inputs to channel, and channel adds
noise with Gaussian distribution (zero mean,
variance N)
Input sequence (codeword) power set to P
Capacity is maximum I(XY) over p(x) such that
EX2 satisfies power constraint
C 1/2 log(1P/N) bits per transmission.
The more usual form is to consider a channel of
bandwidth W and noise power spectral density No.
Then C W log(1P/NoW) bits per second.

8
Rate Distortion Lossy Source Coding

Rate distortion function R(D) can be interpreted
as
The minimum rate at which a source can be
represented subject to a distortion Dd(X,Y)
The minimum distortion that can be achieved given
a maximum rate constraint R
Interesting dual results to Capacity
Spend coding effort on distortion-typical set
rest are dont cares
Applies to compression of real-valued sequences

Achievable region
R
D
9
Universal Source Coding

Divide sequence into distortion-typical
(interesting) and distortion-atypical
(uninteresting) sets
Index for distortion typical set of small
length--consumes our coding effort atypical set
is large, but coding scheme not critical
Require systematic means of classifying sequences
as typical (promotion mechanism and distance
measure)
Gold washing algorithm typical set, plus
candidates

Distortion typical set
Atypical set
10
Source/Channel Coding Separation

For single link, separately performing source and
channel coding achieves optimal rates
Separate optimization greatly reduces theoretical
complexity
Classes of codes have been identified that get
very close to respective Shannon limits
Joint source/channel coding can reduce latency or
overall complexity, but infrequently used since
application-specific

11
Multi-Terminal Information Theory

The preceding discussion assumed a single
transmitter and receiver
Multi-terminal information theory considers
maximization of mutual information for the
following possibilities
Multiple senders and one receiver (the multiple
access channel)
One sender and multiple receivers (the broadcast
channel)
One sender and one receiver, but intervening
transducers that can assist (the relay channel)
Composite combinations of these basic types
Bayes estimation also aims to maximize mutual
information, except the senders do not cooperate
and usually there is a fidelity constraint
One sender and multiple receivers (the data
fusion problem)
Multiple senders and receivers (the source
separation problem)
Delay and resource usage may also be included

12
Gaussian Multiple Access Channel

m transmitters with power P sharing the same
noisy channel
C(P/N)1/2 log(1P/N) bits per channel use for
isolated sender
then the achievable rate region is

The last inequality dominates when rates are the
same
Capacity increases with more users (there is more
power)
Result is dual to Slepian-Wolf encoding of
correlated sources

13
Gaussian Broadcast Channel

One sender of power P and two receivers, one with
noise N1 and one with noise N2, N1 lt N2

The two codebooks are coordinated to exploit
commonality of information transmitted, otherwise
capacity does not exceed simple multiplexing

14
Relay Channel

One sender, one relay, and one receiver relay
transmits X1 based only on its observations Y1

Y1X1
Y
X

Combines a broadcast channel and a multiple
access channel
Networks are comprised of multiple relay channels
that may further induce delay

15
General Multi-Terminal Networks

m nodes, with node j with associated transmission
variable X(j), and receive variable Y(j)
Node 1 transmits to node m what is the maximum
achievable rate?

(X1,Y1)
(Xm,Ym)

Bounds derived from information flow across
multiple cut sets
generally not achievable

16
Costs of Source-Channel Separation

Source-channel coding separation theorem fails
because capacity of multiple access channels
increases with correlation, while source encoding
eliminates correlation
Greatly complicates search for optimal codes
raises question of whether joint coding would be
worth it
Gastpar has considered asymptotic cost of
separate rate-distortion and channel coding
Compare
Network rate-distortion coding, followed by
cooperative transmission
Joint rate-distortion and channel coding
Potentially exponentially better performance for
joint source and channel coding, in limit the
number of nodes n observing a Gaussian source
with comparable SNR goes to infinity.
Bound, not a prescription for how to do this!

17
Now let it move

Nodes move within bounded region according to
some random distribution what is capacity
subject to energy constraint on messages?

Node m
Node 1
Time 2
Time 1

Answer depends on delay constraint eventually
they will collide implying near-zero path loss
and thus unbounded capacity
Other questions
Probability the nodes have connecting path of
required rate
Probability of message arriving in required delay

18
Some Recent Research for Sensor Networks

Data fusion in sensor networks
N-helper problem
Cooperative communications in sensor networks
Scalability of sensor networks
Sensing for distributed sources
Network synchronization and rate distortion
Systems design

19
General Assumptions

Objective of network is to solve some (multi-)
hypothesis problem, subject to a set of fidelity
criteria, and convey the result to some end-user,
subject to resource constraints
Consequence fidelity criteria and resource
constraints allow meaningful optimization
questions to be posed
Communications is more costly than signal
processing
Consequence long distance communication is to be
avoided, if possible
Justification Shannon capacity and Maxwells
equations are fundamental SP power cost follows
Moores Law
Signals decay with distance of propagation
Consequence local distributed algorithms become
feasible
Justification true for all natural propagation
media

20
Rate Distortion and Data Fusion

Can identify resource use (energy/number of bits
transmitted) with rate,decision reliability
(false alarm rate, missed detection prob) with
distortion
Operate at different points on rate distortion
curve depending on valuesof cost function
Location of fusion center, numerical resolution,
number of sensors,length of records, routing,
distribution of processing all affect R(D)

21
A Simple Algorithm

Nodes activated to send requests for information
from other nodes based on SNR
If above threshold T, decision is reliable, and
suppress activity by neighbors
Otherwise, increase likelihood of requesting help
based on proximity to T
In likelihood, higher SNR nodes form the cluster
Bits of resolution related to SNR (e.g., for use
in maximal ratio combining)

1 high SNR initiates 2 activated, and
requests further information 3 SNR too low to
respond
3
2
1
3
22
Optimal Fusion and Information Theory

Bayes estimator maximizes the likelihood FX(xz)
where x is the state of nature and z is the set
of observables.
Define Zrz(1),z(2),,z(r)set of observations
to time r, then recursive form of the estimator
is
A variety of classical estimators then maximize
the likelihoods based on particular assumptions
regarding the priors
Fusion typically weighted combinations of
likelihoods to produce decision as sensors may
be very different, question of optimal weighting
scheme

23
Likelihood Opinion Pool
Sensor 1
F(Z1rx)
Sensor 2
F(Z2rx)
P
F(xZr)
. . .
F(x) Prior information
Sensor n
F(Znrx)
The hard part determination of the various
likelihoods
24
Likelihood Opinion Pool

Combine using the recursive rule
Taking logarithms on each side, followed by
expectations one obtains
Which can be interpreted as posterior
informationprior informationobservation
information thus can deal in summations of
mutual information obtained from different sensor
types (e.g., video plus audio).

25
Designing for Detection

In digital communications, choose modulation for
ease of estimation of decision variables and
subsequent selection of most likely signal
(hypothesis) we design signals for separability
In sensor networks, have no control over nature,
but we can control
Density and locations of sensors
Sensor types
These can be manipulated in same way, given a
fusion strategy, to ease signal separability or
achieve Nyquist sampling of source features.
This can also be done adaptively as we learn more
about the sources and the propagation environment
(in general, reduce model uncertainty)
Add sensors, and/or change types (e.g., new
deployment)
Move sensors
Articulate directional elements

26
Networked Info-Mechanical Systems
27
The n-helper Gaussian Scenario
X
Y1
Y2
Gateway/Fusion center
Y3

Yn

Multiple sensors observe event and generate
correlated Gaussian data. One data node (X) is
the main data source (e.g. closest to
phenomenon), and the n additional nodes (Y1 - Yn)
are the helpers.
The Problem What codes and data rates so that
gateway/data-fusion center can reproduce the data
from the main node using the remaining nodes as
sources of partial side information, subject to
some distortion criterion.

28
Main Result

We do not care about reproduction of the Y
variables rather they act as helpers to
reproduce X
This problem was previously solved for the 2-node
case
Key to extension treat YkYk-1,..X as single new
helper Pk.
Our solution for an admissable rate
(Rx,R1,,Rn), and for some Disgt0, the n-helper
system data rates can be fused to yield an
effective data rate (wrt source X) satisfying the
following rate distortion bound

where s2 is the variance and r is the correlation
(straightforward but tedious to calculate as n
increases).

29
Comments

Other source distributions analytically
difficult, but many are likely to be convex
optimizations
Generalization would consider instances of
relay/broadcast channels in conveying information
to fusion center with minimum energy
Many sensor network detection problems are
inherently local even though expression may be
complicated, the number of helpers will usually
be small due to decay of signals as power of
distance
Numerical results for Gaussian sources indicate a
small number of helpers lead to significant
improvement rapidly diminishing returns after
four or so for typical propagation conditions.
Suggests that source/channel coding separation
might in fact be good enough for many practical
situations (especially above the local
interaction)

30
Problem Definition of Cooperative Communication

Many low-power and low-cost wireless sensors
cooperate with each other to achieve more
reliable and higher rate communications
The dominant constraint is the peak power, the
bandwidth is not the main concern
Multiplexing (FDMA, TDMA, CDMA, OFDM) is the
standard approach. Each sensor has an unique
channel
We focus on schemes where multiple sensors occupy
the same channel

31
Example Space-Time Coding

N transmit antennas and N receive antennas
Channel transition matrix displays independent
Rayleigh (complex Gaussian) fading in each
component
With properly designed codes, capacity is N times
that of single Rayleigh channel
Note this implicitly assumes synchronization
among Tx and Rx array elements--requires special
effort in sensor networks
A coordinated transmission, not a multiple access
situation.

32
Context

Cooperative reception problem very similar to
multi-node fusionproblem same initiation
procedure required to create the cluster, however
we can choose channel code.
Cooperative transmission and reception similar to
multi-target multi-node fusion, but more can be
done beacons, space-time coding
Use to overcome gaps in network, communicate with
devicesoutside of sensor network (e.g. UAV)

33
Channel Capacity

Channel state information
known at transmitter side, and at both sides
If channel state information is known at the
transmit side, RF synchronization can be achieved
Channels
AWGN and fading channels with unequal path loss
General formula

34
Channel Capacity(contd)

Receive diversity
Transmit diversity
Combined transmit-receive diversity
RF synchronization

35
Comments

Capacity is much higher if phase synchronization
within transmitter and receiver clusters can be
achieved
Have investigated practical methods for
satellite/ground sensors synchronization
Beacons (e.g. GPS) can greatly simplify the
synchronization problem for ground/ground
cooperative communications
Recent network capacity results do not take into
account possibilities for cooperation by nodes as
transmitter/receiver clusters

36
Capacity in Ad Hoc Networks

Received signal power decays with distance, and
transmission power is limited
Frequency re-use is possible sophisticated
antenna/MIMO systems improve the constant
Nodes generate traffic, and can relay traffic
from other nodes
If did not generate traffic, then higher node
density implies greater network capability
(improved re-use)
All nodes alike
We will also relax this later

37
Transport Capacity of Wireless Networks

n Nodes within some fixed region A, with max
radio range R, bandwidth W, generating data.
Source-destination pairs random per node
transport capacity is then

38
Transport Capacity of Wireless Networks II

Note this is achieved by using simple relay
strategy one link at a time without cooperation
in transmission or reception (Gupta-Kumar) but
bad news continues even with optimal cooperation
(Gastpar-Vetterli)
The inverse square root of n behavior can be
roughly explained by average number of links
increasing in a path of a given length, each of
which must deal with more traffic to be carried,
with the same bandwidth.

39
Scaling in Ad Hoc Networks

The only solution when everyone generates traffic
is to add more resources as n increases
Traditional approach communication hierarchy
where we add new resources at each layer
Each level is limited in numbers
Traffic is aggregated and carried on set of
trunks of increasing bandwidth and thus capacity
Higher levels are longer distance, also limiting
latency

40
Scaling in Sensor Networks

Elements not only generate traffic but can
process data
Do not necessarily want or need to send raw
information to distant users with same
probability as near neighbors
Key to scalability is to change the
source-destination pair distribution to local
communication (in limit, most nodes in fact send
nothing)
Key to proof is to separately consider densities
of sources, sensors and communication relays, and
pose problem as extraction of information to
within particular fidelity (rate distortion)

41
Scalability for Point Sources in Sensor Networks

Cooperative rate distortion coding results in
most communication being local more nodes do not
necessarily result in more traffic under
distortion criterion
More relays reduce frequency re-use distance and
thus interference capacity can increase without
bound
Thus more nodes increase likelihood of extracting
information at desired fidelity

42
Comments

Number of bits a sensor reports is a complicated
function of density
Low density report nothing if SNR too low
High density may need to report only decision
Moderate density many nodes may need to locally
cooperate with mix of raw data and decision
likelihoods
Far away powerful sources will activate many
nodes with similar SNR, but a small subset of
nodes will be sufficient to make decisions
Design objective will be to minimize resources
required to suppress node activity

43
Scalability for Distributed Sources

To estimate parameters of a field (e.g., to get
isotherm map) information increases until achieve
desired spatial sampling
After this extra nodes contribute no additional
information, but can increase communication
resource
Image processing analogy specify pixel size
Parameters to describe local field can be compact
compared to raw data, for given level of
distortion

44
Practical Implementation

Dense network in neighborhood have mix of nodes
with different ranges, operating in separate
bands
Locally route towards the longer range links
they act as traffic attractors, causing number of
hops at any given layer to be small
Cooperative communication among nodes would serve
mainly to assure reliability of paths towards
next level of hierarchy
Result is a (largely) standard overlay
hierarchical network
Any cross-layer optimization (e.g., joint
source-channel coding) is confined to the local
neighborhood, since this is where most of the
resources are consumed in any scalable solution.

45
Network Synchronization

Synchronism is needed for wide set of purposes in
sensor networks
Coordination of power down/up for energy savings
Time stamping of data
Coherent combining in communication or sensing
(cooperative comm., fusion, position location)
Traditional approaches assume receivers/processors
always on, and provide same precision everywhere
by locking oscillators
Sensor networks are different
Do not need same level of synchronism at all
times and everywhere
Do need to save energy

46
Synchronization and Rate Distortion

Clocks are not explicitly locked rather record
differences of time scales to allow explicit
conversion.
References are passed either on a schedule or on
demand for post facto synchronization
Frequency and precision of updates (the rate)
depends on local accuracy requirement Dtj (the
time distortion)
Would like to bound rate subject to accuracy
requirements and acceptable delays in achieving
synchronism
Very similar issues for position localization

47
Implications of Signal Locality

Severe decay of signals with distance (second to
fourth power)
Mutual information to source dominated by small
set of nodes
Cooperative communication clusters for ground to
ground transmission will likely be small
Implications
Local processing is good enough for many
situations do not need to convey raw data over
long distances very frequently
Consequently, lowest layers of processing/network
formation, etc. are the most important, since
most frequently invoked (typical)
Practical example
Specialized local transmission schemes (e.g., for
forming ad hoc clusters), but long range might
use conventional methods such as TCP/IP

48
Hierarchy in Sensor Networks

For dealing with the network as a whole, number
of variations of topology are immense
Distributed algorithms exploiting locality of
events
Use of ensembles for deriving bounds
In between, considers layers of hierarchy, each
of which may be amenable to a conventional
optimization technique

49
Information Processing Hierarchy
Note difficulty of fully separating networking,
database and signal processing problems
transmit decision
human observer
beamforming
base stationhigh resolutionprocessing
query for more information
high powerlow false alarm ratelow duty cycle
low powerhigh false alarm ratehigh duty cycle
50
Some Research Challenges

Minimal energy to obtain reliable decision in a
distributed network
Minimal (average) delay in conveying information
through network
Density and source separability trades
Model uncertainty and methods for reducing its
effects
how do we know that we dont know?
Role of hierarchy how much leads to what kinds
of changes in information theoretic optimal
behavior
At small scale can use brute force, at large
scale can use ensembles what can we do in
between?
Exploitation of signal locality what is the
spatial domain over which cross-layer
optimization is useful

51
References

T. Cover and J. Thomas, Elements of Information
Theory. Wiley 1991.
G. Pottie and W. Kaiser, Wireless Integrated
Network Sensors, Commun. ACM, May 2000
M. Ahmed, Y-S. Tu, and G. Pottie, Cooperative
detection and communication in wireless sensor
networks, 38th Allerton Conf. On Comm., Control
and Computing, Oct. 2000.
Visit www.cens.ucla.edu technical reports section
for a variety of related papers and theses