Title: MultiTerminal Information Theory Problems in Sensor Networks
1Multi-Terminal Information Theory Problems in
Sensor Networks
- Gregory J Pottie
- Professor, Electrical Engineering Department
- Associate Dean, Research and Physical Resources
- Deputy Director, NSF Center for Embedded
Networked Sensing (CENS) - UCLA Henry Samueli School of Engineering and
Applied Science - pottie_at_icsl.ucla.edu
2Outline
- Context and general issues
- Basic tools of information theory
- Multi-terminal information theory
- Research domains
- Data fusion
- Cooperative communication
- Sensor network scalability
- Network synchronization
- Distributed large-scale systems
3Sensor Network Operation
Cooperative communication
Data fusion
Routing
Basic goal detection/identification of point or
distributed sources subject to distortion
constraints, and timely notification of end user
4Basic Information Theoretic Concepts
- Typical Sets (of sufficiently long sequences of
i.i.d. variables) - Has probability nearly 1
- The elements are equally probable
- The number of elements is nearly 2nH
Xn
Yn
W
source
decoder
channel p(yx)
channel encoder
- Aim of communications system
- Minimize errors due to noise in channel
- Maximize data rate
- Minimize bandwidth and power (the resources)
- Shannon Capacity establishes the fundamental
limits
5Jointly Typical Sequences
Xn
Yn
X1n
X2n
Output set in general larger due to additive
noise Output images of inputs may overlap due to
noise
6Basic Information Theoretic Concepts
Xn
Yn
W
source
decoder
channel encoder
channel p(yx)
- Capacity C is the max mutual information I(XY)
wrt p(x) that is, choose the set X leading to
largest mutual information. - Capacity C is the largest rate at which
information can be transmitted without error - Jointly typical set from among the typical input
and output sequences, choose the ones for which
1/n log p(xn,yn) close to H(X,Y) - Size of jointly typical set is about 2nI(X,Y),
thus there are about this number of
distinguishable signals (codewords) in Xn - These codewords necessarily contain
redundancy--size of set is smaller than the
alphabet would imply sequences provide better
performance than isolated symbols if properly
chosen.
7Gaussian Channel Capacity
- Discrete inputs to channel, and channel adds
noise with Gaussian distribution (zero mean,
variance N) - Input sequence (codeword) power set to P
- Capacity is maximum I(XY) over p(x) such that
EX2 satisfies power constraint - C 1/2 log(1P/N) bits per transmission.
- The more usual form is to consider a channel of
bandwidth W and noise power spectral density No.
Then C W log(1P/NoW) bits per second.
8Rate Distortion Lossy Source Coding
- Rate distortion function R(D) can be interpreted
as - The minimum rate at which a source can be
represented subject to a distortion Dd(X,Y) - The minimum distortion that can be achieved given
a maximum rate constraint R - Interesting dual results to Capacity
- Spend coding effort on distortion-typical set
rest are dont cares - Applies to compression of real-valued sequences
Achievable region
R
D
9Universal Source Coding
- Divide sequence into distortion-typical
(interesting) and distortion-atypical
(uninteresting) sets - Index for distortion typical set of small
length--consumes our coding effort atypical set
is large, but coding scheme not critical - Require systematic means of classifying sequences
as typical (promotion mechanism and distance
measure) - Gold washing algorithm typical set, plus
candidates
Distortion typical set
Atypical set
10Source/Channel Coding Separation
- For single link, separately performing source and
channel coding achieves optimal rates - Separate optimization greatly reduces theoretical
complexity - Classes of codes have been identified that get
very close to respective Shannon limits - Joint source/channel coding can reduce latency or
overall complexity, but infrequently used since
application-specific
11Multi-Terminal Information Theory
- The preceding discussion assumed a single
transmitter and receiver - Multi-terminal information theory considers
maximization of mutual information for the
following possibilities - Multiple senders and one receiver (the multiple
access channel) - One sender and multiple receivers (the broadcast
channel) - One sender and one receiver, but intervening
transducers that can assist (the relay channel) - Composite combinations of these basic types
- Bayes estimation also aims to maximize mutual
information, except the senders do not cooperate
and usually there is a fidelity constraint - One sender and multiple receivers (the data
fusion problem) - Multiple senders and receivers (the source
separation problem) - Delay and resource usage may also be included
12Gaussian Multiple Access Channel
- m transmitters with power P sharing the same
noisy channel - C(P/N)1/2 log(1P/N) bits per channel use for
isolated sender - then the achievable rate region is
- The last inequality dominates when rates are the
same - Capacity increases with more users (there is more
power) - Result is dual to Slepian-Wolf encoding of
correlated sources
13Gaussian Broadcast Channel
- One sender of power P and two receivers, one with
noise N1 and one with noise N2, N1 lt N2
- The two codebooks are coordinated to exploit
commonality of information transmitted, otherwise
capacity does not exceed simple multiplexing
14Relay Channel
- One sender, one relay, and one receiver relay
transmits X1 based only on its observations Y1
Y1X1
Y
X
- Combines a broadcast channel and a multiple
access channel - Networks are comprised of multiple relay channels
that may further induce delay
15General Multi-Terminal Networks
- m nodes, with node j with associated transmission
variable X(j), and receive variable Y(j) - Node 1 transmits to node m what is the maximum
achievable rate?
(X1,Y1)
(Xm,Ym)
- Bounds derived from information flow across
multiple cut sets - generally not achievable
16Costs of Source-Channel Separation
- Source-channel coding separation theorem fails
because capacity of multiple access channels
increases with correlation, while source encoding
eliminates correlation - Greatly complicates search for optimal codes
raises question of whether joint coding would be
worth it - Gastpar has considered asymptotic cost of
separate rate-distortion and channel coding - Compare
- Network rate-distortion coding, followed by
cooperative transmission - Joint rate-distortion and channel coding
- Potentially exponentially better performance for
joint source and channel coding, in limit the
number of nodes n observing a Gaussian source
with comparable SNR goes to infinity. - Bound, not a prescription for how to do this!
17Now let it move
- Nodes move within bounded region according to
some random distribution what is capacity
subject to energy constraint on messages?
Node m
Node 1
Time 2
Time 1
- Answer depends on delay constraint eventually
they will collide implying near-zero path loss
and thus unbounded capacity - Other questions
- Probability the nodes have connecting path of
required rate - Probability of message arriving in required delay
18Some Recent Research for Sensor Networks
- Data fusion in sensor networks
- N-helper problem
- Cooperative communications in sensor networks
- Scalability of sensor networks
- Sensing for distributed sources
- Network synchronization and rate distortion
- Systems design
19General Assumptions
- Objective of network is to solve some (multi-)
hypothesis problem, subject to a set of fidelity
criteria, and convey the result to some end-user,
subject to resource constraints - Consequence fidelity criteria and resource
constraints allow meaningful optimization
questions to be posed - Communications is more costly than signal
processing - Consequence long distance communication is to be
avoided, if possible - Justification Shannon capacity and Maxwells
equations are fundamental SP power cost follows
Moores Law - Signals decay with distance of propagation
- Consequence local distributed algorithms become
feasible - Justification true for all natural propagation
media
20Rate Distortion and Data Fusion
- Can identify resource use (energy/number of bits
transmitted) with rate,decision reliability
(false alarm rate, missed detection prob) with
distortion - Operate at different points on rate distortion
curve depending on valuesof cost function - Location of fusion center, numerical resolution,
number of sensors,length of records, routing,
distribution of processing all affect R(D)
21A Simple Algorithm
- Nodes activated to send requests for information
from other nodes based on SNR - If above threshold T, decision is reliable, and
suppress activity by neighbors - Otherwise, increase likelihood of requesting help
based on proximity to T - In likelihood, higher SNR nodes form the cluster
- Bits of resolution related to SNR (e.g., for use
in maximal ratio combining)
1 high SNR initiates 2 activated, and
requests further information 3 SNR too low to
respond
3
2
1
3
22Optimal Fusion and Information Theory
- Bayes estimator maximizes the likelihood FX(xz)
where x is the state of nature and z is the set
of observables. - Define Zrz(1),z(2),,z(r)set of observations
to time r, then recursive form of the estimator
is - A variety of classical estimators then maximize
the likelihoods based on particular assumptions
regarding the priors - Fusion typically weighted combinations of
likelihoods to produce decision as sensors may
be very different, question of optimal weighting
scheme
23Likelihood Opinion Pool
Sensor 1
F(Z1rx)
Sensor 2
F(Z2rx)
P
F(xZr)
. . .
F(x) Prior information
Sensor n
F(Znrx)
The hard part determination of the various
likelihoods
24Likelihood Opinion Pool
- Combine using the recursive rule
- Taking logarithms on each side, followed by
expectations one obtains - Which can be interpreted as posterior
informationprior informationobservation
information thus can deal in summations of
mutual information obtained from different sensor
types (e.g., video plus audio).
25Designing for Detection
- In digital communications, choose modulation for
ease of estimation of decision variables and
subsequent selection of most likely signal
(hypothesis) we design signals for separability - In sensor networks, have no control over nature,
but we can control - Density and locations of sensors
- Sensor types
- These can be manipulated in same way, given a
fusion strategy, to ease signal separability or
achieve Nyquist sampling of source features. - This can also be done adaptively as we learn more
about the sources and the propagation environment
(in general, reduce model uncertainty) - Add sensors, and/or change types (e.g., new
deployment) - Move sensors
- Articulate directional elements
26Networked Info-Mechanical Systems
27The n-helper Gaussian Scenario
X
Y1
Y2
Gateway/Fusion center
Y3
Yn
- Multiple sensors observe event and generate
correlated Gaussian data. One data node (X) is
the main data source (e.g. closest to
phenomenon), and the n additional nodes (Y1 - Yn)
are the helpers. - The Problem What codes and data rates so that
gateway/data-fusion center can reproduce the data
from the main node using the remaining nodes as
sources of partial side information, subject to
some distortion criterion.
28Main Result
- We do not care about reproduction of the Y
variables rather they act as helpers to
reproduce X - This problem was previously solved for the 2-node
case - Key to extension treat YkYk-1,..X as single new
helper Pk. - Our solution for an admissable rate
(Rx,R1,,Rn), and for some Disgt0, the n-helper
system data rates can be fused to yield an
effective data rate (wrt source X) satisfying the
following rate distortion bound
- where s2 is the variance and r is the correlation
(straightforward but tedious to calculate as n
increases).
29Comments
- Other source distributions analytically
difficult, but many are likely to be convex
optimizations - Generalization would consider instances of
relay/broadcast channels in conveying information
to fusion center with minimum energy - Many sensor network detection problems are
inherently local even though expression may be
complicated, the number of helpers will usually
be small due to decay of signals as power of
distance - Numerical results for Gaussian sources indicate a
small number of helpers lead to significant
improvement rapidly diminishing returns after
four or so for typical propagation conditions. - Suggests that source/channel coding separation
might in fact be good enough for many practical
situations (especially above the local
interaction)
30Problem Definition of Cooperative Communication
- Many low-power and low-cost wireless sensors
cooperate with each other to achieve more
reliable and higher rate communications - The dominant constraint is the peak power, the
bandwidth is not the main concern - Multiplexing (FDMA, TDMA, CDMA, OFDM) is the
standard approach. Each sensor has an unique
channel - We focus on schemes where multiple sensors occupy
the same channel
31Example Space-Time Coding
- N transmit antennas and N receive antennas
- Channel transition matrix displays independent
Rayleigh (complex Gaussian) fading in each
component - With properly designed codes, capacity is N times
that of single Rayleigh channel - Note this implicitly assumes synchronization
among Tx and Rx array elements--requires special
effort in sensor networks - A coordinated transmission, not a multiple access
situation.
32Context
- Cooperative reception problem very similar to
multi-node fusionproblem same initiation
procedure required to create the cluster, however
we can choose channel code. - Cooperative transmission and reception similar to
multi-target multi-node fusion, but more can be
done beacons, space-time coding - Use to overcome gaps in network, communicate with
devicesoutside of sensor network (e.g. UAV)
33Channel Capacity
- Channel state information
- known at transmitter side, and at both sides
- If channel state information is known at the
transmit side, RF synchronization can be achieved - Channels
- AWGN and fading channels with unequal path loss
- General formula
34Channel Capacity(contd)
- Receive diversity
- Transmit diversity
- Combined transmit-receive diversity
- RF synchronization
35Comments
- Capacity is much higher if phase synchronization
within transmitter and receiver clusters can be
achieved - Have investigated practical methods for
satellite/ground sensors synchronization - Beacons (e.g. GPS) can greatly simplify the
synchronization problem for ground/ground
cooperative communications - Recent network capacity results do not take into
account possibilities for cooperation by nodes as
transmitter/receiver clusters
36Capacity in Ad Hoc Networks
- Received signal power decays with distance, and
transmission power is limited - Frequency re-use is possible sophisticated
antenna/MIMO systems improve the constant - Nodes generate traffic, and can relay traffic
from other nodes - If did not generate traffic, then higher node
density implies greater network capability
(improved re-use) - All nodes alike
- We will also relax this later
37Transport Capacity of Wireless Networks
- n Nodes within some fixed region A, with max
radio range R, bandwidth W, generating data. - Source-destination pairs random per node
transport capacity is then
38Transport Capacity of Wireless Networks II
- Note this is achieved by using simple relay
strategy one link at a time without cooperation
in transmission or reception (Gupta-Kumar) but
bad news continues even with optimal cooperation
(Gastpar-Vetterli) - The inverse square root of n behavior can be
roughly explained by average number of links
increasing in a path of a given length, each of
which must deal with more traffic to be carried,
with the same bandwidth.
39Scaling in Ad Hoc Networks
- The only solution when everyone generates traffic
is to add more resources as n increases - Traditional approach communication hierarchy
where we add new resources at each layer - Each level is limited in numbers
- Traffic is aggregated and carried on set of
trunks of increasing bandwidth and thus capacity - Higher levels are longer distance, also limiting
latency
40Scaling in Sensor Networks
- Elements not only generate traffic but can
process data - Do not necessarily want or need to send raw
information to distant users with same
probability as near neighbors - Key to scalability is to change the
source-destination pair distribution to local
communication (in limit, most nodes in fact send
nothing) - Key to proof is to separately consider densities
of sources, sensors and communication relays, and
pose problem as extraction of information to
within particular fidelity (rate distortion)
41Scalability for Point Sources in Sensor Networks
- Cooperative rate distortion coding results in
most communication being local more nodes do not
necessarily result in more traffic under
distortion criterion - More relays reduce frequency re-use distance and
thus interference capacity can increase without
bound - Thus more nodes increase likelihood of extracting
information at desired fidelity
42Comments
- Number of bits a sensor reports is a complicated
function of density - Low density report nothing if SNR too low
- High density may need to report only decision
- Moderate density many nodes may need to locally
cooperate with mix of raw data and decision
likelihoods - Far away powerful sources will activate many
nodes with similar SNR, but a small subset of
nodes will be sufficient to make decisions - Design objective will be to minimize resources
required to suppress node activity
43Scalability for Distributed Sources
- To estimate parameters of a field (e.g., to get
isotherm map) information increases until achieve
desired spatial sampling - After this extra nodes contribute no additional
information, but can increase communication
resource - Image processing analogy specify pixel size
- Parameters to describe local field can be compact
compared to raw data, for given level of
distortion
44Practical Implementation
- Dense network in neighborhood have mix of nodes
with different ranges, operating in separate
bands - Locally route towards the longer range links
they act as traffic attractors, causing number of
hops at any given layer to be small - Cooperative communication among nodes would serve
mainly to assure reliability of paths towards
next level of hierarchy - Result is a (largely) standard overlay
hierarchical network - Any cross-layer optimization (e.g., joint
source-channel coding) is confined to the local
neighborhood, since this is where most of the
resources are consumed in any scalable solution.
45Network Synchronization
- Synchronism is needed for wide set of purposes in
sensor networks - Coordination of power down/up for energy savings
- Time stamping of data
- Coherent combining in communication or sensing
(cooperative comm., fusion, position location) - Traditional approaches assume receivers/processors
always on, and provide same precision everywhere
by locking oscillators - Sensor networks are different
- Do not need same level of synchronism at all
times and everywhere - Do need to save energy
46Synchronization and Rate Distortion
- Clocks are not explicitly locked rather record
differences of time scales to allow explicit
conversion. - References are passed either on a schedule or on
demand for post facto synchronization - Frequency and precision of updates (the rate)
depends on local accuracy requirement Dtj (the
time distortion) - Would like to bound rate subject to accuracy
requirements and acceptable delays in achieving
synchronism - Very similar issues for position localization
47Implications of Signal Locality
- Severe decay of signals with distance (second to
fourth power) - Mutual information to source dominated by small
set of nodes - Cooperative communication clusters for ground to
ground transmission will likely be small - Implications
- Local processing is good enough for many
situations do not need to convey raw data over
long distances very frequently - Consequently, lowest layers of processing/network
formation, etc. are the most important, since
most frequently invoked (typical) - Practical example
- Specialized local transmission schemes (e.g., for
forming ad hoc clusters), but long range might
use conventional methods such as TCP/IP
48Hierarchy in Sensor Networks
- For dealing with the network as a whole, number
of variations of topology are immense - Distributed algorithms exploiting locality of
events - Use of ensembles for deriving bounds
- In between, considers layers of hierarchy, each
of which may be amenable to a conventional
optimization technique
49Information Processing Hierarchy
Note difficulty of fully separating networking,
database and signal processing problems
transmit decision
human observer
beamforming
base stationhigh resolutionprocessing
query for more information
high powerlow false alarm ratelow duty cycle
low powerhigh false alarm ratehigh duty cycle
50Some Research Challenges
- Minimal energy to obtain reliable decision in a
distributed network - Minimal (average) delay in conveying information
through network - Density and source separability trades
- Model uncertainty and methods for reducing its
effects - how do we know that we dont know?
- Role of hierarchy how much leads to what kinds
of changes in information theoretic optimal
behavior - At small scale can use brute force, at large
scale can use ensembles what can we do in
between? - Exploitation of signal locality what is the
spatial domain over which cross-layer
optimization is useful
51References
- T. Cover and J. Thomas, Elements of Information
Theory. Wiley 1991. - G. Pottie and W. Kaiser, Wireless Integrated
Network Sensors, Commun. ACM, May 2000 - M. Ahmed, Y-S. Tu, and G. Pottie, Cooperative
detection and communication in wireless sensor
networks, 38th Allerton Conf. On Comm., Control
and Computing, Oct. 2000. - Visit www.cens.ucla.edu technical reports section
for a variety of related papers and theses