MultiTerminal Information Theory Problems in Sensor Networks - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

MultiTerminal Information Theory Problems in Sensor Networks

Description:

Deputy Director, NSF Center for Embedded Networked Sensing (CENS) UCLA Henry Samueli School of Engineering and Applied Science. pottie_at_icsl.ucla.edu ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 52
Provided by: gregp7
Category:

less

Transcript and Presenter's Notes

Title: MultiTerminal Information Theory Problems in Sensor Networks


1
Multi-Terminal Information Theory Problems in
Sensor Networks
  • Gregory J Pottie
  • Professor, Electrical Engineering Department
  • Associate Dean, Research and Physical Resources
  • Deputy Director, NSF Center for Embedded
    Networked Sensing (CENS)
  • UCLA Henry Samueli School of Engineering and
    Applied Science
  • pottie_at_icsl.ucla.edu

2
Outline
  • Context and general issues
  • Basic tools of information theory
  • Multi-terminal information theory
  • Research domains
  • Data fusion
  • Cooperative communication
  • Sensor network scalability
  • Network synchronization
  • Distributed large-scale systems

3
Sensor Network Operation
Cooperative communication
Data fusion
Routing
Basic goal detection/identification of point or
distributed sources subject to distortion
constraints, and timely notification of end user
4
Basic Information Theoretic Concepts
  • Typical Sets (of sufficiently long sequences of
    i.i.d. variables)
  • Has probability nearly 1
  • The elements are equally probable
  • The number of elements is nearly 2nH

Xn
Yn
W
source
decoder
channel p(yx)
channel encoder
  • Aim of communications system
  • Minimize errors due to noise in channel
  • Maximize data rate
  • Minimize bandwidth and power (the resources)
  • Shannon Capacity establishes the fundamental
    limits

5
Jointly Typical Sequences
Xn
Yn
X1n
X2n
Output set in general larger due to additive
noise Output images of inputs may overlap due to
noise
6
Basic Information Theoretic Concepts
Xn
Yn
W
source
decoder
channel encoder
channel p(yx)
  • Capacity C is the max mutual information I(XY)
    wrt p(x) that is, choose the set X leading to
    largest mutual information.
  • Capacity C is the largest rate at which
    information can be transmitted without error
  • Jointly typical set from among the typical input
    and output sequences, choose the ones for which
    1/n log p(xn,yn) close to H(X,Y)
  • Size of jointly typical set is about 2nI(X,Y),
    thus there are about this number of
    distinguishable signals (codewords) in Xn
  • These codewords necessarily contain
    redundancy--size of set is smaller than the
    alphabet would imply sequences provide better
    performance than isolated symbols if properly
    chosen.

7
Gaussian Channel Capacity
  • Discrete inputs to channel, and channel adds
    noise with Gaussian distribution (zero mean,
    variance N)
  • Input sequence (codeword) power set to P
  • Capacity is maximum I(XY) over p(x) such that
    EX2 satisfies power constraint
  • C 1/2 log(1P/N) bits per transmission.
  • The more usual form is to consider a channel of
    bandwidth W and noise power spectral density No.
    Then C W log(1P/NoW) bits per second.

8
Rate Distortion Lossy Source Coding
  • Rate distortion function R(D) can be interpreted
    as
  • The minimum rate at which a source can be
    represented subject to a distortion Dd(X,Y)
  • The minimum distortion that can be achieved given
    a maximum rate constraint R
  • Interesting dual results to Capacity
  • Spend coding effort on distortion-typical set
    rest are dont cares
  • Applies to compression of real-valued sequences

Achievable region
R
D
9
Universal Source Coding
  • Divide sequence into distortion-typical
    (interesting) and distortion-atypical
    (uninteresting) sets
  • Index for distortion typical set of small
    length--consumes our coding effort atypical set
    is large, but coding scheme not critical
  • Require systematic means of classifying sequences
    as typical (promotion mechanism and distance
    measure)
  • Gold washing algorithm typical set, plus
    candidates

Distortion typical set
Atypical set
10
Source/Channel Coding Separation
  • For single link, separately performing source and
    channel coding achieves optimal rates
  • Separate optimization greatly reduces theoretical
    complexity
  • Classes of codes have been identified that get
    very close to respective Shannon limits
  • Joint source/channel coding can reduce latency or
    overall complexity, but infrequently used since
    application-specific

11
Multi-Terminal Information Theory
  • The preceding discussion assumed a single
    transmitter and receiver
  • Multi-terminal information theory considers
    maximization of mutual information for the
    following possibilities
  • Multiple senders and one receiver (the multiple
    access channel)
  • One sender and multiple receivers (the broadcast
    channel)
  • One sender and one receiver, but intervening
    transducers that can assist (the relay channel)
  • Composite combinations of these basic types
  • Bayes estimation also aims to maximize mutual
    information, except the senders do not cooperate
    and usually there is a fidelity constraint
  • One sender and multiple receivers (the data
    fusion problem)
  • Multiple senders and receivers (the source
    separation problem)
  • Delay and resource usage may also be included

12
Gaussian Multiple Access Channel
  • m transmitters with power P sharing the same
    noisy channel
  • C(P/N)1/2 log(1P/N) bits per channel use for
    isolated sender
  • then the achievable rate region is
  • The last inequality dominates when rates are the
    same
  • Capacity increases with more users (there is more
    power)
  • Result is dual to Slepian-Wolf encoding of
    correlated sources

13
Gaussian Broadcast Channel
  • One sender of power P and two receivers, one with
    noise N1 and one with noise N2, N1 lt N2
  • The two codebooks are coordinated to exploit
    commonality of information transmitted, otherwise
    capacity does not exceed simple multiplexing

14
Relay Channel
  • One sender, one relay, and one receiver relay
    transmits X1 based only on its observations Y1

Y1X1
Y
X
  • Combines a broadcast channel and a multiple
    access channel
  • Networks are comprised of multiple relay channels
    that may further induce delay

15
General Multi-Terminal Networks
  • m nodes, with node j with associated transmission
    variable X(j), and receive variable Y(j)
  • Node 1 transmits to node m what is the maximum
    achievable rate?

(X1,Y1)
(Xm,Ym)
  • Bounds derived from information flow across
    multiple cut sets
  • generally not achievable

16
Costs of Source-Channel Separation
  • Source-channel coding separation theorem fails
    because capacity of multiple access channels
    increases with correlation, while source encoding
    eliminates correlation
  • Greatly complicates search for optimal codes
    raises question of whether joint coding would be
    worth it
  • Gastpar has considered asymptotic cost of
    separate rate-distortion and channel coding
  • Compare
  • Network rate-distortion coding, followed by
    cooperative transmission
  • Joint rate-distortion and channel coding
  • Potentially exponentially better performance for
    joint source and channel coding, in limit the
    number of nodes n observing a Gaussian source
    with comparable SNR goes to infinity.
  • Bound, not a prescription for how to do this!

17
Now let it move
  • Nodes move within bounded region according to
    some random distribution what is capacity
    subject to energy constraint on messages?

Node m
Node 1
Time 2
Time 1
  • Answer depends on delay constraint eventually
    they will collide implying near-zero path loss
    and thus unbounded capacity
  • Other questions
  • Probability the nodes have connecting path of
    required rate
  • Probability of message arriving in required delay

18
Some Recent Research for Sensor Networks
  • Data fusion in sensor networks
  • N-helper problem
  • Cooperative communications in sensor networks
  • Scalability of sensor networks
  • Sensing for distributed sources
  • Network synchronization and rate distortion
  • Systems design

19
General Assumptions
  • Objective of network is to solve some (multi-)
    hypothesis problem, subject to a set of fidelity
    criteria, and convey the result to some end-user,
    subject to resource constraints
  • Consequence fidelity criteria and resource
    constraints allow meaningful optimization
    questions to be posed
  • Communications is more costly than signal
    processing
  • Consequence long distance communication is to be
    avoided, if possible
  • Justification Shannon capacity and Maxwells
    equations are fundamental SP power cost follows
    Moores Law
  • Signals decay with distance of propagation
  • Consequence local distributed algorithms become
    feasible
  • Justification true for all natural propagation
    media

20
Rate Distortion and Data Fusion
  • Can identify resource use (energy/number of bits
    transmitted) with rate,decision reliability
    (false alarm rate, missed detection prob) with
    distortion
  • Operate at different points on rate distortion
    curve depending on valuesof cost function
  • Location of fusion center, numerical resolution,
    number of sensors,length of records, routing,
    distribution of processing all affect R(D)

21
A Simple Algorithm
  • Nodes activated to send requests for information
    from other nodes based on SNR
  • If above threshold T, decision is reliable, and
    suppress activity by neighbors
  • Otherwise, increase likelihood of requesting help
    based on proximity to T
  • In likelihood, higher SNR nodes form the cluster
  • Bits of resolution related to SNR (e.g., for use
    in maximal ratio combining)

1 high SNR initiates 2 activated, and
requests further information 3 SNR too low to
respond
3
2
1
3
22
Optimal Fusion and Information Theory
  • Bayes estimator maximizes the likelihood FX(xz)
    where x is the state of nature and z is the set
    of observables.
  • Define Zrz(1),z(2),,z(r)set of observations
    to time r, then recursive form of the estimator
    is
  • A variety of classical estimators then maximize
    the likelihoods based on particular assumptions
    regarding the priors
  • Fusion typically weighted combinations of
    likelihoods to produce decision as sensors may
    be very different, question of optimal weighting
    scheme

23
Likelihood Opinion Pool
Sensor 1
F(Z1rx)
Sensor 2
F(Z2rx)
P
F(xZr)
. . .
F(x) Prior information
Sensor n
F(Znrx)
The hard part determination of the various
likelihoods
24
Likelihood Opinion Pool
  • Combine using the recursive rule
  • Taking logarithms on each side, followed by
    expectations one obtains
  • Which can be interpreted as posterior
    informationprior informationobservation
    information thus can deal in summations of
    mutual information obtained from different sensor
    types (e.g., video plus audio).

25
Designing for Detection
  • In digital communications, choose modulation for
    ease of estimation of decision variables and
    subsequent selection of most likely signal
    (hypothesis) we design signals for separability
  • In sensor networks, have no control over nature,
    but we can control
  • Density and locations of sensors
  • Sensor types
  • These can be manipulated in same way, given a
    fusion strategy, to ease signal separability or
    achieve Nyquist sampling of source features.
  • This can also be done adaptively as we learn more
    about the sources and the propagation environment
    (in general, reduce model uncertainty)
  • Add sensors, and/or change types (e.g., new
    deployment)
  • Move sensors
  • Articulate directional elements

26
Networked Info-Mechanical Systems
27
The n-helper Gaussian Scenario
X
Y1
Y2
Gateway/Fusion center
Y3

Yn
  • Multiple sensors observe event and generate
    correlated Gaussian data. One data node (X) is
    the main data source (e.g. closest to
    phenomenon), and the n additional nodes (Y1 - Yn)
    are the helpers.
  • The Problem What codes and data rates so that
    gateway/data-fusion center can reproduce the data
    from the main node using the remaining nodes as
    sources of partial side information, subject to
    some distortion criterion.

28
Main Result
  • We do not care about reproduction of the Y
    variables rather they act as helpers to
    reproduce X
  • This problem was previously solved for the 2-node
    case
  • Key to extension treat YkYk-1,..X as single new
    helper Pk.
  • Our solution for an admissable rate
    (Rx,R1,,Rn), and for some Disgt0, the n-helper
    system data rates can be fused to yield an
    effective data rate (wrt source X) satisfying the
    following rate distortion bound
  • where s2 is the variance and r is the correlation
    (straightforward but tedious to calculate as n
    increases).

29
Comments
  • Other source distributions analytically
    difficult, but many are likely to be convex
    optimizations
  • Generalization would consider instances of
    relay/broadcast channels in conveying information
    to fusion center with minimum energy
  • Many sensor network detection problems are
    inherently local even though expression may be
    complicated, the number of helpers will usually
    be small due to decay of signals as power of
    distance
  • Numerical results for Gaussian sources indicate a
    small number of helpers lead to significant
    improvement rapidly diminishing returns after
    four or so for typical propagation conditions.
  • Suggests that source/channel coding separation
    might in fact be good enough for many practical
    situations (especially above the local
    interaction)

30
Problem Definition of Cooperative Communication
  • Many low-power and low-cost wireless sensors
    cooperate with each other to achieve more
    reliable and higher rate communications
  • The dominant constraint is the peak power, the
    bandwidth is not the main concern
  • Multiplexing (FDMA, TDMA, CDMA, OFDM) is the
    standard approach. Each sensor has an unique
    channel
  • We focus on schemes where multiple sensors occupy
    the same channel

31
Example Space-Time Coding
  • N transmit antennas and N receive antennas
  • Channel transition matrix displays independent
    Rayleigh (complex Gaussian) fading in each
    component
  • With properly designed codes, capacity is N times
    that of single Rayleigh channel
  • Note this implicitly assumes synchronization
    among Tx and Rx array elements--requires special
    effort in sensor networks
  • A coordinated transmission, not a multiple access
    situation.

32
Context
  • Cooperative reception problem very similar to
    multi-node fusionproblem same initiation
    procedure required to create the cluster, however
    we can choose channel code.
  • Cooperative transmission and reception similar to
    multi-target multi-node fusion, but more can be
    done beacons, space-time coding
  • Use to overcome gaps in network, communicate with
    devicesoutside of sensor network (e.g. UAV)

33
Channel Capacity
  • Channel state information
  • known at transmitter side, and at both sides
  • If channel state information is known at the
    transmit side, RF synchronization can be achieved
  • Channels
  • AWGN and fading channels with unequal path loss
  • General formula

34
Channel Capacity(contd)
  • Receive diversity
  • Transmit diversity
  • Combined transmit-receive diversity
  • RF synchronization

35
Comments
  • Capacity is much higher if phase synchronization
    within transmitter and receiver clusters can be
    achieved
  • Have investigated practical methods for
    satellite/ground sensors synchronization
  • Beacons (e.g. GPS) can greatly simplify the
    synchronization problem for ground/ground
    cooperative communications
  • Recent network capacity results do not take into
    account possibilities for cooperation by nodes as
    transmitter/receiver clusters

36
Capacity in Ad Hoc Networks
  • Received signal power decays with distance, and
    transmission power is limited
  • Frequency re-use is possible sophisticated
    antenna/MIMO systems improve the constant
  • Nodes generate traffic, and can relay traffic
    from other nodes
  • If did not generate traffic, then higher node
    density implies greater network capability
    (improved re-use)
  • All nodes alike
  • We will also relax this later

37
Transport Capacity of Wireless Networks
  • n Nodes within some fixed region A, with max
    radio range R, bandwidth W, generating data.
  • Source-destination pairs random per node
    transport capacity is then

38
Transport Capacity of Wireless Networks II
  • Note this is achieved by using simple relay
    strategy one link at a time without cooperation
    in transmission or reception (Gupta-Kumar) but
    bad news continues even with optimal cooperation
    (Gastpar-Vetterli)
  • The inverse square root of n behavior can be
    roughly explained by average number of links
    increasing in a path of a given length, each of
    which must deal with more traffic to be carried,
    with the same bandwidth.

39
Scaling in Ad Hoc Networks
  • The only solution when everyone generates traffic
    is to add more resources as n increases
  • Traditional approach communication hierarchy
    where we add new resources at each layer
  • Each level is limited in numbers
  • Traffic is aggregated and carried on set of
    trunks of increasing bandwidth and thus capacity
  • Higher levels are longer distance, also limiting
    latency

40
Scaling in Sensor Networks
  • Elements not only generate traffic but can
    process data
  • Do not necessarily want or need to send raw
    information to distant users with same
    probability as near neighbors
  • Key to scalability is to change the
    source-destination pair distribution to local
    communication (in limit, most nodes in fact send
    nothing)
  • Key to proof is to separately consider densities
    of sources, sensors and communication relays, and
    pose problem as extraction of information to
    within particular fidelity (rate distortion)

41
Scalability for Point Sources in Sensor Networks
  • Cooperative rate distortion coding results in
    most communication being local more nodes do not
    necessarily result in more traffic under
    distortion criterion
  • More relays reduce frequency re-use distance and
    thus interference capacity can increase without
    bound
  • Thus more nodes increase likelihood of extracting
    information at desired fidelity

42
Comments
  • Number of bits a sensor reports is a complicated
    function of density
  • Low density report nothing if SNR too low
  • High density may need to report only decision
  • Moderate density many nodes may need to locally
    cooperate with mix of raw data and decision
    likelihoods
  • Far away powerful sources will activate many
    nodes with similar SNR, but a small subset of
    nodes will be sufficient to make decisions
  • Design objective will be to minimize resources
    required to suppress node activity

43
Scalability for Distributed Sources
  • To estimate parameters of a field (e.g., to get
    isotherm map) information increases until achieve
    desired spatial sampling
  • After this extra nodes contribute no additional
    information, but can increase communication
    resource
  • Image processing analogy specify pixel size
  • Parameters to describe local field can be compact
    compared to raw data, for given level of
    distortion

44
Practical Implementation
  • Dense network in neighborhood have mix of nodes
    with different ranges, operating in separate
    bands
  • Locally route towards the longer range links
    they act as traffic attractors, causing number of
    hops at any given layer to be small
  • Cooperative communication among nodes would serve
    mainly to assure reliability of paths towards
    next level of hierarchy
  • Result is a (largely) standard overlay
    hierarchical network
  • Any cross-layer optimization (e.g., joint
    source-channel coding) is confined to the local
    neighborhood, since this is where most of the
    resources are consumed in any scalable solution.

45
Network Synchronization
  • Synchronism is needed for wide set of purposes in
    sensor networks
  • Coordination of power down/up for energy savings
  • Time stamping of data
  • Coherent combining in communication or sensing
    (cooperative comm., fusion, position location)
  • Traditional approaches assume receivers/processors
    always on, and provide same precision everywhere
    by locking oscillators
  • Sensor networks are different
  • Do not need same level of synchronism at all
    times and everywhere
  • Do need to save energy

46
Synchronization and Rate Distortion
  • Clocks are not explicitly locked rather record
    differences of time scales to allow explicit
    conversion.
  • References are passed either on a schedule or on
    demand for post facto synchronization
  • Frequency and precision of updates (the rate)
    depends on local accuracy requirement Dtj (the
    time distortion)
  • Would like to bound rate subject to accuracy
    requirements and acceptable delays in achieving
    synchronism
  • Very similar issues for position localization

47
Implications of Signal Locality
  • Severe decay of signals with distance (second to
    fourth power)
  • Mutual information to source dominated by small
    set of nodes
  • Cooperative communication clusters for ground to
    ground transmission will likely be small
  • Implications
  • Local processing is good enough for many
    situations do not need to convey raw data over
    long distances very frequently
  • Consequently, lowest layers of processing/network
    formation, etc. are the most important, since
    most frequently invoked (typical)
  • Practical example
  • Specialized local transmission schemes (e.g., for
    forming ad hoc clusters), but long range might
    use conventional methods such as TCP/IP

48
Hierarchy in Sensor Networks
  • For dealing with the network as a whole, number
    of variations of topology are immense
  • Distributed algorithms exploiting locality of
    events
  • Use of ensembles for deriving bounds
  • In between, considers layers of hierarchy, each
    of which may be amenable to a conventional
    optimization technique

49
Information Processing Hierarchy
Note difficulty of fully separating networking,
database and signal processing problems
transmit decision
human observer
beamforming
base stationhigh resolutionprocessing
query for more information
high powerlow false alarm ratelow duty cycle
low powerhigh false alarm ratehigh duty cycle
50
Some Research Challenges
  • Minimal energy to obtain reliable decision in a
    distributed network
  • Minimal (average) delay in conveying information
    through network
  • Density and source separability trades
  • Model uncertainty and methods for reducing its
    effects
  • how do we know that we dont know?
  • Role of hierarchy how much leads to what kinds
    of changes in information theoretic optimal
    behavior
  • At small scale can use brute force, at large
    scale can use ensembles what can we do in
    between?
  • Exploitation of signal locality what is the
    spatial domain over which cross-layer
    optimization is useful

51
References
  • T. Cover and J. Thomas, Elements of Information
    Theory. Wiley 1991.
  • G. Pottie and W. Kaiser, Wireless Integrated
    Network Sensors, Commun. ACM, May 2000
  • M. Ahmed, Y-S. Tu, and G. Pottie, Cooperative
    detection and communication in wireless sensor
    networks, 38th Allerton Conf. On Comm., Control
    and Computing, Oct. 2000.
  • Visit www.cens.ucla.edu technical reports section
    for a variety of related papers and theses
Write a Comment
User Comments (0)
About PowerShow.com