Phase Transitions and Statistics Gathering for Sensor Networks PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Phase Transitions and Statistics Gathering for Sensor Networks


1
Phase Transitions and Statistics Gathering for
Sensor Networks Ashish GoelStanford University
Joint work with Sanatan Rai and Bhaskar
Krishnamachari Enachescu, Govindan, and Motwani
http//www.stanford.edu/ashishg
2
Sensor Networks
  • Sensors often reside on the Euclidean plane
  • Whether two sensors can communicate is determined
    largely by their Euclidean distance
  • Often, laid out regularly
  • Grids, random geometric graphs are reasonable
    models
  • Big advantage over general networks in algorithm
    design and analysis
  • This talk
  • Phase transitions in geometric random graphs
  • Statistics gathering over grids

3
Geometric Random Graphs
  • G(nr) in d-dimensions
  • n points uniformly distributed in 0,1d
  • Two points are connected if their Euclidean
    distance is less than r
  • Sensor networks can often be modeled as G(nr)
    with d2
  • Eg. sensors sprinkled from a helicopter over a
    corn field
  • The wireless radius corresponds to r
  • Question How should n and r be chosen to ensure
    that G(nr) has a desirable property P (eg.
    connectivity, 2-connectivity, large cliques) with
    high probability?

4
1
Any other point Y is a neighbor of X with
probability ?r2
Expected degree of X is ? r2 (n-1)
r
X
0
1
5
Thresholds for monotone properties?
  • A graph property P is monotone if, for all graphs
    G1(V,E1) and G2(V,E2) such that E1 µ E2,
  • G1 satisfies P ) G2 satisfies P
  • Informally, addition of edges preserves P
  • Examples connectivity, Hamiltonianicity, bounded
    diameter, expansion, degree k, existence of
    minors, k-connectivity .
  • Folklore Conjecture All monotone properties have
    sharp thresholds for geometric random graphs
  • Also, simulation evidence for several properties
  • Krishnamachari, Bejar, Wicker, Pearlman 02
  • McColm 03

6
  • To quote Bollobás (via Krishnamachari et al)
  • One of the main aims of the theory of random
    graphs is to determine when a given property is
    likely to appear.

7
Example Connectivity
  • Define c(n) such that p c(n)2 log n/n
  • Asymptotically, when d2
  • G(nc(n)) is disconnected with high probability
  • For any e gt 0, G(n (1e)c(n)) is connected whp
  • So, c(n) is a sharp threshold for
    connectivity at d2 Gupta and Kumar 98 Penrose
    97
  • Similar thresholds exist for all dimensions
  • cd(n) ¼ (log n/(nVd))1/d, where Vd is the volume
    of the unit ball in d dimensions
  • Average degree ¼ log n at the threshold

8
Width and sharp thresholds
  • For property P, and 0 lt e lt 1, if there exist two
    functions L(n) and U(n) such that
  • PrG(nL(n)) satisfies P e, and
  • PrG(nU(n)) satisfies P 1 - e,
  • then the e-width we(n) of P is defined as
    U(n)-L(n)
  • If we(n) o(1) for all e, then P is said to have
    a sharp threshold

9
Example
PrG(nr) satisfies P
1
1-e
e
r
0
Width
10
Connections (?) to Bernoulli Random Graphs
  • Famous graph family G(np)
  • Popular model for general random graphs
  • Edges are iid each edge present with probability
    p
  • Connectivity threshold is p(n) log n/n
  • Average degree exactly the same as that of
    geometric random graphs at their connectivity
    threshold!!
  • All monotone properties have e-width O(1/log n)
    for any fixed e in the Bernoulli graph model
  • Friedgut and Kalai 96
  • Can not be improved beyond O(1/log2 n)
  • Almost matched Bourgain and Kalai 97
  • Proof relies heavily on independence of edges
  • There is no edge independence in geometric random
    graphs gt we need new techniques

11
Our results
Goel, Rai, Krishnamachari, Ann App Prob 2005
  • The e-width of any monotone property is
  • Sharp thresholds in the geometric random graph
    model
  • Sharper transition (inverse polynomial width)
    than Bernoulli random graphs (inverse logarithmic
    width)
  • There exist monotone properties with width
  • Tight for d1, sub-logarithmic gap for dgt1

12
Why cd(n)?
  • Why express results in terms of cd(n)?
  • Width gives a sharp additive threshold
  • We are typically interested in properties that
    subsume connectivity
  • For such properties, an additive threshold in
    terms of cd(n) also corresponds to a
    multiplicative threshold
  • The exact sharpness of the multiplicative
    threshold depends on L(n) and on the exact
    additive bounds

13
Bottleneck Matchings
  • Draw n blue points and n red points uniformly
    and independently from 0,1d
  • B, R denotes the set of blue, red points resp.
  • A minimum bottleneck matching between R and B is
    a one-one mapping
  • fB ! R
  • which minimizes
  • maxu2 Bf(u)-u2
  • The corresponding distance (maxu2 Bf(u)-u2)
    is called the minimum bottleneck distance
  • Let Xn denote this minimum bottleneck distance

14
Example
Bottleneck distance g
?
15
Bottleneck Matchings and Width
  • Theorem If PrXn gt g p then the sqrt(p)-width
    of any monotone property is at most 2g
  • Implication Can analyze just one quantity, Xn,
    as opposed to all monotone properties (in
    particular, can provide simulation based
    evidence)
  • Proof Let P be any monotone property
  • Let e sqrt(p)
  • Choose L(n) such that PrG(nL(n)) satisfies P
    e
  • Define U(n) L(n) 2g
  • Draw two random graphs GL and GU (independently)
    from G(nL(n)) and G(nU(n)), resp.
  • Let B, R denote the set of points in GL, GU resp.

16
Bottleneck Matchings and Width (proof contd.)
  • Assume Xn g.
  • Let f be the corresponding minimum bottleneck
    matching between R and B.
  • For any u,v 2 B
  • f(u)-f(v)2 f(u)-u2 u-v2
    f(v)-v2 2g u-v2
  • Hence, (u,v) is an edge in GL ) (f(u),f(v)) is an
    edge in GU
  • i.e. GL is a subgraph of GU
  • By definition, PrXn gt g p
  • ) PrGL is not a subgraph of GU p ?2 (1)

17
Illustration I Triangle Inequality
18
Bottleneck Matchings and Width (proof contd.)
  • PrGL is not a subgraph of GU p ?2 (1)
  • Let q PrGU does not satisfy P
  • P is monotone, PrGL satisfies P e,
  • ) PrGL is not a subgraph of GU eq (2)
  • Combining (1) and (2), we get eq p i.e. q e
  • Therefore, PrGU satisfies ? 1-?
  • i.e. the ?-width of ? is at most U(n) L(n) 2?
  • Done!

19
Illustration II Probability Amplification
GU
GL
20
Our Goal now Analyze the bottleneck matching
distance Xn Specifically, we are done if Xn
O(g(n)) with high probability, for some small
g(n)
21
Comparison with Bernoulli Random Graphs?
  • We are attempting to show something quite strong
  • G(nr) is a subgraph of G(nrg) whp, for small g
  • Laminar structure
  • Corresponding result is NOT true for Bernoulli
    random graphs even for ? ½
  • If small bottleneck matchings exist whp, we will
    get stronger thresholds than for Bernoulli random
    graphs

22
Existence of Small Bottleneck Matchings
  • The bottleneck matching length is
  • O(cd(n)) whp for d 3
  • Shor and Yukich 1991 we present a simpler
    proof
  • O(c2(n) log1/4n) whp for d 2
  • Leighton and Shor 1989
  • O (sqrt(log(1/e))/sqrt(n)) with probability 1-e
    for d 1
  • Our paper (folklore?)
  • This gives us the desired widths
  • Will omit the proof, focus on implications
  • There is a small bottleneck matching between a
    random geometric graph and the grid

23
Lower bound examples
  • For d1, the property min-degree n/4 has
    width
  • W(sqrt(log 1/e)/sqrt(n))
  • Basic idea Just the two endpoints on the line
    are interesting for the purpose of finding the
    minimum degree
  • For d 2, the property G is a clique has width
    W(1/n1/d)
  • Open problems
  • Plug the gap in the upper/lower bounds on the
    width for d 2
  • Also, all our lower bound examples undergo phase
    transitions at r Q(1). Is there something
    interesting and different in the region where r
    is of the order of the connectivity threshold?

24
Implications Mixing Time
  • Fastest mixing Markov chain defined on G(nr) has
    mixing time Q(r-2 log n) for large enough r
    Boyd, Ghosh, Prabhakar, Shah 05
  • Alternate proof
  • GRID(nr) n points are laid on a grid in 0,12
    and two points are connected if they are within
    distance r.
  • Fastest mixing time of GRID(nr) Q(r-2log n)
    Trivial
  • G(nr) is a super-graph of GRID(nr-d) and a
    sub-graph of GRID(nrd) whp for small enough d
    Our result
  • ) Fastest mixing time of G(nr) is Q(r-2log n)
    whp

25
Implications Spectra
  • Our techniques can be extended to show that the
    spectrum of random geometric graphs converges to
    the spectrum of the grid graph. Rai 05

26
Implications Coverage
  • Coverage Any point in the unit square must be
    within a distance r from one of the n sensors
  • Known there is a sharp threshold in r
    Shakkottai, Srikant, Shroff 04
  • Coverage is NOT a graph property, so it does not
    fall within our framework
  • But the laminar structure in our proof implies a
    sharp threshold for coverage as well (weaker than
    the sharpest known)
  • Fixed density, increasing area Muthukrishnan,
    Pandurangan

27
Conclusions
  • Monotone properties in G(nr) have sharp
    thresholds
  • Much sharper than for Bernoulli Random Graphs
  • Much stronger too Random geometric graphs
    exhibit a laminar structure
  • Useful for recovering several known
    results/proving new ones
  • Randomness is often a red-herring since the grid
    often yields tight upper/lower bounds
  • Rule of thumb Grids are nearly as good a model
    for sensor networks as random geometric graphs
  • No useful analogue in general networks
    (deterministic expanders are no easier to analyze
    than Bernoulli random graphs)
  • Open problem Does laminarity imply anything
    about throughput (via separators)?

28
Statistics Gathering
  • Assume sensors are laid out on an n n grid
  • Each sensor has 4 neighbors
  • Each sensor knows its (x,y)-coordinates
  • A processing agent (sink) at (0,0)
  • Vast literature model based, gossip based, ODI,
    throughput maximization, coding
  • This talk
  • Answering box queries Goel, unpublished
  • Aggregating spatially correlated data Enachescu,
    Goel, Govindan, Motwani, 2004

29
Answering Box Queries
  • The sink can issue queries for the aggregate
    statistics in any box
  • Will focus on statistics that can be estimated
    from a random sample of size K
  • Examples mean, median, quantiles, standard
    deviations
  • Some naive strategies
  • All sensors send their value to the sink every
    time unit
  • No query-time cost, very high pre-processing
    cost (n per node)
  • The sink samples K sensors from the box at query
    time
  • No preprocessing cost, cost Kn at query time

30
Notation K-uniform Samples
  • A K-uniform sample of S is a subset of S obtained
    by choosing each element with probability K/S,
    independently of all other elements
  • K-uniform samples are equally good for estimation
    purposes

31
Our Solution Hyperbolic Gossip
  • Pre-processing Every time unit, sensor (i,j)
    chooses z(i,j) uniformly at random from 0,1)
  • Sensor (i,j) sends its value to all sensors (x,y)
    satisfying
  • z(i,j) K/(x-iy-j) hyperbolic
    region
  • Probability that (x,y) hears from (i,j)
    K/(x-iy-j)
  • H(i,j) set of sensors contacted by (i,j)
  • H(i,j) is connected, and has expected size O(K
    log2n)
  • Pre-processing cost O(K log2n) per sensor
    (sharply concentrated)
  • Claim Every sensor can now derive K-uniform
    samples from every box that contains this sensor
  • Querying becomes trivial sink can query any
    sensor in this box
  • Side-benefit Each sensor has a rough map of the
    entire sensor-field

32
Deriving K-uniform Samples
  • Suppose sensor (x,y) needs to derive K-uniform
    samples from box B
  • Sub-sample For every node (i,j) in B which has
    sent its value to (x,y), add the value to the
    sample with probability x-iy-j/B
  • Probability that (x,y) hears from (i,j)
    K/(x-iy-j)
  • Probability that the value from (i,j) is in the
    sample
  • (x-iy-j/B)(K/(x-iy-j))
  • K/B

(x,y)
33
Aggregation in Sensor Networks
  • Highly constrained in power (and hence
    communication)
  • Data gathering Each sensor (source) sends its
    data to a central agent (the sink)
  • In-network aggregation (coding) can result in
    great power savings due to correlations in data
    or the nature of the query
  • Example spatially correlated data
  • Aggregation gains are unpredictable
  • Basic Question What is the best routing
    structure to aggregate data on?
  • Does it depend on the correlation?

34
The Spatial Correlation Model
  • Imagine each sensor is a camera with range k
  • Can take a picture of any point within a 2k 2k
    square centered at the sensor
  • Let Ak(s) denote the area imaged by sensor s
  • Total amount of information obtained from set of
    sensors S ?s2 S Ak(s)
  • What is the right value of k?
  • Would like a single aggregation tree that is good
    for all k

35
Rationale
  • If an algorithm does well for all camera ranges
    k, it also does well when the aggregation
    function is a linear superposition of different
    values of k
  • For example, consider events of different
    intensities (fireflies, camp-fires, forest fires,
    volcanic eruptions)
  • Different events can be sensed within different
    ranges
  • For any sets of intensities and relative
    frequencies, there is a superposition of
    different values of k which measures exactly the
    information in a set of sensors S
  • The right model for spatial correlation given
    simultaneous optimization?

36
A Useful Transformation
  • Instead of focusing on the information captured
    by a sensor, focus instead on the area in a 11
    square
  • We call this a value
  • A value is sensed by all sensors within a 2k2k
    square centered at the value

The value for k2
37
Cost Model
  • Find a routing tree (by choosing a parent for
    each node) to send the information from all nodes
    to the processing node.
  • Cost of an edge Number of values transmitted
    across the edge
  • Cost of the tree Total cost of all the edges
  • We want to minimize this transmission cost
    simultaneously for all k, assuming perfect
    aggregation.
  • Assume the nodes must send all data values.

38
Collision Time
  • Consider two adjacent paths in a routing tree.
  • Collision time the number of steps before the
    lower path meets the upper path.

Example collision time 3
39
Collision Times and Simultaneous Optimization
  • Theorem An aggregation tree with average
    expected collision time O(N1/2) gives a constant
    factor approximation to the optimum aggregation
    trees for all k

40
A Randomized Algorithm
  • Each node chooses one of its two downstream (i.e.
    closer to the origin) neighbors as its parent
  • Results in a shortest path tree rooted at the
    sink
  • All data flows on this shortest path tree
  • Choice of parent Node at position (x,y) chooses
    the node at position (x-1,y) with probability
    x/(xy) and the node at position (x,y-1) with
    probability y/(xy)
  • Different nodes choose independently
  • Interesting Property The path from a sensor to
    the sink is a shortest path chosen uniformly at
    random from all such shortest paths

41
The Randomized Algorithm contd.
  • Intuition Consider A (j-1,j1) and B
    (j1,j-1)
  • A is above the diagonal, so has a slight
    preference for going down
  • B is below the diagonal, so has a slight
    preference for going left
  • But if A goes down and B goes left, they meet!
  • This centrist tendency introduces a small but
    sufficient bias so that the average expected
    collision time is O(N1/2) as opposed to O(N)
  • Enachescu, Goel, Govindan, Motwani
    Theoretical Comput Sci 2006
  • Proof omitted
  • Implies that the randomized algorithm is an O(1)
    approximation for all correlation parameters k
  • Much stronger than what is known for general
    graphs Goel, Estrin

42
Conclusions (Aggregating Correlated Data)
  • Simple routing techniques work well for a wide
    range of aggregation functions
  • Rule of thumb Do not couple coding and routing
    in a sensor network
  • Supported by Pattem et al Beferull-Lozano et
    al Goel, Estrin
  • Open problems
  • Extend our results to other correlation models
  • Example Gaussian sources, spatial covariance
    matrix Scaglione, Servetto
  • Conjecture Our model of spatial correlations
    captures Gaussian sources
  • Making Spatio-temporal maps of sensor readings
    with varying and unknown spatio-temporal
    correlation
  • Sampling of correlated data?
  • Gossip based protocols for computing statistics
    of correlated data? (along the lines of Kempe et
    al., Boyd et al)
Write a Comment
User Comments (0)
About PowerShow.com