Reliable Transport and Code Distribution in Wireless Sensor Networks

About This Presentation

Title:

Reliable Transport and Code Distribution in Wireless Sensor Networks

Description:

Potentially not as bad as it sounds! Cluster/group based communication ... Requirements and Properties of Code Distribution. The complete image must reach all nodes ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 67

Provided by: thanossta

Category:

more less

Transcript and Presenter's Notes

Title: Reliable Transport and Code Distribution in Wireless Sensor Networks

1
Reliable Transport and Code Distribution in
Wireless Sensor Networks

Thanos Stathopoulos
CS 213 Winter 04

2
Reliability Introduction

Not an issue on wired networks
TCP does a good job
Link error rates are usually 10-15
No energy cost
However, WSNs have
Low power radios
Error rates of up 30 or more
Limited range
Energy constraints
Retransmissions reduce lifetime of network
Limited storage
Buffer size cannot be too large
Highly application-specific requirements
No single TCP-like solution

3
Approaches

Loss-tolerant algorithms
Leverage spatial and temporal redundancy
Good enough for some applications
But what about code updates?
Add retransmission mechanism
At the link layer (e.g. SMAC)
At the routing/transport layer
At the application layer
Hop-by-hop or end-to-end?

4
Relevant papers

PSFQ A Reliable Transport Protocol for Wireless
Sensor Networks
RMST Reliable Data Transport in Sensor Networks
ESRT Event-to-Sink Reliable Transport in
Wireless Sensor Networks

5
PSFQ Overview

Key ideas
Slow data distribution (pump slowly)
Quick error recovery (fetch quickly)
NACK-based
Data caching guarantees ordered delivery
Assumption no congestion, losses due only to
poor link quality
Goals
Ensure data delivery with minimum support from
transport infrastructure
Minimize signaling overhead for
detection/recovery operations
Operate correctly in poor link quality
environments
Provide loose delay bounds for data delivery to
all intended receivers
Operations
Pump
Fetch
Report

6
End-to-end considered harmful ?

Probability of reception degrades exponentially
over multiple hops
Not an issue in the Internet
Serious problem if error rates are considerable
ACKs/NACKs are also affected

7
Proposed solution Hop-by-Hop error recovery

Intermediate nodes now responsible for error
detection and recovery
NACK-based loss detection probability is now
constant
Not affected by network size (scalability)
Exponential decrease in end-to-end
Cost Keeping state on each node
Potentially not as bad as it sounds!
Cluster/group based communication
Intermediates are usually receivers as well

8
Pump operation

Node broadcasts a packet to its neighbors every
Tmin
Data cache used for duplicate suppression
Receiver checks for gaps in sequence numbers
If all is fine, it decrements TTL and schedules a
transmission
Tmin lt Ttransmit lt Tmax
By delaying transmission, quick fetch operations
are possible
Reduce redundant transmissions (dont transmit if
4 or more nodes have forwarded the packet
already)
Tmax can provide a loose delay bound for the last
hop
D(n)Tmax ( of fragments) ( of hops)

9
Fetch operation

Sequence number gap is detected
Node will send a NACK message upstream
Window specifies range of sequence numbers
missing
NACK receivers will randomize their transmissions
to reduce redundancy
It will NOT forward any packets downstream
NACK scope is 1 hop
NACKs are generated every Tr if there are still
gaps
Tr lt Tmax
This is the pump/fetch ratio
NACKs can be cancelled if neighbors have sent
similar NACKs

10
Proactive Fetch

Last segments of a file can get lost
Loss detection impossible no next segment
exists!
Solution timeouts (again)
Node enters proactive fetch mode if last
segment hasnt been received and no packet has
been delivered after Tpro
Timing must be right
Too early wasted control messages
Too late increased delivery latency for the
entire file
Tpro a (Smax - Smin) Tmax
A node will wait long enough until all upstream
nodes have received all segments
If data cache isnt infinite
Tpro a k Tmax (Tpro is proportional to
cache size)

11
Report Operation

Used as a feedback/monitoring mechanism
Only the last hop will respond immediately
(create a new packet)
Other nodes will piggyback their state info when
they receive the report reply
If there is no space left in the message, a new
one will be created

12
Experimental results

Tmax 0.3s, Tr 0.1s
100 30-byte packets sent
Exponential increase in delay happens at 11 loss
rate or higher

13
PSFQ Conclusion

Slow data dissemination, fast data recovery
All transmissions are broadcast
NACK-based, hop-by-hop recovery
End-to-end behaves poorly in lossy environments
NACKs are superior to ACKs in terms of energy
savings
No out-of-order delivery allowed
Uses data caching extensively
Several timers and duplicate suppression
mechanisms
Implementing any of those on motes is challenging
(non-preemptive FIFO scheduler)

14
RMST Overview

A transport layer protocol
Uses diffusion for routing
Selective NACK-based
Provides
Guaranteed delivery of all fragments
In-order delivery not guaranteed
Fragmentation/reassembly

15
Placement of reliability for data transport

RMST considers 3 layers
MAC
Transport
Application
Focus is on MAC and Transport

16
MAC Layer Choices

No ARQ
All transmissions are broadcast
No RTS/CTS or ACK
Reliability deferred to upper layers
Benefits no control overhead, no erroneous path
selection
ARQ always
All transmissions are unicast
RTS/CTS and ACKs used
One-to-many communication done via multiple
unicasts
Benefits packets traveling on established paths
have high probability of delivery
Selective ARQ
Use broadcast for one-to-many and unicast for
one-to-one
Data and control packets traveling on established
paths are unicast
Route discovery uses broadcast

17
Transport Layer Choices

End-to-End Selective Request NACK
Loss detection happens only at sinks (endpoints)
Repair requests travel on reverse (multihop) path
from sinks to sources
Hop-by-Hop Selective Request NACK
Each node along the path caches data
Loss detection happens at each node along the
path
Repair requests sent to immediate neighbors
If data isnt found in the caches, NACKs are
forwarded to next hop towards source

18
Application Layer Choices

End-to-End Positive ACK
Sink requests a large data entity
Source fragments data
Sink keeps sending interests until all fragments
have been received
Used only as a baseline

19
RMST details

Implemented as a Diffusion Filter
Takes advantage of Diffusion mechanisms for
Routing
Path recovery and repair
Adds
Fragmentation/reassembly management
Guaranteed delivery
Receivers responsible for fragment retransmission
Receivers arent necessarily end points
Caching or non-caching mode determines
classification of node

20
RMST Details (contd)

NACKs triggered by
Sequence number gaps
Watchdog timer inspects fragment map periodically
for holes that have aged for too long
Transmission timeouts
Last fragment problem
NACKs propagate from sinks to sources
Unicast transmission
NACK is forwarded only if segment not found in
local cache
Back-channel required to deliver NACKs to
upstream neighbors

21
Evaluation

NS-2 simulation
802.11 MAC
21 nodes
single sink, single source
6 hops
MAC ARQ set to 4 retries
Image size 5k
50 100-byte fragments
Total cost of sending the entire file 87,818
bytes
Includes diffusion control message overhead
All results normalized to this value

22
Results Baseline (no RMST)

ARQ and S-ARQ have high overhead when error rates
are low
S-ARQ is better in terms of efficiency
Also helps with route selection
No ARQ results drop considerably as error rates
increase
Exponential decay of end-to-end reliability
mechanisms

23
Results RMST with H-b-H Recovery and Caching

Slight improvement for ARQ and S-ARQ results over
baseline
No ARQ is better even in the 10 error rate case
But, many more exploratory packets were sent
before the route was established

24
Results RMST with E-2-E Recovery

No ARQ doesnt work for the 10 error rate case
Numerous holes that required NACKs couldnt make
it from source to sink without link-layer
retransmissions
ARQ and S-ARQ results are statistically
insignificant from H-b-H results
NACKs were very rare when any form of ARQ was
used

25
Results Performance under High Error Rates

No ARQ doesnt work for the 30 error rate case
Diffusion control messages could not establish
routes most of the time
In the 20 case, it took several minutes to
establish routes

26
RMST Conclusion

ARQ helps with unicast control and data packets
In high error-rate environments, routes cannot be
established without ARQ
Route discovery packets shouldnt use ARQ
Erroneous path selection can occur
RMST combines a NACK-based transport layer
protocol with S-ARQ to achieve the best results

27
Congestion Control

Sensor networks are usually idle
Until an event occurs
High probability of channel overload
Information must reach users
Solution congestion control

28
ESRT Overview

Places interest on events, not individual pieces
of data
Application-driven
Application defines what its desired event
reporting rate should be
Includes a congestion-control element
Runs mainly on the sink
Main goal Adjust reporting rate of sources to
achieve optimal reliability requirements

29
Problem Definition

Assumption
Detection of an event is related to number of
packets received during a specific interval
Observed event reliability ri
of packets received in decision interval I
Desired event reliability R
of packets required for reliable event
detection
Application-specific
Goal configure the reporting rate of nodes
Achieve required event detection
Minimize energy consumption

30
Reliability vs Reporting frequency

Initially, reliability increases linearly with
reporting frequency
There is an optimal reporting frequency (fmax),
after which congestion occurs
Fmax decreases when the of nodes increases

31
Characteristic Regions

n normalized reliability indicator
(NC,LR) No congestion, Low reliability
f lt fmax, n lt 1-e
(NC, HR) No congestion, High reliability
f lt fmax, n lt 1e
(C, HR) Congestion, High reliability
f gt fmax, n gt 1
(C, LR) Congestion, Low reliability
f lt fmax, n lt 1
OOR Optimal Operating Region
f lt fmax, 1-e lt n lt 1e

32
Characteristic Regions
33
ESRT Requirements

Sink is powerful enough to reach all source nodes
(i.e. single-hop)
Nodes must listen to the sink broadcast at the
end of each decision interval and update their
reporting rates
A congestion-detection mechanism is required

34
Congestion Detection and Reliability Level

Both done at the sink
Congestion
Nodes monitor their buffer queues and inform the
sink if overflow occurs
Reliability Level
Calculated by the sink at the end of each
interval based on packets received

35
ESRT Protocol Operation

(NC, LR)
(NC, HR)
(C, HR)
(C, LR)

36
ESRT Conclusion

Reliability notion is application-based
No delivery guarantees for individual packets
Reliability and congestion control achieved by
changing the reporting rate of nodes
Pushes all complexity to the sink
Single-hop operation only

37
Code Distribution Introduction

Nature of sensor networks
Expected to operate for long periods of time
Human intervention impractical or detrimental to
sensing process
Nevertheless, code needs to be updated
Add new functionality
Incomplete knowledge of environment
Predicting right set of actions is not always
feasible
Fix bugs
Maintenance

38
Approaches

Transfer the entire binary to the motes
Advantage
Maximum flexibility
Disadvantage
High energy cost due to large volume of data
Use a VM and transfer capsules
Advantage
Low energy cost
Disadvantages
Not as flexible as full binary update
VM required
Reliability is required regardless of approach

39
Papers

A Remote Code Update Mechanism for Wireless
Sensor Networks
Trickle A Self-Regulating Algorithm for Code
Propagation and Maintenance in Wireless Sensor
Networks

40
MOAP Overview

Code distribution mechanism specifically targeted
for Mica2 motes
Full binary updates
Multi-hop operation achieved through recursive
single-hop broadcasts
Energy and memory efficient

41
Requirements and Properties of Code Distribution

The complete image must reach all nodes
Reliability mechanism required
If the image doesnt fit in a single packet, it
must be placed in stable storage until transfer
is complete
Network lifetime shouldnt be significantly
reduced by the update operation
Memory and storage requirements should be moderate

42
Resource Prioritization

Energy Most important resource
Radio operations are expensive
TX 12 mA
RX 4 mA
Stable storage (EEPROM)
Everything must be stored and Write()s are
expensive
Memory usage
Static RAM
Only 4K available on current generation of motes
Code update mechanism should leave ample space
for the real application
Program memory
MOAP must transfer itself
Large image size means more packets transmitted!
Latency
Updates dont respond to real-time phenomena
Update rate is infrequent
Can be traded off for reduced energy usage

43
Design Choices

Dissemination protocol How is data propagated?
All at once (flooding)
Fast
Low energy efficiency
Neighborhood-by-neighborhood (ripple)
Energy efficient
Slow
Reliability mechanism
Repair scope local vs global
ACKs vs NACKs
Segment management
Indexing segments and gap detection Memory
hierarchy vs sliding window

44
Ripple Dissemination

Transfer data neighborhood-by-neighborhood
Single-hop
Recursively extended to multi-hop
Very few sources at each neighborhood
Preferably, only one
Receivers attempt to become sources when they
have the entire image
Publish-subscribe interface prevents nodes from
becoming sources if another source is present
Leverage the broadcast medium
If data transmission is in progress, a source
will always be one hop away!
Allows local repairs
Increased latency

45
Reliability Mechanism

Loss responsibility lies on receiver
Only one node to keep track of (sender)
NACK-based
In line with IP multicast and WSN reliability
schemes
Local scope
No need to route NACKs
Energy and complexity savings
All nodes will eventually have the same image

46
Retransmission Policies

Broadcast RREQ, no suppression
Simple
High probability of successful reception
Highly inefficient
Zero latency
Broadcast RREQ, suppression based on randomized
timers
Quite efficient
Complex
Latency and successful reception based on
randomization interval

47
Retransmission Policies (contd)

Broadcast RREQ, fixed reply probability
Simple
Good probability of successful reception
Latency depends on probability of reply
Average efficiency
Broadcast RREQ, adaptive reply probability
More complex than the static case
Similar latency/reception behavior
Unicast RREQ, single reply
Smallest probability of successful reception
Highest efficiency
Simple
Complexity increases if source fails
Zero latency
High latency if source fails

48
Segment Management Discovering if a segment is
present

No indexing
Nothing kept in RAM
Need to read from EEPROM to find if segment i is
missing
Full indexing
Entire segment (bit)map is kept in RAM
Look at entry i (in RAM) to find if segment is
missing
Partial indexing
Map kept in RAM
Each entry represents k consecutive segments
Combination of RAM and EEPROM lookup needed to
find if segment i is missing

49
Segment Management (contd)

Hierarchical full indexing
First-level map kept in ram
Each entry points to a second-level map stored in
EEPROM
Combination of RAM and EEPROM lookup needed to
find if segment i is missing
Sliding window
Bitmap of up to w segments kept in RAM
Starting point last segment received in order
RAM lookup
Limited out-of-order tolerance!

50
Retransmission Polices Comparison
51
Segment Management Comparison
52
Results Energy efficiency

Significant reduction in traffic when using
Ripple
Up to 90 for dense networks
Full Indexing performs 5-15 better than Sliding
Window
Reason better out-of-order tolerance
Differences diminish as network density grows

53
Results Latency

Flooding is 5 times faster than Ripple
Full indexing is 20-30 faster than Sliding
window
Again, reason is out-of-order tolerance

54
Results Retransmission Policies

Order-of-magnitude reduction when using unicasts

55
Current Mote implementation

Using Ripple-sliding window with unicast
retransmission policy
User builds code on the PC
Packetizer creates segments out of binary
Mote attached to PC becomes original source and
sends PUBLISH message
Receivers 1 hop away will subscribe, if version
number is greater than their own
When a receiver gets the full image, it will send
a PUBLISH message
If it doesnt receive any subscriptions for some
time, it will COMMIT the new code and invoke the
bootloader
If a subscription is received, node becomes a
source
Eventually, sources will also commit

56
Current Mote Implementation (contd)

Retransmissions have higher priority than data
packets
Duplicate requests are suppressed
Nodes keep track of their sources activity with
a keepalive timer
Solves NACK last packet problem
If the source dies, the keepalive expiration will
trigger a broadcast repair request
Late joiner mechanism allows motes that have just
recovered from failure to participate in code
transfer
Requires all nodes to periodically advertise
their version
Footprint
700 bytes RAM
4.5K bytes ROM

57
MOAP Conclusion

Full binary updates over multiple hops
Ripple dissemination reduces energy consumption
significantly
Sliding window method and unicast retransmission
policy also reduce energy consumption and
complexity
Successful updates of images up to 30K in size
Next steps
Larger experiments
Better Late Joiner mechanism
Verification phase
Sending DIFFS instead of full image

58
Trickle Overview

State synchronization/code propagation mechanism
Suitable for VM environments, where transmitted
code is small
Uses polite gossip dissemination
Periodic broadcasts of state summary
Nodes overhear transmissions and stay quiet
unless they need to update
Goals
Propagation install new code
Maintenance detect propagation need

59
Basic Mechanism

A node will periodically transmit information
Only if less than a number of neighbors hasnt
sent the same data
Cells (neighborhoods) can be in two states
All nodes up to date
Update needed
Node learns about new code
Node detects neighbor with old code
Since communication can be transmission or
reception, ideally only one node per cell needs
to transmit
Similar to MOAPs ideal single-source scenario

60
Maintenance

Time is split in periods
Nodes pick a random slot from 0..T
Transmission occurs if a node has heard less than
K other identical transmissions
Otherwise, node stays quiet
K is small (1 or 2 usually)
If a node detects a neighbor that is out of date,
it transmits the newer code
If a node detects it is out of date, it transmits
its state
Update is triggered when other nodes receive this
transmission
Nodes transmit at most once per period
In the presence of losses, scaling property is
O(logn)

61
Maintenance and timesync

When nodes are synchronized, everything works
fine
If nodes are out of sync, some might transmit
before others had a chance to listen
Short-listen problem
O(sqrt(n)) scaling
Solution Enforce a listen-only periond
Pick a slot from T/2..T for transmission

62
Maintenance and timesync (contd)
63
Propagation

Large T
Low communication overhead (less probable to pick
same slot)
High latency
Small T reversed
Solution dynamic scaling of T
Use two bounds TL and TH
When T expires, it doubles until it reaches TH
When newer state is overheard, T TL
When older state is overheard, immediately send
updates
When new code is installed, T TL
Helps spread new code quickly

64
Propagation Summary
65
Trickle Conclusion

Efficient state synchronization protocol
Fast
Limits transmissions (localized flood)
Scales well with network size
Does not propagate code per se
Instead, it notifies the system that update is
needed
In many cases, determining when to propagate can
be more expensive than propagation
Contrast with MOAPs simple late-joiner algorithm

66
The End!

Write a Comment

User Comments (0)