Title: Robust Distributed Services in Embedded Networks
1Robust Distributed Servicesin Embedded Networks
2Take-Away Message
- An analogy
- Users on the Internet are not satisfied with only
connectivity - Higher-level services attract users and
applications - Same theme is arising in mobile handheld
applications - Similarly, we believe that ensuring connectivity
is only part of the picture for embedded / ad-hoc
/ networks - Users and applications will require services,
databases, and other pull-style information
backplanes
3What Makes This Difficult?
- If your embedded / ad-hoc network is autonomous,
it may have no servers! - At least not in the typical sense of that word
- A server is typically
- Well provisioned and maintained
- Reliably connected
- Relatively trustworthy
- Embedded / ad hoc networks may lack any such nodes
4Survivable Distributed Services
- Service, or object, abstraction
push
pop
sort
invocation
response
5Traditional Approach State Machine Replication
Servers
inv
inv
inv
- Offers no load dispersion, and degrades as system
scales
6Quorum Systems
- Quorum systems
- Basic tool for synchronization in distributed
systems - A set of subsets (quorums) of a universe U of
logical elements, having intersection property
(any pair of quorums intersect)
Majority
Grid
7Byzantine Quorum Systems
- A quorum system is a data redundancy technique
that supports load dispersion among servers - Only a subset of servers are accessed in each
operation - Good servers in intersection must be enough to
out vote bad servers
Ex Grid with n49, b3
8Protocols for Survivable Servicesw/
Abd-El-Malek, Ganger, Goodson, and Wylie
- New protocols for
- Read/write objects
- Arbitrary services (Q/U)
- combining
- Quorum systems
- Optimistic execution
- Fast cryptographic primitives
- Graphs on right show that quorum protocols can
scale better than SMR in real systems - But these were well-connected settings
9Dealing with Network Effects
- Network effects are likely to be just as
important in embedded / ad hoc networks as load
dispersion - Even worse, minimizing network delays for
accessing quorums can be in conflict with load
dispersion - May have to bypass a close but heavily-loaded
quorum in favor of a less-loaded but more distant
quorum - Can we balance this tradeoff?
10Quorum Placement Problems
- Place good quorum systems on network
- to minimize network-specific measures
- preserve goodness
- Goodness load
- Assume each quorum Q is accessed with probability
p(Q) - loadp(u) ?Q u?Q p(Q)
- Network measures
- Average delay observed by clients when accessing
quorum system - Network congestion induced by clients accessing
quorum system
11Network Measures
- quorum system Q over U
- access strategy p Q ? 0, 1
- placement f U ?V
- Given
- network G (V, E)
- delay d E ? R
- edge_cap E ? R
- Average max-delay
- d(v, f(Q)) maxu?Q d(v, f(u))
- d(v, f(Q)) Epd(v, f(Q)) ?f (v)
- avg_delayf Avgv?V ?f (v)
- Network congestion
- flow gv,f(u) E ? R
- traffe(v, f(Q)) ?u?Q gv,f(u)(e)
- traffe Avgv?V Eptraffice(v, f(Q))
- congf maxe?E traffe/edge_cap(e)
12Quorum Placement Problem for Delay (QPPD)
- Given
- graph G (V, E),
- with distances d E ? R
- and capacity node_cap(v) for all v ? V
- a quorum system Q
- with a distribution p s.t. each Qi is accessed
with prob. p(Qi) - find placement f
- minimizing average max-delay, Avgv?V ?f (v)
- subject to load constraints loadf(v)
node_cap(v) , for all v ? V
1/3
4
f ?
5
1/3
1/3
5
13Results for QPPDw/ Gupta, Maggs, Oprea
- QPPD is NP-hard
- For any ? gt 1, there is a (5?/(??1), ?1)
approximation - If we allow capacities to be exceeded by a factor
of ?1, then we can achieve average max-delay
within a factor of 5?/(??1) of optimal for all
capacity-respecting solutions - For Majority and Grid, if node capacities equal
the optimal load of the quorum system, there is a
(5, 1)-approximation.
14Quorum Placement for Congestion (QPPC)
- Two routing models
- Fixed paths (given as input)
- Arbitrary paths (chosen probabilistically)
- Given
- graph G (V, E),
- node capacities node_cap(v) for all v ? V,
- and edge capacities edge_cap(e) for all e ? E
- a quorum system Q
- with a distribution p s.t. each Qi is accessed
with prob. p(Qi) - find placement f
- minimizing max relative-congestion, Maxe?E congf
(e) - subject to load constraints loadf(v)
node_cap(v) , for all v ? V
15Results for QPPCw/ Golovin, Gupta, Maggs, Oprea
- QPPC is NP-hard in either model
- Even finding any node-capacity-respecting
solution is NP-hard - Arbitrary paths
- There is an (O(log2 n log log n),
2)-approximation. - If we allow node capacities to be exceeded by a
factor of 2, then we can achieve max
relative-congestion to within a factor of O(log2
n log log n) of optimal for all
node-capacity-respecting solutions - If G is a tree, there is a (5, 2)-approximation.
- Fixed paths
- There is an (O(? log n / log log n), 2)
approximation, where ? is the size of the set
?log2(load(u))? u ? U
16Theory vs. Practice
- We have some initial theory results
- But many theoretical questions remain unanswered
- But how does the theory correspond to practice?
- Example Network delay is only one component of
client response time, the other being server load - So, network delay and server load are not easily
separable for this measure - These problems still need to be explored even in
fixed-infrastructure networks
17Embedded / Ad Hoc Networks
- Importance of addressing faults
- Not only due to disabling quorum elements, but
also due to impinging on quorum reachability - If population is dynamic
- Need to consider migrating quorum elements
- If mobility is involved
- Continually need to re-evaluate quorum placements