Title: Local Tolerance to Unbounded Byzantine Faults
1Local Tolerance to UnboundedByzantine Faults
2Faults in System of Large Scale
- large system size presents unique challenges and
opportunitiesto ensuring dependability - problem
- faults
- occur often
- affect multiple components
- interact unpredictably
- asynchronous execution model
- faults are spatially/temporally unbounded,
complex undetectable - opportunity
- a fault directly affects a region rather than
whole system - if faults are contained, rest of the system
continues to function
3Difficulties Containing Unbounded Faults
- lack of spatial bound
- arbitrary number of processes can be faulty
- cannot rely on limited scope offault or number
of faulty processes - lack of temporal bound
- faulty process behaves incorrectly arbitrarily
long - cannot wait until fault stops
- contain correctness and tolerance instead of
faults - use execution models that simplify such
containment
4Outline
- containing correctness and tolerancestrict
fault containment and strict stabilization - execution models and example programs
- reactive program dining philosophers
- transformational execution models and programs
- output dependent -independent set selection
- output independent lightweight spanner
construction
5Containing Correctness
- address specification first
- what does it mean for a system to be correct
when its arbitrary portion is faulty? - spec defines correct sequences for each process
P - sequence involves states of Pand possibly others
- a program is locally containing of faults of
class F if ? constant l (containment radius)
such that - every P conforms to its spec if faulty processes
are at least l hops away from P - problem correctness of P depends onevery
process in the system conforming to spec or F
6Strict Fault Containment
- strict fault containing (SFC) program is locally
containing of unboundedByzantine faults - a process satisfies spec regardlessof actions of
processes outsidelocality - SFC-program is containing ofbounded and
unbounded faults of any class - for each P the spec can only mention processes
inside locality - a problem lacking such specs (e.g. routing) does
not have SFC-solutions
7Strict Stabilization
additional tolerance properties to faults within
locality for a strictly-fault containing program
8Outline
- containing correctness and tolerancestrict
fault containment and strict stabilization - execution models and example programs
- reactive program dining philosophers
- transformational execution models and programs
- output dependent k-independent set selection
- output independent lightweight spanner
construction
9Dining Philosophers Problem
- definition
- network of processes, each may request to eat
- properties
- mutual exclusion no two neighbors eat together
- liveness each requestingprocess eats
eventually - execution model
- interleaving
- communication via shared registers
- high-atomicity
10Solution to Dining Philosophers
- priority based
- actions
- if T higher priority neighbors thinking ?
become hungry - if H no neighbors are eating ? eat
(ensures MX) - E done ?
think give
priority to
neighbors (ensures liveness) - waiting chain 3
- optimal containmentradius of 2
11Fault Containment andInformation Propagation
- fault containment leverages limit on information
propagation - idea abstract fromthe process of information
propagation and highlight the result
12Execution Models
- transformation program given input computes
output (e.g. leader election) - models for transformation programs each process
reads from processes within range (finite
distance) - output dependent each process reads all
information within range input and (atomically)
output - output independent each process reads only
input within range - every program in this model is strictly fault
containing
13k-Independent Set Selection (cf. HHJS01)
- problem select a maximal subset of processes S
such that - for each process in S each otherprocess of S is
at least k hops away - solution actions
- if no member of S less than k-hops away ?
join S - if exists member of S less than k-hops away ?
leave S - observe
- only faulty node P can make another process Q to
leave S - if Q leaves S, it can make another process R
join S - containment radius is 2k
14Outline
- containing correctness and tolerancestrict
fault containment and strict stabilization - execution models and example programs
- reactive program dining philosophers
- transformational execution models and programs
- output dependent k-independent set selection
- output independent lightweight spanner
construction - practical problem fast routing tree construction
in sensor networks - spanner construction with double range
- spanner optimization with larger ranges
15Experimental Platform Wireless Sensors
- 4 MHz Amtel processor
- 8 Kb of programming memory
- 512B of data memory
- 916 MHz single-channel, low-power radio
- 10 Kbps of raw bandwidth
- uniform antenna length orientation
- TinyOS as the runtime system
- fresh AA batteries
16Experiment Fast Routing Tree Construction By
Flooding G02
- 156 nodes are arranged in a 13x12 grid on an
open parking lot, with grid spacing of 2 feet. - the base station is placed in the middle of the
base of the grid and starts the flooding - each receiving node rebroadcast the flood message
immediately upon receipt and then squelches
further broadcasts - the sender is selected as parent, thus routing
tree to the base station is formed - expectation a routing tree with relatively
regular structure - of children, link length, path size, etc.
171 hop
2 hops
Long Link
Backward Link
final
3 hops
Straggler
Clustering
18Problems and Solution Approach
- problem routing tree constructed fast overraw
topology is inadequate - uneven clustering (some nodes have too many
neighbors) - long links (possibly unreliable)
- unoptimal paths (backward links)
- idea pre-process the topology to mitigate the
problem - weigh links (by length, error rate, node degree,
etc.) - locally construct a connected but lightweight
spanner - link weight may be reflexive (depend on the
spanner, ex node degree)
19Lightweight Spanner Construction Using2k-Range
P can compute MSTfor each process Qin this
region
- spanner connected subgraph that includes all
nodes (ex spanning tree) - k-local spanner there is a path within
distance k to each neighbor - problem given a weighted graph(all weights
unique) and 2k-rangebuild a lightweight k-local
spanner - solution each process P computes the minimum
spanning tree for eachprocess Q in distance no
more than k and selects the union of incident
edges
k
k
P
Q
MST for Qs region
20Spanner Optimization Using Ranges gt 2
- each P computes spanners topology in
neighborhood with radius range-k - P knows complete spanner in this region
- P iteratively repeats theprocedure on the
resultant spanner
P can compute MSTfor each process Qin this
region
k
k
k
P
Q
21Conclusion
- complexity and scale of large systemsforces
unorthodox approaches to faults - we explored spatial dimension of fault tolerance
to complex unbounded faults, used lack of global
info propagation - stated necessary conditions and impossibility
results - gave first examples of programs
- question how to solve problems that do have
global info propagation? is it possible to
contain problems before they spread?