Title: A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond
1A World of (Im)PossibilitiesNancy Lynch
Celebration Sixty and Beyond
- Hagit Attiya, Technion
- Jennifer Welch, Texas AM University
2Introduction
- One of the main themes of Nancy's work has been
proving lower bounds and impossibility results
for problems that arise in distributed computing. - Overview some of Nancy's results
- Less known results, hidden gems closer to our
hearts - Emphasize their meaning and implications
- How they influenced the development of the field
and of distributed systems - Concentrating on their positive impact
3Best-Known Example FLP
- Impossibility of asynchronous fault-tolerant
consensus - Fischer, Lynch, Paterson
- Motivated work on
- strengthening models of computation
- partially synchronous models
Dwork, Lynch, Stockmeyer - unreliable failure detectors
Chandra, Toueg - weakening the problem definition
- k-set agreement Chaudhuri
- renaming Attiya et al.
- condition-based approaches Raynal,
Rajsbaum et al.
4FLP Impact
- Related practical problems
- transaction commit
- leader election
- atomic broadcast
- maintaining consistent replicated data
- The wait-free hierarchy (classify concurrent
abstract data types) Herlihy - Attempts to solve k-set agreement and renaming
led to the application of topology in distributed
computing. - Chaudhuri Borowsky, GafniSaks,
ZaharoglouHerlihy, Shavit
52nd Example Brewer's Conjecture
- Brewer, PODC 2000 invited talk
- A web service cannot provide all three
guarantees - Consistency
- Availability
- Partition-tolerance
6What Does This Mean?
- Gilbert, Lynch, SIGACT News 2002
- A web service cannot provide all three
guarantees - Consistency atomicity of (read / write)
operations - Availability request by nonfaulty client gets
response - Partition-tolerance even when lost messages
create two partitioned components in the network
7Proof Idea
- adapted from Attiya, Bar-Noy, Dolev
X
p0
X
X
X
p1 reads 0
contradiction
8Brewer's Conjecture Implications
- Traditional database services maintain the
consistency and fail to provide availability in
the face of partitions - Relax the consistency guarantees of the web
service - Sometimes miss values or return stale data
(Internet queries) - PIER Huebsch, Hellerstein, Lanham, Loo,
Shenker, Stoica - Allow partitions to evolve separately, and build
mechanisms to cope when this happens (stream
processing) - Medusa Balazinska, Balakrishnan, Stonebraker
- Sacrifice availability, but not often (stream
processing) - BOREALIS Balazinska, Balakrishnan, Madden,
Stonebraker - Assume a mechanism to guard against partitions
- CQ Shah, Hellerstein, Brewer
93rd Example Best-Case Cost of Fault-Tolerant
Algorithms
- Does making an algorithm be fault-tolerant incur
a cost even when the system is well-behaved? - Previous investigation focused on the synchronous
case - early stopping algorithms for consensus 2
rounds vs. 1 round for non-fault-tolerant
algorithm - Dolev, Reischuk, Strong Dwork, Moses
Moses, Tuttle - non-blocking commit twice as many rounds as for
blocking commit - Dwork, Skeen
- What about the asynchronous case?
10Are Wait-Free Algorithms Fast?
- Attiya, Lynch, Shavit
- Studies the best-case complexity of an algorithm
- When there are no failures, although algorithm
can tolerate any number of crashes (is wait-free) - When the execution is synchronized, although the
algorithm works in asynchronous executions also - Complexity measure of interest is running time
- Time is measured by synchronized rounds
- Problem of interest is approximate agreement
n 6
11Wait-Free Algorithms are not Fast
- A non-fault-tolerant algorithm takes O(1) time
- one process writes its input and the rest read it
- achieves perfect agreement (? 0)
- Prove an O(log n) time lower bound for wait-free
approximate agreement - So there are problems for which being wait-free
in the asynchronous model imposes more than
constant additional cost even when failures do
not occur.
12Proof Idea
this process cannot influence the decision
0
0
0
0
0
0
0
0
decide0
0
13Proof Idea
? decide1
1
decide0
14The Best-Case Cost of Fault-Tolerance
- Formalize the idea of "designing for the normal /
common case" and show its cost - Lampson, "Hints for computer system design"
- The idea of accommodating the worst case
measuring the best / normal / common case has
become standard. - message cost of consensus in failure-free runs
- Halpern, Hadzilacos
- contention-free step complexity
- Alur, Taubenfeld
- obstruction-free step complexity
- Ellen, Luchangco, Moir, Shavit
15Interleaving Algorithms
- Also an approximate agreement algorithm matching
the ?(log n) time lower bound - Interleaves two algorithms
- One guarantees fault-tolerance
- Another guarantees best-case time complexity
- Need to coordinate results
- Using a virtual two-process approximate
agreement algorithm - Similar applications of interleaving, especially
in randomized consensus Saks, Shavit, Woll - E.g., this morning session Aspnes,
Attiya, Censor
16Application Replicated Storage
- Yu and Vahdat
- Emulates a shared memory
- Replication-based implementation of wide-area
data access services - need automatic regeneration of failed replicas
and reconfiguration of groups - Probabilistic guarantee reads may return stale
values with a small probability - Optimizes for best case
- Failure-free reconfiguration is quick and cheap
- Failure-induced calls a consensus protocol Saks,
Shavit, Woll for replicas to agree on next
configuration
174th Example Clock Synchronization
- In a distributed system with n nodes that
experiences variable message delays, how closely
can the nodes' clocks be synchronized?
18Clock Synchronization Lower Bound
- Lundelius, Lynch
- No algorithm can synchronize n clocks closer than
- (1-1/n)u For a clique with same
message delay uncertainty u on all links (u
max delay - min delay) - Even if no failures and no clock drift
- Proof introduced the shifting technique
shift p0 backwards by u
19What About Other Topologies?
- Halpern, Megiddo, Munshi
- Arbitrary topologies and nonuniform uncertainties
- Adversary's optimal strategy is to maximize a
certain quantity - involving neighboring nodes' initial clock values
and the delays between them - subject to constraints on message uncertainty
- Bound is expressed as a system of equations, and
this linear program is solved using optimization
techniques - Shifting notion is captured in the linear program
- Not in closed form except for a few special cases
- Bound is tight
20What About Closed Form Bounds?
- Biaz, Welch
- If uncertainties are symmetric (same in both
directions of a link), then lower bound is - diam/2
- where diam is diameter of the graph w.r.t.
uncertainties
c
d
b
1
2
5
diam 9
3
3
2
4
a
4
f
5
e
21Shifting Equivalent Clique
- Arbitrary topology G with arbitrary uncertainties
is equivalent to clique G' with same nodes where
uncertainty between any two nodes is length of
shortest path between them in G (w.r.t.
uncertainties) - Halpern, Megiddo, Munshi
- Shift a carefully chosen execution on the
clique, for 2 nodes diam apart to get the
diam/2 lower bound.
3
a
a
b
5
6
6
3
4
2
3
9
f
f
c
4
2
5
1
5
d
e
3
22What About Upper Bounds?
- For arbitrary graph and arbitrary topology, the
radius is an upper bound Halpern, Megiddo,
Munshi - Since radius diam, within factor of 2
diam 9 radius 5
- Tight almost tight closed form upper bounds for
some specific common topologies with uniform
uncertainties Biaz, Welch
23External Clock Synchronization
- What about external synchronization, when some
clocks have outside time sources? - Previous results for internal synchronization
- The tight bound on how close a node's clock can
get to the source time is half the shortest path
distance (w.r.t. uncertainties) from the node to
a source - Attiya, Hay, Welch
c
d source
b
1
2
bounds are b 3/2 c 1/2 e 3/2 f 5/2
5
3
2
4
source a
3
4
f
5
24Optimal Synchronization Per Execution
- Given information collected in a specific
execution,by some algorithm strategy, find the
tightest possible synchronization - internal synchronization, offline algorithm
- Attiya, Herzberg, Rajsbaum
- external synchronization, online algorithm
- Patt-Shamir, Rajsbaum
- extended to handle clock drift
- Ostrovsky, Patt-Shamir
25Gradient Clock Synchronization
- The clock skew between any pair of nodes should
be a function of the distance between them - Fan, Lynch
c
d
b
clocks of a and d need not be as tightly
synch'ed as those of a and b
a
f
e
26Gradient Clock Synchronization
- motivated by problems in sensor networks, or
more generally, large scale networks, where
nodes in the same locality need to be more
tightly synchronized - data fusion
- target tracking
http//www.mikalac.com/mis/missile.html
27Gradient Clock Synch Lower Bound
- Closest that two nodes' clocks can get (in worst
case) is ?(log D / log log D) - D is diameter of network ? global influence
- Algorithms requiring a fixed maximum skew for
nearby nodes may not scale well - E.g., TDMA
http//www.dsna-dti.aviation-civile.gouv.fr/actual
ities /revuesgb/revue64gb/64pgarticle2gb/telecom_c
2gb.html
28Gradient Clock Synch Lower Bound Assumption 1
- Nonzero clock drift (hardware) clocks can run
fast or slow, within known bounds
29Gradient Clock Synch Lower Bound Assumption 2
- Algorithm must ensure that (logical) clocks
always increase at some minimum positive rate ?
logical clock
min slope
clock time
?
real time
30Gradient Clock Synch LB Simple Case
- Consider a simple algorithm in which the clock
value of p1 is periodically propagated down the
chain - Can construct execution in which pn-1's new clock
value is larger than pn's old clock value by an
amount depending on D - carefully choose message delays
- manipulate clock drift rates
- cause nodes to suddenly jump to higher values
without synchronizing with their neighbors - Insight in the paper is generalizing this to any
algorithm
31Is the Lower Bound Tight?
- Recall lower bound is ?(log D / log log D)
- Several pre-existing algorithms have O(D)
- Then upper bound improved to O(vD)
- Locher, Wattenhofer
- Recently upper bound improved to O(log D)
- Lenzen, Locher, Wattenhofer
- Still a small gap can the lower bound be
improved?
32How Long Can Large Difference Last?
- In the simple diffusion algorithm on the chain,
large difference between pn-1 and pn only lasts
while message is in transit - Perhaps difficulties could be avoided by keeping
track of generation of clock value and only
comparing apples with apples (clocks of the same
generation)? - but this could be complicated
33And Theres a Lot More
- Lower bounds on space for mutual exclusion
- Burns, Lynch
- Lower bound on number of messages for leader
election in synchronous rings - Frederickson, Lynch
- Impossibility results for data link layer and
connection management - Fekete, Lynch, Mansour, Spinelli Kleinberg,
Attiya, Lynch - Lower bound on time for consensus in partially
synchronous models - Attiya, Dwork, Lynch, Stockmeyer
- Lower bound on time for synchronous k-set
agreement - Chaudhuri, Herlihy, Lynch, Tuttle
- Tradeoff between safety and liveness for
randomized coordinated attack - Varghese, Lynch
- Impossibility of boosting fault tolerance
- Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum
34Final Observations
- Strive to make the results relevant
- Natural problems
- Practical architectural assumptions
- Realistic performance measures (for lower bounds)
- Crisp arguments (ingenious but clear)
- Easy to understand and verify
- Simple to extend and lead to follow-ups
35Take-Home Message
- Impossibility results help the development of the
area - Understanding inherent limits guides efforts in
the appropriate directions - And setting boundaries is good for everyone
36Thanks for your attention