A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond

Description:

involving neighboring nodes' initial clock values and the delays between them ... The tight bound on how close a node's clock can get to the source time is half ... – PowerPoint PPT presentation

Number of Views:119

Avg rating:3.0/5.0

Slides: 37

Provided by: jennife397

Category:

more less

Transcript and Presenter's Notes

Title: A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond

1
A World of (Im)PossibilitiesNancy Lynch
Celebration Sixty and Beyond

Hagit Attiya, Technion
Jennifer Welch, Texas AM University

2
Introduction

One of the main themes of Nancy's work has been
proving lower bounds and impossibility results
for problems that arise in distributed computing.
Overview some of Nancy's results
Less known results, hidden gems closer to our
hearts
Emphasize their meaning and implications
How they influenced the development of the field
and of distributed systems
Concentrating on their positive impact

3
Best-Known Example FLP

Impossibility of asynchronous fault-tolerant
consensus
Fischer, Lynch, Paterson
Motivated work on
strengthening models of computation
partially synchronous models
Dwork, Lynch, Stockmeyer
unreliable failure detectors
Chandra, Toueg
weakening the problem definition
k-set agreement Chaudhuri
renaming Attiya et al.
condition-based approaches Raynal,
Rajsbaum et al.

4
FLP Impact

Related practical problems
transaction commit
leader election
atomic broadcast
maintaining consistent replicated data
The wait-free hierarchy (classify concurrent
abstract data types) Herlihy
Attempts to solve k-set agreement and renaming
led to the application of topology in distributed
computing.
Chaudhuri Borowsky, GafniSaks,
ZaharoglouHerlihy, Shavit

5
2nd Example Brewer's Conjecture

Brewer, PODC 2000 invited talk
A web service cannot provide all three
guarantees
Consistency
Availability
Partition-tolerance

6
What Does This Mean?

Gilbert, Lynch, SIGACT News 2002
A web service cannot provide all three
guarantees
Consistency atomicity of (read / write)
operations
Availability request by nonfaulty client gets
response
Partition-tolerance even when lost messages
create two partitioned components in the network

7
Proof Idea

adapted from Attiya, Bar-Noy, Dolev

X
p0
X
X
X
p1 reads 0
contradiction
8
Brewer's Conjecture Implications

Traditional database services maintain the
consistency and fail to provide availability in
the face of partitions
Relax the consistency guarantees of the web
service
Sometimes miss values or return stale data
(Internet queries)
PIER Huebsch, Hellerstein, Lanham, Loo,
Shenker, Stoica
Allow partitions to evolve separately, and build
mechanisms to cope when this happens (stream
processing)
Medusa Balazinska, Balakrishnan, Stonebraker
Sacrifice availability, but not often (stream
processing)
BOREALIS Balazinska, Balakrishnan, Madden,
Stonebraker
Assume a mechanism to guard against partitions
CQ Shah, Hellerstein, Brewer

9
3rd Example Best-Case Cost of Fault-Tolerant
Algorithms

Does making an algorithm be fault-tolerant incur
a cost even when the system is well-behaved?
Previous investigation focused on the synchronous
case
early stopping algorithms for consensus 2
rounds vs. 1 round for non-fault-tolerant
algorithm
Dolev, Reischuk, Strong Dwork, Moses
Moses, Tuttle
non-blocking commit twice as many rounds as for
blocking commit
Dwork, Skeen
What about the asynchronous case?

10
Are Wait-Free Algorithms Fast?

Attiya, Lynch, Shavit
Studies the best-case complexity of an algorithm
When there are no failures, although algorithm
can tolerate any number of crashes (is wait-free)
When the execution is synchronized, although the
algorithm works in asynchronous executions also
Complexity measure of interest is running time
Time is measured by synchronized rounds
Problem of interest is approximate agreement

n 6
11
Wait-Free Algorithms are not Fast

A non-fault-tolerant algorithm takes O(1) time
one process writes its input and the rest read it
achieves perfect agreement (? 0)
Prove an O(log n) time lower bound for wait-free
approximate agreement
So there are problems for which being wait-free
in the asynchronous model imposes more than
constant additional cost even when failures do
not occur.

12
Proof Idea
this process cannot influence the decision
0
0
0
0
0
0
0
0
decide0
0
13
Proof Idea
? decide1
1
decide0
14
The Best-Case Cost of Fault-Tolerance

Formalize the idea of "designing for the normal /
common case" and show its cost
Lampson, "Hints for computer system design"
The idea of accommodating the worst case
measuring the best / normal / common case has
become standard.
message cost of consensus in failure-free runs
Halpern, Hadzilacos
contention-free step complexity
Alur, Taubenfeld
obstruction-free step complexity
Ellen, Luchangco, Moir, Shavit

15
Interleaving Algorithms

Also an approximate agreement algorithm matching
the ?(log n) time lower bound
Interleaves two algorithms
One guarantees fault-tolerance
Another guarantees best-case time complexity
Need to coordinate results
Using a virtual two-process approximate
agreement algorithm
Similar applications of interleaving, especially
in randomized consensus Saks, Shavit, Woll
E.g., this morning session Aspnes,
Attiya, Censor

16
Application Replicated Storage

Yu and Vahdat
Emulates a shared memory
Replication-based implementation of wide-area
data access services
need automatic regeneration of failed replicas
and reconfiguration of groups
Probabilistic guarantee reads may return stale
values with a small probability
Optimizes for best case
Failure-free reconfiguration is quick and cheap
Failure-induced calls a consensus protocol Saks,
Shavit, Woll for replicas to agree on next
configuration

17
4th Example Clock Synchronization

In a distributed system with n nodes that
experiences variable message delays, how closely
can the nodes' clocks be synchronized?

18
Clock Synchronization Lower Bound

Lundelius, Lynch
No algorithm can synchronize n clocks closer than
(1-1/n)u For a clique with same
message delay uncertainty u on all links (u
max delay - min delay)
Even if no failures and no clock drift
Proof introduced the shifting technique

shift p0 backwards by u
19
What About Other Topologies?

Halpern, Megiddo, Munshi
Arbitrary topologies and nonuniform uncertainties
Adversary's optimal strategy is to maximize a
certain quantity
involving neighboring nodes' initial clock values
and the delays between them
subject to constraints on message uncertainty
Bound is expressed as a system of equations, and
this linear program is solved using optimization
techniques
Shifting notion is captured in the linear program
Not in closed form except for a few special cases
Bound is tight

20
What About Closed Form Bounds?

Biaz, Welch
If uncertainties are symmetric (same in both
directions of a link), then lower bound is
diam/2
where diam is diameter of the graph w.r.t.
uncertainties

c
d
b
1
2
5
diam 9
3
3
2
4
a
4
f
5
e
21
Shifting Equivalent Clique

Arbitrary topology G with arbitrary uncertainties
is equivalent to clique G' with same nodes where
uncertainty between any two nodes is length of
shortest path between them in G (w.r.t.
uncertainties)
Halpern, Megiddo, Munshi
Shift a carefully chosen execution on the
clique, for 2 nodes diam apart to get the
diam/2 lower bound.

3
a
a
b
5
6
6
3
4
2
3
9
f
f
c
4
2
5
1
5
d
e
3
22
What About Upper Bounds?

For arbitrary graph and arbitrary topology, the
radius is an upper bound Halpern, Megiddo,
Munshi
Since radius diam, within factor of 2

diam 9 radius 5

Tight almost tight closed form upper bounds for
some specific common topologies with uniform
uncertainties Biaz, Welch

23
External Clock Synchronization

What about external synchronization, when some
clocks have outside time sources?
Previous results for internal synchronization
The tight bound on how close a node's clock can
get to the source time is half the shortest path
distance (w.r.t. uncertainties) from the node to
a source
Attiya, Hay, Welch

c
d source
b
1
2
bounds are b 3/2 c 1/2 e 3/2 f 5/2
5
3
2
4
source a
3
4
f
5
24
Optimal Synchronization Per Execution

Given information collected in a specific
execution,by some algorithm strategy, find the
tightest possible synchronization
internal synchronization, offline algorithm
Attiya, Herzberg, Rajsbaum
external synchronization, online algorithm
Patt-Shamir, Rajsbaum
extended to handle clock drift
Ostrovsky, Patt-Shamir

25
Gradient Clock Synchronization

The clock skew between any pair of nodes should
be a function of the distance between them
Fan, Lynch

c
d
b
clocks of a and d need not be as tightly
synch'ed as those of a and b
a
f
e
26
Gradient Clock Synchronization

motivated by problems in sensor networks, or
more generally, large scale networks, where
nodes in the same locality need to be more
tightly synchronized
data fusion
target tracking

http//www.mikalac.com/mis/missile.html
27
Gradient Clock Synch Lower Bound

Closest that two nodes' clocks can get (in worst
case) is ?(log D / log log D)
D is diameter of network ? global influence
Algorithms requiring a fixed maximum skew for
nearby nodes may not scale well
E.g., TDMA

http//www.dsna-dti.aviation-civile.gouv.fr/actual
ities /revuesgb/revue64gb/64pgarticle2gb/telecom_c
2gb.html
28
Gradient Clock Synch Lower Bound Assumption 1

Nonzero clock drift (hardware) clocks can run
fast or slow, within known bounds

29
Gradient Clock Synch Lower Bound Assumption 2

Algorithm must ensure that (logical) clocks
always increase at some minimum positive rate ?

logical clock
min slope
clock time
?
real time
30
Gradient Clock Synch LB Simple Case

Consider a simple algorithm in which the clock
value of p1 is periodically propagated down the
chain
Can construct execution in which pn-1's new clock
value is larger than pn's old clock value by an
amount depending on D
carefully choose message delays
manipulate clock drift rates
cause nodes to suddenly jump to higher values
without synchronizing with their neighbors
Insight in the paper is generalizing this to any
algorithm

31
Is the Lower Bound Tight?

Recall lower bound is ?(log D / log log D)
Several pre-existing algorithms have O(D)
Then upper bound improved to O(vD)
Locher, Wattenhofer
Recently upper bound improved to O(log D)
Lenzen, Locher, Wattenhofer
Still a small gap can the lower bound be
improved?

32
How Long Can Large Difference Last?

In the simple diffusion algorithm on the chain,
large difference between pn-1 and pn only lasts
while message is in transit
Perhaps difficulties could be avoided by keeping
track of generation of clock value and only
comparing apples with apples (clocks of the same
generation)?
but this could be complicated

33
And Theres a Lot More