A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond

Description:

involving neighboring nodes' initial clock values and the delays between them ... The tight bound on how close a node's clock can get to the source time is half ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 37
Provided by: jennife397
Category:

less

Transcript and Presenter's Notes

Title: A World of ImPossibilities Nancy Lynch Celebration: Sixty and Beyond


1
A World of (Im)PossibilitiesNancy Lynch
Celebration Sixty and Beyond
  • Hagit Attiya, Technion
  • Jennifer Welch, Texas AM University

2
Introduction
  • One of the main themes of Nancy's work has been
    proving lower bounds and impossibility results
    for problems that arise in distributed computing.
  • Overview some of Nancy's results
  • Less known results, hidden gems closer to our
    hearts
  • Emphasize their meaning and implications
  • How they influenced the development of the field
    and of distributed systems
  • Concentrating on their positive impact

3
Best-Known Example FLP
  • Impossibility of asynchronous fault-tolerant
    consensus
  • Fischer, Lynch, Paterson
  • Motivated work on
  • strengthening models of computation
  • partially synchronous models
    Dwork, Lynch, Stockmeyer
  • unreliable failure detectors
    Chandra, Toueg
  • weakening the problem definition
  • k-set agreement Chaudhuri
  • renaming Attiya et al.
  • condition-based approaches Raynal,
    Rajsbaum et al.

4
FLP Impact
  • Related practical problems
  • transaction commit
  • leader election
  • atomic broadcast
  • maintaining consistent replicated data
  • The wait-free hierarchy (classify concurrent
    abstract data types) Herlihy
  • Attempts to solve k-set agreement and renaming
    led to the application of topology in distributed
    computing.
  • Chaudhuri Borowsky, GafniSaks,
    ZaharoglouHerlihy, Shavit

5
2nd Example Brewer's Conjecture
  • Brewer, PODC 2000 invited talk
  • A web service cannot provide all three
    guarantees
  • Consistency
  • Availability
  • Partition-tolerance

6
What Does This Mean?
  • Gilbert, Lynch, SIGACT News 2002
  • A web service cannot provide all three
    guarantees
  • Consistency atomicity of (read / write)
    operations
  • Availability request by nonfaulty client gets
    response
  • Partition-tolerance even when lost messages
    create two partitioned components in the network

7
Proof Idea
  • adapted from Attiya, Bar-Noy, Dolev

X
p0
X
X
X
p1 reads 0
contradiction
8
Brewer's Conjecture Implications
  • Traditional database services maintain the
    consistency and fail to provide availability in
    the face of partitions
  • Relax the consistency guarantees of the web
    service
  • Sometimes miss values or return stale data
    (Internet queries)
  • PIER Huebsch, Hellerstein, Lanham, Loo,
    Shenker, Stoica
  • Allow partitions to evolve separately, and build
    mechanisms to cope when this happens (stream
    processing)
  • Medusa Balazinska, Balakrishnan, Stonebraker
  • Sacrifice availability, but not often (stream
    processing)
  • BOREALIS Balazinska, Balakrishnan, Madden,
    Stonebraker
  • Assume a mechanism to guard against partitions
  • CQ Shah, Hellerstein, Brewer

9
3rd Example Best-Case Cost of Fault-Tolerant
Algorithms
  • Does making an algorithm be fault-tolerant incur
    a cost even when the system is well-behaved?
  • Previous investigation focused on the synchronous
    case
  • early stopping algorithms for consensus 2
    rounds vs. 1 round for non-fault-tolerant
    algorithm
  • Dolev, Reischuk, Strong Dwork, Moses
    Moses, Tuttle
  • non-blocking commit twice as many rounds as for
    blocking commit
  • Dwork, Skeen
  • What about the asynchronous case?

10
Are Wait-Free Algorithms Fast?
  • Attiya, Lynch, Shavit
  • Studies the best-case complexity of an algorithm
  • When there are no failures, although algorithm
    can tolerate any number of crashes (is wait-free)
  • When the execution is synchronized, although the
    algorithm works in asynchronous executions also
  • Complexity measure of interest is running time
  • Time is measured by synchronized rounds
  • Problem of interest is approximate agreement

n 6
11
Wait-Free Algorithms are not Fast
  • A non-fault-tolerant algorithm takes O(1) time
  • one process writes its input and the rest read it
  • achieves perfect agreement (? 0)
  • Prove an O(log n) time lower bound for wait-free
    approximate agreement
  • So there are problems for which being wait-free
    in the asynchronous model imposes more than
    constant additional cost even when failures do
    not occur.

12
Proof Idea
this process cannot influence the decision
0
0
0
0
0
0
0
0
decide0
0
13
Proof Idea
? decide1
1
decide0
14
The Best-Case Cost of Fault-Tolerance
  • Formalize the idea of "designing for the normal /
    common case" and show its cost
  • Lampson, "Hints for computer system design"
  • The idea of accommodating the worst case
    measuring the best / normal / common case has
    become standard.
  • message cost of consensus in failure-free runs
  • Halpern, Hadzilacos
  • contention-free step complexity
  • Alur, Taubenfeld
  • obstruction-free step complexity
  • Ellen, Luchangco, Moir, Shavit

15
Interleaving Algorithms
  • Also an approximate agreement algorithm matching
    the ?(log n) time lower bound
  • Interleaves two algorithms
  • One guarantees fault-tolerance
  • Another guarantees best-case time complexity
  • Need to coordinate results
  • Using a virtual two-process approximate
    agreement algorithm
  • Similar applications of interleaving, especially
    in randomized consensus Saks, Shavit, Woll
  • E.g., this morning session Aspnes,
    Attiya, Censor

16
Application Replicated Storage
  • Yu and Vahdat
  • Emulates a shared memory
  • Replication-based implementation of wide-area
    data access services
  • need automatic regeneration of failed replicas
    and reconfiguration of groups
  • Probabilistic guarantee reads may return stale
    values with a small probability
  • Optimizes for best case
  • Failure-free reconfiguration is quick and cheap
  • Failure-induced calls a consensus protocol Saks,
    Shavit, Woll for replicas to agree on next
    configuration

17
4th Example Clock Synchronization
  • In a distributed system with n nodes that
    experiences variable message delays, how closely
    can the nodes' clocks be synchronized?

18
Clock Synchronization Lower Bound
  • Lundelius, Lynch
  • No algorithm can synchronize n clocks closer than
  • (1-1/n)u For a clique with same
    message delay uncertainty u on all links (u
    max delay - min delay)
  • Even if no failures and no clock drift
  • Proof introduced the shifting technique

shift p0 backwards by u
19
What About Other Topologies?
  • Halpern, Megiddo, Munshi
  • Arbitrary topologies and nonuniform uncertainties
  • Adversary's optimal strategy is to maximize a
    certain quantity
  • involving neighboring nodes' initial clock values
    and the delays between them
  • subject to constraints on message uncertainty
  • Bound is expressed as a system of equations, and
    this linear program is solved using optimization
    techniques
  • Shifting notion is captured in the linear program
  • Not in closed form except for a few special cases
  • Bound is tight

20
What About Closed Form Bounds?
  • Biaz, Welch
  • If uncertainties are symmetric (same in both
    directions of a link), then lower bound is
  • diam/2
  • where diam is diameter of the graph w.r.t.
    uncertainties

c
d
b
1
2
5
diam 9
3
3
2
4
a
4
f
5
e
21
Shifting Equivalent Clique
  • Arbitrary topology G with arbitrary uncertainties
    is equivalent to clique G' with same nodes where
    uncertainty between any two nodes is length of
    shortest path between them in G (w.r.t.
    uncertainties)
  • Halpern, Megiddo, Munshi
  • Shift a carefully chosen execution on the
    clique, for 2 nodes diam apart to get the
    diam/2 lower bound.

3
a
a
b
5
6
6
3
4
2
3
9
f
f
c
4
2
5
1
5
d
e
3
22
What About Upper Bounds?
  • For arbitrary graph and arbitrary topology, the
    radius is an upper bound Halpern, Megiddo,
    Munshi
  • Since radius diam, within factor of 2

diam 9 radius 5
  • Tight almost tight closed form upper bounds for
    some specific common topologies with uniform
    uncertainties Biaz, Welch

23
External Clock Synchronization
  • What about external synchronization, when some
    clocks have outside time sources?
  • Previous results for internal synchronization
  • The tight bound on how close a node's clock can
    get to the source time is half the shortest path
    distance (w.r.t. uncertainties) from the node to
    a source
  • Attiya, Hay, Welch

c
d source
b
1
2
bounds are b 3/2 c 1/2 e 3/2 f 5/2
5
3
2
4
source a
3
4
f
5
24
Optimal Synchronization Per Execution
  • Given information collected in a specific
    execution,by some algorithm strategy, find the
    tightest possible synchronization
  • internal synchronization, offline algorithm
  • Attiya, Herzberg, Rajsbaum
  • external synchronization, online algorithm
  • Patt-Shamir, Rajsbaum
  • extended to handle clock drift
  • Ostrovsky, Patt-Shamir

25
Gradient Clock Synchronization
  • The clock skew between any pair of nodes should
    be a function of the distance between them
  • Fan, Lynch

c
d
b
clocks of a and d need not be as tightly
synch'ed as those of a and b
a
f
e
26
Gradient Clock Synchronization
  • motivated by problems in sensor networks, or
    more generally, large scale networks, where
    nodes in the same locality need to be more
    tightly synchronized
  • data fusion
  • target tracking

http//www.mikalac.com/mis/missile.html
27
Gradient Clock Synch Lower Bound
  • Closest that two nodes' clocks can get (in worst
    case) is ?(log D / log log D)
  • D is diameter of network ? global influence
  • Algorithms requiring a fixed maximum skew for
    nearby nodes may not scale well
  • E.g., TDMA

http//www.dsna-dti.aviation-civile.gouv.fr/actual
ities /revuesgb/revue64gb/64pgarticle2gb/telecom_c
2gb.html
28
Gradient Clock Synch Lower Bound Assumption 1
  • Nonzero clock drift (hardware) clocks can run
    fast or slow, within known bounds

29
Gradient Clock Synch Lower Bound Assumption 2
  • Algorithm must ensure that (logical) clocks
    always increase at some minimum positive rate ?

logical clock
min slope
clock time
?
real time
30
Gradient Clock Synch LB Simple Case
  • Consider a simple algorithm in which the clock
    value of p1 is periodically propagated down the
    chain
  • Can construct execution in which pn-1's new clock
    value is larger than pn's old clock value by an
    amount depending on D
  • carefully choose message delays
  • manipulate clock drift rates
  • cause nodes to suddenly jump to higher values
    without synchronizing with their neighbors
  • Insight in the paper is generalizing this to any
    algorithm

31
Is the Lower Bound Tight?
  • Recall lower bound is ?(log D / log log D)
  • Several pre-existing algorithms have O(D)
  • Then upper bound improved to O(vD)
  • Locher, Wattenhofer
  • Recently upper bound improved to O(log D)
  • Lenzen, Locher, Wattenhofer
  • Still a small gap can the lower bound be
    improved?

32
How Long Can Large Difference Last?
  • In the simple diffusion algorithm on the chain,
    large difference between pn-1 and pn only lasts
    while message is in transit
  • Perhaps difficulties could be avoided by keeping
    track of generation of clock value and only
    comparing apples with apples (clocks of the same
    generation)?
  • but this could be complicated

33
And Theres a Lot More
  • Lower bounds on space for mutual exclusion
  • Burns, Lynch
  • Lower bound on number of messages for leader
    election in synchronous rings
  • Frederickson, Lynch
  • Impossibility results for data link layer and
    connection management
  • Fekete, Lynch, Mansour, Spinelli Kleinberg,
    Attiya, Lynch
  • Lower bound on time for consensus in partially
    synchronous models
  • Attiya, Dwork, Lynch, Stockmeyer
  • Lower bound on time for synchronous k-set
    agreement
  • Chaudhuri, Herlihy, Lynch, Tuttle
  • Tradeoff between safety and liveness for
    randomized coordinated attack
  • Varghese, Lynch
  • Impossibility of boosting fault tolerance
  • Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum

34
Final Observations
  • Strive to make the results relevant
  • Natural problems
  • Practical architectural assumptions
  • Realistic performance measures (for lower bounds)
  • Crisp arguments (ingenious but clear)
  • Easy to understand and verify
  • Simple to extend and lead to follow-ups

35
Take-Home Message
  • Impossibility results help the development of the
    area
  • Understanding inherent limits guides efforts in
    the appropriate directions
  • And setting boundaries is good for everyone

36
Thanks for your attention
  • Thank you, Nancy!
Write a Comment
User Comments (0)
About PowerShow.com