Availability Study of Dynamic Voting Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Availability Study of Dynamic Voting Algorithms

Description:

Title: Paradigms for distributed systems: GCS and QoS preserving totally ordered multicast Author: Idit Keidar Last modified by: KEIDAR-MX Created Date – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 13
Provided by: Idi58
Category:

less

Transcript and Presenter's Notes

Title: Availability Study of Dynamic Voting Algorithms


1
Availability Study of Dynamic Voting Algorithms
  • Kyle Ingols and Idit Keidar
  • MIT Lab for Computer Science

2
Primary Component
  • For fault-tolerance with partitions
  • Network can partition to several components
  • One component is primary
  • Only members of primary component can modify
    shared data
  • Usually, primary contains majority (quorum)

3
Dynamic Voting
  • Defines quorums adaptively
  • New primary component contains majority of
    previous one
  • Example
  • 1, 2, 3, 4, 5, 6, 7, 8, 9
  • 1, 2, 3, 4, 5
  • 2, 3, 4
  • 3, 4, 6, 10, 11

4
Dynamic Voting Benefits
  • Dynamic universe
  • New processes can join at any time
  • Processes that leave likely to not return
  • Higher availability
  • Repeated failures may reduce chance of connected
    majority

5
Previous Availability Studies
  • Stochastic analysis, stochastic Petri nets,
    simulations, empirical measurements,
  • Assume static universe
  • Show that dynamic voting leads to primary
    component being formed most often

6
Previous Studies Overlooked.
  • Transition to new primary cannot be atomic in
    a distributed system

7
Bug Example
  • Initially, all connected 1, 2, 3, 4, 5
  • 1, 2 suspect 4, 5, move to 1, 2, 3
  • 3 does not move with them, detaches
  • 1, 2 suspect 3, move to 1, 2
  • At the same time, 3, 4, 5 move to 3, 4, 5
  • Two primaries!

8
If failure Occurs in the Middle...
  • Some suggested algorithms are wrong
  • See example on previous slide
  • Correct algorithms differ in
  • 1. How fast they recover
  • 2. How many processes need to reconnect to allow
    recovery
  • Previous studies overlooked differences

9
The (Correct) Algorithms
  • 1-Pending
  • Use 2-phase-commit
  • Single point of failure
  • Majority-Resilient 1-Pending (MR1P)
  • Use 3-phase-commit
  • Recovery takes 2 communication steps
  • Then, algorithm for new primary takes 2 more
  • YKD - pipeline recovery and new algorithm

10
Our Study
  • Simulations
  • No failures, only partitions
  • Same probability to being faulty as detached
  • Advantage to 2-phase commit
  • Multiple frequent connectivity changes
  • Inject fault with fixed probability in each step
  • Then, stable period - see if primary exists

11
Observations
  • Algorithms differ greatly in availability
  • Especially in their degradation as changes become
    more frequent
  • 2-phase commit suffers due to single point of
    failure
  • 3-phase commit suffers because its more likely
    to be interrupted
  • YKD - no degradation in lengthy executions

12
Conclusion
  • Analysis of any kind may fail to consider
    important cases...
  • General lessons
  • Minimize number of processes needed for recovery
  • Speed up recovery by pipelining / parallelizing
    multiple instances
Write a Comment
User Comments (0)
About PowerShow.com