Recovery Management in Quicksilver - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Recovery Management in Quicksilver

Description:

Recovery Management in Quicksilver. Haskin, Malachi, Sawdon, Chan. IBM Almaden ... How to deal with more complicated failure modes? Provide atomic transactions ... – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 18
Provided by: syst118
Category:

less

Transcript and Presenter's Notes

Title: Recovery Management in Quicksilver


1
Recovery Management in Quicksilver
  • Haskin, Malachi, Sawdon, Chan
  • IBM Almaden
  • ACM TOCS (61) February 1988

2
Introduction
  • Distributed, extensible system
  • Partition computation and data
  • lean kernel
  • System services are processes
  • Message-oriented IPC
  • How to deal with more complicated failure modes?
  • Provide atomic transactions as system service

3
Recovery Techniques
  • timeouts
  • how to distinguish slow from dead?
  • connectionless protocols / stateless servers
  • some actions cant be made idempotent
  • retries can cause problems

4
Recovery Techniques
  • virtual circuits
  • cant handle multiple servers
  • replication
  • too expensive for some uses
  • how to detect failures?

5
Transactions
  • Basic idea use transactions as a single,
    system-wide recovery paradigm
  • Transactions are heavyweight
  • Not every server needs them
  • Different server classes
  • Volatile (window mgr)
  • Replicated volatile (name server, uses TXN for
    commit atomicity)
  • Recoverable (file server)
  • Long running transactions need log support

6
Structure of Transactions
  • Everything belongs to a transaction
  • Default transaction ID for processes
  • Globally unique transaction identifiers
  • Each transaction has an owner and multiple
    participants
  • Owner can commit or abort
  • Participants can only abort

7
Recovery Manager
  • One transaction-based recovery manager per host
  • Three components
  • Transaction Manager
  • Log Manager
  • Deadlock Detector

8
Transaction Manager
  • Tracks transactions for processes on host
  • Manages distributed commit protocol
  • Distributed transaction is a tree
  • Only need to know your superior and your
    immediate subordinates
  • Failure vs. Termination
  • Termination causes commit/abort to proceed
    immediately
  • Failure is remembered and transaction aborted
    when it finally terminates

9
Transaction Manager
  • Participants can say whether their failure causes
    transaction failure or termination
  • Subordinates can reclaim resources early
  • Several alternative commit protocols available to
    servers
  • 1-phase used by volatile servers
  • 2-phase used by recoverable servers

10
2-phase Commit
  • Different voting options
  • abort undo my action, announce abort to others
    in 2nd phase
  • commit-read-only no recoverable resources
    modified, dont include me in 2nd phase
  • commit-volatile same as read-only, but notify
    me of results of 2nd phase
  • commit-recoverable recoverable state modified,
    notify me of results of 2nd phase

11
Commit Processing
  • Special rules to handle special cases
  • Commit before participate (late joining)
  • Cycles in transaction graph
  • New requests after being prepared to commit
  • Rules
  • TM must accept new participants and let them vote
    until commit
  • All requests that could force an abort must
    complete before commit
  • 1-phase-commit servers cannot commit before
    making requests that might force an abort

12
Commit Processing
  • Transaction coordinator at transaction birth-site
  • Usually a user workstation, likely to fail
  • Migrate or replicate coordinator for reliability

13
Log Manager
  • Log manager provides optional services
  • Backpointers for log replay
  • Block I/O access
  • Log replication
  • Log archival
  • Servers tell LM what they need
  • Not penalized for services they dont use
  • LM does not interpret data servers determine
    recovery strategy

14
Deadlock Detector
  • Distributed deadlock detection is hard!
  • So, they didnt do it.

15
Criticisms
  • ???

16
Criticisms
  • IPC is responsible for a lot
  • Guaranteed delivery
  • Message ordering
  • Security constraints
  • Keeping transaction graphs together
  • For a system that claims to not make you pay for
    services you dont use.

17
Why Do We Care?
  • Transactions as a core OS mechanism
  • Mechanism, not policy
  • Customize effort to need
  • Optional cost for optional services
Write a Comment
User Comments (0)
About PowerShow.com