Failure Recovery - PowerPoint PPT Presentation

About This Presentation
Title:

Failure Recovery

Description:

Example: what to do at recovery? Redo log (disk): T1,A,16 T1,commit Checkpoint ... Buffer containing X can be flushed to disk either before or after T commits ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 23
Provided by: JeffU4
Category:

less

Transcript and Presenter's Notes

Title: Failure Recovery


1
Failure Recovery
  • Checkpointing
  • Undo/Redo Logging

Source slides by Hector Garcia-Molina
2
Recovery is very, very SLOW !
  • Redo log
  • First T1 wrote A,B Last
  • Record Committed a year ago Record
  • (1 year ago) --gt STILL, Need to redo after crash!!

...
...
...
Crash
3
Solution Checkpoint (simple version)
  • Periodically
  • (1) Do not accept new transactions
  • (2) Wait until all transactions finish
  • (3) Flush all log records to disk (log)
  • (4) Flush all buffers to disk (DB) (do not
    discard buffers)
  • (5) Write checkpoint record on disk (log)
  • (6) Resume transaction processing

4
Example what to do at recovery?
  • Redo log (disk)

Crash
...
...
...
...
...
...
Start from last checkpoint and move forward in
the log file redoing updates for
committed transactions.
5
Key drawbacks
  • Undo logging data must be written to disk
    immediately after a transaction finishes, which
    can increase number of disk I/O's
  • Redo logging need to keep all modified blocks in
    memory until transaction commits and log is
    flushed, which can increase the number of buffers
    required

6
Solution undo/redo logging!
  • Update record in the log has the format
  • ltT, X, new X val, old X valgt

7
Rules
  • Buffer containing X can be flushed to disk either
    before or after T commits
  • Log record must be flushed to disk before
    corresponding updated buffer is (WAL)

8
Recovery with Undo/Redo Logging
  • Redo all committed transactions in order from
    earliest to latest
  • handles committed transactions with some changes
    not yet on disk
  • Undo all incomplete transactions in order from
    latest to earliest
  • handles uncommitted transactions with some
    chnages already on disk

9
Non-quiescent Checkpoint
  • Simple checkpointing scheme requires system to
    "quiesce" (reach a point with no active
    transactions), ensured by preventing new
    transactions from starting for a while
  • Avoid this behavior with non-quiescent
    checkpointing
  • write a "start checkpoint" record to the log
  • later write an "end checkpoint" record to the log
  • Details vary depending on whether undo, redo, or
    undo/redo logging

10
Non-quiescent Checkpoint for Undo/Redo
  • write "start checkpoint" listing all active
    transactions to log
  • flush log to disk
  • write to disk all dirty buffers (contain a
    changed DB element), whether or not transaction
    has committed
  • this implies some log records may need to be
    written to disk (WAL)
  • write "end checkpoint" to log
  • flush log to disk

11
Non-quiescent checkpoint for undo/redo
  • L
  • O
  • G
  • for
  • undo dirty buffer
  • pool pages
  • flushed

start ckpt active T's T1,T2,...
end ckpt
...
...
...
...
12
Recovery process
  • Backwards pass (end of log ? latest checkpoint
    start)
  • construct set S of committed transactions
  • undo actions of transactions not in S
  • Undo pending transactions
  • follow undo chains for transactions in
    (checkpoint active list) - S
  • Forward pass (latest checkpoint start ? end of
    log)
  • redo actions of S transactions

backward pass
start check- point
forward pass
13
Examples what to do at recovery time?
  • no T1 commit
  • L
  • O
  • G

T1,- a
...
Ckpt T1
...
Ckpt end
...
T1- b
...
? Undo T1 (undo a,b)
14
Example
  • L
  • O
  • G

...
T1 a
...
...
T1 b
...
...
T1 c
...
T1 cmt
...
ckpt- end
ckpt-s T1
? Redo T1 (redo b,c)
15
Real world actions
  • E.g., dispense cash at ATM
  • Ti a1 a2 ... aj ... an


16
Solution
  • (1) execute real-world actions after commit
  • (2) try to make idempotent

17
Media failure (loss of non-volatile storage)

A 16
Solution Make copies of data!
18
Example 1 Triple modular redundancy
  • Keep 3 copies on separate disks
  • Output(X) --gt three outputs
  • Input(X) --gt three inputs vote

X3
X1
X2
19
Example 2 Redundant writes, Single reads
  • Keep N copies on separate disks
  • Output(X) --gt N outputs
  • Input(X) --gt Input one copy - if ok, done
  • - else try another one
  • ? Assumes bad data can be detected

20
Example 3 DB Dump Log
backup database
active database
log
  • If active database is lost,
  • restore active database from backup
  • bring up-to-date using redo entries in log

21
When can log be discarded?
last needed undo
check- point
db dump
log
time
not needed for media recovery
not needed for undo after system failure
not needed for redo after system failure
22
Summary
  • Consistency of data
  • One source of problems failures
  • - Logging
  • - Redundancy
  • Another source of problems Data
    Sharing..... next
Write a Comment
User Comments (0)
About PowerShow.com