Failure Recovery

About This Presentation

Title:

Failure Recovery

Description:

Example: what to do at recovery? Redo log (disk): T1,A,16 T1,commit Checkpoint ... Buffer containing X can be flushed to disk either before or after T commits ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 23

Provided by: JeffU4

Learn more at: https://people.engr.tamu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Failure Recovery

1
Failure Recovery

Checkpointing
Undo/Redo Logging

Source slides by Hector Garcia-Molina
2
Recovery is very, very SLOW !

Redo log
First T1 wrote A,B Last
Record Committed a year ago Record
(1 year ago) --gt STILL, Need to redo after crash!!

...
...
...
Crash
3
Solution Checkpoint (simple version)

Periodically
(1) Do not accept new transactions
(2) Wait until all transactions finish
(3) Flush all log records to disk (log)
(4) Flush all buffers to disk (DB) (do not
discard buffers)
(5) Write checkpoint record on disk (log)
(6) Resume transaction processing

4
Example what to do at recovery?

Redo log (disk)

Crash
...
...
...
...
...
...
Start from last checkpoint and move forward in
the log file redoing updates for
committed transactions.
5
Key drawbacks

Undo logging data must be written to disk
immediately after a transaction finishes, which
can increase number of disk I/O's
Redo logging need to keep all modified blocks in
memory until transaction commits and log is
flushed, which can increase the number of buffers
required

6
Solution undo/redo logging!

Update record in the log has the format
ltT, X, new X val, old X valgt

7
Rules

Buffer containing X can be flushed to disk either
before or after T commits
Log record must be flushed to disk before
corresponding updated buffer is (WAL)

8
Recovery with Undo/Redo Logging

Redo all committed transactions in order from
earliest to latest
handles committed transactions with some changes
not yet on disk
Undo all incomplete transactions in order from
latest to earliest
handles uncommitted transactions with some
chnages already on disk

9
Non-quiescent Checkpoint

Simple checkpointing scheme requires system to
"quiesce" (reach a point with no active
transactions), ensured by preventing new
transactions from starting for a while
Avoid this behavior with non-quiescent
checkpointing
write a "start checkpoint" record to the log
later write an "end checkpoint" record to the log
Details vary depending on whether undo, redo, or
undo/redo logging

10
Non-quiescent Checkpoint for Undo/Redo

write "start checkpoint" listing all active
transactions to log
flush log to disk
write to disk all dirty buffers (contain a
changed DB element), whether or not transaction
has committed
this implies some log records may need to be
written to disk (WAL)
write "end checkpoint" to log
flush log to disk

11
Non-quiescent checkpoint for undo/redo

L
O
G
for
undo dirty buffer
pool pages
flushed

start ckpt active T's T1,T2,...
end ckpt
...
...
...
...
12
Recovery process

Backwards pass (end of log ? latest checkpoint
start)
construct set S of committed transactions
undo actions of transactions not in S
Undo pending transactions
follow undo chains for transactions in
(checkpoint active list) - S
Forward pass (latest checkpoint start ? end of
log)
redo actions of S transactions

backward pass
start check- point
forward pass
13
Examples what to do at recovery time?

no T1 commit
L
O
G

T1,- a
...
Ckpt T1
...
Ckpt end
...
T1- b
...
? Undo T1 (undo a,b)
14
Example

...
T1 a
...
...
T1 b
...
...
T1 c
...
T1 cmt
...
ckpt- end
ckpt-s T1
? Redo T1 (redo b,c)
15
Real world actions

E.g., dispense cash at ATM
Ti a1 a2 ... aj ... an

16
Solution

(1) execute real-world actions after commit
(2) try to make idempotent

17
Media failure (loss of non-volatile storage)

A 16
Solution Make copies of data!
18
Example 1 Triple modular redundancy

Keep 3 copies on separate disks
Output(X) --gt three outputs
Input(X) --gt three inputs vote

X3
X1
X2
19
Example 2 Redundant writes, Single reads

Keep N copies on separate disks
Output(X) --gt N outputs
Input(X) --gt Input one copy - if ok, done
- else try another one
? Assumes bad data can be detected

20
Example 3 DB Dump Log
backup database
active database
log

If active database is lost,
restore active database from backup
bring up-to-date using redo entries in log

21
When can log be discarded?
last needed undo
check- point
db dump
log
time
not needed for media recovery
not needed for undo after system failure
not needed for redo after system failure
22
Summary