Title: Chapter 7: Distributed Recovery
1Chapter 7 Distributed Recovery
2Distributed Recovery
- Introduction
- Recovering from Aborted Transactions
- System Failure
- Media Failures
- Practical Advice
- Summary
3Why Do We Need Recovery?
- Operating system fails
- Transaction aborts
- Media fails
4Recovery Information
- Recovery information is stored to help recover
from aborts. - Before images are values of data before an
update. - After images are modified data values.
- Archive is a complete copy of the database at a
point in time.
5Transaction Execution Steps
- Input transaction
- Log transaction
- Fetch DB record(s)
- Log before image(s)
- Compute new record value(s)
- Log after image(s)
- Log commitment
- Write new DB record value(s)
More
6Transaction Execution Steps, cont.
Input Transaction
OutputMessage
1
8
2
4
Process
5
6
7
ObtainRecord
Write UpdatedRecord
3
8
7Distributed Recovery
- Introduction
- Recovering from Aborted Transactions
- System Failure
- Media Failures
- Practical Advice
- Summary
8Recovering From Aborted Transactions
- Two strategies
- Incremental log with deferred updates
- Incremental log with immediate updates
9Incremental Log With Deferred Updates
- Defer writes until it is assured that all writes
can be completed successfully. - Log structure ltA, Startgt ltA, item X1, after
value X1gt ltA, item X2, after value
X2gt . . . ltA, item Xn, after value
Xngt ltA, Commitgt - Recovery Operations
- If commit, then merge log into the database.
- If abort, then do nothing.
10Incremental Log With Deferred Updates Example
Transaction
Log
after image A 1 null A 5 Lewis A 8 Jones
Begin transaction A Delete Employee where
ENumber1 insert into Employee (ENumber,
Name) lt5, "Lewis " gt update Employee set
Name "Jones" where ENumber8 commit end
transaction A
commit
(process deferredupdates)
abort
(do nothing)
11Incremental Log With Immediate Updates
- Write updates to the DB and maintain a log of
before-and-after values for all updated items - Log structure ltA, Startgt ltA, X1, before value
X1, after value X1gt ltA, X2, before value X2,
after value X2gt . . . ltA, Xn, before
value Xn, after value Xngt ltA, Commitgt
12Incremental Log With Immediate Updates Example
Transaction
Log
before image after image A 1 Ackman null A nu
ll 5 Lewis A 8 Smith 8 Jones
Begin transaction A Delete Employee where
ENumber1 insert into Employee (ENumber,
Name) lt5, "Lewis " gt update Employee set
Name "Jones" where ENumber8 commit end
transaction A
abort
(process inreverse order)
(save log)
13Recovery Exercise 1
Transaction
Log
Database 1 Ackman 8000 2 Brown
7000 6 Carson 6500 8 Smith 8500 9 Wong
7500
Begin transaction B insert into employee
values (15, Taylor, 7500) update
employee set salarysalary1.1 where
salarygt7000 delete from employee where
salarylt7000 commit end transaction B
- Using incremental log with deferred updates,
show - Contents of the log
- What to do for commit
- What to do for abort
14Recovery Exercise 2
Transaction
Log
Database 1 Ackman 8000 2 Brown
7000 6 Carson 6500 8 Smith 8500 9 Wong
7500
Begin transaction B insert into employee
values (15, Taylor, 7500) update
employee set salarysalary1.1 where
salarygt7000 delete from employee where
salarylt7000 commit end transaction B
- Using incremental log with deferred updates,
show - Contents of the log
- What to do for commit
- What to do for abort
15Distributed Recovery
- Introduction
- Recovering from Aborted Transactions
- System Failure
- Media Failures
- Practical Advice
- Summary
16What is a System Failure?
- Contents of main storage and I/O buffers are
lost. - Database is safe.
- Transactions in progress must be aborted.
- Recovery approaches
- Search entire log
- Use a quiet point
- Use a checkpoint
17Recovery from System Failure by Searching the
Entire Log
- Undo empty
- Search the entire log from the beginning.
- For each BEGIN TRANSACTION, place transaction
I.D. on the UNDO list. - For each COMMIT TRANSACTION, remove the
transaction I.D. from the UNDO list. - Rollback transactions on the UNDO list and
restart them. - What is wrong with this approach?
18Definition Quiet Point
- Quiet point
- Accept no new transactions until all current
transactions have committed. - Write the quiet point to the log.
- Pointer to the quiet point is recorded in the
restart file.
19Recovery from System Failure Using a Quiet Point
- Undo empty
- Search log beginning with the most recent quiet
point. - For each BEGIN TRANSACTION, place the transaction
I.D. on the UNDO list. - For each COMMIT TRANSACTION, remove the
transaction I.D. from the UNDO list. - Rollback transactions on the UNDO list and
restart them. - What is wrong with this approach?
20Making a Checkpoint
- Force log info to log
- Create CHECKPOINT entry on the log
- CHECKPOINT entry contains I.D.s of all active
transactions - Pointer to the checkpoint is recorded in the
restart file
21Recovery from a System FailureUsing a Checkpoint
- UNDO transaction I.D.s from most recent
CHECKPOINT - Search the log beginning with the most recent
checkpoint. - For each BEGIN TRANSACTION, place the transaction
I.D. on the UNDO list. - For each COMMIT TRANSACTION, remove the I.D. from
the UNDO list. - Rollback transactions on the UNDO list and
restart them. - Something is still not right.
- Due to system delays, updated values of committed
transactions might not be written to the database
before the system crashes.
22Recovery from a System Failure Using a Checkpoint
(Revised)
- Undo transaction I.D.s in the most recent
checkpoint entry - Redo empty
- Search the log beginning with the most recent
checkpoint record. - For each BEGIN TRANSACTION, place the transaction
I.D. on the UNDO list. - For each COMMIT, move the transaction I.D. from
the UNDO list to the REDO list. - For each transaction on the UNDO list, rollback.
- For each transaction on the REDO list, force log
info to the database.
23Incremental Logs With Immediate Updates Example
T1 T2 T3 T4 T5 Checkpoint System Crash
time
- Log ltT1, Startgt ltT2, Startgt ltT3, Startgt ltT1,
Commitgt ltCheckpointT2,T3gt ltT2, Commitgt ltT4,
Startgt ltT5, Startgt ltT4, Commitgt System Crash - Recovery Redo T2 and T4 Undo T3 and T5
24Recovery in a Distributed DBMS
- Use a global checkpoint
- Set of local checkpoints performed at all sites
- If a subtransaction of Transaction A is contained
in a local checkpoint, then all other
subtransactions of Transaction A are included in
some local checkpoint. - Recovery
- Determine the most recent local checkpoint at the
failed site - Force all sites to recover from the same
checkpoint
25Recovery Exercise
A B C D EFG H Checkpoint System Crash
time
- Log ltA, startgt ltB, startgt ltE, commitgt
ltA, commitgt ltF, startgt ltC, startgt ltG,
startgt ltC, commitgt ltB, commitgt ltD,
startgt ltH, startgt ltE, startgt ltF,
commitgt ltCheckpoint, B, D, Egt ltH, commitgt - Which transactions should be redone?
- Which transactions should be undone?
26Distributed Recovery
- Introduction
- Recovering from Aborted Transactions
- System Failure
- Media Failures
- Practical Advice
- Summary
27Media Failures Secondary Memory Is Lost
- Restore the database from an archive.
- Using log, redo transactions run since the
archive was recorded.
28Media Failures Secondary Memory and Log Are Lost
- Restore the database to the most recent archive.
- Apply the portion of the log that is undamaged.
- Look for new job!
29Distributed Recovery
- Introduction
- Recovering from Aborted Transactions
- System Failure
- Media Failures
- Practical Advice
- Summary
30Some Transactions Cannot Be Rolled Back and
Restarted
- Withdraw funds from a bank
- Print a paycheck
- Fill and ship an order
- Etc.
31Recovery from Deviant Transactions
- Obvious approach
- Undo transactions back to the deviant transaction
- Undo the deviant transaction
- Force log info after the deviant transaction into
the database - Will not always work
- A transaction executed after the deviant
transaction may have used data written by the
deviant transaction. - Hard-luck approach
- Carefully examine database
- Correct errors caused by the deviant transaction
- Correct errors propagated by other transactions
- Correct errors propagated by other transactions
- Correct errors ASAP to avoid further database
contamination
32Distributed Recovery
- Introduction
- Recovering from Aborted Transactions
- System Failure
- Media Failures
- Practical Advice
- Summary
33Summary
- Most systems use incremental logs with immediate
updates for transaction recovery. - Most systems use checkpoints for system recovery.
- Most systems use archives and transaction logs
for media recovery.