Why panic()? Improving Reliability through Restartable File Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Why panic()? Improving Reliability through Restartable File Systems

Description:

Title: Why panic()? Improving Reliability through Restartable File Systems Author: Harini Last modified by: Swaminathan Sundararaman Created Date – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 29

Provided by: Har127

Learn more at: https://research.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Why panic()? Improving Reliability through Restartable File Systems

1
Why panic()? Improving Reliability through
Restartable File Systems

Swaminathan Sundararaman, Sriram Subramanian,
Abhishek Rajimwale, Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau, Michael M. Swift

2
Data Availability

Applications require data
Use FS to reliably store data
Both hardware and software can fail
Typical Solution
Large clusters for availability
Reliability through replication

Slave Nodes
GFS Master
GFS Master
Slave Nodes
3
User Desktop Environment

Replication infeasible for desktop environments
Wouldnt RAID work?
Can only tolerate H/W failures
FS crash are more severe
Services/applications are killed
Requiring OS reboot and recovery
Need better reliability in the
event of file system failures

App
App
App
OS
FS
Disk
Raid Controller
Disks
4
Outline

Motivation
Background
Restartable file systems
Advantages and limitations
Conclusions

5
Failure Handling in File Systems

Exception paths not tested thoroughly
Exceptions failed I/O, bad arguments, null
pointer
On errors call panic,BUG,BUG_ON
After failure data becomes inaccessible
Reason for no recovery code
Hard to apply corrective measures
Not straightforward to add recovery

6
Realworld Example Linux 2.6.15
ReiserFS
int journal_mark_dirty(.)    struct
reiserfs_journal_cnode cn NULL    if (!cn)
        cn get_cnode(p_s_sb)        if
(!cn)             reiserfs_panic(p_s_sb,
"get_cnode failed!\n")
File systems already detect failures
void reiserfs_panic(struct super_block sb,
...) BUG()   / this is not actually
called, but makes reiserfs_panic() "noreturn"
/    panic("REISERFS panic s\n,
error_buf)
Recovery simplified by generic recovery mechanism
7
Possible Solutions

Code to recover from all failures
Not feasible in reality
Restart on failure
Previous work have taken
this approach
FS need stateful lightweight
recovery

Heavyweight
Lightweight
CuriOS EROS
Stateful
Nooks/Shadow Xen, Minix L4, Nexus
SafeDrive Singularity
Stateless
8
Restartable File Systems

Goal build lightweight stateful solution to
tolerate file-system failures
Solution single generic recovery mechanism
for any file system failure
Detect failures through assertions
Cleanup resources used by file system
Restore file-system state before crash
Continue to service new file system requests

FS Failures completely transparent to
applications
9
Challenges

Transparency
Multiple applications using FS upon crash
Intertwined execution
Fault-tolerance
Handle a gamut of failures
Transform to fail-stop failures
Consistency
OS and FS could be left in an inconsistent state

10
Guarantying FS Consistency

FS consistency required to prevent data loss

Not all FS support crash-consistency
FS state constantly modified by applications
Periodically checkpoint FS state
Mark dirty blocks as Copy-On-Write
Ensure each checkpoint is atomically written
On Crash revert back to the last checkpoint

11
Overview of Our Approach
Open (file)
write()
read()
write()
write()
Close()
Application
VFS
checkpoint
File System
Epoch 0
Epoch 1
time
Completed
In-progress
Legend
Crash
12
Checkpoint Mechanism

File systems constantly modified
Hard to identify a consistent recovery point
Naïve Solution Prevent any new FS operation and
call sync
Inefficient and unacceptable overhead

13
Key Insight
App
App
App
All requests go through the VFS layer
VFS
File System
ext3
VFAT
Control requests to FS and dirty pages to disk
Page Cache
File Systems write to disk through Page Cache
Disk
14
Generic COW based Checkpoint
App
App
App
VFS
VFS
VFS
File System
File System
File System
Page Cache
Page Cache
Page Cache
Disk
Disk
Disk
At Checkpoint
After Checkpoint
Regular
Membrane
15
Interaction with Modern FSes

Have built-in crash consistency mechanism
Journaling or Snapshotting
Seamlessly integrate with these mechanism
Need FSes to indicate beginning and end of an
transaction
Works for data and ordered journaling mode
Need to combine writeback mode with COW

16
Light-weight Logging

Log operations at the VFS level
Need not modify existing file systems
Operations open, close, read, write, symlink,
unlink, seek, etc.
Read
Logs are thrown away after each checkpoint
What about logging writes?

17
Page Stealing Mechanism

Mainly used for replaying writes
Goal Reduce the overhead of logging writes
Soln Grab data from page cache during recovery

Write (fd, buf, offset, count)
VFS
VFS
VFS
File System
File System
File System
Page Cache
Page Cache
Page Cache
Before Crash
During Recovery
After Recovery
18
Handling Non-Determinism
19
Skip/Trust Unwind Protocol
20
Evaluation

Setup

21
OpenSSH Benchmark
22
Postmark Benchmark
23
Recovery Time

Restart ext2 during random-read micro benchmark

24
Recovery Time (Cont.)
Data (Mb) Recovery Time (ms)
10 12.9
20 13.2
40 16.1
Open Sessions Recovery Time (ms)
200 11.4
400 14.6
800 22.0
Log Records Recovery Time (ms)
200 11.4
400 14.6
800 22.0
25
Advantages

Improves tolerance to file system failures
Build trust in new file systems (e.g., ext4,
btrfs)
Quick-fix bug patching
Developer transform corruptions to restart
Restart instead of extensive code restructuring
Encourage more integrity checks in FS code
Assertions could be seamlessly transformed to
restart
File systems more robust to failures/crashes

26
Limitations

Only tolerate fail-stop failures
Not address-space based
Faults could corrupt other kernel components
FS restart may be visible to application
e.g., Inode numbers could be changed after
restart

Inode Mismatch
File1 inode 15
File1 inode 12
create (file1)
stat (file1)
write (file1, 4k)
create (file1)
Application
stat (file1)
write (file1, 4k)
VFS
File System
File file1 Inode 15
File file1 Inode 12
Epoch 0
Epoch 0
After Crash Recovery
Before Crash
27
Conclusions