Title: SQCK: A Declarative File System Checker
1SQCK A Declarative File System Checker
- Haryadi S. Gunawi, Abhishek Rajimwale,
- Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
- University of Wisconsin Madison
- OSDI 08 December 9th, 2008
2Corrupt file systems
- File systems
- Store massive amounts of data
- Must be reliable
- Corrupted file system images
- Due to hardware errors, file system bugs, etc.
- Need to be repaired a.s.a.p.
3Who should repair?
- Does journaling (write-ahead log) help?
- No, only for crashes
- Does file system repair itself online?
- No, not enough machinery
- Fsck the last line of defense
- Its a must have utility
- XFS no need fsck ever, but deploys fsck at the
end - Must be fully reliable
4But fsck is complex
- Fsck has a big task
- Turn any corrupt image to a consistent image
- E.g. check if a data block is shared by two
inodes - How are they implemented?
- Written in C ? hard to reason about
- Large and complex
- Ext2 fsck 150 checks in 16 KLOC
- XFS fsck 340 checks in 22 KLOC
- Hundreds of cluttered if-check statements
- Bottom line fsck code is untouchable
5Two Questions
- Are current checkers really reliable?
- If not, how should we build robust checkers?
6 e2fsck is unreliable
- Analyze e2fsck (ext2 file system checker)
- Findings
- Inconsistent repair
- The file system becomes unreadable
- Consistent but not correct
- Fsck deletes valid directory entries
- Fsck loses a huge number of files
7SQCK
- Lesson Complexity is the enemy of reliability
- Big task bad design ? complexity ?
unreliability - Need a higher-level approach for simplicity
- SQCK (SQL-based Fsck)
- Use a declarative query language to write checks
- Put simply write fewer lines of code
- Evaluation
- Simple and reliable e2fsck in 150 queries (vs.
16 KLOC of C) - More Great flexibility and reasonable performance
8Outline
- Introduction
- Analysis of e2fsck
- SQCK Design
- SQCK Evaluation
- Conclusion
9Methodology
- E2fsck task cross-check all ext2 metadata
- An indirect pointer should not point to the
superblock - A subdir should only be accessible from one
directory - Inject single corruption
- Observe how e2fsck repairs a single corruption
- Only corrupt on-disk pointers
- Corrupt an indirect pointer to point to the
superblock - Corrupt a directory entry to point to another
directory - Usually, a corrupt pointer is simply cleared to
zero
10Inconsistent (Out-of-order) Repair
- Check bad indirect pointer
2. Check indirect content
Inode ind
Inode ind
Superblock
0
Ideal fsck
e2fsck
2. Check indirect content
- Check bad indirect pointer
Inode ind
Inode ind
Superblock
11Consistent but Incorrect Repair (1)
/
/
/
a1
b1
a1
b1
a1
b1
LF
X
a2
b2
a2
b2
a2
b2
Ideal fsck
Kidnapping problem!
e2fsck
/
E2fsck does not use all available information
a1
b1
X
b2
12Result Summary
- Four problems
- Inconsistent
- Information-incomplete
- Policy-inconsistent
- Insecure
- E2fsck does not handle all corruptions
- Warning Programming bug in e2fsck! Or some
bonehead (you) is checking a mounted (live)
filesystem. - Not simple implementation bugs
- Difficult to combine available information
- Difficult to ensure correct ordering
13Outline
- Introduction
- Analysis
- SQCK Design
- SQCK Evaluation
- Conclusion
14Fsck Properties
- Hundreds of checks
- Complex cross-checks
- Taxonomy of checks in e2fsck
- Must be ordered correctly
15A Declarative Approach
- Lesson Complexity is the enemy of reliability
- SQCK
- Use a declarative query language (e.g. SQL), why?
- It is declarative high-level intent is clear
- Fit for cross-checking massive information
- Goals achieved
- Simple e2fsck in 150 queries (vs. 16 KLOC of C)
- Reliable Each check/query is easy to understand
- Flexible Plug in/out different queries
16Using SQCK
- Take a fs image
- Load metadata to db tables
- Temporary tables
- Ex InodeTable, GroupDescTable, DirEntryTable
- Run checks and repairs (in the form of queries)
- Flush any modification, and delete tables
Database tables
Scanner Loader
Checks Repairs
Flush
File system image
17Declarative check (example 1)
- Cross-checking a single instance of a structure
- Find block bitmap that is not located within its
block group
first_block sb-gts_first_data_block last_block
first_block blocks_per_group fo
r (i 0, gdfs-gtgroup_desc i lt
fs-gtgroup_desc_count i, gd) \ if (i
fs-gtgroup_desc_count - 1) last_block
sb-gts_blocks_count if ((gd-gtbg_blk_bmap lt
first_block) (gd-gtbg_blk_bmap gt
last_block)) px.blk gd-gtbg_block_bitmap
if (fix_problem(BB_NOT_GROUP, ...))
gd-gtbg_block_bitmap 0 ...
SELECT FROM GroupDescTable G WHERE
G.blockBitmap NOT BETWEEN G.start AND
G.end
18Declarative check (example 2)
- Cross-checking multiple instances of the same
structure - Find false parents (i.e. directory entries that
point to a subdirectory that already belongs to
another directory) - Must read all directory entries in dir data
blocks - Wrong implementation in e2fsck (the kidnapping
problem)
19Declarative check (example 2)
if ((dot_state gt 1) (ext2fs_test_inode_bitm
ap (ctx-gtinode_dir_map,
dirent-gtinode))) // ext2fs_get_dir_info //
is 20 lines long subdir e2fsck_get_dir_info
(dirent-gtinode) ... if
(subdir-gtparent) if (fix_problem(LINK_DIR,..
)) dirent-gtinode 0 goto next
else subdir-gtparent ino
20Declarative check (example 2)
SELECT F. // ? returns the //
false parent(s) FROM DirEntryTable P, C,
F WHERE // P says C is its child
P.entry_num gt 3 AND P.entry_ino
C.ino AND // and C says P is his
parent C.entry_num 2 AND
C.entry_ino P.ino AND // F also says
C is its child F.entry_num gt 3 AND
F.entry_ino C.ino AND F.ino ltgt
P.ino AND
F
P
C
21Declarative Repairs
- Running declarative checks is part of the problem
- Must also perform the declarative repairs
- A repair An update query
- Some repairs simply update a few fields
... SET T.field newValue, T.dirty
1
- A repair A series of queries
- Ex Reconnect an orphan directory to the
lostfound directory - Combine a series of queries with C code
- All repairs are written in SQL
- C code is only used for connecting them
22Outline
- Introduction
- Analysis
- SQCK Design
- SQCK Evaluation
- Conclusion
23SQCK Evaluation
- Complexity
- 150 queries in 1100 lines of SQL statements
- (compared to 16,000 lines of C in e2fsck)
- Reliability
- Pass hundreds of corruption scenarios
- Flexibility
- Add new checks/repairs
- Enable different versions of e2fsck
- Performance
- Introduce some optimizations
24SQCK vs. e2fsck
- Reasonable
- First generation of SQCK (with MySQL)
- Within 1.5x of e2fsck
- Future optimizations
- Hierarchical checks
- Concurrent queries
25Conclusion
- Complexity is the enemy of reliability
- Recovery code is complex
- SQCK Build recovery tools with a higher-level
approach
26Thank you!Questions?
- ADvanced Systems Laboratory www.cs.wisc.edu/adsl