IRON File Systems - PowerPoint PPT Presentation

About This Presentation
Title:

IRON File Systems

Description:

Home use: Photos, movies, tax returns, ... Cluster use too: GoogleFS built on local FS's ... Home use: Photos, tax returns, home movies. Servers: Network file ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 59
Provided by: remziarpa
Category:
Tags: iron | file | systems

less

Transcript and Presenter's Notes

Title: IRON File Systems


1
IRON File Systems
  • Remzi Arpaci-Dusseau
  • University of Wisconsin, Madison

2
Understanding How ThingsFail Is Important
3
How Disks Fail
4
Classic Failure Model Fail Stop
  • As defined Schneider 90
  • Stop Upon failure, halt
  • Make known But first, switch to state s.t.
    other components can detect that you have
    failed
  • Very simple model of disk failure
  • Used by all early file and storage systems(once
    controllers could detect failure)
  • But is it realistic?

5
AssertionModern Disks Are Not Whole-Disk Fail
Stop
6
Real Failures
  • Latent sector errors Kari 93, Bairavasundaram
    07
  • Block or blocks becomes inaccessible
  • Data corruption Weinberg 04, Greene 05,
    Bairavasundaram 08
  • Controller bugs, not bit rot
  • Transient errors too Talagala 99
  • Bus stuttering, etc.
  • Result Partial failures are a reality

7
So What Should We Do?
8
High-end Systems Extra Measures
  • Disk Scrubbing Kari 93
  • Proactively scan drives in search of latent
    errors
  • When detected, correct from redundant copyon
    another disk
  • Extra redundancy Corbett 04
  • RAID system with two parity disks
  • Checksums Bartlett 04, Weinberg 04
  • Extra computation over data
  • Guard against corruption

9
But What About Desktop File Systems?
10
Desktop FSs Lost In The Past?
  • Desktop file systems are important
  • Home use Photos, movies, tax returns, ...
  • Cluster use too GoogleFS built on local FSs
  • Performance policies are well known
  • e.g., FFS placement policy
  • But what is their fault-handling policy?
  • Do they handle partial disk failures?
  • How can we tell?

11
Two Questions
12
Questions I Will Answer
  • Question 1 How do local file systems reactto
    the more realistic set of disk failures?
  • Question 2 How can we change file systemsto
    better handle these types of faults?

13
How Disks Fail The Details
14
The Storage Stack
Host
  • Not just file system on top of the disk
  • Many layers
  • Lots of software
  • Even within disk!
  • Failures occur at all levels

Disk
15
Latent Sector Errors
  • Disks experience partial failures
  • a small portion of data on disk
    becomestemporarily or permanently
    unavailableCorbett 04
  • Root causes
  • Surface is scratched, inaccurate arm movement,
    interconnect problems
  • Bottom line A single read or write can fail

16
Data Corruption
  • Suns ZFS Weinberg 04
  • Misdirected writes Right data, wrong location
  • Phantom/Lost writes Yes I wrote the data!
    (but didnt)
  • EIDE Interface on motherboards Greene 05
  • Read reported as done when not (race)
  • Similar problem at Google Ghemewat 03
  • Network Appliance Lewis 99
  • Disk occasionally returns byte-shifted data

17
Transient Errors
  • 18-month study of large disk farm Talagala 99
  • Most machines had SCSI timeout errors(loose
    cables, bad cables?)
  • SCSI parity errors were common too(data
    corrupted when moving across the bus)
  • Failures can be transient too
  • Might work if just retried

18
Even Worse With ATA (Not SCSI)
  • ATA drives Less reliable Anderson 03, Hughes
    Murray 05
  • Few are returned for failure analysis
  • Some are partially flaw marked during testing
  • Test conditions not as harsh (power, temp.)
  • High-end reliability features missing(filters
    remove particles, chemicals humidity)
  • Cheap disks -gt less testing -gt less reliability
  • But cost drives many purchasing decisions

19
Trend More Problems, Not Less
  • Denser drives Capacity sells drives
  • More logic -gt more complexity
  • More complexity -gt more bugs
  • Cost per byte dominates Pennies matter
  • Manufacturers will cut corners
  • Reliability features are the first to go
  • Increasing amount of software
  • 400K lines of code in modern Seagate drive
  • Hard to write, hard to debug

20
The Fail-Partial Failure Model
21
The Fail-Partial Failure Model
  • Disk failure
  • Entire disk may fail
  • Block failure
  • Part of disk may fail
  • Block corruption
  • Part of disk may get corrupted
  • All can be either transient or sticky

22
Important Parameters
  • Locality
  • Are partial faults independent of each other?
  • Frequency
  • How often do partial faults occur?

23
Frequency of Failures
  • Study of Latent Sector Errors Bairavasundaram et
    al. 07
  • 1.53 millions disks, 3 years of data
  • ATA 8.5 - SCSI 1.9
  • Latent sector errors are not independent
  • Spatial locality exists, disk capacity matters
  • Study of Block Corruption Bairavasundaram et al.
    08
  • Same data set
  • ATA 0.6 - SCSI 0.06
  • Corruptions within disk are not independent
  • Spatial locality exists
  • The bad block number problem

24
How Do File Systems ReactTo Partial Failures?
25
How To Detect Handle Failures?
  • Need Classification of techniques
  • Detection Discovering a failure took place
  • Recovery Recovering from the failure
  • Detection Recovery IRON
  • File systems with Internal RObustNess
  • IRON Taxonomy Classify techniques

26
IRON Detection Taxonomy
  • How to detect block failure or corruption?
  • Possible strategies
  • Zero No detection technique used
  • Error Code Check return codes from disk
  • Sanity Check data structures for consistency
  • Redundancy Add checksums or otherforms of
    computed replication to detect problems

27
IRON Recovery Taxonomy
  • How to recover from a detected failure?
  • Possible strategies
  • Zero Dont do anything
  • Propagate Pass error on to higher level
  • Stop Halt activity (fail stop)
  • Guess Manufacture data, return to user
  • Retry Assume failure is transient
  • Repair If inconsistency is detected
  • Remap Redirect to another block
  • Redundancy Use another copy of block

28
What IRON Techniques DoModern File Systems Use?
29
Fault Injection
  • Typical fault injection
  • Insert failures at random disk locations/times
  • Watch system to see what happens
  • Not good enough
  • May miss interesting behavior
  • May find problems, but not explanatory
  • What we do Space- and Time-aware injection
  • A gray box approach to testing

30
Space Awareness
  • File systems comprised of many on-disk structures
  • e.g., superblocks, inodes, etc.
  • Idea Make fault injection layer awareof file
    system structures
  • Inject faults across all block types

31
Time Awareness
  • Time is key to testing as well
  • e.g., update sequence
  • Idea Build model of file system I/O activity

Writes
J Journal C Commit K Checkpoint S Superblock
Data Journaling (Simplified)
  • Use model to induce faults at crucial times
  • Dont miss interesting behaviors

32
Making It Comprehensive
  • Workloads
  • Exercise as much of FS as possible
  • Two types of workloads
  • Singlets Stress single system call(open, lstat,
    rename, symlink, write, etc.)
  • Generics Stress common functionality(path
    traversal, recovery, log writes, etc.)

33
Injecting Faults
  • Disk Hard to do -gtits hardware
  • Software approach
  • Easy
  • Desirable
  • Fail-partial faults
  • Read, write errors
  • Read corruption

Host
Disk
34
The File Systems We Tested
  • Linux ext3
  • Popular, simple, compatible Linux file system
  • Linux ReiserFS
  • Scalable, database-like file system
  • Linux IBM JFS
  • Big Blues classic journaling file system
  • Windows NTFS
  • Yes, a non-Linux file system

35
Result Matrix
Workloads
Data Structures
36
Read ErrorsRecovery
Ext3
  • Ext3 Stop and propagate(dont tolerate
    transience)
  • ReiserFS Mostly propagate
  • JFS Stop, propagate, retry
  • All Some cases missed

ReiserFS
JFS
37
Write Errors Recovery
Ext3
  • Ext3/JFS Ignore write faults
  • No detection -gt no recovery
  • Can corrupt entire volume
  • ReiserFS always calls panic
  • Exception indirect blocks

ReiserFS
JFS
38
Corruption Recovery
Ext3
  • Ext3/Reiser/JFS
  • Some sanity checking used
  • Stop/Propagate common
  • Sanity checking not enough

ReiserFS
JFS
39
File System Specific Results
  • Ext3 Overall simplicity
  • Checks error codes, modest sanity
    checking,propagates errors, aborts operation
  • Overreacts on read errors -gt halt instead of
    propagate
  • But, some write errors are ignored
  • ReiserFS First, do no harm
  • At slightest sign of failure, panic() file system
  • Preserves integrity overreacts to transients
  • IBM JFS The kitchen sink
  • Uses broadest range of techniques
  • Windows NTFS Persistence is a virtue
  • Liberal retry (understands disks can be flaky)

40
General Results (1 of 3)
  • Illogical inconsistency is common
  • Similar faults -gt different reactions(e.g., JFS
    failed read of superblock)
  • Bugs are common
  • Code not stress-tested enough?(e.g., ReiserFS
    indirect block code paths)
  • Error codes are sometimes ignored
  • Highly surprising Easiest to detect(but
    sometimes hard to act upon)

41
General Results (2 of 3)
  • Sanity checking is of limited utility
  • Doesnt help if read right type, wrong block
  • Hard to do for some structures (e.g., bitmaps)
  • Stop is useful (if used correctly)
  • ReiserFS halts on write errors
  • Ext3 tries to do this (but aborts too late)
  • Stop should not be overused
  • Faults can be transient
  • Faults can be sticky, too!

42
General Results (3 of 3)
  • Retry is underutilized
  • JFS does it some, NTFS quite a bit
  • But transient faults occur
  • Automatic repair is rare
  • Almost all stop actions involve administrator
    intervention/repair (running fsck, reboot, etc.)
  • Redundancy is rarely used
  • Only superblocks are replicated, sometimes

43
Towards an IRON File System
44
IRON ext3 ixt3
  • Prototype of an IRON file system
  • First cut Many other possibilities still exist
  • Start with Linux ext3
  • Add checksums To detect corruption
  • Add replication For important structures(e.g.,
    meta-data)
  • Add parity For user data
  • Result IRON ext3 (ixt3)

45
Ixt3 Implementation
  • Checksums
  • Initially write to the ext3 log,then checkpoint
    them to their final location
  • Meta-data replicas
  • Write to replica log, checkpointlater to their
    final on-disk location
  • Parity protection for data
  • One block per file, extra pointer in inode
  • Performance issues
  • Space overhead Low
  • Time overhead?

46
Ixt3 Performance Evaluation
  • For home use or read-mostly No overhead
  • Has cost for write-intensive workloads

47
Wrapping Up
48
Summary
  • File systems are important
  • Used everywhere, in many different ways
  • Disks fail in interesting ways
  • New model Fail-partial failure model
  • Local file systems Not ready for local faults
  • Illogical inconsistencies, bugs, and little
    recovery
  • Need IRON file systems
  • Ixt3 Low-cost protection from partial failures

49
Challenges and Directions
  • Need to rethink how we build file systems
  • Performance policy isnt the only policy
  • Fault-handling policy is critical
  • Testing and beyond testing
  • Failure handling must be tested (continuously?)
  • Beyond testing Code analysis too?
  • Guiding principles
  • Lessons from networking
  • Put simply Dont trust the disk

50
ADvanced Systems Lab (ADSL)
  • www.cs.wisc.edu/adsl

51
ADvanced Systems Lab (ADSL)
  • Who did the real work
  • Nitin Agrawal
  • Lakshmi Bairavasundaram
  • Haryadi Gunawi
  • Vijayan Prabhakaran

52
Backup Slides
53
Read Errors Detection Techniques
  • Across all three file systems
  • Error codes checked forread errors(rarely
    ignored)

54
Write Errors Detection Techniques
  • Ext3, JFS ignore write errors!
  • Either ignored altogetheror not used
    meaningfully
  • ReiserFS Much more careful

55
Corruption Detection Techniques
  • Sanity checking used acrossall three file
    systems
  • Sanity checking not sufficient
  • e.g., when you read blockof similar type

56
File Systems The Manager of Your Data
57
Why File Systems Are Important
  • The file system The manager of most data
  • Consists of named files Linear array of bytes
  • Organized in directories /this/is/my/file
  • Access methods open(), read(), write(), close()
  • Where we use them Everywhere
  • Home use Photos, tax returns, home movies
  • Servers Network file servers, Google search
    engine
  • Why we use them
  • Simple, convenient
  • Good performance Subject of much research
  • Reliable? Depends on how disks fail

58
File System Background
  • Meta-data Structures the file system usesto
    track what it needs to track
  • Superblock File-system wide parameters
  • Inodes Information about a file
  • Data Blocks to hold user data
Write a Comment
User Comments (0)
About PowerShow.com