Automatic Data Structure Repair for Self-Healing Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Data Structure Repair for Self-Healing Systems

Description:

Automatic Data Structure Repair for. Self-Healing Systems. Brian Demsky. Martin Rinard. Laboratory for Computer Science. Massachusetts Institute of Technology ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 40
Provided by: marti276
Category:

less

Transcript and Presenter's Notes

Title: Automatic Data Structure Repair for Self-Healing Systems


1
Automatic Data Structure Repair for Self-Healing
Systems
  • Brian Demsky
  • Martin Rinard
  • Laboratory for Computer Science
  • Massachusetts Institute of Technology

2
Motivation
Broken Data Structure
  • Errors
  • Missing elements
  • Inappropriate sharing
  • Dangling references
  • Out of bounds array indices
  • Inconsistent values

F 20 G 5
F 20 G 10
I 5
J 2
3
Goal
Broken Data Structure
Consistent Data Structure
F 10 G 5
F 20 G 10
F 2 G 1
F 20 G 5
F 20 G 10
Repair Algorithm
I 3
I 5
J 2
J 2
4
Goal
Broken Data Structure
Consistent Data Structure
Consistency Properties From Developer
F 10 G 5
F 20 G 10
F 2 G 1
F 20 G 5
F 20 G 10
Repair Algorithm
I 3
I 5
J 2
J 2
5
What Does Repair Algorithm Produce?
  • Data structure that
  • Satisfies consistency properties, and
  • Heuristically close to broken data structure
  • Not necessarily the same data structure as
    (hypothetical) correct program would produce
  • But enough to keep program operating successfully

6
Precursors
  • Data structure repair has historically appeared
    in systems with extreme reliability goals
  • 5ESS switch hand coded audit routines
  • IBM MVS operating system hand coded failure
    recovery routines
  • Key component of these systems

7
Where Is This Likely To Be Useful?
  • Not for systems with slack - can just reboot
  • Cause of error must go away after reboot
  • Must be OK to lose volatile state
  • Must be OK to wait for reboot
  • Persistent data structures
  • (file systems, application files)
  • Autonomous and/or safety critical systems
  • Monitor/control unstable physical phenomena
  • Largely independent subcomputations
  • Moving time window

8
Architecture
Broken Abstract Model
Repaired Abstract Model
Internal Consistency Properties
External Consistency Properties
Model Definition Translation
1011100110001111011 1010101011110011101 1010111000
111101110
1010011110001111011 1010110101110011010 1010111011
001100010
Broken Bits
Repaired Bits
9
Architecture Rationale
  • Why go through the abstract model?
  • Simple, uniform structure
  • Sets of objects
  • Relations between objects
  • Simplifies both
  • Expression of consistency properties
  • Repair algorithm
  • Enables system to support full range of
    efficient, heavily encoded data structures

10
File System Example
abst
intro

0
2
1

-5

1

-1
Directory Entries
Disk Blocks
struct Disk Entry dirNumEntries Block
blockNumBlocks Disk D
  • struct Entry
  • byte nameLength
  • int firstBlock
  • struct Block
  • int nextBlock
  • data byteBlockSize

11
Model Definition
  • Sets of objects
  • set blocks of integer partition used free
  • Relations between objects values of object
    fields, referencing relationships between objects
  • relation next used, used

blocks
used
free
next
12
Model Translation
  • Bits translated to sets and relations in abstract
    model using statements of the form
  • Quantifiers, Condition ? Inclusion Constraint
  • for i in 0..NumEntries, 0 ? D.diri.firstBlock
    and D.diri.firstBlock lt NumBlocks ?
  • D.diri.firstBlock in used
  • for b in used, 0 ? D.blockb.nextBlock and
    D.blockb.nextBlock lt NumBlocks ?
    ?b,D.blockb.nextBlock? in next
  • for ?b,n? in next, true ? n in used
  • for b in 0..NumBlocks, not (b in used) ? b in free

13
Model in Example
abst
intro

0
2
1

-5

1

-1
Directory Entries
Disk Blocks
blocks
used
0
next
free
1
3
next
2
14
Internal Consistency Properties
  • Quantifiers, Body
  • Body is first-order property of basic
    propositions
  • Inequality constraints on values of numeric
    fields
  • V.R E, V.R lt E, V.R ? E, V.R ? E, V.R gt E
  • Presence of required number of objects
  • size(S) C, size(S) ? C, size(S) ? C
  • Topology of region surrounding each object
  • size(V.R) C, size(V.R) ? C, size(V.R) ? C
  • size(R.V) C, size(R.V) ? C, size(R.V) ? C
  • Inclusion constraints V in S, V1 in V2.R,
    ?V1,V2? in R
  • Example for b in used, size(next.b) ? 1

15
Internal Consistency Violations
  • Evaluate consistency properties, find violations
  • for b in used, size(next.b) ? 1 is false for b 1

blocks
used
0
next
free
1
3
next
2
16
Repairing Violations of Internal Consistency
Properties
  • Violation provides binding for quantified
    variables
  • Convert Body to disjunctive normal form
  • (p1 ? ? pn ) ? ? (q1 ? ? qm )
  • p1 pn , q1 qm are basic propositions
  • Choose a conjunction to satisfy
  • Repair violated basic propositions in conjunction

17
Repairing Violations of Basic Propositions
  • Inequality constraints on values of numeric
    fields
  • V.R E, V.R lt E, V.R ? E, V.R ? E, V.R gt E
  • Compute value of expression, assign field
  • Presence of required number of objects
  • size(S) C, size(S) ? C, size(S) ? C
  • Remove or insert objects from/to set
  • Topology of region surrounding each object
  • size(V.R) C, size(V.R) ? C, size(V.R) ? C
  • size(R.V) C, size(R.V) ? C, size(R.V) ? C
  • Remove or insert pairs from/to relation
  • Inclusion constraints V in S, V1 in V2.R,
    ?V1,V2? in R
  • Remove or add the object or pair from/to set or
  • relation

18
Repair in Example
for b in used, size(next.b) ? 1 is false for b
1 Must repair size(next.1) ? 1 Can remove either
?0,1? or ?2,1? from next
blocks
used
0
next
free
1
3
next
2
19
Repair in Example
for b in used, size(next.b) ? 1 is false for b
1 Must repair size(next.1) ? 1 Can remove either
?0,1? or ?2,1? from next
blocks
used
0
next
free
1
3
2
20
Acyclic Repair Dependences
  • Questions
  • Isnt it possible for the repair of one
    constraint to invalidate another constraint?
  • What about infinite repair loops?
  • What about unsatisfiable specifications?
  • Answer
  • We require specifications to have no cyclic
    repair dependences between constraints
  • So all repair sequences terminate
  • Repair can fail only because of resource
    limitations

21
External Consistency Constraints
  • Quantifiers, Condition ? Body
  • Body of form V E, V.F E, V.FI E
  • Example
  • for b in free, true ? D.blockb.nextBlock -2
  • for ?i,j? in next, true ? D.blocki.nextBlock
    j
  • for b in used, size(b.next) 0 ?
    D.blockb.nextBlock -1
  • Repair simply performs assignments
  • Translates model repairs to bit repairs

22
Repair in Example
Inconsistent File System
Repaired File System
23
When to Test for Consistency and Repair
  • Persistent data structures
  • Repair can be independent activity, or
  • Repair when data written out or read in
  • Volatile data structures in running program
  • Under programmer control
  • Transaction-based approach
  • Identify transaction start and end
  • Repair at start, end, or both
  • Failure-based approach
  • Wait until program fails
  • Repair and restart from latest safe point

24
Experience
  • We acquired four benchmarks (written in C/C)
  • CTAS (air-traffic control tool)
  • Simplified Linux file system
  • Freeciv interactive game
  • Microsoft Word files
  • We developed specifications for all four
  • Very little development time (days, not weeks)
  • Most of time spent figuring out Freeciv and CTAS
  • Each benchmark has
  • Workload
  • Fault insertion methodology
  • Ran benchmarks with and without repair

25
CTAS
  • Set of air-traffic control tools
  • Traffic management
  • Arrival planning
  • Flow visualization
  • Shortcut planning
  • Deployed in centers around country (Dallas/Ft.
    Worth, Los Angeles, Denver, Miami,
    Minneapolis/St. Paul, Atlanta, Oakland)
  • Approximately 1 million lines of C/C code

26
CTAS Screen Shot
27
Results
  • Workload recorded radar feed from DFW
  • Fault insertion
  • Simulate error in flight plan processing
  • Bad airport index in flight plan data structure
  • Without repair
  • System crashes segmentation fault
  • With repair
  • Aircraft has different origin or destination
  • System continues to execute
  • Anomaly eventually flushed from system

28
Aspects of CTAS
  • Lots of independent subcomputations
  • System processes hundreds of aircraft problem
    with one should not affect others
  • Multipurpose system
    (visualization, arrival planning, shortcuts, )
    problem in one purpose should not affect others
  • Sliding time window anomalies eventually flushed
  • Rebooting ineffective system will crash again
    as soon as it sees the problematic flight plan

29
Simplified Linux File System


intro
110



0

1011













directory block
inode bitmap block
block bitmap block
inode
inode

super block
group block
disk blocks
inode block
  • Some Consistency Properties
  • inode bitmap consistent with inode usage
  • block bitmap consistent with block usage
  • directory entries refer to valid inodes
  • files contain valid blocks only
  • files do not share blocks

30
Results
  • Workload write and verify several files
  • Fault insertion crash file system
  • Inode and block bitmap errors
  • Partially initialized directory and inode entries
  • Without repair
  • Incorrect file contents because of inode and disk
    block sharing
  • With repair
  • Bitmaps repaired preventing illegal sharing,
    correct file contents

31
Freeciv
Terrain Grid
  • Consistency Properties
  • Tiles have valid terrain values
  • Cities are not in the ocean
  • Each city has exactly one reference from city
    location grid
  • City locations are consistent in
  • City structures and
  • tile grid

O Ocean
P
O
M
M
P Plain
O
O
M
P
M Mountain
P
O
M
M
City Structures
P
P
M
P
loc 3,0
loc 2,3
32
Results
  • Workload Freeciv software plays against itself
  • Fault insertion randomly corrupt terrain values
  • Without repair program fails (seg fault)
  • With repair
  • Game runs just fine
  • But game plays out differently because of the
    different terrain values

33
Microsoft Word Files
  • Files consist of a sequence of streams
  • Streams stored using FAT-based data structure
  • Consistency Properties
  • FAT blocks exist and contain valid entries
  • FAT streams are properly terminated
  • Free blocks properly marked
  • Streams contain valid blocks
  • No sharing of blocks between streams

abst
1
intro




7
0
1
9
2
-1
-1
-2
1
Directory Entries
FAT
Disk Blocks
34
Results
  • Workload several Microsoft Word files
  • Fault insertion scramble FAT
  • Without repair
  • If blocks containing the FAT were incorrectly
    marked as free, Word successfully loads file
  • Otherwise, The document name or path is
    not valid
  • With repair
  • Word loads all files

35
Extensions
  • Elimination of external consistency constraints
  • Eliminates problems with translating repairs on
    the abstract model to the actual data structure
  • Repair algorithm analyzes model definition rules
    to generate repair actions for the actual data
    structure

36
Extensions
  • Support for doubly linked data structures
  • Enables the repair algorithm to regenerate back
    links

37
Extensions
  • Compilation and optimization of consistency
    checking
  • Achieved significant speedups (n x) by compiling
    the specification
  • Achieved further speedups () by partially
    optimizing away the construction of the abstract
    model

38
Related Work
  • Hand-coded repair
  • Lucent 5ESS switch
  • IBM MVS operating system
  • Self-stabilizing algorithms
  • Log-based recovery for database systems
  • Recovery-oriented computing
  • Recursive restartability
  • Undo framework

39
Conclusion
  • Data structure repair interesting way to
    (potentially) improve reliability
  • Specification-based approach promises to make
    technique more widely applicable
  • Moving towards more robust, probabilistic,
    continuous concept of system behavior

40
Formalizing Repair DependencesConstraint
Dependence Graph
  • Nodes Constraint, Conjuncts from DNF
  • Edges
  • constraint to its conjunctions
  • conjunction to dependent propositions
  • if repairing conjunction could falsify
    proposition, or
  • if repairing conjunction could increase
    quantifier scope

P
(b1 ? ? bn )
(a1 ? ? an )
Q
T
(c1 ? ? cn )
(d1 ? ? dn )
(e1 ? ? en )
(f1 ? ? fn )
41
Consistency Properties
  • The FAT blocks exist
  • FAT contains valid values only
  • -1 terminates FAT streams
  • -2 indicates free blocks
  • Valid disk block index next block in stream
  • FAT streams properly terminated
  • Free blocks properly marked
  • Streams contain valid blocks only
  • Streams do not share blocks

42
Formalizing Repair DependencesConstraint
Dependence Graph
  • Absence of cycles implies valid repair schedule
  • Conjunction removal for cycle elimination
  • (must leave at least one conjunction per
    constraint)

P
(b1 ? ? bn )
(a1 ? ? an )
Q
T
(c1 ? ? cn )
(d1 ? ? dn )
(e1 ? ? en )
(f1 ? ? fn )
43
Formalizing Repair DependencesConstraint
Dependence Graph
  • Absence of cycles implies valid repair schedule
  • Conjunction removal for cycle elimination
  • (must leave at least one conjunction per
    proposition)

P
(b1 ? ? bn )
(a1 ? ? an )
Q
T
(c1 ? ? cn )
(d1 ? ? dn )
(e1 ? ? en )
44
Pointers
  • Sets in model can include
  • Primitive types (int, char, )
  • Structs (identified by pointer to struct)
  • Standard linked list example
  • struct node int value node next
  • set nodes of node
  • relation next node, node
  • for n in nodes, true ? n.next in nodes
  • for n in nodes, true ? ?n,n.next? in next

45
What About Corrupted Pointers?
  • System only allows valid structs in model
  • struct must be completely in valid memory
  • one struct may be nested inside another struct
    (but must agree on memory format)
  • If encounter invalid or null pointer, the
    (invalid) struct does not appear in model
  • Implementation must track operations that affect
    valid regions of address space
  • malloc, free
  • mmap, munmap

46
CTAS in Action
FAST at DFW TRACON
TMA at Fort Worth Center
47
Usage Scenarios
  • Reduced development effort
  • Invest less effort in finding and fixing bugs
  • Rely on repair to deliver reliable system
  • Afraid to fix bug
  • Cheap insurance policy
  • No good quantitative justification
  • But repair seems like a good idea

48
Issues
  • Unclear relationship between repaired bits and
    bits from correct execution of program
  • Identifying results involving repaired data
  • Characterizing likely errors
  • Data races in multithreaded programs
  • Failure to update correlated data structures
  • Caching inconsistencies
  • Unanticipated failures/exit points
  • Constraint language expressivity
  • Coverage of desired properties
  • Limitations from acyclicity requirement
  • When to test for consistency and repair

49
What About Corrupted Pointers?
  • Sets may contain pointers to structs
  • System only allows valid structs in model
  • struct must be completely in valid memory
  • one struct may be nested inside another struct
    (but must agree on memory format)



Invalid Struct
Valid Struct
Valid Structs
Valid Memory
Invalid Memory
50
Interesting Nuggets
  • Small specifications
  • Global invariant advantages
Write a Comment
User Comments (0)
About PowerShow.com