Title: Recovery Oriented Programming
1Recovery Oriented Programming
- Olga Brukman and Shlomi Dolev
- Ben-Gurion University
- Beer-Sheva
- Israel
2Towards Correct Software
- Software should respects its specifications
- Safety, Liveness
- Atomic power station
- Safety the atomic station shouldn't explode
- Liveness the atomic station should produce some
electricity
Atomic power station
3Recovery Oriented Design
- Software performs substantially in accordance
with specifications for a period of 90 days...
(IEEE Computer, October 2006) - How to cope with such software?!
- Recovery Oriented Computing PBB'02!
- Recovery actions
- Reboot, wait, reschedule
- Non-intrusive avoid rewriting the program
(possibly new other bugs)
4Recovery Oriented Programming
- Programmer
- Best-effort implementation
- Using same IO variables as specifier
- Still bugs and unexpected states
- Specifications Composer (Project Manager)
- Invariants and predicates
- important properties on program IO
- Recovery actions
5Recovery Oriented Programming Assumptions
- Self-stabilizing processor
- Self-stabilizing OS
- Infrastructure for robust monitoring and recovery
- Processes exist and execute their code
6Recovery Oriented Programming Assumptions
- Not immediately Byzantine
- eventual Byzantine program
7Our Framework
Subsystems hierarchy
Recovery tuples
Code
Subsystem External Monitor
System is able to recover from any state
8Generated Code One Process
External Monitor
Recovery tuples
event-driven monitoring
Code
event-driven monitoring
9Generated Code Subsystem
External Monitor
Subsystems hierarchy
event-driven monitoring
Recovery tuples
Subsystem External Monitor
10Our Framework Transforming Recovery Tuples into
Code
Subsystems hierarchy
Pre-compiler
Recovery tuples
Code
Subsystem External Monitor
11Safety Recovery Tuple
1 process
PRED x!7 RA this.restart()
temp_xa if temp_x!7 xtemp_x else
this.restart()
Pre-compiler
... xa ...
12Safety Recovery Tuple in the Scope of
Stabilization External Monitoring
1 process
if !(ps.x!7) ps.restart()
PRED x!7 RA this.restart()
temp_xa if temp_x!7 xtemp_x else
this.restart() ...
Pre-compiler
... xa ...
No more x...
13Liveness Recovery Tuple
1 process
xx2 if (xy15) this.history ... yy5
if (xy15) this.history
INV eventually xy15 RA this.restart() HTR
history
Pre-compiler
xx2 ... yy5 ...
historyhistory?this.state() if loop in
history and CPU(this)
ps.restart()
History ... .., x1,y2,..,
.., x3,y7,.., ...
14Generated Monitoring Code for Subsystem
sub p1, p2
History ... distributed
snapshot(sub), ...
External monitor for sub
Recovery Tuples
Pre-compiler
Code for p1
Code for p2
15Generic Correctness Theorem
- In the program produced by the pre-compiler every
rsf (restart supporting fair)-execution E has a
suffix in which the program respects its
specification function - A rsf-execution is the execution in which system
is trusted to behave according to its
specifications after restart.
16Generic Correctness Proof
- Assumption Processes and external monitors are
scheduled fairly due to presence of
self-stabilizing software platform - Safety process either reaches monitoring section
in its code or its external monitor makes
scheduled check - Subsystem external monitor makes scheduled check
17Generic Correctness Proof Cont.
- Liveness the process (subsystem) external
monitor makes scheduled check of the history log - Corrupted history
- If causes (unnecessary) recovery - trimmed
- New correct records are eventually accumulated
and reflect the real state of system
18Related Work Perfect Software
- Formal specification languages
- ASM GRS'04, IO Automata L'96, NURPL CKB'84
- Gradually and manually translated into fully
verified program - Model checking
- Doesn't scale
- Specification embedding programming languages
- SRC (Software Cost Reduction) language RLHL'06
- Programmer bugs
19Related Work Programming Tools
- Design By Contract
- Eiffel, iContract for Java
- Checking invariants on an object state,
pre-/post-conditions on object methods, recovery
by predefined recovery action - Partial monitoring of liveness, based on timeout
- Monitoring of safety outside of stabilization
scope - Exceptions
- Suitable for single process only
- Unpractical for changing the program flow
20Related Work Online Recovery
- Recovery blocks (N-programming) RX94
- ROC PBB02, Java MOPCR'05, Kinesthetics
eXtreme KPGV'03, "On Modeling and Tolerating
Incorrect Software" AT'03 - Monitoring/correcting layer that alternates the
failed component behaviour
21Related Work Online Recovery
- Assumption of monitoring/correcting layer
stability - ROC PBB02, Java MOPCR'05, Kinesthetics
eXtreme KPGV'03 - Intrusive correcting actions
- Empty program correcting actions define the
program - "On Modelling and Tolerating Incorrect Software"
AT'03
22Conclusions
- Recovery Oriented Programming paradigm for a
programming language - Full monitoring of safety and liveness properties
in the scope of stabilization - Formal correctness proof scheme for the resulting
code