Frances Perry - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Frances Perry

Description:

Reasoning about Control Flow in the Presence of Transient Faults. Occur when an energetic particle strikes a transistor ... Concept 1: Protocol Stages. 7/16/08 ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 46
Provided by: frances85
Category:

less

Transcript and Presenter's Notes

Title: Frances Perry


1
Reasoning about Control Flow in the Presence of
Transient Faults
  • Frances Perry
  • Princeton University
  • Joint work WITH David Walker
  • SAS 2008

2
Transient Faults
  • Occur when an energetic particle strikes a
    transistor or wire causing a change in state
  • Do not permanently damage hardware
  • May corrupt computation by altering stored values
    and signals

1 1 18
1 1 2
3
Issues Caused by Transient Faults
  • Crashes Failures
  • Sun Microsystems acknowledged that transient
    caused crashes in server systems at AOL, eBay,
    and dozens of other major clients. Baumann 02
  • Cypress Semiconductor acknowledged that a single
    soft caused a server farm at a telephone company
    to crash. Ziegler Puchner 04
  • The ASC Q supercomputer at Los Alamos crashed
    regularly due to soft errors. Michalak et al.
    05
  • Exploiting Virtual Machine Vulnerabilities
    Govindavajhala Appel 03
  • Java Virtual Machine relies on type safety to
    separate untrusted programs from the VM
  • An attacker can craft a program that exploits
    transient faults to take over the VM
  • Breaking Cryptographic Protocols, eg. RSA
    Boneh, DeMillo Lipton 97
  • RSA relies on inability to factor N into prime
    numbers p and q
  • Attacker obtains two signatures (one correct, one
    faulty) of the same message

4
Transient Fault Trends
  • In 2004, a typical laptop with 1GB RAM had 1 soft
    fail per year. Ziegler 04
  • Faster clock rates, increasing transistor
    density, decreasing voltages, and smaller feature
    size all contribute to increasing fault rates of
    approximately 8 per generation. Borkar 05

5
Dealing with Transient Faults
  • Many existing solutions
  • Provide protection by adding redundancy
  • Hardware, software, hybrid hardware-software,
    single core, multi-core
  • Tradeoffs performance, cost, power and
    reliability

Performance
Power
Reliability
Cost
6
An Example Solution SWIFT
  • SWIFT Reis et al. 05 is a software-based
    solution
  • Compiler duplicates the original computation and
    inserts comparisons before memory stores and
    control flow instructions to ensure that the two
    versions agree
  • Evaluation Randomly inject faults and look at
    resulting performance and detection rate
  • The detection rate wasnt as good as they
    expected
  • Compiler was adding the redundant computation and
    then performing optimizations that remove
    redundancy!
  • Solution permanently turn off certain
    optimizations
  • Results
  • Experimental data showing a decrease in fault
    detection
  • English descriptions of faults not handled, but
    no formal reasoning
  • SWIFT will detect all but the most pathological
    single-upset faults.

7
Transient Fault Solutions
  • Many Existing Solutions
  • Borin et al. 06, Chang et al. 06, Gomaa et
    al. 03, Guerraoui Schiper 97, Horst et al.
    90, Kalbarczyk et al. 99, Oh et al. 02,
    Ohlsson Rimen 95, Rebaudengo 01, Reinhardt
    Mukherjee 00, Reis et al. 05, Reis et al.
    06, Reis et al. 07, Shirvani et al. 00,
    Slegel et al. 99, Tremblay Tamir 89,
    Venkatasubramanian 03, Vijaykumar et al. 02,
    Yan Zhang 05, Yeh 96, Yeh 98,
  • Do these solutions actually work?

8
Typed Assembly Language Morrisett et al. 98
  • Instance of Proof-Carrying Code Necula Lee
    96
  • Assembly-level type system encapsulates desired
    invariants
  • Type checking the generated code guarantees
    properties
  • The compiler does not have to be trusted.

Source Code
Type-Preserving Compiler
Typed Assembly Language
Type Checker
9
Using Typed Assembly Languages
  • Develop a machine model to reason about machine
    execution
  • Design the type system
  • Prove the type system is sound with respect to
    the machine model
  • Show that the typed assembly language is
    expressive

Source Code
Type-Preserving Compiler
Typed Assembly Language
Type Checker
10
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Verifying Assembly Code with Typed Assembly
    Languages
  • Detecting Control Flow Faults in Software
  • TALCF Formalizing a partial solution to Control
    Flow Faults
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Showing Expressiveness
  • Conclusions

11
Control Flow Faults
. . . mov r2 L2 jmp r2
L1
12
Control Flow Faults
. . . mov r2 L2 jmp r2
L1
?
13
Detecting Control Flow Faults in Software
  • Existing software solutions can catch many (but
    not all) control flow faults.
  • Method have another value that approximates the
    PC within the current block
  • Existing sequence of work handles increasing
    classes of control flow faults
  • CFCSS Oh et al. 2002, SWIFT Reis et al.
    2005, RCF Borin et al. 2006
  • Still cant handle all faults
  • Detecting control flow faults is difficult (and
    handling all may be impossible?)
  • What can we say about these techniques
    mathematically?

14
A Simplified Model of Control Flow Faults
  • Goal To formally analyze (parts of) the existing
    (imperfect) solutions for detecting control flow
    faults in software
  • Fault model
  • Faults affect general registers r1, r2, , ri
  • Single Event Upset Model
  • Hardware catches faults into the middle of blocks
    or to non-code addresses
  • General approach two redundant, independent
    computations
  • Green used to determine control flow
  • Blue backup copy, used to check control flow
    was correct

15
Stating and Verifying Intentions
. . . mov r2 L2 jmp r2
. . . mov ri L2 mov r2 L2 jmp r2
L1
L10
. . .
mov r2 L10 sub r2 r2 ri brnz r2 Lrec . . .
Lrec
... Recovery Code ...
. . .
mov r2 L2 sub r2 r2 ri brnz r2 Lrec . . .
L2
16
Stating and Verifying Intentions
. . . mov ri L2 mov r2 L2 jmp r2
L1
L10
mov r2 L10 sub r2 r2 ri brnz r2 Lrec . . .
Lrec
... Recovery Code ...
mov r2 L2 sub r2 r2 ri brnz r2 Lrec . . .
L2
17
Instruction Set
  • Instructions i mov rd v
  • sub rd rs rs
  • intend r2 // mov ri r2
  • intendz rz r2 // if rz 0, mov ri r2
  • recovernz rz // if rz ? 0, jmp Lrec
  • Blocks b i b
  • jmp rt
  • brz rz rt

18
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Verifying Assembly Code with Typed Assembly
    Languages
  • Detecting Control Flow Faults in Software
  • TALCF Formalizing a partial solution to Control
    Flow Faults
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Showing Expressiveness
  • Conclusions

19
TALCF Reasoning about Control Flow
Source Code
Syntactic Semantic Analysis
Type-Preserving Compiler
Compilation to Low-level Code
Addition of Redundancy Detection Protocol
TALCF
Optimizations
Type Checker
Code Generation
20
Step 1 TALCF Machine Model
  • Machine States ? (C,R,b,h) contains a code
    memory, register file, current block being
    executed, and a history (trace of blocks visited)
  • Final States ? recover hw-error
  • Define an operational semantics ? !o
  • Add operational rules ? !1 ? that
    nondeterministically introduce faults

21
Step 2 Type System Design
  • Main concepts behind the TALCF type system
  • Stages of the fault tolerance protocol must occur
    in order.
  • Equivalence checking ensures that redundant
    values act as proper backups.
  • Values are classified based on their reliability
    properties.

22
Concept 1 Protocol Stages
... mov r2 L ... sub r2 r2 ri ... recovernz
r2 . . . intend r3 ... jmp r3
  • Checking Code
  • ri has type check
  • Block may be invalid
  • Block Body
  • ri has type ok
  • Block is valid
  • Exit Code
  • ri has type go
  • Block is valid

23
Concept 2 Equivalence Checking
  • Values are typed with a triple ltc,b,Egt
  • c - a color
  • b - a basic type (int, codetp, protocol stage)
  • E - a static expression
  • Static expressions are arithmetic describing the
    value
  • Typing rules use expressions to enforce Blue
    computation is a true copy of the Green
    computation

24
Concept 3 Approximating Trust
  • How do we know if a runtime value actually has
    its static type? We cant!
  • Type system approximates trust by assigning each
    value a color G, B, O
  • All values with the same color share the same
    reliability properties
  • A value colored c only depends on other values
    colored c
  • After a fault zaps a value colored c
  • all values colored c are untrusted
  • the trust level of other colors doesnt change

25
Green and Blue Invariants
  • Main computation values are green
  • Backup computation values are blue
  • If a corrupt value of either color is used during
    a control flow transfer, then a control flow
    fault has occurred
  • Once a CF fault has occurred, we consider both
    green and blue values to be untrusted

26
Orange Invariants
  • Orange values continue to be trusted even after a
    control flow fault has occurred.
  • Invariants still hold because either
  • Value isnt live across control flow transfers
  • Invariant is true across every control flow
    transfer
  • All blocks must
  • Require nothing special about the value in ri (
    ltO,check,agt )
  • No other registers can be colored orange

27
Zap Tags
  • Typing judgments are parameterized by a zap tag Z
    ( Z ? ) which classifies groups of values as
    trusted or untrusted
  • Z Z if Z is at least as trusted as Z

28
Step Three Soundness
  • Progress Well-typed states can take a step.
  • Preservation Execution preserves typing, but
    the zap tag may elevate to a supertype.
  • ? ? c ? when a fault first occurs
  • G/B ? ? CF ? when a corrupt value is
    used to determine control flow.
  • Fault Tolerance Theorem
  • If a single fault occurs, nothing goes too badly
    wrong before the error is detected.
  • Fully formalized and proved in the paper

29
Step 4 Proving Expressiveness
  • Type preserving translation from a simple
    imperative language to TALCF
  • s v n vd va vb
  • if0 vz then s1 else s2 while vz ? 0 do s
  • s1 s2
  • Prove that well-formed statements s can be
    translated into well-typed machine states ?

30
Conclusion
  • Transient faults are a significant concern
    already and future processors will come more
    susceptible
  • Contributions of TALCF
  • Introduces new proof techniques for reasoning
    about control flow.
  • Prove the correctness of a software technique for
    detecting control flow faults.
  • For more information, visit the Project ap
    Homepagehttp//www.cs.princeton.edu/sip/projects/
    zap

31
(No Transcript)
32
Extra Slides
33
Formalizing Fault Tolerance Solutions
  • ?zap Walker et al. ICFP 06
  • at the level of ? calculus
  • duplicates computation and uses atomic voting to
    compare computations and recover from faults
  • Avoids specifically dealing with control flow
    faults
  • TALFT Perry et al. PLDI 07
  • formalizes an existing hybrid hardware-software
    solution
  • uses an assembly level type system to capture
    invariants about redundancy
  • Requires special hardware to address control flow
    faults

34
Option 1 CF Identical Registers Similar
1
1
2
2
3
3
4
4
5
5
35
Option 2 Hardware Error Detected
1
1
2
2
3
3
hw-error
4
4
5
36
Option 3 Fault to Backup Copy Detected
1
1
2
2
Lrec
3
3
5
4
4
5
37
Option 4 CF Fault Detected
1
1
2
2
Lrec
3
3
4
5
4
5
38
Typing Example
. . . intend r3 jmp r2
r2 hG, codetp, e2i r3 hB, codetp, e3i
ri hB, go, e3i
  • Requires
  • Concept 1 intend has already occurred
  • Concept 2 backup is correct (e2 e3)

ri hO, check, xii
Want to know Did r3 r2 ? Can
check Does ri L ?
L
mov r6 L sub r6 r6 ri recovernz r6 . . .
r6 hO, check, L - xii
Control only proceeds past this point if xi L
and block is valid.
39
Formalizing Program Execution
  • ? represents the state of the machine at some
    point during program execution
  • Define an operational semantics ? !o ? to
    express the result of executing a single
    instruction
  • There are no rules for the undefined cases

R(r2) c m
R(r1) c n
R(r2) c m
R(r1) c n
(add)
!o
(R, C, M, Q, add rd r1 r2)
(Rrd ? c (nm), C, M, Q, ?)
(Rrd ? c (nm), C, M, Q, ?)
40
Formalizing the Fault Model
  • Add operational rules ? !1 ? that
    nondeterministically introduce faults to the
    register file and queue

R(r) c n (zap-reg) (R, C, M, Q, i)
!1 (Rr ? c z, C, M, Q, i)
Q (a1,v1), , (ai,vi), .,(an,vn) Q
(a1,v1), , (z, vi), .,(an,vn)
(zap-queue-addr) (R, C, M, Q, i) !1 (R,
C, M, Q, i)
41
Fault Tolerance Theorem
  • If a computation sustains a single fault
  • the faulty computation looks identical to the
    original, modulo the corrupt color
  • the faulty computation visits the same sequence
    of blocks until a hardware error is detected
  • the faulty computation visits the same sequence
    of block until a fault is detected
  • the faulty computation veers off course to an
    invalid block but catches the error within that
    block

42
Control Flow Invariants
. . . mov r2 L jmp r2
. . . mov r2 L jmp r2
L
. . .
If the green and blue computations dont agree
that I am the jump target, then theres been a
fault.
43
Control Flow Invariants
. . . mov r2 L jmp r2
. . . mov r2 L jmp r2
Arrival at this instruction depends on green
computation. Constants can be trusted as any
color.
L
mov r2 L sub r2 r2 ri recovernz r2 . . .
Use a different color for the checking code
ri is part of the blue computation, but L has no
preconceived notion of ris type so can trust
it as any color.
44
Machine State Typing
  • Machine State Typing Judgment Z ?
  • Code memory is described by ?
  • Register File typing is described by ?

45
Related Work
  • ?zap Fault-tolerant lambda calculus Walker et
    al.
  • High-level lacks a program counter, register
    file, memory, load/store instructions,
  • TAL Typed Assembly Language Morrisett et al.
Write a Comment
User Comments (0)
About PowerShow.com