Frances Perry - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Frances Perry

Description:

Occur when an energetic particle strikes a transistor or wire causing a change ... Cypress Semiconductor acknowledged, 'the wake-up call came in the end of 2001 ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 57
Provided by: frances85
Category:
Tags: cypress | frances | perry

less

Transcript and Presenter's Notes

Title: Frances Perry


1
Reasoning about Software in the Presence of
Transient Faults
  • Frances Perry
  • Princeton University

2
Transient Faults
  • Occur when an energetic particle strikes a
    transistor or wire causing a change in state
  • Do not permanently damage hardware
  • May corrupt computation by altering stored values
    and signals

1 1 18
1 1 2
3
Issues Crashes and Failures
  • In 2000, Sun Microsystems acknowledged that
    transient faults interfered with cache memories
    and caused crashes in server systems at major
    customer sites, including AOL, eBay, and dozens
    of others. Baumann 02
  • Cypress Semiconductor acknowledged, the wake-up
    call came in the end of 2001 with a major
    customer reporting havoc at a large telephone
    company. Technically it was found that a single
    soft failwas causing an interleaved system farm
    to crash. Ziegler Puchner 04
  • At Los Alamos in 2003, the ASC Q supercomputer
    crashed regularly due to soft errors. Michalak
    et al. 05

4
Issues Vulnerabilities in Virtual Machines
Govindavajhala Appel 03
  • Java Virtual Machine relies on type safety to
    separate untrusted programs from the VM
  • An attacker can craft a program that exploits
    transient faults to take over the VM
  • Waits until a fault results in a pointer with a
    runtime type that doesnt match its static type
  • Use mismatch to execute arbitrary code
  • Successful demo at conference
  • Can speed up transient fault rate using a heat
    source
  • 70 probability of taking complete control
  • of VM within 1 minute

5
Issues Breaking Cryptographic Protocols
  • Certain implementations of RSA are vulnerable to
    a single fault Boneh, DeMillo Lipton 97
  • RSA relies on inability to factor N into prime
    numbers p and q
  • Attacker obtains two signatures (one correct, one
    faulty) of the same message
  • GCD ( correct signature - faulty signature, N )
    p
  • Many other examples Biham Shamir 97, Blomer
    Seifert 03, Dusart et al. 03, Piret
    Quisquater 03,

6
Transient Fault Trends
  • In 2004, a typical laptop with 1GB RAM had 1 soft
    fail per year. Ziegler 04
  • Faster clock rates, increasing transistor
    density, decreasing voltages, and smaller feature
    size all contribute to increasing fault rates of
    approximately 8 per generation. Borkar 05

7
Dealing with Transient Faults
  • Many existing solutions
  • Provide protection by adding redundancy
  • Hardware, software, hybrid hardware-software,
    single core, multi-core
  • Tradeoffs performance, cost, power and
    reliability

Performance
Power
Reliability
Cost
8
An Example Solution SWIFT
  • SWIFT Reis et al. 05 is a software-based
    solution
  • Compiler duplicates the original computation and
    inserts comparisons to ensure that the two
    versions agree
  • Evaluation Randomly inject faults and look at
    resulting performance and detection rate
  • The detection rate wasnt as good as they
    expected
  • Compiler was adding the redundant computation and
    then performing optimizations that remove
    redundancy!
  • Solution permanently turn off certain
    optimizations

9
Transient Fault Solutions
  • Many Existing Solutions
  • Borin et al. 06, Chang et al. 06, Gomaa et
    al. 03, Guerraoui Schiper 97, Horst et al.
    90, Kalbarczyk et al. 99, Oh et al. 02,
    Ohlsson Rimen 95, Rebaudengo 01, Reinhardt
    Mukherjee 00, Reis et al. 05, Reis et al.
    06, Reis et al. 07, Shirvani et al. 00,
    Slegel et al. 99, Tremblay Tamir 89,
    Venkatasubramanian 03, Vijaykumar et al. 02,
    Yan Zhang 05, Yeh 96, Yeh 98,
  • Do these solutions actually work?

10
Formalizing Fault Tolerance Solutions
  • Goal Formally reason about the behavior of fault
    tolerance solutions
  • What is the right level of abstraction?
  • Faults affect hardware
  • Need to deal with primitive instructions, memory,
    registers, additional hardware structures
  • Software redundancy is added during compilation
  • Need to understand interactions with
    optimizations, register allocation, etc
  • Want to reason about the implementation as well
    as the algorithm

11
Proof Carrying Code Necula Lee 96
  • Code producer supplies a safety proof along with
    the code binary
  • User verifies proof before executing code
  • How do we represent safety proofs?

Source Code
Compilation Certification
Native Safety Code Proof
Producer
Consumer
Proof Validation
12
Typed Assembly Language Morrisett et al. 98
  • Assembly-level type system encapsulates desired
    invariants
  • Type checking the generated code guarantees
    properties
  • The compiler does not have to be trusted.

Source Code
Type-Preserving Compiler
Typed Assembly Language
Type Checker
13
Using Typed Assembly Languages
  • Develop a machine model to reason about machine
    execution
  • Design the type system
  • Prove the type system is sound with respect to
    the machine model
  • Show that the typed assembly language is
    expressive
  • My research Using typed assembly languages in
    the presence of transient faults

Source Code
Type-Preserving Compiler
Typed Assembly Language
Type Checker
14
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Formalizing Fault-tolerance Solutions with Typed
    Assembly Languages
  • TALFT Formalizing a hybrid transient fault
    solution
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Discussion Is this realistic?
  • Related and Future Work in Fault Tolerance
  • Other Recent Projects
  • Conclusions Goals

15
TALFT Fault-tolerant Typed Assembly Language
PLDI 07. Best Paper Award.
Source Code
Syntactic Semantic Analysis
CRAFT Compiler Reis et al. 05
Type-Preserving Compiler
Compilation to Low-level Code
Addition of Software Redundancy
TALFT
Typed Assembly Language
Optimizations
Type Checker
Code Generation
with slightly modified hardware
16
Machine Instruction Set
  • Colors c G B
  • Values v c n
  • Instructions i mov rd, v
  • op rd, rs, rt op rd, rs, v
  • ldc rd, rs stc rd, rs
  • brzc rz, rd jmpc rd

17
Machine State
. . . 0x0393 mov r1 4 0x0394 mov r3
4 0x0395 stG r2 r1 0x0396 stB r4 r3 . . .

C
18
Example Store Instruction
  • Goal
  • store value 5 into address 256
  • mov r1, G 5
  • mov r2, G 256
  • stG r2, r1
  • mov r3, B 5
  • mov r4, B 256
  • stB r4, r3

0x0393 mov r1, G 5 0x0394 mov r2, G 256 0x0395
stG r2, r1 0x0396 mov r3, B 5 0x0397 mov r4, B
256 0x0398 stB r4, r3
256 ! 5
r15
r2256
r35
r4256
19
Example Data Fault
  • Goal
  • store value 5 into address 256
  • mov r1, G 5
  • mov r2, G 256
  • stG r2, r1
  • mov r3, B 5
  • mov r4, B 256
  • stB r4, r3

0x0393 mov r1, G 5 0x0394 mov r2, G 256 0x0395
stG r2, r1 0x0396 mov r3, B 5 0x0397 mov r4, B
256 0x0398 stB r4, r3
fault detected
r15
r121
r2256
r35
r4256
20
Example Interleaved Loads and Stores
  • / r1 r2 value 5
  • r3 r4 address 256 /
  • stG r3, r1
  • ldG r5, r3
  • add r5, r5, 1
  • stG r3, r5
  • stB r4, r2
  • ldB r6, r4
  • add r6, r6, 1
  • stB r4, r6

0x0393 mov r1, G 5 0x0394 mov r2, G 256 0x0395
stG r2, r1 0x0396 mov r3, B 5 0x0397 mov r4, B
256 0x0398 stB r4, r3
256 ! 5
256 ! 6
r55
r56
r66
r65
21
Example Flexible Instruction Scheduling
  • / r1r2value 5
  • r3r4 address 256 /
  • stG r3, r1
  • stB r4, r2
  • ldG r5, r3
  • add r5, r5, 1
  • ldB r6, r4
  • stG r3, r5
  • add r6, r6, 1
  • stB r4, r6

0x0393 mov r1, G 5 0x0394 mov r2, G 256 0x0395
stG r2, r1 0x0396 mov r3, B 5 0x0397 mov r4, B
256 0x0398 stB r4, r3
256 ! 5
256 ! 6
r56
r55
r66
r65
22
Example Control Flow Fault
  • mov r1, G 5
  • mov r2, G 256
  • stG r2, r1
  • mov r3, B 5
  • mov r4, B 256
  • stB r4, r3

0x0393 mov r1, G 5 0x0394 mov r2, G 256 0x0395
stG r2, r1 0x0396 mov r3, B 5 0x0397 mov r4, B
256 0x0398 stB r4, r3
fault detected
pcG0x0393
pcG0x0394
pcG0x0395
pcG0xbeef
pcB0x0393
pcB0x0394
pcB0x0395
23
Formalizing Program Execution
  • ? represents the state of the machine at some
    point during program execution
  • Define an operational semantics ? !o ? to
    express the result of executing a single
    instruction
  • There are no rules for the undefined cases

R(r2) c m
R(r1) c n
R(r2) c m
R(r1) c n
(add)
!o
(R, C, M, Q, add rd r1 r2)
(Rrd ? c (nm), C, M, Q, ?)
(Rrd ? c (nm), C, M, Q, ?)
24
Formalizing the Fault Model
  • Add operational rules ? !1 ? that
    nondeterministically introduce faults to the
    register file and queue

R(r) c n (zap-reg) (R, C, M, Q, i)
!1 (Rr ? c z, C, M, Q, i)
Q (a1,v1), , (ai,vi), .,(an,vn) Q
(a1,v1), , (z, vi), .,(an,vn)
(zap-queue-addr) (R, C, M, Q, i) !1 (R,
C, M, Q, i)
25
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Formalizing Fault-tolerance Solutions with Typed
    Assembly Languages
  • TALFT Formalizing a hybrid transient fault
    solution
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Discussion Is this realistic?
  • Related and Future Work in Fault Tolerance
  • Other Recent Projects
  • Conclusions Goals

26
Principles Behind the TALFT Type System
  • In the absence of faults, standard type theory
    applies.
  • Green and blue computations are independent.
  • Green and blue computations are redundant.
  • Observable actions depend on both computations.

27
Typing Values
  • The type of a value is a triple ltc,b,Egt
  • c - a color (either green or blue)
  • b - a basic type (int, b reference, code
    pointer)
  • E - a static expression describing arithmetic
    and memory

G 3 ltG,int,3gt
28
Instruction Typing Example Add
Execution Behavior
R(r2) c m
R(r1) c n
!o
(R, C, M, Q, add rd r1 r2)
(Rrd ? c (nm), C, M, Q, ?)
Static Requirements
?(r1) hc, int, E1i ?(r2) hc, int, E2i
?
add rd r1 r2 )
?????
where
?
rd ?hc, int, E1E2i
E1E2
int
c
?
29
Using Expressions to Enforce Redundancy
r1 ltG,int,xgt r3 ltB,int,xgt r2 ltG,int,ygt r4
ltB,int,ygt
r3 ltB,int,xygt
r1 ltG,int,xygt
Q (E8, xy)
3. Redundant computations
xy xy
30
Using Expressions to Enforce Redundancy
Error during compilation
r3 ltB,int,xxgt
r1 ltG,int,xygt
Q (E8, xy)
xx ? xy
Type Checking Fails
31
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Formalizing Fault-tolerance Solutions with Typed
    Assembly Languages
  • TALFT Formalizing a hybrid transient fault
    solution
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Discussion Is this realistic?
  • Related and Future Work in Fault Tolerance
  • Other Recent Projects
  • Conclusions Goals

32
Type Safety in the Presence of Faults
  • Standard Type Safety Well-typed programs
    continue to be well-typed during execution
  • After a fault, some values may not be well-typed

r1 ltG, int, E1gt r3 ltG, int ref, E3 gt
stG r3, r1
33
Abstracting Corruption with Zap Tags
  • Transient faults occur during execution cant
    statically track which values may be corrupted
  • Abstract the possible scenarios using three zap
    tags
  • Z ? G B
  • When Z is a color, values of that color may have
    any type

ltG, int ref, E3 gt
ltG, int, 3 gt
ltG, code ptr, E2E7 gt
G
G 3
34
Type Safety Progress
  • When no faults have occurred, well-typed machine
    states can execute the next instruction
  • After a fault has occurred, well-typed states
    either execute the next instruction or detect the
    error

S
If
then
S !o S
c S
If
then either
S !o S
or
S !o fault
35
Type Safety Preservation
  • Normal execution preserves typing
  • Faulty execution preserves typing modulo the
    corrupted color

Z S
S !o S
Z S
If
and
then
S
S S
Exists c. c S
If
and
then
36
Program Execution
S3
S2

!o
!o
!o
S1
37
Indistinguishable Machine States
38
Fault Detection Theorem
  • If a machine state is well-typed and a single
    fault occurs somewhere during execution, then
    there is no change in observable behavior until
    the fault is detected.



S1
S2
S3
S5

S6
!o
!o
!o
!o
!o
!o
!o

S2f
S3f
S5f
!o
!o
!o
!o
fault
39
Formal Results for TALFT
  • Well-typed TALFT programs
  • Dont go wrong Type Safety
  • Only detect a fault when a fault has occurred No
    False Positives Lemma
  • Never allow a single fault to change the
    observable program behavior Fault Detection
    Theorem

40
Work In Progress Compiling to TALFT
  • Its easy to design a sound type system just
    make it very restrictive
  • Claim Can generate code for TALFT
  • Work in progress
  • Naively translate a source-level language into
    TALFT
  • Show how to support register allocation and other
    optimizations.

41
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Formalizing Fault-tolerance Solutions with Typed
    Assembly Languages
  • TALFT Formalizing a hybrid transient fault
    solution
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Discussion Is this realistic?
  • Related and Future Work in Fault Tolerance
  • Other Recent Projects
  • Conclusions Goals

42
Model vs. Reality Faults
  • TALFT models a single fault to the register file
    or store buffer
  • Faults may occur anywhere ALU, control signals,
    sequential or combinational logic, instruction
    decoding,
  • Multiple faults may occur
  • Benefit Precise specification of faults under
    consideration
  • Allow arbitrary corruption
  • Many intra-instruction faults can be modeled by
    correct instruction execution followed by a fault
    to the destination register
  • Many others are likely to be caught by an
    eventual mismatch in the two computations

43
Model vs. Reality Program Inputs
  • All inputs need to be duplicated
  • Loads in concurrent applications
  • Can use a load queue similar to the store queue
  • Requires ldG to simultaneously put correct value
    in destination register and correct pair into
    load queue
  • Adhoc inputs random(), user inputs, etc
  • Would need to be cached for blue computation.
  • Creates window of vulnerability without
    hardware support

44
Model vs. Reality Performance Cost
  • Simulated execution of TALFT code to get a rough
    estimate of performance cost
  • TALFT code is 34 slower than the
    fault-intolerant baseline

45
TALFT Summary
  • TALFT is an assembly-level type system that
    captures invariants about redundant computations
    in a hybrid transient fault solution
  • Well-typed programs will always detect observable
    faults (relative to the fault model)
  • The results are applicable to real-world
    scenarios

46
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Formalizing Fault-tolerance Solutions with Typed
    Assembly Languages
  • TALFT Formalizing a hybrid transient fault
    solution
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Discussion Is this realistic?
  • Related and Future Work in Fault Tolerance
  • Other Recent Projects
  • Conclusions Goals

47
Formalizing Fault Tolerance Related Work
  • Reasoning about Control Flow in the Presence of
    Transient Faults To appear SAS 08
  • Existing software-only solutions can catch many
    faults
  • Takes a first step towards formalizing
    software-based control-flow fault detection
  • Requires reasoning after a control flow fault has
    occurred
  • Generalizes TALFT zap tags
  • Higher-level abstractions
  • Static Typing for a Faulty Lambda Calculus
    Walker et al. 06
  • Fault-Tolerant Voting in a Simply-Typed Lambda
    Calculus Elsman. 07

48
Formalizing Fault Tolerance Future Work
  • Explore other fault tolerance techniques and
    richer fault models
  • Look beyond fault detection, to reason about
    fault recovery
  • Develop more powerful methods for reasoning in
    the presence of transient faults

49
Roadmap
  • Transient Faults Issues and Existing Solutions
  • Formalizing Fault-tolerance Solutions with Typed
    Assembly Languages
  • TALFT Formalizing a hybrid transient fault
    solution
  • Formalizing the machine model
  • Designing the type system
  • Proving soundness
  • Discussion Is this realistic?
  • Related and Future Work in Fault Tolerance
  • Other Recent Projects
  • Conclusions Goals

50
Other Research Projects
  • Typed assembly languages for stacks
  • L. Jia, F. Perry, D. Walker, and N. Glew.
    Certifying Compilation for a Language with Stack
    Allocation. Logic in Computer Science (LICS),
    June 2005
  • F. Perry, C. Hawblitzel, and J. Chen. Simple
    and Flexible Stack Types. International Workshop
    on Aliasing, Confinement, and Ownership (IWACO),
    July 2007
  • J. Chen, C. Hawblitzel, F. Perry, M. Emmi, J.
    Condit, D. Coetzee and P. Pratikakis.
    Type-Preserving Compilation for Realistic
    Object-Oriented Compilers. Programming Language
    Design and Implementation (PLDI), to appear June
    2008
  • Dynamic verification of aliasing invariants
  • F. Perry, L. Jia, and D. Walker. Expressing
    Heap-shape Contracts in Linear Logic . Generative
    Programming and Component Engineering (GPCE),
    October 2006
  • Static deadlock detection using dataflow analysis
  • Identified over 100 confirmed concurrency bugs in
    Windows Vista

51
Conclusion and Goals
  • Typed Assembly Languages are well-suited for
    reasoning about solutions to transient faults
  • Future Goal Continuing to improve code
    reliability by
  • Collaborating with researchers in other fields to
    identify domain-specific issues that will benefit
    from formal reasoning
  • Developing new techniques for reasoning about
    program behavior
  • Implementing analyses in real compilers

52
  • Questions?

53
Extra Slides
54
Typed Assembly Languages for Stacks
  • Typing stacks is tricky
  • Stacks require frequent strong updates
  • To support real stack use need to allow stack
    locations to be aliased
  • Strong updates are unsound in the presence of
    aliasing
  • Insight use Linear Logic to express stack types
  • Certifying Compilation for a Language with Stack
    Allocation
  • L. Jia, Frances Spalding Perry, D. Walker, and
    N. Glew
  • IEEE Symposium on Logic in Computer Science
    (LICS), June 2005
  • Simple and Flexible Stack Types
  • Frances Perry, C. Hawblitzel, and J. Chen
    International Workshop on Aliasing, Confinement,
    and Ownership (IWACO), July 2007
  • Type-Preserving Compilation for Realistic
    Object-Oriented Compilers
  • J. Chen, C. Hawblitzel, Frances Perry, M. Emmi,
    J. Condit, D. Coetzee and P. Pratikakis
  • Programming Language Design and Implementation
    (PLDI), to appear June 2008

55
ESPC Static Lock Use Analysis
Program Analysis Group, Microsoft
  • Use static analysis to detect incorrect lock
    usage
  • Deadlock detection - infer global lock ordering
    and look for conflicts
  • Has found over 100 confirmed concurrency bugs in
    Windows Vista
  • Lessons from ESPC
  • There are times when its ok to sacrifice
    soundness and completeness
  • Analysis finds bugs you wouldnt find any other
    way
  • Programmers appreciate help with difficult
    problems

56
Related Work Similar Systems
  • Typed Assembly Language Morrisett et al. 98
  • Proof Carrying Code Necula Lee. 96
  • Information Flow Denning 78 Volpano et al.
    96
  • Control-Flow Integrity Abadi et al. 05
Write a Comment
User Comments (0)
About PowerShow.com