Methodology to Compute Architectural Vulnerability Factors - PowerPoint PPT Presentation

About This Presentation
Title:

Methodology to Compute Architectural Vulnerability Factors

Description:

FIT = Failure in Time = 1 failure in a billion hours ... benign fault. no error * We only focus on SDC FIT. Architectural Vulnerability Factor (AVF) ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 30
Provided by: ctwe
Category:

less

Transcript and Presenter's Notes

Title: Methodology to Compute Architectural Vulnerability Factors


1
Methodology to Compute ArchitecturalVulnerability
Factors
  • Chris Weaver1, 2
  • Shubhendu S. Mukherjee1
  • Joel Emer 1
  • Steven K. Reinhardt1, 2
  • Todd Austin2
  • 1Fault Aware Computing Technology (FACT), VSSAD,
    Intel
  • 2University of Michigan

2
Overview
  • Background
  • Previous reliability estimation methodology
  • Proposed methodology for early reliability
    estimates
  • Sample analysis
  • Conclusion

3
Strike Changes State
0
1
4
Failure Rate Definitions
  • Interval-based
  • MTBF Mean Time Between Failures
  • Rate-based
  • FIT Failure in Time 1 failure in a billion
    hours
  • 1 year MTBF 109 / (24 365) FIT 114,155 FIT
  • Additive

Cache 0 FIT
IQ 114K FIT

FU 114K FIT
Total of 228K FIT
5
Motivation
6
Results of precise early analysis
  • If we meet goal
  • we are done
  • If we dont meet goal
  • add error protection schemes

7
Objectives
  • Determine which bits matter
  • Compute FIT rate

8
Strike on state bit
Bit Read
no
yes
Bit has error protection
benign fault no error
no
yes
yes
Does bit matter?
Error is only detected (e.g., parity no
recovery)
Error can be corrected (e.g, ECC)
yes
no
benign fault no error
Detected, but unrecoverable error (DUE)
no error
We only focus on SDC FIT
9
Architectural Vulnerability Factor (AVF)
  • AVFbit Probability Bit Matters
  • of Visible Errors
  • of Bit Flips from Particle Strikes

FITbit intrinsic FITbit AVFbit
10
Previous AVF Methodology
  • Statistical Fault Injection with RTL

Simulate Strike on Latch
Logic
0
1
output
0
Does Fault Propagate to Architectural State
11
Characteristics of SFI with RTL
  • Naturally characterizes all logical structures
  • RTL not till late in the design cycle
  • Numerous experiments to flip all bits
  • Generally done at the chip level
  • Limited structural insight

12
Objectives
  • Determine which bits matter
  • Earlier in the design cycle
  • With fewer experiments
  • At the structural-level
  • Compute FIT rate
  • Intrinsic FIT per bit
  • Architectural Vulnerability Factor

13
Our Analysis Which bits matter?
  • Branch Predictor
  • Doesnt matter at all (AVF 0)
  • Program Counter
  • Almost always matters (AVF 100)

14
Architecturally Correct Execution (ACE)
Program Input
Program Outputs
  • ACE path requires only a subset of values to flow
    correctly through the programs data flow graph
    (and the machine)
  • Anything else (un-ACE path) can be derated away

15
Example of un-ACE instruction Dynamically Dead
Instruction
Dynamically Dead Instruction
Most bits of an un-ACE instruction do not affect
program output
16
Dynamic Instruction Breakdown
Average across all of Spec2K slices
17
Mapping ACE un-ACE Instructions to the
Instruction Queue
ACEInst
Architectural un-ACE
Micro-architectural un-ACE
18
Vulnerability of a structure
  • AVF fraction of cycles a bit contains ACE
    state

19
Littles Law for ACEs

20
Computing AVF
  • Our approach is conservative
  • We assume every bit is ACE unless proven
    otherwise
  • Data Analysis
  • Try to prove that data held in a structure is
    un-ACE
  • Timing Analysis
  • Tracks the time this data spent in the structure

21
Computing FIT rate of a Chip
  • Total FIT ? (FIT per biti X of bitsi X AVFi)

Intrinsic FIT per bit from externally published
data
22
ResultsExperimental Setup
  • Used ASIM modeling infrastructure
  • Model of a Itanium2-like processor
  • Ran all Spec2K benchmarks
  • Compiled with highest level of optimization with
    the Intel electron compiler
  • Simulated under a full OS
  • Simulation points chosen using SimPoint (Sherwood
    et al)

23
Instruction Queue
ACE percentage AVF 29
24
Functional Units
ACE percentage AVF 9
25
Computing FIT rate of Chip

Intrinsic FIT per bit from externally published
data
26
Summary
  • Determine which bits matter
  • ACE (Architecturally Correction Execution)
  • Compute FIT rate
  • Intrinsic FIT per bit
  • AVF (Architectural Vulnerability Factor)

27
Questions?
28
Statistical Fault Injection (SFI)
  • Algorithm
  • Find a statistically significant set of bits
  • Randomly select a bit
  • Flip the bit
  • Run two simulations one with bit flip and one
    without bit flip
  • Run for pre-defined cycles
  • Compare architectural state of two simulations
    (e.g., register file)
  • If mismatch, declare an error
  • Repeat algorithm with different bit flip
  • AVF mismatches observed / total experiments
  • Used widely
  • has provided useful AVF numbers till date

29
SFI vs. ACE analysis
Write a Comment
User Comments (0)
About PowerShow.com