Title: CostEfficient Soft Error Protection for Embedded Microprocessors
1Cost-Efficient Soft Error Protection for Embedded
Microprocessors
- Jason Blome1, Shuguang Feng1, Shantanu Gupta1,
Scott Mahlke1, Daryl Bradley2 - University of Michigan1
- ARM, Ltd. 2
2The Soft Error Problem
1
3Fault Masking
- Logical faulted value does not affect logical
operation of the circuit
- Architectural/Software incorrect state is
written before it is read
- Latching-Window the fault pulse does not reach a
state element within the latching window
- Electrical the fault pulse is electrically
attenuated by subsequent gates in the circuit
mov r2, 4
mov r5, 8
4
add r6, r2, r5
9
8
4Soft Error Rate Trends
Soft Error Rate Contributions
Mitra 2005
Shivakumar 2002
Increasing contribution of faults in
combinational logic to the overall soft error rate
5Outline
- Soft error analysis setup
- Summary of fault analysis results
- Fault tolerance techniques
- Register value cache
- Strategic deployment of fault detectors
- Conclusion
6Fault Analysis Framework
7Observed Error Rates
Faults Occurring in Registers
Faults Occurring in Combinational Logic
At the software interface, error rates within 3
8Impact of Fault Injection
9Targeting the Faults that Count
- ARM926EJ-S register file consumes 8.7 of total
core area - Responsible for 57.4 of architectural errors
- Register file area dominated by combinational
logic - ECC cost, efficacy?
10The Register Value Cache
Register File
0
1
Read/Write Addr/Data
2
decoder
3
Read Result
4
5
Register Value Cache
0
CMP
1
x
2
Stall/ Check CRC
3
CMP
4
x
5
CMP
11The Register Value Cache
Index Array
Valid
Value Array
Read Data
Read/Write Addr
Previous Read Values
Write Data
CRC
CMP
Error
CRC
Write Data
Read Operation
Write Operation
Check Operation
Error
12Example
Register File
-
0
-
mov r2, 4
mov r2, 4
1
4
-
4
2
-
decoder
3
0
mov r5, 8
mov r5, 8
-
8
4
-
5
add r3, r1, r4
add r3, r2, r5
Register Cache
-
0
-
-
1
-
4
x
-
2
-
-
3
-
8
4
x
5
13RVC Fault Coverage
57.4
14RVC Overhead
15What About the Rest?
- Leverage fault fanout to place detectors at
likely targets
16Fault Fanout
17Transient Fault Detector
D
Main Flip-Flop
Main Flip-Flop
Q
CLK
Shadow Latch
Shadow Latch
Error
Delay
A Self-Tuning DVS Processor Using Delay-Error
Detection and Correction S. Das 2006
18Glitch Detector Coverage
Power
Area
Coverage
Coverage
Percent Overhead
Percent Overhead
19Combined Technique Coverage
Power
Area
Coverage
Coverage
Percent Overhead
Percent Overhead
20Conclusion
- Circuit level soft error analysis offers
significant insight - Faults in combinational logic do not require
structural duplication - Coverage versus cost tradeoffs available
- Significant benefits in compromise
- 85 fault coverage for only 5.5 area
- 2-3x increase in MTTF
21Questions?
22RVC Hit Rates