Title: Soft Errors: Tools and Interactions with Power Optimizations
1Soft Errors Tools andInteractions with Power
Optimizations
- Vijaykrishnan Narayanan
- Embedded and Mobile Computing Design Center
- The Pennsylvania State University
- Acknowledgment
- Vijay Degalahal, Rajaraman Ramanarayanan, Profs.
Kenan Unlu and Yuan Xie - This work was supported in part by grants from
National Science Foundation and Department of
Energy. All opinions expressed are those of the
author.
2Soft Error in Action
G
D
S
n
- - -
p substrate
B
3Problems caused by SEU
- Single event upsets can cause problems in
different ways - Change the data value in the caches and memory
11100 28 minute talk
- Corrupt the execution of instruction due the flip
of data in the pipeline registers.
- Change the configuration of a SRAM-Based FPGA
circuit. (Firm Error) - Cause glitches in combinational logic that can
propagate to state elements
4Logic SER The new menace
REG I STERS
REG I STERS
Particle strike
I1
A
1
I2
D
0
O1
I3
1
1
B
X
I4
1
E
0
I5
O2
I6
1
C
0
I7
Effect of electrical masking
Effect of logical masking
5Latch Window Masking
Faster clocks(Shallow pipelines) and reduced
capacitance and voltage make logic errors a
critical problem
6Why Care about Soft Errors ?
- SUN FIRE 15K Crash Mysteriously
- Its ridiculous. Ive got a 300,000 server that
doesnt work. The thing should be bullet-proof.
--- Forbes magazine, 2000
- All future designs that require highest
availability must counter unavoidable SEUs. - Cisco 12000 line cards may reset after single
event upset (SEU) failures. This field notice
highlights some of those failures, why they
occur, and what work arounds are available.
- Highest failure rate of all other reliability
mechanisms combined. - TI, Baumann, IRPS 2002
7Error Impact on System Operation
- Soft errors Not a new problem !
- J. Wallmark, S. Marcus, Minimum size and
- maximum packaging density of non-redundant
- semiconductor devices, In Proc. IRE, 50, 1962.
- Existing solutions employed for
space/military - applications consume more power,
- reduce manufacturability and severely
influence - performance
Challenge How to provision for error handling
within given performance, power and cost
constraints ?
8Accelerated Soft Error Testing
9Accelerated Testing Results
Toshiba chip - 4Mbits
Cypress Chip (2KX8 SRAM)
- Change in reactor power varies acceleration rate
- 107 neutrons/cm2-sec at 1 MW
- 106 neutrons/cm2-sec at 100 KW
- 360 particles/m2-sec natural radiation at
ground level
10SEATSoft Error Analysis Toolset
Device Level
Circuit Level
Logic Level
Block Error Prob.
Arch. Level
Application Error Prob.
11SEAT-DA Modeling Charge Collection
MCNP codes, using PTRAC card
n-Si Interaction
Reaction products, energies
Charge Deposition
TRIM/SRIM Codes
Electron-hole generation rate, and ions stopping
range
Charge Collection
Synopsys Davinci Device Simulator
Charge Collected, Current and Voltage transient
12Charge Collection at Different Supply Voltages
13SEAT - LA
14SEAT - LA
1000x speedup over circuit simulation Within 5
error margin of SER estimates
15Lowering Data Retention Voltage for Low Power
16Increasing Threshold Voltage for Leakage Reduction
HIGHER Qcritical is better
17Voltage Assignment for Low Leakage
1.6E-20
1.4E-20
1.2E-20
1E-20
Qcritical (C)
8E-21
6E-21
4E-21
2E-21
0
Low Vth Low Vth FF
Slow path ( 6 inverters)
Fast Path (3 inverters)
18Balancing Power, Performance, Reliability
Tradeoffs
- Not all functional units are active in all cycles
- Exploit idleness
- Switch off to save power
- Execute replicated computation to increase
reliability - EDF Energy-Delay-Fallibility product
19Conclusion
- Tools for quick and accurate estimation of soft
error rates are necessary - Soft error optimizations interact with
- Power
- Performance
- Area
- Proper choice of control knobs is critical for
multi-criteria optimizations - Combinational logic and on-chip networks will
require soft-error tolerant provisioning from
MPSOC designers - Soft errors under other stress conditions such as
thermal hotspots, supply noise fluctuations
requires further understanding