Title: Soft-errors Modeling and evaluation in caches
1Soft-errors Modeling and evaluation in caches
- Reliable low power design
- Lin Li Vijay Degalahal
- Fall 2003, CSE-598C
2Outline
- Introduction
- Soft Errors
- Low Power Cache design
- Methodology
- Hspice
- Simple scalar
- Results
- Conclusions
3Soft Errors
- Soft errors or transient errors are circuit
errors caused due to excess charge carriers
induced primarily by external radiations - These errors cause an upset event but the circuit
it self is not damaged.
4Power Vs Soft Errors
- SER a Nflux CSexp (-Qcritical /Qs) Hazucha,
2000 - Nflux- Neutron Flux
- CS- Cross Sectional area
- Qcritical Critical charge necessary for a Bit
Flip - Qs Charge Collection Efficiency
Q CV, For a given process voltage has
exponential dependency
5Power
- Cache dominant sources of power consumption
- Occupy a lot of area and not actively used
- Considered two schemes
- Cache Decay
- Drowsy Cache
6Cache Decay
turning off cache lines when they hold data not
likely to be reused.
7Cache Decay
- Use the generational rate to determine turn off
interval - Disadvantage
- Lose data, if prediction wrong
8Drowsy Cache
- Reduce Cache VDD periodically
- Just one cycle access penalty
- No data lost
- Simple as uses just one global counter
9Methodology
- Hspice
- Usual prodecure
- Inject current at a node and see for inversion
- Long time, and boring to see waveforms
10Methodology
- Hspice
- Developed a script
- Good for memory based elements
- Swipe the current
- Fast more accurate
.param spike 0 Iio1 io1 vdd exp (0 spike
0.435ns 0.001ns 0.436ns 0.001ns) Vio1 io1 .TRANS
100ps 30ns sweep (spike -10mamps -43.5mamps
-1mamps) nvth 0.2v -0.2v -0.01v pvth -0.2v
0.2v 0.05v .measure avgflipcur avg i(Vio1) from
435ps to 442ps .measure minv min v(q) from
1.6ns to 1.8ns .measure minv2 min v(out2) from
10ps to 500ps .measure tran delay trig V(in)
val 0.5 fall 2 targ V(out) val 0.5 fall3
.measure tran delay2 trig V(out) val 0.5 fall
3 targ V(out2) val 0.5 fall3
11Leakage Vs Soft Error
Drowsy Caches
12Methodology
- Soft error injection in SimpleScaler
- Random variable R1 ? 0, 1
- Every cycle, generate a random number
- Indicate which kind of soft error happens
0
1E-7
1E-8
1E-9
1
Normal Low Voltage R/W
Single-bit Error 1E-7 1E-6 5E-7
Double-bit Error 1E-8 1E-7 5E-8
Multi-bit Error 1E-9 1E-8 5E-9
Soft Error Rate (per cycle)
13Methodology
- Soft error injection in SimpleScaler
- Random variable R2 ? 0, set - 1
- Random variable R3 ? 0, way - 1
- Random variable R4 ? 0, bit - 1
14Injected Soft Errors
Original Cache Original Cache Original Cache Drowsy Cache Drowsy Cache Drowsy Cache
1-bit error 2-bit error Multi-bit error 1-bit error 2-bit error Multi-bit error
gzip 39 4 0 299 34 3
gcc 65 6 0 540 54 5
mcf 112 13 1 966 100 12
perlbmk 60 5 0 494 50 4
gap 36 4 0 281 34 3
vortex 82 10 1 662 69 11
bzip2 43 4 0 328 38 4
twolf 76 8 1 639 58 8
Soft error in drowsy cache increases
significantly.
More powerful/effective error detection/correction
schemes are needed.
15Category of Soft Errors
- Injected soft errors
- On a invalid cache blocks
- On a valid cache blocks
- Read into processor (1)
- Overwritten by new data
- Replaced by new cache block, if block is clean
- Written back to L2 cache, if block is dirty (2)
We call (1) and (2) as effective errors, that
propagate errors to pipeline or L2 cache.
16Effective Soft Errors
Original Cache Original Cache Original Cache Decay Cache Decay Cache Decay Cache
1-bit error 2-bit error Multi-bit error 1-bit error 2-bit error Multi-bit error
gzip 22 3 0 3 0 0
gcc 38 4 0 13 1 0
mcf 53 8 0 23 2 0
perlbmk 32 5 0 19 2 0
gap 32 3 0 7 0 0
vortex 38 6 0 16 4 0
bzip2 5 1 0 2 0 0
twolf 31 2 0 17 0 0
Decay cache turns off cache lines if they are not
accessed for a long period of time.It saves
leakage energy and increases reliability.
17Outcome of Soft Errors
In normal cache and drowsy cache, errors being
replaced and written back to L2 are dominant. In
decay cache, invalid soft error is dominant.
18Happen Time of Soft Errors
Cumulative Errors
gzip
Cycles
Time gap between the time when cache block is
loaded into L1 cache and the time when soft error
happens. The longer the block is in cache, the
high probability it is hit by soft error.
19Influence of Bit Interleaving
w/o Interleaving w/o Interleaving w/o Interleaving w/ Interleaving w/ Interleaving w/ Interleaving
1-bit error 2-bit error Multi-bit error 1-bit error 2-bit error Multi-bit error
gzip 299 34 3 372 0 0
gcc 540 54 5 653 1 0
mcf 966 100 12 1192 2 0
perlbmk 494 50 4 551 6 1
gap 281 34 3 351 4 1
vortex 662 69 11 823 3 0
bzip2 328 38 4 408 0 0
twolf 639 58 8 769 0 0
Drowsy cache
Convert a multi-bit error into several single-bit
errors.
20Influence of EDC/ECC
Parity Parity SEC-DED SEC-DED DEC DEC
Read WB Read WB Read WB
gzip 1 152 0 17 0 1
gcc 15 266 0 32 0 2
mcf 0 392 0 39 0 4
perlbmk 10 258 0 34 0 2
gap 3 285 0 36 0 3
vortex 18 274 0 28 0 3
bzip2 1 60 0 4 0 1
twolf 1 259 0 23 0 4
Failure errors in Drowsy cache
Clean block failure errors undetectable
errorsDirty block failure errors undetectable
errors uncorrectable errorsError protection is
very important for dirty blocks in write-back
cache.
21Conclusions
- Drowsy cache incurs a significant increase of
soft errors. - Decay cache can reduce the amount of effective
errors. - Bit interleaving can alleviate the impact of
multi-bits soft errors. - Dirty cache blocks in write-back cache are more
vulnerable.