Title: FPGA Defect Tolerance: Impact of Granularity
1FPGA Defect Tolerance Impact of Granularity
- Anthony Yu Guy Lemieux
- December 14, 2005
2Outline
- Introduction and motivation
- Previous works
- New architectures
- Coarse-grain redundancy (CGR)
- Fine-grain redundancy (FGR)
- Experimentation Results
- Conclusions
3Introduction and Motivation
- Scaling introduces new types of defects
- Smaller feature sizes susceptible to smaller
defects - Expected results
- Defects per chip increases
- Chip yield declines
- FPGAs are mostly interconnect
- FPGAs must tolerate multiple interconnect defects
to improve yield (and )
4General Defect Tolerant Techniques
- Defect-tolerant techniques minimize impact (cost)
of manufacturing defects - FPGA defect-tolerance can be loosely categorized
into three classes - Software Redundancy use CAD tools to map around
the defects - Hardware Redundancy incorporate spare resources
to assist in defect correction (eg. Spare
row/column) - Run-time Redundancy protection against
transient faults such as SEUs (eg. TMR)
5Previous work 1 Xilinx
- Xilinxs Defect-Tolerant Approach
- Customer (knowingly) purchases less that
perfect parts - Customer gives Xilinx configuration bitstream
- Xilinx tests FPGA devices against bitstream
- Sells FPGA parts that appear perfect
- Defects avoid the bitstream
- Limitation
- Chips work only with given bitstream no changes!
6Previous work 2 Altera
- Alteras Defect-Tolerant Approach
- Customer purchases seemingly perfect parts
- Make defective resources inaccessible to user
- Coarse-grain architecture
- Spare row and column in array (like memories)
- Defective row/column must be bypassed
- Use the spare row/column instead
- Limitation
- Does not scale well (multiple defects)
7Objective
- Problem
- FPGA yield is on decline because of aggressive
technology scaling - Proposed Solutions
- Defect-tolerance through redundancy
- Important Objectives
- Interconnect defects important (dominates area)
- Tolerate multiple defects (future trend)
- Preserve timing (no timing re-verification)
- Fast correction time (production use)
- Understand the factors that influence yield
8Background
9Island-style FPGA
10Directional Switch Block
11Directional Switch Block
12Course-grain Redundancy (CGR)
13Coarse-grain Redundancy (CGR)
14Sowhats wrong with it?
15Improving yield for CGR Adding Multiple Global
Spares
- Add multiple global spare to traditional CGR
- Global spares can be used to repair any defective
row/column in the array - Wire extensions are now longer
16Yield Impact of Multiple Global Spares
17Increasing AreaDelay Overhead
MORE SPARES ? MORE MUX OVERHEAD IN EVERY SWITCH
ELEMENT
NO SPARES
2 GLOBAL SPARES
4 GLOBAL SPARES MAY BE IMPRACTICAL !!!
1 GLOBAL SPARE
18Improving yield for CGR Adding Multiple Local
Spares
- Divide FPGA into subdivisions
- Each subdivision has local spare(s)
- Distributes spares across chip
- Reduces mux area overhead(of Global scheme)
- Limitation
- Spare(s) can only repair defect within the
subdivision
19Yield Impact of Multiple Local Spares(not as
good as Global with same spares)
20Fine-grain Redundancy (FGR)
21Fine-grain Redundancy (FGR) Defect Avoidance by
Shifting
22Defect-tolerant Switch Block
23Switch Implementation Options
- Several detailed implementations are possible
- Trade off area / delay / yield(repairability)
24Minimum Fault-free Radius (MFFR)
25Experimentation Results
- Switch implementation
- Array size
- Wire length
- Area
- Summary
26Switch Implementation
Assumes all bridging defects
27Fixed Array Size (32x32) Global Sparing
28Fixed Array Size (32x32) Local Sparing
29Increasing Array Size
30Yield for Varying Wire Length
31Estimated Area overhead at equal yield (80)
CGR-G1 can only tolerate 1-2 defects
32Limitations of Study Architectures
- Logic and power/ground shorts were not considered
- Assumed that all defects are randomly distributed
- Assumed that all defects can be corrected with a
single row/column - Switch area was not accounted for our yield model
- Area results for CGR are approximated
33Conclusions
- CGR is effective for 1 or 2 defects
- FGR meets desired objectives
- Tolerates multiple randomly distributed defects
- Defect correction does not perturb timing
- Tolerates an increasing number of defects as
array size increases - Correction can be applied quickly
34Thank you!
35Summary
- As the density of FPGAs increase, they becoming
in susceptible to manufacturing defects - Fault-redundant techniques alleviate this growing
problem - Depending on the desired level of protection, we
can apply different techniques - At low defect rates, the spare row and column
approach has lower overhead than the fine-grain
approach - At large array sizes, the spare row and column
approach requires more area overhead to tolerate
the same number of defects as the fine-grain
approach