Title: Diagnosing Faults in CLB Array
1Diagnosing Faults in CLB Array
- The target diagnosis here is performed by means
of locating the faulty CLB. Some believe, that
more effort targeting the diagnosis of the
precise faulty point in CLB ( MUX, LUT,
connection, etc) is not required, in this step of
the research. - Since it does not matter which part of the CLB is
faulty if the entire faulty CLB will not be used.
Until now, almost all fault tolerance methods
proposed to dispose the entire faulty CLB.
2Diagnosing Faulty CLB Using the Programmability
- Almost all strategies which were proposed for
detecting faults in CLB resources are improved
later for diagnosing faults. - BIST Approach improved for diagnosis faults
- The main concept behind this improvement is the
use of the regularity of the FPGA chip. - As shown in this diagram, the FPGA is diagnosed
in four sessions ( NS, SN, WE,EW). - In each session, one CLB row is programmed as the
test pattern generator denoted in the figure (
TPG).
3Diagnosing Faulty CLB Using the Programmability
- Some CLB rows are under testing and are denoted
in the figure (Bus), while other CLB rows are
programmed as output response analyzers and are
denoted in the figure (as ORA). Note that after
the two sessions NS and SN, all CLB rows are
covered by the test. Therefore, after these two
sessions, we can determine which row is faulty
but we cannot determine the exact position of the
faulty CLB. Some scientist suggest turning the
chip by 90 degrees and applying the same strategy
to the columns ( session EW and session WE).
Therefore, the faulty CLB column will appear
after the completion of the two sessions (EW and
WE) - If we realize the four sessions, we can determine
the faulty CLB row and the faulty CLB column.
Therefore, the position of the faulty CLB will be
deduced.
4- Universal Testing Approach
- This approach is achieved with low complexity.
Disadvantage the test time is long in some
cases - Array Based Approach
- Can locate the fault by applying the test
strategy twice once for the chip in normal
position and the second time for the chip rotated
by 90 degrees ( device is symmetric) - Disadvantage the diagnosis time is twice the
testing time.
5- I Approach
- Can directly detect and diagnose faults since
each CLB under testing is observed directly from
off the chip. - No additional time is required
- The same method proposed for detecting faults,
diagnoses faults as well. - However this method is inherently slow compared
to the other methods.
6Diagnosing the Faulty CLB Using Design for
Testability
- Primary concern How to improve the actual FPGA
design, in order to make the fault diagnosis
easy. - Not much headway has been made in this research
area due to the fact that in order to improve the
FPGA design, knowledge of the actual structure of
the FPGA is required. - Two main proposals in this area
- A modified scan procedure to sequentially test
every module in FPGA can be used for diagnosing
faults if the rows and columns of CLBs are used - Diagnosing faults in CLBs by shifting of the
configuration data - The idea is to develop an algorithm for shifting
the configuration data with the aim of diagnosis.
The algorithm consists of shifting the data row
by row and column by column. The row by row
shifting diagnoses the faulty row while shifting
column by column diagnoses the faulty column.
Thus the diagnosis of the faulty CLB is achieved.
7Diagnosing Faults in Interconnect Resources
- Fault diagnosis in interconnects may require a
long time since interconnect resources are very
complex. Diagnosing faults is always more
difficult that detecting faults - Fault Diagnosis in Interconnect Resources Using
the Programmability - Two ways to diagnose faults
- BIST
- non-BIST
- Both of these methods were proposed for
detecting faults in interconnect resources and
here they are improved for diagnosing faults. - Main difference between detecting faults and
diagnosing faults no of configurations.
Diagnosing faults require large number of
configurations
8Diagnosing Faults in Interconnect Resources
- Many scientist have proposed different methods to
minimize the number of configurations required at
the expense of fault coverage. - We must maintain a balance between the number of
configurations and the fault coverage ( or the
FPGA model generalization)
9Faults Diagnosis in Interconnect Resources Using
the Design for Testability
- Some of the research proposed in this area
require regular distribution of the interconnect
resources. However the actual design of FPGA chip
on the market are not symmetric - Altera FPGA interconnect resources are
concentrated on the center since more
functionality is in the middle of the chip - Xilinx FPGA more interconnect resources are at
the border since more functionality exists at the
border - This area holds potential future research topics,
Since today the concept of embedded design is
emerging, and then designers will introduce
highly complex FPGAs, necessitating the
integration of the testing on the chip
10DEFECT AND FAULT TOLERANT FPGA
- Defect Tolerant FPGA
- Defect Tolerant means the problem of tolerating
defects occurring during the fabrication of the
chip - presents problems from the manufacture side
- Fault Tolerant FPGA
- Means the problem of tolerating faults happening
within the usage of the chip - presents problems from the user side
- Defect-Tolerant FPGA
- Manufactures are still searching for more
reliable chips of low cost( area, hardware,
complexity, delay, etc) and high-yield
improvement. - General goal after detecting and locating a
defect in the chip, instead of throwing out the
entire chip, only the defective CLB, or wire
should be isolated and avoided without
compromising the original performance - Fault detection and diagnosis
- CLB defects
- Interconnect defects
11Tolerating Defective CLB
- Since the FPGA is constructed of 2-D arrays of
identical CLBs, defective CLB can be avoided by
remapping the users application data around it
using spare or other unused resources. This
solution may fit will for FPGA having flexible
interconnect resources. - Disadvantage likely to generate significant
delay after remapping the user application data,
especially in the case of FPGA with limited
interconnect resources - One deviation from this method benefits from the
regular array of FPGA by using a spare column of
CLBs or one column and one row of CLBs. - Though this method improves delay, a defect
within a CLB will result in a whole row or column
of CLB being unused. This obviously will affect
the yield enhancement. To remedy this
short-coming, scientist proposed a fast
reconfiguration as a key mechanism of obtaining a
significant increase in yield - Utilization of laser technique this is another
approach to defect tolerance. This idea is based
on the addition of one grid of defect avoidance
buses to the original FPGA interconnect
resources. When a defect is detected and located,
the defective cell is avoided by using additional
buses. Disadvantage additional hardware
overhead and delay caused by additional switches.
12- Node-Covering Technique achieves defect
tolerance by giving the possibility to each node
( CLB) to cover its neighbor in the row. The
defective CLB is avoided by reconfiguration
around it using the laser-burned fuses. The
defect is transparent to the user. It means that
the user configuration data which is loaded into
the FPGA remains the same, independent of whether
the chip is defect-free or not. The SRAM
corresponding to CLB and that of interconnect
resources are assumed to be separate. The figure
shows an example of the SRAM structure
corresponding to CLB in two rows of simplified
FPGA model. An additional multiplexer is added to
the SRAM corresponding to each CLB. When a fault
occurs, the data is shifted by one CLB to the
right and the multiplexer corresponding to the
defective CLB is activated so that this CLB is
avoided.
13(No Transcript)
14- The data corresponding to the interconnect
resources prevent originally the defect by using
the same principle. - Every segment covers its neighbor and the last
segment in the channel is covered by the
reservation of one supplementary channel segment.
- Figure (a) shows an example of the implementation
of defect-tolerant routing of nets using a cover
segment, and Figure b shows the reconfiguration
around the defect. - Advantage high yield improvement with moderate
cost - Disadvantage
- susceptible to failures in the case of some
FPGAs, which contain various types of links in
their interconnect network. - Hardware overhead is high
15(No Transcript)
16Shifting Approach
- This approach achieves defect tolerance by
shifting the configuration data on the chip. - The following two figures show an example of the
design and its ability to shift data on a chip.
In this example the user data is shifted by one
row or column (top, down and right) of CLB - Two distribution of spare CLB are proposed (chess
game) - King shifting
- Horse allocation
- When a defect occurs, the data is shifted in the
corresponding direction so that the defect is
avoided. The king distribution requires
eight-shifting directions and the horse
distribution requires four directions. - Main problem in this method is the possibility
it may fail if if a defect occurs in one memory
cell however it can be improved by tolerating
defects and faults of the SRAM part separately
17Shifting Approach
18(No Transcript)
19Shifting the data in 8 directions with King
Shifting distribution
20Tolerating Interconnect Defects
- When a defect occurs, it can be easily avoided
using computer-aided design (CAD) tools with less
delay than that when a defect occurs in CLB. - Some scientists suggest node covering method for
interconnect resources. Yet they are not very
attractive solution for interconnect resources.
21Fault Tolerant FPGA
- Today FPGA devices are not fault tolerant since
- Manufactures do not have any particular cost
benefits - Fault tolerant can be solved partially on board
or system levels - Solution is based on
- Chip Level
- node covering method, CAD tools method, laser
techniques - Unfortunately, these methods present several
problems - When a fault occurs, a user must contact the
manufacture customer perpetually dependent on
the manufactures. - Board or System level
- Recommended for fault tolerance since the methods
based on chip level are complex and expensive
22Board or System level Fault Tolerance
- Some scientists proposed a solving the problem at
the board level using low overhead approach. This
approach achieves fault tolerance by partitioning
FPGA in several tiles, within each tile, some
CLBs are used as spares. Consequently when a
fault is detected in one tile, only the concerned
tile is reconfigured using a partial
reconfiguration so that the fault is avoided. - For instance, consider a Boolean function Y
(AB) (CD), implemented a tile containing 4 CLBs
and this configuration has one spare CLB. Upon
detecting a fault, an alternate tile
configuration is activated. This concept is shown
in the following figure.
23(No Transcript)
24Fault Detection, Diagnosis, and Defect/Fault
Tolerance in NEW FGPA Generations
- Two types of FPGA
- Static FPGA
- Dynamically reconfigurable FPGA (DRFPGA)
emerging market - The structure that we studied earlier is a static
FPGA - Todays market is geared towards DRFPGA. Thus can
we apply the fault detection, diagnosis, defect
and fault tolerant approaches that we studied
earlier for static FPGA to DRFPGA? - To answer this question we must first study the
DRFPGA structure.
25Structure of the new FPGA Generations
- Two types of DRFPGA
- Partially reconfigurable FPGA
- This type permits reconfiguration of some logic
blocks and wire segments, while some other
programmable hardware is busy in the functional
mode. - Switch context FPGA
- This concept as a whole is still in the research
stage. This type of FPGA can change from one
context to another in only one clock period. This
means that the users can make several designs in
the multiple configurations store, each one in
one configuration memory. Then the user can shift
from one design to another in one clock period
26Structure of dynamically re-configurable FPGA
27Detecting and Diagnosing Faults in the New FPGA
- The new FPGA generations do not introduce any new
components, requiring new fault model. Thus the
fault model is same as the one adopted in static
FPGA - As explained earlier, there exist methods that is
based on programmability of the FPGA and others
based on the modification of the FPGA structure
with the goal of testing or diagnosing. - Though, we can use the methods based on the FPGA
programmability for DRFPGA, we cannot directly
use them most of them would require changes. The
following table reflects the difficulties of
these changes
28Tolerating Faults and Tolerating Defects in the
New FPGA
- Approaches designed for static FPGA are generally
difficult to be adopted for the new FPGAs. Since
these approaches are based on the exact knowledge
of the FPGA structure details. - Scientists believe, that each approach requires
some changes. The following table reflects the
difficulties of these approaches
29Conclusion and Future Research Directions
- The SRAM-Based FPGA presents several advantages
- Re-configurability, which provides the
flexibility to implement several designs in a
single FPAG without changing the hardware - Approaches for testing and fault tolerance were
introduced based on this re-configurability
feature - Fault Detection most of them are based on the
re-configurability feature - Diagnosis most of them are an improved version
of the fault detection methods - Defect and Fault Tolerance holds less interest
than testing. More studies are available for
defect tolerance than fault tolerance - Other FPGA structures in addition to the 2-D
array of CLBs, other structures of SRAM-Based
FPGAs such as hierarchical and dynamically
reconfigurable are available
30- Fault Model The academic research must be
performed in conjunction with the manufactures
and researchers must collect more information
about the structures of the FPGA - Fault Detection and Diagnosis until now there
has been no research targeting the fault
detection or the diagnosis of the entire FPGA
chip area. Research studies until now have
treated fault detection and diagnosis separately. - On line Testing This is a very difficult issue
within the framework of fault tolerance. It will
be useful if we know how to achieve on-line
testing for a standard 2-D FPGA