Title: Strategies for Post-Silicon Debug of Complex Integrated Circuits and Systems-on-Chip
1Strategies for Post-Silicon Debug of Complex
Integrated Circuits and Systems-on-Chip
- Brad Quinton,
- Dept. of Electrical and Computer Engineering,
- University of British Columbia
- Vancouver, BC
2Bugs, bugs, everywhere...!
- Intel Core 2 Duo
- 50 page Errata
- 75 known bugs
- IBM PowerPC 750GX
- 27 page errata
- 13 known bugs
- AMD Opteron
- 95 page errata
- 71 known bugs
- .... including the now infamous quad-core TLB
bug.
3Bugs, bugs, everywhere...!
- Data TLB Eviction Condition in the Middle of a
Cacheline Split Load Operation May Cause the
Processor to Hang. - An stfd of an uninitialized FPR can hang the
processor. - Multiprocessor Coherency Problem with Hardware
Prefetch Mechanism. - Short Nested Loops That Span Multiple 16-Byte
Boundaries May Cause a Machine Check Exception or
a System Hang. - ......and it goes on.
4The Culprit Design Complexity
- The complexity of IC design continues to increase
dramatically - Moores Law scaling enables an ever increasing
integration of functionality on a single chip - And, the demand for low power and low cost
devices continues to drive this integration
(iPods, cell phones, automotive, notebooks) - Multi-core processors, integrated memory
controllers, GPUs, ... everyone is making SoCs
5The Effect Costs and Risks ?
- The time and cost to go from device specification
to production release continues to increase - At the same time the risks involved are also
being magnified - It is possible to be 20 million into a project
and still not have a meaningful answer the
question When can we release this device...?
6Outline
- IC Development Overview
- The Need for Post-Silicon Debug
- Existing Debug Solutions
- Our New DFD Infrastructure
- Design for Debug Going Forward
7IC Development
Pre-Silicon
Post-Silicon
8IC Development
Pre-Silicon
Verification (complexity)
Post-Silicon
9IC Development
Pre-Silicon
Verification (complexity)
1.1 Million
Validation (complexity visibility)
Post-Silicon
10IC Development
Pre-Silicon
Verification (complexity)
Re-Spin
1.1 Million
Validation (complexity visibility)
Post-Silicon
11Validation A High Stakes Game
- The cost of validation escapes are enormous (the
Pentium FPDIV bug cost Intel 475 million) - However, time-to-market pressure is also hitting
its peak during the validation phase
12The Need for Post-Silicon Debug
13The device wont boot. Now what?!
- The validation process inevitably follows this
pattern - The first packaged device arrives in the lab
after manufacturing test is complete. - It is installed in a socket on a custom-designed
printed circuit board (PCB) the validation
board. - The validation engineer will power the device and
attempt to start running basic tests. - At some point the device will not behave as
expected. - The debug begins.
14Visibility is Key
- the validation process has the advantage of real
time operation and real world stimulus - unfortunately, it is severely hindered by the
lack of internal visibility and control - IC integration has only increased the problem by
moving busses and component interconnects inside
the device
15Existing Solutions
16Existing Solutions
- Software-Based - software monitor routines and
processor-specific hardware allow some visibility - Test Feature-Based - the design-for-test (DFT)
structures are re-purposed for functional debug - In-Circuit Emulation - a special bond-out
version of the device is created that mirrors key
internal signals on external device pins - On-chip Emulation - dedicated debug logic runs in
parallel to the normal device logic
17Existing Solutions
- Software-Based - software monitor routines and
processor-specific hardware allow some visibility - Test Feature-Based - the design-for-test (DFT)
structures are re-purposed for functional debug - In-Circuit Emulation - a special bond-out
version of the device is created that mirrors key
internal signals on external device pins - On-chip Emulation - dedicated debug logic runs in
parallel to the normal device logic
Our solution.
18On-chip Emulation
- On-chip Emulation solves many of the problems
with other methodologies - Dedicated circuits have little or no impact on
the normal behaviour of the device - The internal observability can be extended beyond
the state of the software - The debug logic can run at high-speeds without
the requirement of high-speed I/O
19Emerging Implementations
- IBM Cell BE - Trace Logic Analyzer (TLA) for
storing and viewing internal signals. - IEEE
TVLSI 2007 - AMD Opteron - HyperTransport Trace Buffer (HTTB)
for observation of inter-core and inter-device
transactions. - IEEE DT 2007 - DAFCA ClearBlue - Proprietary DFD infrastructure
targeting SoCs. - DAC 2006
20Our Proposal
21Our Proposal
- The existing solutions are ad-hoc and design
specific - We are interested in a more universal solution
- To do this we use programmable logic
- This provides the flexibility needed to extend
debug throughout the SoC
22Reconfigurable DFD
23Reconfigurable DFD
Design for Debug
programmable logic a reconfigurable network
24Framework
Programmable Logic Cores (PLCs) are embedded
blocks of reconfigurable logic embedded FPGAs
25High-level Architecture
26High-level Architecture
- Observability
- Select signals using the network
- Process these signals with the PLC (triggers,
compression..) - Return the test results
27High-level Architecture
- Signal Control
- Create circuits in the PLC that interact with the
device - Selectively override signals using the network
- Observe results
28High-level Architecture
- Error Detect/Correct
- Interrupt block output signals
- Manipulate these signals using the PLC logic
- Create new device behaviour
29Our Proposal - Key Advantages
- Enables the debug of arbitrary digital logic in
the SoC. - Allows for a reconfigurable, scenario-specific
triggering, event filtering and trace
compression. - Facilitates the detection and potential
correction of design errors during normal
operation.
30Key Challenges
- Network Topology
- Network Implementation
- Programmable Logic Interface
- Overall Area
31Results
32Network Topology
- We have developed a unique network topology that
leverages the programmability of the PLC to
reduce network costs - complete debug node selection flexibility
- 50 of the area and 50 of the depth previous
topologies - hierarchical construction appropriate for ICs
- Details
- B.R. Quinton and Steven J.E. Wilton,
Concentrator Access Networks for Programmable
Logic Cores on SoCs, IEEE International
Symposium on Circuits and Systems, Kobe, Japan,
May 2005. - B.R. Quinton, S.J.E. Wilton, Post-Silicon Debug
Using Programmable Logic Cores, Proceedings of
the IEEE International Conference on
Field-Programmable Technology, Singapore, pp.
241-247, December 2005.
33Network Implementation
- We have developed and evaluated two network
implementations synchronous and asynchronous,
each achieves high throughput - synchronous networks up to 830 MHz in 90nm
technology - asynchronous networks up to 910 MHz in 90nm
technology - reasonable area costs
- Details
- B.R. Quinton, M.R. Greenstreet, S.J.E. Wilton,
Practical Asynchronous Interconnect Network
Design, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 16, no. 5, pp.
579-588, May 2008.
34Programmable Logic Interface
- We have developed new programmable structures
that are integrated into a regular programmable
logic fabric to significantly increase the
throughput of interface circuits - system bus interfaces up to 254 MHz in 180nm
- regular synchronous interfaces up to 694 MHz in
180nm - very small area increase in the PLC ( less than
0.4) - limited impact on the PLC interconnect
- Details
- B.R. Quinton, S.J.E. Wilton, "Embedded
Programmable Logic Core Enhancements for System
Bus Interfaces", Proceedings of the International
Conference on Field-Programmable Logic and
Applications, Amsterdam, pp. 202-209, August
2007. - B.R. Quinton and Steven J.E. Wilton,
Programmable Logic Core Enhancements for High
Speed On-Chip Interfaces, accepted for
publication in IEEE Transactions on VLSI, 2009.
35Area Overhead
- To understand the area overhead of our scheme for
a range of ICs we created a set of parameterized
models - We used a 90nm standard cell process
- We targeted the 90nm IBM/Xilinx PLC with a
capacity of approximately 10,000 ASIC gates - The network was implemented using standard cells
- All area numbers are post-synthesis, but
pre-layout
36Area Overhead
37Area Overhead
- 20M gate device, 7200 signals for 5 overhead
38Design for Debug Going Forward
39On-Going Research
- How do we select the correct debug nodes?
- How do we judge the quality of our debug
infrastructure before it is used? What is the
coverage metric? - How do we integrate hardware DFD with software
debug? - How do we integrate DFD with DFT? What is the
overlap? Can this reduce costs? - Can we use software algorithms to infer the value
of nodes that have not been directly observed?
40End.