Title: Sadik Ezer and Scott Johnson
1Smart Diagnostics for Configurable Processor
Verification
- Sadik Ezer and Scott Johnson
2Processor Verification Challenges
complexity
time to market
design
verification
3Simulation Based Verification
Test Vectors
Assertions
Transactors
Protocol Checkers
Assembly
Co-simulation
C/C
DUT
SystemC
Monitors
HDL
Functional Coverage
Vera
e
INSTIGATORS
CHECKERS
DESIGN
4Stimuli for Processor Verification
DUT
Inbound Request
Diagnostics
B U S
Memory
Responses
- Memory Busy
- Memory Errors
Asynchronous Events
5Ways to Generate External Events
- Magic stores in the assembly code to generate
- Interrupts and Stalls
- Bus Errors and Bus Delays
- Memory Busy and Memory Errors
- Inbound Bus Requests
- Changes the test flow store instructions are
needed - A text file that specify at what cycles to
generate the different types of external events - Have synchronization problems
- Results in non-maintainable tests
6Embedded Testbench Control
- Decoupled assembly diagnostics and test-bench
infrastructure cannot address - Synchronization of instruction execution and
external events - Controlling test-bench functions inside the
assembly test - Checking based on the internal processor states
or I/O - Using coverage feedback during simulation
- Embedded Testbench Control (ETC) is a methodology
that addresses all of the above - Smart Diagnostics are assembly tests that use
this approach
7Smart Diagnostics
- The task that calls test-bench functions is
embedded inside the assembly code, and
synchronization with instruction execution is
achieved through special syntax
Special Syntax _at_ test-bench code synchronized
with instruction execution used for the
definition of the test-bench functions
8Assembly and Test-bench Synchronization
- Finding the commit (W) stage PC where the
test-bench task will be executed
Assembly
Disassembly
ETC_LABEL0 INSTR_A ETC_LABEL1
INSTR_B
40000854 ltETC_LABEL0gt 40000854
INSTR_A 40000a56 ltETC_LABEL1gt 40000a56
INSTR_B
Smart Diagnostic
Compile and Link
INSTR_A _at_ my_func0 INSTR_B _at_ my_func1
ETC Pre-processor
Parse Disassembly
Read_labels(labels.txt) case(PC_W)
label0 my_func0 label1
my_func1
00000002 // count of labels 40000854 //
Label0 40000a56 // Label1
Labels.txt
Testbench Code
9Xtensa LX Configurable Processor
Instruction Fetch / Decode
Processor Controls
Trace/JTAG/OCD
Designer-defined FLIX parallel execution
pipelines - N wide
Base ISA Execution Pipeline
Interrupts, Breakpoints, Timers
. . . . .
Designer Defined Execution Units, Register Files
and Interfaces
Designer Defined Execution Units, Register Files
and Interfaces
Register File
LocalInstruction Memories
Base ALU
. . .
Optional Execution Units
Processor Interface (PIF) to System Bus
External Bus Interface
Designer Defined Queues / Ports up to 1M Pins
Designer Defined Execution Unit
Local Data Memories
Base ISA Feature
Vectra LXDSP Engine
Configurable Functions
Xtensa Local Memory Interface
Optional Function
Data Load/Store Unit
Load/Store Unit 2
Optional Configurable
Designer Defined Features (TIE)
10Xtensa LX Verification Infrastructure
System Memory
Simulation Suites
Assembly
ISS
Test
Data
Inst.
Cache
Cache
ETC
Co-simulation
DUT
Data
Inst.
TIE Port
RAM
RAM
External Event
Data
Inst
Generator
ROM
ROM
Xtensa LX
JTAG
Processor
XLMI
Checkers
(Data Port)
Monitors
Coverage
11FLIX Flexible Length Instruction Extensions
- FLIX instructions are designer-defined 32b or 64b
instructions bundling multiple operations - Xtensa LX processor is capable of issuing one
FLIX instruction per cycle - Xtensa LX processor can freely intermix 16b/24b
core instructions with wide FLIX instructions - Higher data throughput with optional dual
Loadstore units - Automated or manual FLIX instruction encoding
63
0
12Example 1 Memory Order with FLIX
- Load acquire and store release instructions act
as memory fence instructions creating boundaries - Order of loads and stores to external memory must
comply with these boundaries
Address Count
A0 1
STORE A0
NOP
Basket N
LOAD ACQ
A1 1 A2 2 A3 1
LOAD A1
LOAD A2
LOAD A2
STORE A3
STORE REL
Basket N1
STORE A1
LOAD A1
A1 3 A2 2 A3 1
LOAD A3
STORE A2
LOAD A2
LOAD A1
STORE REL
Basket N2
Task Definition
Synchronization Point
13Xtensa LX TIE Ports and Queues
Xtensa LX
Export States
Import Wires
Execution Unit
OutBus lt630gt
InBus lt310gt
OutWire
InWire
Output Queue
Input Queue
OutData lt2550gt
InData lt2550gt
OutData_PushReq
InData_Empty
OutData_Full
InData_PopReq
ExternalQueue H/W
ExternalQueue H/W
14Example 2 TIE Queue Deadlock
- An output queue write instruction followed by an
input queue read instruction should not be
blocked even if the queue read instruction stalls
due to the input queue being empty.
15Coverage Oriented Diagnostics with ETC
- Linking assembly tests and coverage module
functions - ETC gives a handle to the assembly to query the
coverage database any time during simulation - Dynamic coverage feedback to determine diagnostic
status - Tests pass or fail based on the coverage result
- Helps identify broken or obsolete diagnostics
- Enhancing random diagnostics based on the
coverage feedback - Control testbench to improve coverage
- Timely termination of structured random
diagnostics when coverage goal is reached - Instead of running a predefined number of cycles,
simulation stops when coverage goal is met
16Example 3 Multiple Exception Coverage
RUN CASE
Write 0x1 to icount level register movi
a5,0x00000001 wsr a5, 237 isync beq
a4, a2, PASS_1073741891_2 j
FAILPASS_1073741891_2 nop _at_
check_coverage(ibreak_icount_ill)
nop task check_coverage(string
cov_name) integer bin_count string
bin_name bin_count exception_cov.triple_
exc.query(FIRST) while (bin_count ! -1)
bin_name exception_cov.triple_exc.que
ry_str(NAME) if(bin_name
cov_name) if(bin_count lt1)
fail(bin_name) bin_count
exception_cov.triple_exc.query(NEXT)
COVERAGE DATA COLLECTED
CHECK
FAIL
FOREACH BIN
GET COVERAGE
COVERAGE DATA QUERIED
CHECK
FAIL
PASS
17Results
- Comparative study of smart diagnostics and
regular diagnostics with random external events - ETC improves coverage and simulation performance
- Multiple exceptions
- TIE Queue Contention
- Load/Store/Ifetch arbitration
- Arbitration of pipeline loads/stores and inbound
processor requests to processor memories - ETC improves generation of corner cases with high
coincidence number
18ETC Limitations
- Smart Diagnostics cannot be a replacement for
random testing - Can be used to enhance its coverage
- ETC has limitations on dynamically scheduled
superscalar processors - Harder to synchronize with multiple PCs
- ETC only enables one-way communication between
the assembly and the test-bench code - This can be improved by allowing the testbench to
modify data used by the diagnostic program
19Conclusions
- A unified methodology enhances processor
verification - Linking assembly and test-bench functions gives
more flexibility and power in creating
interesting scenarios - ETC improves
QUALITY
PRODUCTIVITY
- In test generation and checking
EFFICIENCY
MAINTAINABILITY
- By improving simulation performance
- By identifying broken tests and helping reuse