Title: Modeling and Validation of Programmable Embedded Systems
1Modeling and Validation of Programmable Embedded
Systems
- Prabhat Mishra
- Dept. of Computer and Information Science and
Engineering - University of Florida
2Outline
- Ongoing research
- Modeling and Validation of Programmable Embedded
Systems - Programmable embedded systems
- Motivation
- Traditional validation techniques
- Language-driven validation methodology
- Conclusion
3Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
4Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
Specification
HW/SW Partitioning
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
5Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
6Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
7Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
8Design Automation of Embedded Systems
Hardware Components
Hardware
Design (Synthesis, Layout, )
Concept
HW/SW Partitioning
Specification
Design (Compilation, )
Estimation - Exploration
Software Components
Software
Validation and Evaluation (area, power,
performance, )
9Ongoing Research
- Validation of Specification
- ACM TECS 2004, Kluwer DAES 2003, DATE 2002,
ASPDAC 2002 - Design Space Exploration
- ACM TECS 2004, VLSI 2004, RSP 2003, ISSS 2001,
VLSI 2001 - Instruction-Set Simulation
- CODESISSS 2005, DAC 2003, CODESISSS 2003
- Functional Test Generation
- DATE 2005, DATE 2004, HLDVT 2002
- Equivalence Checking
- IEEE DesignTest 2004, IJES 2005
10Outline
- Ongoing research
- Modeling and Validation of Programmable Embedded
Systems - Programmable embedded systems
- Motivation
- Traditional validation techniques
- Language-driven validation methodology
- Conclusion
11Programmable Embedded Systems
- Computing is an integral part of daily life
- Two types of computations
- Desktop-based systems
- PC, laptops, workstations, servers,
- Embedded systems
- handheld and household items, military and
medical equipments
12Programmable Embedded Systems
- Computing is an integral part of daily life
- Two types of computations
- Desktop-based systems
- PC, laptops, workstations, servers,
- Embedded systems
- handheld and household items, military and
medical equipments - Difference
- Application specific versus general purpose
- Commonality
- Use processor, co-processor, and memories to
execute application programs - Programmable Embedded Systems
- Programmable Architectures
13Programmable Embedded Systems
Programmable Embedded Systems
A2D Converter
Processor Core
Coprocessor
ASIC / FPGA
Coprocessor
D2A Converter
Memory Subsystem
Sensors / Actuators
DMA Controller
Embedded Systems
14Outline
- Ongoing research
- Modeling and Validation of Programmable Embedded
Systems - Programmable embedded systems
- Motivation
- Traditional validation techniques
- Language-driven validation methodology
- Conclusion
15Technology and Demand
of transistors are doubling every 2 years
Demand
Technology
Communication, multimedia, entertainment,
networking
Exponential growth of design complexity ?
verification complexity
16North America Re-spin Statistics
100
48
44
39
1st Silicon Success
2004
1999
2002
Source 2002 Collett International Research and
Synopsys
71 SOC re-spins are due to logic bugs
17Functional Verification of SOC Designs
2000
1000B
2007
200
10B
2001
Engineer Years
Simulation Vectors
100M
20
1995
100M
1M
10M
Logic Gates
Source Synopsys
Source G. Spirakis, keynote address at DATE 2004
18Functional Validation of Microprocessors
- Functional validation is a major bottleneck
- Deeply pipelined complex micro-architectures
- Logic bugs increase at 3-4 times/generation
- Bugs increase (exponential) is linear with design
complexity growth.
19Outline
- Motivation
- Traditional validation techniques
- Language-driven validation methodology
- Complements existing techniques
- Conclusion
- Future research directions
20Traditional Validation Approach
Manual Process
21Traditional Validation Approach
Manual Process
22Bottlenecks of Functional Verification
- Bottom-up methodology
- Lack of a golden reference model
- Difficult to find micro-architectural bugs
- Uses reverse-engineering (abstraction) methods
- Specification has all the details
- Lack of a suitable functional coverage metric
- Code coverage, toggle coverage not sufficient
- Cannot determine if all pipeline interactions
(with hazards/exceptions) are considered. - Approach
- A top-down validation methodology
- Complements existing bottom-up techniques
23Proposed Top-down Validation Methodology
http//www.ics.uci.edu/express
ADL Architecture Description Language
24Test Generation
25Functional Validation of Pipelined Processors
Test Generator
Pipelined Processor
TestGen
MOV R1, 011 MOV R2, 010 ADD R3, R1, R2 R3 101
Test Program
R3 101 ?
Check Result
Verifies the functionality of the processor using
assembly programs
26Functional Validation of Pipelined Processors
Test generation is considered in this work
Test Generation
Pipelined Processor
TestGen
MOV R1, 011 MOV R2, 010 ADD R3, R1, R2 R3 101
Test Program
R3 101 ?
Check Result
27Related Work Test Generation
- Directed test program generation
- Aharon et al., DAC 1995
- Shen et al., DAC 1999
- Pipeline behavior is not considered
- Test generation for pipelined processors
- Ur and Yadin, DAC 1999
- Iwashita et al., ICCAD 1994
- Campenhout et al., DAC 1999
- No coverage metric for pipeline interactions
- Functional test program generation
- Chen et al., DAC 2003, Lai et al., DAC 2001
- Applied in the context of manufacturing testing
28Functional Test Program Generation
- Processor model
- Graph model for pipelined processors
- Functional fault model
- Pipeline interactions based on graph coverage
- Coverage-directed test generation technique
- Model checker to generate test programs
- write the negation of the property to be verified
- model checker generates example to disprove
(counter-example)
29Pipelined Processor Model
Graph Model
Graph (Nodes, Edges) Nodes units U
storages Edges data-transfer edges U
pipeline edges
30Functional Fault Model
Fetch
MEM
Decode
1. Node Fault A node does not execute
correctly - active - stalled -
exception - flushed 2. Edge Fault An
edge does not transfer inst./data correctly
- active - stalled - flushed
ALU
AddrCalc
RF
LdSt
WB
31Coverage-directed Test Generation
- Algorithm
- Inputs
- 1. Graph Model of the processor, G
- 2. List of possible faults, faultList
- Output Test programs for detecting all the
faults in the fault model - begin
- TestProgramList
- for each fault in the faultList
- testprogreg createTestProgram(fault , G)
- TestProgramList TestProgramList U testprogreg
- endfor
- return TestProgramList
- end
32Coverage-directed Test Generation
Fetch
- Example generate test to make edge LdSt-ALU
active - Two properties need to be generated
- Make the node LdSt active at clock cycle t
- Make the node ALU active at clock cycle (t1)
- Test Program
- LOAD R1, R5, 0x1
- NOP
- MOV R3, R1
Decode
AddrCalc
ALU
RF
Memory (latency 1)
LdSt
WB
33Test Generation Methodology
Architecture Specification
ADL Specification
Simulator Generation
SMV
Not Enough Properties
Counterexamples
Coverage Report
Simulator
Automatic
ADL Architecture Description Language
Manual
Test Programs
Feedback
34Test Generation Example
- Initialize registers Ain and Bin with values 2
and 3 at cycle 9
One property assert G ((cycle8) ? X ((DIV.Ain
2) (DIV.Bin 3))) Apply at processor
level needs 375.98 sec. and 1928568 BDD nodes
using 333 MHz Sun with 128M RAM
Problem Test generation is limited by the
capacity restrictions of the tool.
Solution Apply properties at the module level
35Modified Test Generation Methodology
Properties are applied at the module level
SMV Description (for node N)
Property (for node N)
SMV
N parent of N
N parent of N
Counterexamples
input assignments
primary i/p?
output req. for parent node
yes
Simulator
coverage report
test programs
36Test Generation Example
- Example Initialize Ain and Bin with values 2 and
3 at cycle 9
- Apply to DIV unit
- assert G ((cycle8) ? X((Ain 2) (Bin 3)))
- input assignments divInst.src1 2, divInst.src2
3 - Apply to Decode unit
- assert G((cycle7) ? X((divInst.src1 2)
(divInst.src2 3))) - input assignments oper DIV R3 R1 R2 RF12,
RF23 - Apply to Fetch unit
- assert G((cycle6) ? X(oper.opcode DIV)
(oper.src1 1) (oper.src2 2))) - input assignments PC5, Memory5 DIV R3 R1 R2
37Final Test Program Example
Fetch Cycle Opcode Dest Src1 Src2
1 NOP 2
ADDI R1 R0 2 3
ADDI R2 R0 3 4
NOP 5 NOP 6
NOP 7 DIV
R3 R1 R2
- Using our modified methodology
- requires 1 sec. and 5600 BDD nodes
- 333 MHz Sun UltraSparc II with 128M RAM
- When applied at the processor level
- requires 375.98 sec. and 1928568 BDD nodes
- An order of magnitude improvement time/space
38Functional Coverage
- When to end the verification effort?
- Code coverage, toggle coverage, fault coverage
- No direct relation with the device functionality
- Proposed a functional fault model for pipelined
processors - Register read/write, operation execution,
execution path, and pipeline execution - Used to define functional coverage
- Developed coverage-driven test generation
algorithms - Generates test programs to detect all the faults
in the fault model
39Functional Fault Models
- Register Read/Write
- All registers are written and read.
- Operation Execution
- All operations are executable.
- Execution Path
- Each execution path (taken by an operation) works
correctly - Consists of one pipeline path and multiple
data-transfer paths - Pipeline Execution
- All pipeline interactions are activated.
40Register Read/Write Faults
- The fault can be due to an error in
- reading
- register decoding
- register storage
- prior writing
- Whatever may be the reason, the outcome is an
unexpected value.
41Operation Execution Faults
- The fault can be due to an error in
- Operation decoding
- erroneous decoding returns incorrect opcode
- Control generation
- incorrect execution unit gets selected
- Final implementation
- execution unit can be faulty
- The outcome is an unexpected result.
42Faults in Execution Path
- Execution path
- During execution of an operation, one pipeline
path and one/more data-transfer paths get
selected - these activated paths are defined as execution
path - The fault can be due to an error in any of the
paths - A path is faulty if any of its nodes or edges are
faulty - A node is faulty if does not execute correctly
- An edge is faulty if it does not transfer
data/inst. correctly - The outcome is an unexpected result.
43Pipeline Execution Faults
- The fault can be due to an incorrect
implementation of the pipeline controller - Erroneous hazard detection
- Incorrect stalling
- Erroneous flushing
- Wrong exception handling
- The outcome is an unexpected result.
44Test Generation for Register Read/Write
- Algorithm 1
- Input Graph model of the architecture G.
- Output Test programs for detecting faults in
reg. read/write. - begin
- TestProgramList
- for each register reg in architecture G
- valuereg GenerateUniqueValue(reg)
- writeInst an instruction that writes valuereg
in reg. - testprogreg createTestProgram(writeInst)
- TestProgramList TestProgramList ?
testprogreg - endfor
- return TestProgramList
- end
CreateTestProgram 1. Assigns values to
unspecified locations 2. Creates initialization
instructions for sources 3. Creates instructions
for reading destinations
45A Case Study
- Applied on two pipelined architectures
- VLIW implementation of DLX
- RISC implementation of Sparc V8 (LEON)
- Architecture Specification
- Using Architecture Description Language (ADL)
- EXPRESSION ADL
- Test generation and coverage estimation
- Random/Constrained-random test generation
- Using Specman Elite framework
- Coverage-driven test generation
- Using our test generation algorithms
46Test Generation and Coverage Estimation
Architecture Specification (ADL Description)
Automatic
Manual
ISA Specification (e Description)
Coverage Specification
Pipelined Implementation (e Description)
Coverage Estimation
Simulator
Random
Test Generation
Specman Elite
Directed
External Test Programs (generated by our
algorithms)
47Coverage Estimation
- Instruction definition is used
- opcode, dest, src1, src2
- Register read/write
- coverage of src1 and src2 indicates reads.
- coverage of dest indicates writes.
- Operation execution
- coverage of opcode field
- Pipeline execution
- use variable for each stall/exception
- cross-coverage is used to estimate coverage of
multiple exception scenarios.
48Validation Flow
49Test Generation for VLIW DLX
An entry indicates number of test programs
indicates the fault coverage using the given test
programs for that fault model
Random or constrained-random techniques could not
activate any multiple exception scenarios -
Low coverage in pipeline execution
50Test Generation for LEON2 Processor
An entry indicates number of test programs
indicates the fault coverage using the given test
programs for that fault model
- The trend is similar in both architectures.
- Due to bigger pipeline structure (more
interactions) VLIW - DLX has lower fault coverage than LEON2 model.
51The Framework is Available
- https//www.verificationvault.com
- It includes
- VLIW DLX models
- e specification for reference (ISA model)
- Pipelined implementation in e.
- Components for random/directed test generation
and incorporation of external tests - Components for data/temporal checking and
coverage estimation.
52Conclusion
- Functional validation is a major bottleneck
- Existing methods employ bottom-up approach
- Developed a top-down validation methodology
- Uses an architecture specification
- Validate the ADL specification
- Serves as a golden reference model
- Specification-driven design automation
- Design space exploration
- Implementation validation using equivalence
checking - Functional test program generation
- Complements existing verification techniques
53More on this topic
- Publications are available online
- http//www.cise.ufl.edu/prabhat
- Contact me
- prabhat_at_cise.ufl.edu
- Recent Book
- Functional Verification of Programmable Embedded
Architectures A Top-Down Approach - P. Mishra and N. Dutt, Springer, June 2005
54Future Research Directions
- Architecture Specification
- Completeness criteria for specification
validation - Design Space Exploration
- Generate high quality models from the
specification - Design Verification
- Model generation without implementation knowledge
- Test Generation
- Confluence of validation and manufacturing
testing - Extend current methodology for validation of
- Architectures with multiple-processor cores
- Embedded Systems
55 56Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
57Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
58Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
59Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
60Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode P7 C4 P8) (WB RF P5 C3
P6) ) (TYPE BI (MEM MEMORY P4 C2 P3) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
61Specification of the DLX Processor
PC
Memory
Fetch
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) (DTPATHS (TYPE
UNI (RF Decode) (WB RF) ) (TYPE
BI (MEM MEMORY) ) )
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
FADD3
FADD4
MUL7
MEM
WriteBack
62Specification of the DLX Processor
Structure
PC
Memory
Fetch
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
Behavior
(OPCODE ADD (OPERANDS (SRC1 rf) (SRC2 imm)
(DEST rf)) (BEHAVIOR DEST SRC1 SRC2)
(FORMAT ) )
FADD3
FADD4
MUL7
MEM
WriteBack
63Specification of the DLX Processor
Structure
PC
Memory
Fetch
Decode
Register File
DIV
FADD1
IALU
MUL1
FADD2
MUL2
Behavior
Mapping
(OPCODE ADD (OPERANDS (SRC1 rf) (SRC2 imm)
(DEST rf)) (BEHAVIOR DEST SRC1 SRC2)
(FORMAT ) )
FADD3
FADD4
MUL7
MEM
WriteBack
64Validation of Static Behavior
- Graph based modeling of architectures
- Verify properties
- Connectedness
- False pipeline and data-transfer paths
- Completeness
- Finiteness
- Validated ADL specifications
- DLX, MIPS R10K, TI C6x, and PowerPC
- Validation time is in the order of seconds
65Validation of Dynamic Behavior
- FSM based modeling of pipelined processors
- Verify properties
- Determinism
- In-order execution
- Validated DLX processor specification
- Developed two frameworks
- Equation solver based using Espresso
- Model checker based using SMV
66False Pipeline Path
Fetch
o
Read1
mac
alus
ALU
MUL
o
o
Reg File
o
Read2
o
alus
mac
Shift
ACC
o
WB
Supports two operations alus (ALU-shift) and mac
(multiply-accumulate)
67False Pipeline Path
Fetch
o
Read1
ALU
MUL
o
o
Reg File
o
Read2
o
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB
Fetch, Read1, ALU,
Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
68False Pipeline Path
Fetch
o
Read1
ALU
MUL
o
o
Reg File
o
Read2
o
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
69False Pipeline Path
Fetch
o
Read1
alus
ALU
MUL
o
o
Reg File
o
Read2
o
mac
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
70False Pipeline Path
Fetch
o
Read1
mac
ALU
MUL
o
o
Reg File
o
Read2
o
alus
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB
Fetch, Read1, MUL, Read2, Shift, WB
71False Pipeline Path
Fetch
o
Read1
ALU
MUL
o
o
Reg File
o
Read2
o
Shift
ACC
o
WB
Four pipeline paths Fetch, Read1, ALU, Read2,
Shift, WB alus
Fetch, Read1, MUL, Read2, ACC, WB mac
Fetch, Read1,
ALU, Read2, ACC, WB X
Fetch, Read1, MUL, Read2, Shift,
WB X
False pipeline paths
72False Pipeline Path
Algorithm
Fetch
Inputs 1. Graph model of the architecture
2. Each unit has a list of supported
opcodes Output True, if the property is
satisfied else false. 1. Traverse each node of
the graph starting from root if node is
root OutLroot SopLroot / Supported
opcodes / else InL OutLparent
/ recently visited parent / OutLnode
SopLnode n InL endif If OutLnode is
NULL report false pipeline paths. 2. Return true
if there are no false pipeline paths.
Read1
ALU
MUL
Read2
Shift
ACC
WB
InL Input List SopL Supported opcode
list OutL Output list
73A Fragment of the Processor Pipeline
Stagei-1
Stagei
Pipeline Latch Instruction Register
(IR) Latchji IR i, j
Stagei1
IR i, j receives instructions from p parent
units and sends them to q children units
74Processor Pipeline Flow Conditions
Time
t
t1
t
t1
t
t1
Stage
i
i
i
i1
i1
i1
Normal Flow
Nop Insertion
Stall
Flow conditions for pipeline latches
t
t1
Time
t
t1
t
t1
pc
pc
pc
new
pc
pc
PC
PC
PC
Sequential Execution
Branch Taken
Stall
Flow conditions for Program Counter (PC)
75FSM Model of Processor Pipelines
- Define state of a n-stage pipeline as values of
- Program Counter
- Pipeline Latches / Instruction Registers
- S(t) lt PC(t), IR1,1(t), IR1,2(t), .,
IRn-1, nn-1(t) gt - where, stage i has ni pipeline latches.
- Modelling flow conditions in FSM
- A latch IRi,j ( j-th latch in stage i ) is
stalled - Due to stall of children
- Due to hazards, exceptions etc. on that latch
- condSTIR i,j STIR i,j STchildIR i,j
STselfIR i,j
76Modeling state transition functions
- PC(t1) S(t) State of the
pipeline - fNSPC(S(t), I(t)) I(t)
Set of external signals - PC(t) L if condSEPC(S(t), I(t)) 1
- target if condBTPC(S(t), I(t)) 1
- PC(t) if condSTPC(S(t), I(t))
1 - IRi,j(t1)
- fNS IR i,j(S(t), I(t))
- IRi-1,j(t) if condNFIR i,j(S(t), I(t))
1 - IRi,j(t) if condSTIR i,j(S(t), I(t))
1 - nop if condNIIR i,j(S(t),
I(t)) 1
77Verification of Determinism
- All state registers must be deterministic
- Three state functions must cover all possible
combinations - condSEPC condSTPC condBTPC 1
- condNFIR i,j condSTIR i,j condNIIR i,j 1
- Two conditions are disjoint for each next state
function - condxPC . condyPC 0
- conduIR i,j . condvIR i,j 0
78Verification of In-Order Execution
- State transitions of adjacent instruction
registers must depend on each other. - An instruction register cannot be in normal flow
if all the parent instruction registers (adjacent
ones) are stalled. - condSTIR i-1,j . condNFIR i,k 0 ( for all i,
j, k ) - If such a combination is allowed, the instruction
is duplicated and stored into both IRi-1,j and
IRi,k in the next cycle.
IRi-1,j
Stall
IRi,j
Normal Flow
79Automatic Verification Framework
Processor Core
EXPRESSION ADL
FSM Model
Equations
Eqntott
Espresso
Success
Analyze
Failure
80A Case Study
- Applied this methodology on DLX processor
- ADL Specification
- (DecodeUnit Decode
-
- (CONDITIONS
- (NF ANY ANY)
- (ST ALL)
- (NI ALL ANY)
- (SELF )
- )
- )
- Flow Conditions
- condSTDEC STEX . STM1 . STA1 . STDIV
A fragment of the DLX pipeline
81A Case Study
- A small trace of the property checking in our
validation framework - condNFDEC condSTDEC condNIDEC
- STPC . (STEX STM1 STA1 STDIV )
- (STEX . STM1 . STA1 . STDIV )
- STPC . (STEX STM1 STA1 STDIV )
- (STEX STM1 STA1 STDIV ) . (STPC STPC )
- (STEX . STM1 . STA1 . STDIV)
- 1
82Model Generation from Specification
- Model generation is a major challenge
- Wide varieties of architectures
- RISC, DSP, VLIW, and superscalar
- Simulator, hardware, and validation models
- Developed a functional abstraction scheme
- Compose abstraction primitives to generate new
architecture using functional abstraction - Retargetable simulator generation (ISSS 2001)
- Synthesizable RTL generation (RSP03, VLSI04)
83Functional Abstraction
- Similarities
- Computation units connected using ports, buses,
and latches - Structures Behaviors
- Differences
- Same unit with different parameters
- Same functionality in different unit
- New architectural features
- Define generic functions and sub-functions
- Compose functions to create new architecture
84Functional Abstraction of Architectures
- Structure of a generic processor
- functions for units (fetch, decode, issue, )
- sub-functions for computations (read, write, )
Example A Fetch Unit FetchUnit ( read per
cycle n, res_Station size, ........ ) address
ReadPC() Instructions ReadInstMemory(address,
n) WriteToReservationStation(Instructions,
n) outInst ReadFromReservationStation(m) WriteLa
tch(decode_latch, outInst) pred
QueryPredictor(address) If pred nextPC
QueryBTB(address) SetPC (nextPC) else
IncrementPC(x)
85Functional Abstraction of Architectures
- Define generic functions and sub-functions
- Structure of a generic processor
- functions for units (fetch, decode, issue,
res-station) - sub-functions for computations (read, write, )
- Behavior of a generic processor
- functions for each operation (add, sub, mul,
div, )
86Functional Abstraction of Architectures
- Define generic functions and sub-functions
- Structure of a generic processor
- functions for units (fetch, decode, issue,
res-station,) - sub-functions for computations (read, write, )
- Behavior of a generic processor
- functions for each operation (add, sub, mul, div,
) - Generic memory subsystem
- functions for each component (cache, SRAM, SB, )
- Generic controller
- Interrupts and exceptions
- DMA, Co-processors etc.
- Compose functions to create new architecture
87Step 1 Read ADL Specification
Structure
( ARCHITECTURE_SECTION ..........
(FetchUnit Fetch (CAPACITY 4) (TIMING (all
1)) (OPCODES all) (LATCHES (OTHER
PCLatch)(OUT DLatch)) ) ( PIPELINE_SECTION
(PIPELINE Fetch Decode Execute MEM WB) (Execute
(ALTERNATE ALU MUL FADD DIV)) (FADD (PIPELINE
FADD1 .. FADD3 FADD4)) .......... )
Mapping
Behavior
(OPCODE ADD (OPERANDS (SRC1 rf) (SRC2 imm)
(DEST rf)) (BEHAVIOR DEST SRC1 SRC2)
(FORMAT 0101 dest(27-23), src1(22-18), )
)
88Step 2 Compose Structure
- DLX ( .. )
-
- FetchUnit ( 4, 0, )
- .
- DecodeUnit ( )
- ..
- ..
- ..
- Controller ( )
Reservation Station size
Input/output ports
fetches
89Step 3 Compose Behavior
- DLX ( .. )
-
- FetchUnit (4, 0, , )
- -- No reservation station (instruction buffer)
processing -
- DecodeUnit (.)
- -- Use binary description and operation mapping
to - -- decide where to send the current operation.
-
- ..
- ..
- Controller ( . )
- -- Use control table to stall/unstall/flush the
pipeline . -
90TI C6x Memory Exploration using GSR
91Hardware Generation Exploration
Config 1 IF ? ID ? EX1 ? MEM ? WB Config 2
EX1 IF ? ID MEM ?
WB EX2 Config 3
EX1 IF ? ID ? EX2 ? MEM ? WB
EX3 Config 4 EX1
EX2 IF ? ID MEM ? WB
EX3 EX4
- Schedule length improves due to addition of
pipeline paths (area, power increases) - Fourth configuration is interesting since both
area and performance improves
92Exploration Experiments
Exploration varying MIPS R10K processor features
93Exploration Experiments
Co-processor based Exploration using TI C6x
94Energy Performance Tradeoff for Compress
95Addition of Pipeline Stages
- Clock frequency improves due to addition of
pipeline stages - 4th configuration generated 30 speed improvement
at the cost of 13 area increase
Cfg 1 1-stage multiplier Cfg 2 2-stage
multiplier Cfg 3 3-stage multiplier Cfg 4
4-stage multiplier
96Addition of Operations
- Schedule length improves due to addition of
operations in units - Third configuration generated the best possible
schedule length
97RTL Design Validation
Architecture Specification
Success
RTL Design (Implementation)
Reference Model (Properties)
Reference Model (Complete Design)
Different
Failure
Symbolic Simulation
Equivalence Checker
Equivalent
Successful
98Property Checking using Symbolic Simulation
- Design Carry Lookahead Adder
- Three inputs in0, in1, in2
- One output out
- One simple property
- assign out in0 in1 in2
- Verification failed
- Incomplete specification of in2
- With clear and set logic
- assign temp ( in2 clear ) set
- assign out in0 in1 temp
Architecture Specification (English Document)
Properties (Verilog)
RTL Design (Verilog)
State Machine
Boolean Model
Symbolic Simulation
99Property Checking Experiments
- TLB miss detection
- assign input ( 1'b1, vsid023,
ea49, ea1013 ) - assign out0 ( valid0, data0023,
data02429, data05457 ) - assign out1 ( valid1, data1023,
data12429, data15457 ) - assign hit0 ( input out0 )
- assign hit1 ( input out1 )
- assign miss ( hit0 hit1 )
- Applicable to BAT array miss detection
TLB
100A Case Study
- The Architecture
- DLX Processor
- 20 nodes
- 24 edges
- 91 instructions
Unit
Storage
Pipeline edge
Data-transfer edge
101Experiments
- DLX processor 20 nodes, 24 edges, 91
instructions - 223 test programs needed to cover all single
faults - Reduction possible - 43 test programs
- Random/constrained-random techniques requires an
order of magnitude more test programs to cover
these faults.
102Functional Fault Models
- Fault model for register read/write
- Registers should be written and read correctly
- Fault model for operation execution
- Operations must execute correctly
- Fault model for execution path
- An operation must execute correctly in all
supported paths (pipeline data-transfer). - Fault model for pipeline execution
- The pipeline should produce correct result in the
presence of multiple interactions
103Test Generation for Register Read/Write
- Algorithm
- Input Graph Model of the processor, G
- Output Test programs for detecting faults in
register read/write function - begin
- TestProgramList
- for each register reg in processor G
- valuereg generateUniqueValue(reg)
- writeInst an instruction that writes in
register reg - testprogreg createTestProgram(writeInst)
- TestProgramList TestProgramList U testprogreg
- endfor
- return TestProgramList
- end
104Publications
Exploration
Static Behavior
IEEE Design Test
2004
(ACM TECS)
(ACM TECS)
DATE
Simulator Generation
Dynamic Behavior
(CODESISSS, DAC)
2003
Equivalence Checking (MTV)
(Kluwer DAES)
Hardware Generation (RSP, VLSI Design)
Test Generation
ASPDAC
2002
(HLDVT)
DATE
Symbolic Simulation (MTV)
http//www.ics.uci.edu/pmishra
Memory Specification
(VLSI Design)
2001
HLDVT
ISSS
Coprocessor Specification
(SASIMI)
Architecture Specification
Specification Validation
Model Generation
Design Validation
105Publications
- JOURNALS
- P. Mishra et al., Processor-memory co-exploration
using an architecture description language. ACM
Transactions on Embedded Computing Systems
(TECS), 3(1), 2004. - P. Mishra et al., Modeling and validation of
pipeline specifications. ACM TECS, 3(1), 2004. - P. Mishra et al., A top-down methodology for
validation of microprocessors. IEEE Design Test
of Computers, 2004. - P. Mishra et al., Towards automatic validation of
dynamic behavior in pipeline specifications.
Kluwer Design Automation for Embedded Systems
(DAES), 8(2), 2003. - P. Mishra et al., Functional abstraction driven
design space exploration of programmable embedded
systems, Under revision in ACM TODAES. - P. Mishra et al., A Methodology for Validation of
Microprocessors using Symbolic Simulation,
Inderscience International Journal of Embedded
Systems (IJES), 2004. Invited Paper - BOOK CHAPTER
- P. Mishra et al., Modeling and verification of
pipelined embedded processors in the presence of
hazards and exceptions, in Design and Analysis of
Distributed Embedded Systems, Bernd Kleinjohann
et al., Editors, Kluwer Academic Publishers,
2002. - CONFERENCES
- P. Mishra et al., Graph-based functional test
program generation for pipelined processors,
DATE, 2004. - P. Mishra et al., Synthesis-driven exploration of
pipelined embedded processors, VLSI Design, 2004.
- M. Reshadi, P. Mishra, and N. Dutt, Instruction
set compiled simulation a technique for fast and
flexible instruction set simulation, DAC, 2003.
106Publications
- M. Reshadi, N. Bansal, P. Mishra, and N. Dutt, An
efficient retargetable framework for
instruction-set simulation, CODESISSS, 2003. - P. Mishra et al., Automatic verification of
in-order execution in microprocessors with
fragmented pipelines and multi-cycle functional
units. DATE, 2002. - P. Mishra et al., Automatic Modeling and
Validation of Pipeline Specifications driven by
an Architecture Description Language. ASP-DAC /
VLSI Design, 2002. - P. Mishra et al., Processor-Memory Co-Exploration
driven by a Memory- Aware Architecture
Description Language, VLSI Design, 2001. - P. Mishra et al., Functional Abstraction driven
Design Space Exploration of Heterogeneous
Programmable Architectures, ISSS, 2001. - WORKSHOPS
- P. Mishra et al., Rapid Exploration of Pipelined
Processors through Automatic Generation of
Synthesizable RTL Models, IEEE Workshop on Rapid
System Prototyping (RSP), 2003. - P. Mishra et al., A Methodology for Validation of
Microprocessors using Equivalence Checking, IEEE
Workshop on Microprocessor Test and Verification
(MTV), 2003. - P. Mishra et al., A Property Checking Approach to
Microprocessor Verification using Symbolic
Simulation, Microprocessor Test and Verification
(MTV), 2002. - P. Mishra et al., Automatic Functional Test
Program Generation for Pipelined Processors using
Model Checking, IEEE High Level Design Validation
and Test (HLDVT), 2002. - P. Mishra et al., Automatic Validation of
Pipeline Specifications. HLDVT, 2001. - P. Mishra et al., ADL driven Design Space
Exploration in the Presence of Coprocessors,
Synthesis and System Integration of Mixed
Technologies (SASIMI), 2001.
107Architecture Description Languages
- Behavior-Centric ADLs
- ISPS, nML, ISDL, SCP/ValenC, ...
- primarily capture Instruction Set (IS)
- good for regular architectures, provides
programmers view - tedious for irregular architectures, hard to
specify pipelining - Structure-Centric ADLs
- MIMOLA, ...
- primarily capture architectural structure
- specify pipelining drive code generation, arch.
synthesis - hard to extract IS view
- Mixed-Level ADLs
- LISA, RADL, FLEXWARE, MDes, EXPRESSION,
- combine benefits of both
- generate simulator and/or compiler
108An Example Embedded System
Digital Camera Block Diagram
Memory
Processor
Coprocessors
109Design Complexity
Design complexity is increasing at an exponential
rate.
110(No Transcript)
111Use of Silicon Power