Title: Embedded SystemonChip Design and Validation
1Embedded System-on-Chip Design and Validation
- Rajesh Gupta, UC Irvine
- Sujit Dey, UC San Diego
- Peter Marwedel, U. of Dortmund
2Outline
- Validation Challenges and Issues for
System-on-Chip - Validation Methodologies
- Simulation
- Prototype Validation
- System Verification Environments
- Improving Simulation Performance Using Models
- Hardware-Software Co-Simulation
- Analysis/Estimation
- Performance
- Power
3System-on-chip Using IP Cores
4Challenges for System-on-Chip Industry
- ... the industry is just beginning to fathom
the scope of the challenges confronting those who
integrate blocks of reusable IP on large chips.
Most of the participants summed up the toughest
challenge in one word verification. -
- Source EE Times (Jan. 20, 1997)
- Report on Design Reuse and IP Core Workshop
- Organized by DARPA, EDA Industry Council, NIST
5System-on-Chip Verification Challenges
- Verification goals
- functionality, timing, performance, power,
physical - Design complexity
- MPUs, MCUs, DSPs, interface, telecom, multimedia
- Diversity of blocks (IPs/Cores)
- different vendors
- soft, firm, hard
- digital, analog, synchronous, asynchronous
- different modeling and description languages - C,
Verilog, VHDL - software, firmware, hardware
- Different phases in system design flow
- specification validation, algorithmic,
architectural, hw/sw, - full timing, prototype
6Verification Gap
Test Complexity
Sim Performance
Simulator Performance
Verification Gap
Design Complexity (FFs)
Source Cadence
7Increasing Simulation Loads
Source Synopsys
8System-on-Chip Design and Validation Flow
9Embedded Software Implementation and Validation
Software Tasks
Estimators - Performance - Power
Instruction Set Simulator
Mapping tasks to CPUs
Compiler Assembler Linker
Multitask Scheduling - Priority selection
Co-Simulator
H /W
Multiprocessor Integration - Protocols -
Shared Memory
Debugger Emulator
RTOS
Software Implementation
10Verification of Cores in High Level Design Flow
user constraints
resource, performance, etc.
Functional RTL
Structural RTL
Behavioral
Hardware Sharing Delay / Power/ Testability
RT
-Level
VHDL Specs.
CFG DFG
Scheduler (cycle-by-cycle behavior)
Compiler
Scheduled VHDL
(contr DP)
RT-Level Optimization
Test Bench Generation
Estimators
Mapping, Physical Synthesis
Verification - using Test Bench - Formal
11Integration of Cores Verification of Interfaces
ip
ip
CPU
DMA
Peripheral
Peripheral
External Bus Interface
ASB
APB
Bridge
ROM RAM
Peripheral
Peripheral
Ext Access (Test)
High Speed
Low power
Source ARM
AMBA Advanced Microprocessor Bus Architecture
12Outline
- Validation Challenges for System-on-Chips
- Validation Methodologies
- Simulation
- Prototype Validation
- System Verification Environments
- Improving Simulation Performance Using Models
- Hardware-Software Co-Simulation
- Analysis/Estimation
- Performance
- Power
13Hardware Simulation
- Event-driven
- complied code
- native compiled code (directly producing
optimized object code) - - very slow
- asynchronous circuits, timing verification,
initialize to known state - Cycle-based
- faster (3-10x than NCC)
- - synchronous design, no timing verification,
cannot handle x,z states
14Validating System-on-Chip by Simulation
- Need for both cycle-based and event-driven
- asynchronous interfaces
- verification of initialization
- verification of buses, timing
- Need for mixed VHDL/Verilog simulators
- IP from various vendors
- models in different languages
- SOC verification not possible by current
simulation tools - Growing gap between amount of verification
desired and amount that can be done - 1 million times more simulation load than chip in
1990 (Synopsys)
15Prototype Validation
16Emulation
- Emulation Imitation of all or parts of the
target system by another - system, the target system performance achieved
primarily by hardware - implementation
- In-Circuit Emulator (ICE) A box of hardware that
can emulate the - processor in the target system. The ICE can
execute code in the target - systems memory or a code that is down loaded to
emulator. - - ICE also can be fabricated as silicon within
the processor-core - provides interface between a source level
debugger and a processor - embedded within an ASIC.
- - Provides Realtime emulation.
- - Functions supported such as Breakpoint
setting, Single step execution, Trace display and
Performance analysis. - - Provide C-source debugger.
- Examples embeddedICE macrocell in ARM SY7TDM1,
NEC 850 family of prcessors, LSI Logic
17Embedded ICE Macrocell
2
EmbeddedICE Macrocell
EmbeddICE Macrocell
ARM Core
0
ARM7TDM
Control
1
Data
Addr
Traditional boundary scan
Data bus scan chain
TAP
5 pin JTAG Interface
Source ARM
18Embedded ICE in ARM7TDMI Core
EmbeddedICE Interface
ASIC
EmbeddedICE macrocell
Debug Host
ARM
Source ARM
19Debugging environment for CPU core
Source NECEL
20Problems of Prototype Validation
- Simulation too slow ( 10-100 cycles/sec)
- Emulation is fast (1M cycles/sec) but ...
- too expensive (Intel 5M - 10M per processor)
- errors found expensive to diagnose and fix
- loss of main focus time to market
Source Embedded Systems Programming, Jan 1996
21Need to move Validation Early in the Design Cycle
22System Verification Environment
- provides system environment in which a core
should function - useful to verify if a core, core-based system,
works in the test environment - Compliance Test Environment
- specially suited for cores that comply with
industry standards, eg. PCI, USB, MPEG, Ethernet,
MAC, ... - compliance tests, as well as application specific
test - Examples PCI Test from Virtual Chips, MPEG Test
from CompCore - System Verification Environment
- provides a generic model of the system that is
commonly built using the specific core - Examples SVE from LSI Logic, NEC Simulation
Environment, MicroPack from ARM
23An Example PCI Bus
- 3 Main Buses
- Processor Local Bus
- PCI Bus Hierarchy
- Standard I/O Bus
- PCI Bridge Functions
- Host-to-PCI
- PCI-to-PCI
- PCI-to-Standard
- PCI-to-I/O Controller
Source Virtual Chips
24PCI Bus Features
- Designed for high throughput (132 Mbytes/sec)
- Widely-adopted standard designed for PCs now in
workstations, minicomputers, etc. - Optimized for bursting -- transferring multiple
data words after one address/arbitration phase - Rich set of command types to maximize performance
in different system applications. Examples - Read Commands Write Commands Special Commands
- I/O Read I/O Write Interrupt Acknowledge
- Mem Read Mem Write
- 32 synthesizable soft cores supporting options
- VHDL/Verilog - 32-bit (33 MHz) or
64-bit (66 MHz) - PCI - Host, PCI - Satellite - FIFO or
register data storage - Synchronous or Asynchronous
Source Virtual Chips
25PCI Core In A System Environment
Customers Chip
Application Interface
Virtual Chips PCI Core
PCI Bus
I/O Pads
I/O Pads
Source Virtual Chips
26PCI Core Architecture Example Host Bridge w/FIFOs
Application Interface
Target address
Master address
Master Write Data
Master Write FIFO
Almost Full
I/O Pads
Master Read Data
Master Read FIFO
Empty
Target Write Data
Target Read FIFO
Almost Full
Target Read Data
Target Write FIFO
Empty
Target State Machine
Master State Machine
Config write data, ADDR, Control
Config Regs
Source Virtual Chips
27Case 1 PCI Compliance Validation
- Need protocol validation capability
- Check for wrong responses, timeouts, retry
conditions, etc. - Need timing validation capability
- Clock to Output, Input Setup/Hold, Tri-State
Enable/Disable - Need checks that PCI design is a Good Agent on
bus - Uses the bus efficiently
- Uses the Right command for transfers
- Need Ability to Force Exceptions
- Need Ability to Run Random and Collision Tests
- Need to go beyond the standard Compliance Suite
defined by the PCI SIG, which is only a starting
point
Source Virtual Chips
28Types of Tests Used
- Rigorous test methodology includes
- Compliance Tests
- Master
- Target
- Enhanced feature tests (not in compliance suite)
- Ex FIFO flow control
- Ex Large burst lengths
- Random Tests
- To simulate real-world traffic
- Essential for catching many types of bugs
- Collision Tests
Source Virtual Chips
29PCI Validation Environment
File Reader Interface
TARGET MODEL
MASTER MODEL
Pool MEM
Pool MEM
Procedural Interface ( PI )
P C I B U S
FSM
FSM
Clk Gen Model
Protocol Monitor
Arbiter Model
Timing Checker
Optional Input File
Device Under Test
Source Virtual Chips
30PCI Validation Environment Features
- Includes models for Master and Target devices
- Includes arbitration model
- Any PCI core, can be instantiated as the Device
Under Test - Includes logical protocol checker
- Includes timing checker for full-timing
simulation - Procedural interface (PI) provides tasks to
initiate PCI transactions and check results - Includes script files with compliance and other
tests written using the PI tasks
Source Virtual Chips
31Test Examples (Same Compliance Test)
Programming Interface Example
- -- Test Scenario 2.9. TARGET RECEIVES MEMORY
CYCLES - -- CASE 1 Linear Incrementing AD(10) '00'
(Single Transfer cycles). --gt - -- Memory Write cycle by the Primary Master
to the Slave under test. - MEWE 0x200000 0x11111111 0x0000 0
- -- Read back the written location using a
Memory Read PCI cycle. - MEVF 0x200000 0x11111111 0x0000 0xffffffff 0
- -- Read back the written location using a
Memory Read Line PCI cycle. - MERDL 0x200000 0x0000 0
- -- Read back the written location using a
Memory Read Multiple PCI cycle. - MERDM 0x200000 0x0000 0
- --
- -- Test Scenario 2.9. TARGET RECEIVES MEMORY
CYCLES - --
- write(l, string'("TEST SCENARIO 2.9 --gt "))
- write(l, now)
- writeline(OUTPUT, l)
- write(l, string'("CASE 1 Linear Incrementing
AD10 '00' (Single Transfer cycles).--gt ")) - write(l, now) writeline(OUTPUT, l)
- addr CONV_STD_LOGIC_VECTOR(16200000,
32) - data CONV_STD_LOGIC_VECTOR(1611111111,
32) - -- Memory Write cycle by the Primary Master to
the Slave under test. - cpPciCycle(cpsig1, PCI_MW, addr, data)
wait4cpdone1 if (cpsig1.error '1') then
ASSERT false REPORT " Memory Write
failed. Remaining TS-2.9 tests invalid."
SEVERITY ERROR checkAnyError(cpsig1)
end if - -- Read back the written location using a
Memory Read PCI cycle. - cpSetGlobal(cpsig1, SET_READ_CMP,
C_COMPARE_ONCE) - cpPciCycle(cpsig1, PCI_MRV, addr, data)
- wait4cpdone1
- write(l, string'(" Memory Read cycle
")) - if (cpsig1.error '1') then write(l,
string'(" gt FAILED.")) - writeline(OUTPUT, l)
- checkAnyError(cpsig1)
- else
Input File Test Script Example
Source Virtual Chips
32System Verification Environment (LSI Logic)
33System Simulation Environment (NEC)
Source NECEL
34MicroPack - An Example AMBA system in HDL (ARM)
Source ARM
35Enhancing Simulation Speed Using Simulation Models
- Hardware model
- Behavioral (C) model
- Bus-functional model
- Instruction-Set simulation (ISS) model
- instruction accurate
- cycle accurate
- Full-timing gate-level model
- encrypted to protect IP
36Hardware Model
- Use the actual physical device to model its own
behavior during simulation - Advantages accuracy, full device functionality,
including any undocumented behavior. - Disadvantages delivers 1 to 10 instructions/sec,
cost - Example
- Logic Modeling (Synopsys) Hardware Models
37Behavioral Model
- Behavior of the core modeled in C
- Example Memory models from Denali
- 30-70 of system chip area is memory gt power,
latency, area of chip - In typical simulation, conventional models
consume as much as 90 of workstation memory - C models of DRAM, SRAM, Flash, PROM, SDRAM,
EEPROM, FIFO - RAMBUS, Configurable Cache
- parameterizable models, common interface to all
simulators - allows adaptive dynamic allocation, memory
specific debugging
38Bus Functional Model
- Idea is to remove the application code and the
target processor from the hardware simulation
environment - Performance gains by using the host processors
capabilities rather than simulating same
operation happening on target processor - varying degrees of use of host processor leads to
different models - Bus functional model
- only models the interface circuitry (bus), no
internal functionality - usually driven by commands, like read, write,
interrupt, .. - bus-transaction commands coneveted into a timed
sequence of signal transitions fed as events to
traditional hardware simulator - Bus Functional model emulates
- Read/Write Cycles (single/burst transfers)
- Interrupts
39Compiled Code Simulation
- Host code not eqaul to Target code
- Low-level debugging not possible
- eg. observing processor internal registers
- Measurements may be inaccurate
- eg. cycle counts
40Instruction Set Simulation
- full functional accuracy of the processor as
viewed from pins - Operations of CPU modelled at the
register/instruction level - registers as program variables
- instructions by program functions which operate
on register values - Instructions define relationships between
registers, internal memory, and external memory - Data Path that connects the registers abstracted
out - Allows both high level and assembly code to be
debugged - Instruction Accurate
- accurate at instruction boundaries only
- correct bus operations, and total number of
cycles, but no gurantee of state of CPU at each
clock cycle inaccuracy due to bus contention - Cycle Accurate
- gurantees the state of the CPU at every clock
cycle - gurantees exact bus behavior
- slower than instruction-accurate, but faster than
full behavioral model
Source LSI Logic, Mentor Graphics
41Instruction-Set Simulation Example
- Example system Microtec XRAY Sim
- Fast 100,000 instructions/sec
- software debug source code debugging, register
and memory views
42Example of Simulation Models Used
Example NEC provides the following
simulation models 1- C model (behavioral)
2- RTL model w/ timing wrapper 3- Verilog
gate level model In the early stage of the
ASIC design and software development, customer
uses the C-model because it is the fastest
model. RTL model with timing wrapper for the
accurate timing and function verification. For
the final design verification, gate level model.
( very slow execution time)
43Hardware-Software Co-Simulation
- Most of the bus cycles are Instruction or Data
fetches
- High Activity
- 700-1000 instructions for each I/O bus cycle
- Low Activity
- Only during processor I/O cycles
44Hardware-Software Co-Simulation Implementation
45Seamless CVE Comprehensive System Wide Analysis
Debug
Source Mentor Graphics
46Optimization FoundationMemory Access Time
Source Mentor Graphics
47Seamless Optimization Example
Source Mentor Graphics
48Non-Optimized Logic Simulation
Source Mentor Graphics
49Instruction Fetch Optimization By Masking
Source Mentor Graphics
50Fetch Suppressed Logic Simulation
Source Mentor Graphics
51Memory Access Suppressed
Source Mentor Graphics
52Logic Simulate Active Cycles Only
Source Mentor Graphics
53Performance Optimization Results
Source Mentor Graphics
54Comparison of Validation Methods
55Interface Based System Verification
- Verification of cores (IP Provider)
- making sure core works in all intended
environments - verification models (functional, interface)
- testbench
- Verification of system-on-chip
- pre-verified cores
- validation of
- interfaces, integration, buses, protocols, etc
- Modeling Core Interface
- Interface standards to facilitate integration and
validation of cores (from multiple sources) on
the same chip - Open Modeling Interface from Open Modeling Forum
(Cadence, VSIA)
56Virtual Socket Interface Alliance
Alliance of semiconductor vendors, systems
comapnies, independent core providers, and EDA
vendors Aim Develop open core design
interface and productization standards - define
a set of interfaces for the creation and
integration of cores that enable the efficient
and accurate integration, verification, and
testing of multiple cores (possibly from
multiple sources) on a single piece of silicon.
57VSIA Verification and Interface Standards
58Outline
- Validation Challenges for System-on-Chips
- Validation Methodologies
- Simulation
- Prototype Validation
- System Verification Environments
- Improving Simulation Performance Using Models
- Hardware-Software Co-Simulation
- Analysis/Estimation
- Performance
- Power
59Software Performance Analysis
- Goal Determine a tight upper bound on a
programs worst case execution time estimated
WCET. - Instruction cache memory analysis
- Applications
- HW-SW partitioning
- Real-time systems
- Program Path Analysis
- Determine the worst case execution paths.
- Avoid exhaustive search of program paths.
- Make use of path information provided by the
user. - Microarchitecture Modeling
- Model hardware and determine the execution time
of a known sequence of instructions. - Caches, CPU pipelines, etc. complicate analysis.
for (i0 ilt100 i) if (rand() gt 0.5)
j else k
2100 possible worst case paths!
60Program Path Analysis
- Integer Linear Programming formulation
- Assume constant instruction execution time.
- Basic idea
- Maximize
- subject to a set of linear constraints
- structural constraints derived from program
structure - functionality constraints provided by user
?
c
x
i
i
i
Exec. count of Bi (variable)
Single exec. time of basic block Bi (constant)
- No explicit path enumeration
- Li, Malik,Wolfe, ICCAD 95
61Structural Functionality Constraints
Structural Constraints At each node Exec. count
of Bi ? inputs ? outputs
Example While loop
x
?
d
?
d
/ p gt 0 / q p while (qlt10) q r q
1
1
2
x
?
d
?
d
?
d
?
d
2
2
4
3
5
x
?
d
?
d
3
3
4
x
?
d
?
d
4
6
5
Functionality Constraints provide loop bounds
and other path information
Source Code
Control Flow Graph
0
x
?
x
?
10
x
3
1
1
62Software Performance Analysis Experimental
Results
63Software Power Analysis
- Instruction-level energy analysis
- Assign energy cost to instructions and
inter-instrucion effects - Base energy costs of instructions
- Energy costs of inter-instruction effects (eg.
circuit state overhead, cache misses, pipeline
stalls) - Applied to Intel 486DX2, Fujitsu SPARClite,
Fujitsu DSP - Tiwari, Malik, Wolfe, TVLSI, Dec. 94
64Hardware Power Estimation
- Activity-sensitive power models for macro blocks
- Word-level models for arithmetic components
- Bit-level models
- RTL simulation does not reveal glitching activity
- Glitching can account for as much as 50 of power
- Solution Glitching models for macro blocks
activity estimation techniques for control logic
using functional and partial delay info. - Landman, Rabaey, TVLSI, June 1995
- Raghunathan, Dey, Jha, ICCAD 1996
65Sources
- NEC Electronics, NEC CCRL
- LSI Logic
- Advanced RISC Machines
- Virtual Chips (Phoenix Technologies)
- Mentor Graphics
- Alta Group of Cadence Design Systems
- Synopsys
- Eagle Design Automation
- Princeton University
66Acknowledgements
- C. Smith, M. El-Khatib - NEC Electronics
- D. McKenney, S. Hussain - LSI Logic
- A. Greenhill - Advanced RISC Machines
- T. Anderson, C. Snyder - Virtual Chips
- S. A. Leef - Mentor Graphics
- R. Grossman, B. Williams - Eagle Design
Automation - S. Malik, Princeton University
- A. Raghunathan, NEC