Title: Logic Emulation
1Part III
2What is a Logic Emulation System?
- 1. A programmable hardware built with
programmable logic (FPGA) and programmable
interconnect devices (PID). - 2. A software which automatically programs the
hardware according to the circuit under design - 3. Control HW/SW to support operation of the
emulated design as a hardware component operating
in real time.
3Typical Logic Emulation Environment
Compiler, runtime software
Stimulus generator, logic analyzer
4Why we need Logic Emulation?
- Design verification issues.
- Real-time operation.
- System-level testing.
- Rapid prototyping.
5Design Verification Issues
- Simulation-based verification methods have run
out of steam when chip complexity grows. - Emulation is a verification technology that grows
along with design size.
6Real-Time Operation
- Simulation requires test vector development which
is costly and difficult. - Verification depends on test vector correctness.
- Certain applications must be verified in real
time - human perception audio and video. - Emulation connected to actual hardware can run
- real diagnostic code,
- operating systems, and
- applications.
7System-Level Testing
- Often the chip meets its specifications but it
fails in the system. - We have to verify the system-level interactions
between the chip and other components. They are
hard to formalize. - Internal probing is impossible when the chip is
fabbed and placed in a system - But it is possible using emulation.
8Rapid Prototyping
- Once emulated design is debugged it is available
for immediate use by software developers for
software debugging. - Emulated design is available for demo and
experiments with architecture on real
applications and data.
9Programmable Hardware includes programmable
interconnect
10Considerations for programmable interconnect
- The capacity of logic and interconnection depends
on package constraints. - This forces a hierarchical system.
- Chips gt boards gt boxes gt system
- The interconnect structure must
- 1. Provide successful connectivity,
- 2. Maximize FPGA utilization, and
- 3. Minimize delay and skew.
- Rents rule applies to predict the interconnect
needs.
11Structures of Multi-FPGA Systems
- Topologies
- Mesh - nearest neighboring.
- Crossbar - full and partial. - Interconnect scheme
- Circuit switched.
- Time multiplexed.
12Nearest Neighbor Interconnection
13Advantages and Disadvantages of Nearest Neighbor
Interconnection
- Advantages
- Uniform all chips the same.
- Easy to lay out on PCB.
- Disadvantages
- Routing is easily blocked.
- The through pins limit the logic utilization of
FPGAs.
- Long and unpredictable delays.
- No natural hierarchical extension.
14Nearest Neighbor Extensions
Connect to non-neighbors
Add more neighbors
15Advantages and Disadvantages of nearest-neighbor
extended architectures
- Advantages
- More choices for router by adding diagonal lines
skip lines. - Disadvantages
- More complex PCB.
- More complex routing software.
16Partial Crossbar Interconnect
Logic blocks
Crossbars
B pins
C pins
D pins
A pins
Second-level crossbars
17Partial Crossbar Interconnect
- Partial crossbar consists of a set of small full
crossbars, - connected to logic blocks
- but not to each other.
- I/O pins of each FPGA are divided into subsets.
- Each subset is connected by a full crossbar
circuit switch. - Partial crossbar is a potentially blocking
network.
18Characteristics of Partial Crossbar Architecture
- Partial crossbars size is proportional to the
number of FPGA pins. - All interconnections go through one/three
crossbar chips for a one-level/two-level partial
crossbar interconnect - delays are uniform and bounded.
19Mixed Full and Partial Crossbar
External connections
Partial crossbar
Full crossbar
20Circuit Switched versus Time Multiplexed
Interconnect Schemes
- Trade-offs between the operating speed and the
hardware cost. - Time-multiplexing method
- can greatly expand available interconnect.
- allows lower cost IC package and PCB.
- makes partitioning easier.
- BUT
- System power increases due to frequent signal
switching (higher hardware cost). - Complex scheduling software.
- Slow operating speed.
21Virtual Wires
FPGA
FPGA
Logical outputs
Logical inputs
Physical wires
FPGA
FPGA
DeMux
Mux
I change space to time
22Logic Emulation Systems and their interconnection
schemes
- System with mesh topology - Quickturns RPM and
Virtual Machine Works (IKOS). - System with partial crossbar - Quickturns
Enterprise, Mars, and System Realizer. - System with mixed full and partial crossbar -
Aptix Prototyping System. - System using time-multiplexed interconnect -
Virtual Machine Works (IKOS) , CoBALT and Arkos
(Quickturn).
23Memory Solutions in Emulators and future
devices/systems
- Goal programmable memories with different
width/depth/port combinations. - FPGA-based memories
- inefficient of using logic resources.
- timing correctness is difficult to be insured.
- large or highly multi-ported memories must be
partitioned across several FPGAs. - SRAMs with dedicated or programmable controllers.
24Logic Emulation Design Flow
25Logic Emulation Design Compiler and its components
- Logic emulation design compiler is a large and
complex EDA tool which includes - Front-end design importer.
- HDL-based synthesizer.
- Clock and timing analyzer.
- Partitioner.
- System-level placer and router.
- FPGA-based placer and router.
26Objectives of logic emulation compiler
- Fast compilation time.
- Fast emulation clock.
- Timing correctness.
- Easy (ECO ENGINEERING Change Order).
- Minimize circuit size.
27Design Considerations for Logic Emulators
- HDL synthesis
- Trade-off run-time and quality.
- CLB-based vs. gate-based designs.
- Clock and timing analysis
- Timing correctness, hold-time violation free.
- Clock skew minimization.
- Partitioning
- Run time.
- - Timing and area.
28Design Considerations for Logic Emulators
- System placement and routing
- Timing.
- Completeness of routing.
- FPGA-based placement and routing
- Fast run time.
- Parallel compilation.
Remember you emulate not the same logic as your
design
29Hold-Time Violation
Clock distribution problem (Skew)!!!
Hold-time violation occurs when Routing delay gt
LUT delay!!!
30Timing Correctness
Delay insertion
Delay element
CLB
Routing delay
31Timing Correctness
Use clock enables for gated clocks
Q
Q
D
D
LUT
CK
CK
CE
CLB
Clock path
Primary clock
Low-skew net
32Methodology and components of Logic Emulator
System
- Pre-configuration preparation - prepare netlists
and control files for configuration. - Testbed preparation - prepare emulation-based
operation environment. - Full-chip configuration - download design to the
emulator. - In-circuit emulation - test the design.
33Pre-Configuration in Emulator System
- Translate the leaf-cell libraries into emulation
primitives. - Translated libraries must be verified for
functional equivalence to original. - Modify and redesign some components to attain
compatibility with emulation techniques, such as
precharge logic circuits. - Assemble all the gate-level netlists for the
entire design.
34Testbed in Logic Emulator
- Design and implement the target ICE board
combining the emulated design with real hardware. - Slowdown testbed to emulation speed.
- Assemble the testbed and emulation equipment.
35Full-Chip Configuration In-Circuit Emulation
- Full-chip configuration
- Prepare control files.
- Partition the design to fit into the emulation
system.
- Download design into the system.
- Verify that the emulation model faithfully
implements the design as specified by RTL. - In-circuit emulation
36Part IV
- Reconfigurable Computing and Systems
37General-Purpose Computing vs. Custom Computing
- General-purpose computing - applying applications
on a general-purpose computer. - Custom computing - applying applications on a
custom-made application-specific hardware. - Field-programmable devices make this into a
reality.
38Goals of Reconfigurable Computing
- Tailor the architecture to the application.
- Minimize or eliminate instruction interpretation.
- Exploit fine grained parallelism.
- Map software to hardware.
39Applications of reconfigurable computing
- Database search and analysis.
- Image processing and machine vision.
- Data compression.
- Signal processing.
- Neural networks.
- Biology computing.
- Medical computing.
- Design Automation (PSU)
- Many more.
40Multi-Mode Systems map various applications to a
reconfigurable system
ROM
Reconfigurable system
Application 1
Application 2
- Different configurations for read write
- operations of a tape driver (Honeywell).
- Different configurations for different
- printer controllers (Tektronix).
41Run-Time Reconfiguration in military image
recognition system
Jeep?
Image data
I/O
?
Tank?
- Break single computation into multiple pieces.
- Page in components as needed (virtual
hardware), - ex., automatic target recognition.
42Custom Computing
- Application-specific systems.
- Numerous applications for similar reconfigurable
systems. - Offers hardware performance, flexibility to
handle numerous algorithms. - Multi-FPGA systems can be viewed as hardware
supercomputers.
Tell about DEC Perle
43Reconfigurable Co-processors
Program 2
Inst2
- Provide custom instructions on a
per-application basis.
44Types of Reprogrammable Systems
Three ways to attach custom computing units
Attached processing unit
PU processing Unit
45Types of Reprogrammable Systems
- Attached and standalone processing units are
reprogrammable systems on computer add-on cards
and separate reprogrammable cabinets. -
- Considerations large communication overhead may
over-shadow the speed gain. - Application-specific coprocessors can achieve
significant improvement over a wide range of
applications.
46Types of Reprogrammable Systems
- Integrate the reprogrammable logic into the
processor itself. -
- A reprogrammable functional unit can be
configured on a per-algorithm basis. -
- Providing some special-purpose instructions
tailored to the needs of a given application.
47Architectures of Multi-FPGA (Reconfigurable)
Systems
- The most commonly used topologies
- Mesh 1D (linear array), 2D, and 3D.
- Crossbar full, partial, mixed, and
hierarchical.
- Hybrid between mesh and crossbar.
- Application-specific architecture.
48Hybrid Topology of a reconfigurable system
Splash 2 augments a linear array of FPGAs with
a crossbar switch. Goal Supporting
systolic circuits.
49Hybrid Topology
Host interface
Anyboard A linear array of FPGAs augmented
by global buses.
50Hybrid Topology
RAM
Host interface
RAM
4 X 4 mesh of FPGAs
RAM
RAM
DECPeRLe-1 a 4 X 4 mesh of FPGAs augmented
with shred global buses.
51Application-Specific Topology of MARC-1, one
subsystem
Connections to other FPGAs
1
4
5
2
3
1
3
4
5
2
1
4
3
2
5
1
The Marc-1 subsystem 1.
52Application-Specific Topology of Marc-1, cont.
- Application in circuit simulation where the
program to be executed can be optimized on a - per-run basis.
- This is done for values constant within that
run, - but which may vary from dataset to dataset.
1
The Marc-1
2
3
4
5
53Application-Specific Topology
RAM
FPGA
FPGA
FPGA
RAM
RAM
The RM-nc system neural network.
54Architecture for Computer Prototyping
VME bus
FPGA
FPGA
FPGA
Cache memory
FPGA
FPGA
FPGA
Register file
FPGA
ALU
FPU
The Mushroom processor prototyping system.
55Expandable Topologies
- Hierarchical crossbar topology can be expanded
by adding extra level.
- Quickturn systems. - Expandable mesh topology can be expanded by
connecting individual boards to form a large
mesh. - The Virtual Wires Emulation System (IKOS).
56Topology for Adapting Other Components
- Many multi-FPGA systems include non-FPGA
resources to provide more general purpose
solutions. - The MORRPH system - sockets next to FPGAs which
allow to add arbitrary devices to the array. - The G800 board - contains two FPGAs and four
sockets.
57Topology for Adapting Other Components
- The COBRA system
- Contains
- based modules (expanding to 2D mesh),
- RAM modules,
- I/O modules,
- and bus modules.
- The Springbok system
- a pre-made daughter board which is able to
contain an arbitrary device (on the top) and an
FPGA (on the bottom). - Daughter boards are mounted on a baseplate.
58Topology for Adapting Other Components
- The Quickturn systems - external component
adapters. - The Aptix FPCB - a reprogrammable PCB.
59Design Methodology for general-purpose
configurable systems
Mapping
60Typical Software Methodology for general-purpose
configurable systems
61Typical Software Methodology for general-purpose
configurable systems
62Considerations for such complex software systems
- Architectural-specific design tasks.
- Design automation process.
- The mapping time dominates the setup time for
operating the system. - Run-time reconfigurability.
63Design Specification and Languages for
reconfigurable software systems
- Standard software programming languages,
- e.g., C, C, FORTRAN, and assembly language,
vs. HDLs. - Standard software programming languages - a
sequential execution model. - HDLs - a parallel execution model.
- Who will use it and which one is more suitable
for system description???
64Compilation Issues
- Translate code from software languages into
hardware without losing the inherent concurrency
of hardware. - Compiler techniques for parallelizing code.
- Straight-line code, control flow, and loops.
- Transmogrifier C compiler.
65System-level and High-level Synthesis
- System-level design evaluation and analysis.
- Design estimation.
- Hardware-software partitioning.
- Interface synthesis.
- RTL synthesis.
- Logic synthesis and technology mapping.
66Partitioning and Placement
- Topology-aware partitioning methods.
- Partitioning onto a multi-FPGA system is
equivalent to a placement problem. - Logic utilization and timing.
67Pin Assignment and Routing
- Pin-assignment - the process of determining which
I/O pins to be used for each inter-FPGA signal. - Pin-assignment for a pre-fabricated multi-FPGA
system is equivalent to the global routing
problem. - Pin-assignment will greatly affect the quality of
FPGAs logic utilization and routability.
68Run-Time Reconfigurability
This is a new issue in system design how much of
the processor is virtual, when to reconfigure?
- Virtual hardware ltgt virtual memory. What are
their relations? Artificial Intelligence,
robotics. Vision. - Hardware on demand.
- What is the Initial Un-configured structure?What
are the reconfiguring methods. - Software supporting time-varying mapping.
- Many open problems need to be solved in the forth
coming years.
69Applications Splash 2
- Stream oriented systolic and SIMD applications.
- Scalable linear array of 16 to 256 processing
elements (1 XC4010 with 1/2 Mbyte). - VHDL based.
- Sequence comparison - 2300M0.75M cell
updates/sec (Splash 2Sparc 10). - Edge detection - 10M242K pixels/sec (Splash
2Sparc 10).
70Applications PAM (DEC)
- Programmable Active Memory (PAM).
- C based and mesh arrays of XC3090 (DECPeRLe-1).
- Applications
- Multiple precision arithmetic.
- RSA encryption.
- Video compression (JPEG, MPEG, DCT). -
- High energy physics.
- Telecommunications.
71Sources of some slides
- Peter Alfke
- Xilinx, Inc
- peter.alfke_at_xilinx.com