Title: The Berkeley Emulation Engine
1The Berkeley Emulation Engine
- Chen Chang
- Kimmo Kuusilinna
- Robert W. Brodersen
- Berkeley Wireless Research Center
- April 23rd, 2002
2Whats BEE?
- A real time hardware emulator built from 20
high-density Field Programmable Gate Arrays
(FPGAs). - Emulation capacity of 10 Million ASIC
gate-equivalents per module, corresponding to
600 Billion operations (16-bit adds) per second. - Realistic emulation speed 1 100 MHz
- 2400 external I/O for add-ons, like radios.
- Automated design flow from Simulink to FPGA
emulation, integrated with the Chip-in-a-Day ASIC
design flow.
3The Hive
Analog Front-end
Network
LVDS
Dedicated Ethernet
Integrated Design Flow
FPGA Bit Stream Conf File
Simulink MDL
ASIC Layout
4Applications
- Real-time hardware emulation
- Novel Communication Systems with analog front-end
hardware (MCMA, UWB, 60GHz) - Digital signal processing systems
- Real-time control systems
- Neuron-like network processing
- Hardware acceleration
- Large communication/signal processing system
simulation - Hardware-in-the-loop cosimulation with software
system - Complex parallel computing algorithms
5System Architecture
- Processing Board
- Total 20 Xilinx VirtexE 2000 chips, 16 on a first
level mesh processing, 4 on a second level mesh. - 16 ZBT SRAM chips, 1MB each.
- Control module
- Intel StrongARM 1110, on board 10 Base-T
Ethernet, Linux OS - Radio Rx/Tx Front-End
- 2.4 GHz transceiver, Ultrawide-band transceiver
- Design Flow
- Integrated Simulink to Implementations
(ASIC/FPGA) automatic design flow.
6Processing Board Architecture
48 bit buses
7Processing Board PCB
- Board Dimension 53 X 58 cm
- Layout Area 427 sq. in.
- No. of Layer 26
- Technology 4 Mil Trace
- Manhattan Distance 45,950 in.
- Etch Length 51,804 in.
- No. of Vias 32,334
- Pin Count 28,611
- No. of Nets 8,493
- No. of Connections 19,877
- Total Components 3,400
- Bypass Capacitors 2,40
8Processing Board
9Processing Board
10Off Module Riser I/O Cards
68 Pin HD SCSI Connectors 48 signals per
connector
Source (Xilinx only)
LVDS Termination Resistor Arrays
Destination
400 pin VHDM-HSD Right Angle PCB Connector
11Riser I/O Card PCB The Bare Board
- 12 Layer PCB
- 5 Mil Trace
- 300 (150 pair) active signals
- 36 Termination resister array packs
- Analog/Digital Power Connector (5A each)
12Riser I/O Card Connected to BPU
13Controller Module
- 206MHz StrongARM 1110 Processor
- 32MB SDRAM
- 16MB Flash ROM
- 10Base-T Ethernet with RJ45 jack
- Compact Flash slot for expandability
- Linux Kernel 2.4 as OS
- Remote FPGA configuration and read-back through
GPIO
14Power Distribution System
- Max Input Power 800Watt
- Max Processing Power 600Watt
- Max Current 600A
15Power Board
16Power Board PCB The Bare Board
- 6 Layer PCB
- 25 Mil Trace
- 10 DC-DC converters (max 60A output current each)
- 25 Terminal Blocks (max 60A current per circuit)
17Power Board PCB connected to BPU
18BEE Hardware Performance
- Board-level Main Clock Rate 160MHz
- On Board connection speed
- FPGA to FPGA 100MHz
- XBAR to XBAR 70MHz
- Off board connection speed (3 ft SCSI cable loop
back through riser card) - LVTTL 40MHz
- LVDS 160MHz 220MHz
19BEE Hardware Capacity
- Reference Design
- 10240 tap FIR filter
- 512 taps per FPGA
- Slice utilization 99 of 19200 slices
- Max Clock Rate 28.5MHz
- ASIC Gate 401K per FPGA, 8M total
- MOPS 583,680 total (16bit add 12bit cmult)
- Power 2.5W per FPGA, 50W total
2010240 Tap Fir Design
2110240 Tap Fir Design (cont.)
22BEE Design flow Goals
- Automatic generation of FPGA bit streams and
inter-chip place-and-routing configuration from
Simulink system level design - Full integration with Chip-in-a-Day flow
- Cycle accurate bit-true functional level
equivalency between ASIC BEE implementation
23Integrated design flow overview
- Design Flow Structure
- Design flow view
- Simulation/verification view
- Library development view
- Component Library
- Control Logic Design
24Design Flow View
25BEE Post Design Processes
26Simulation/Verification View
27Library Development View
28Virtual Component Library
- parameterized system level blocks
- Bit-width
- Pipeline stages (latency)
- Output bits truncation
- Customizable block set library
- Different Architecture
- Different Technology Target
29Basic Block Sets
30DSP/COM Block Set
31Interface Control Blocks
32Custom-built Library
- Hand-written VHDL for ASIC/FPGA
- Synopsys Module Compiler implementation for
ASIC/FPGA - Matlab Script generated regular Simulink
structures, i.e. FFT, FIR, Cordic
VHDL
Subsystem
Black Box
Native Simulink Blocks
I/O Ports
33Control Logic Design
- Simulink level StateFlow diagram
- SF2VHD program automatically generate VHDL code
from StateFlow diagram
VHDL
SF2VHD
Black Box
StateFlow
Control Signals
Controller
34Design Example Data Flow
35Control in StateFlow
36Generating Logic in SystemGenerator
The AND gate
OUTPUT0
a0
MUX
b0
a10
b10
OUTPUT1
a1
MUX
b1
a_valid
b_valid
VCC
valid_out
37Design Example Area Estimate
- gtgt gcs
- ans my_addreg
- gtgt slices,luts,ffs,brams bee_sys_area(gcs)
- Loading user preferences...
- Creating Coregen.log file...
- Initializing default project...
- Loading plug-ins...
- Loading settings from "C\kimmo\SysGen\my_addreg\
corework\coregen.ini" - Finished processing "C\kimmo\SysGen\my_addreg\co
rework\coregen.ini - Set current Project to C\kimmo\SysGen\my_addreg\
corework -
- Loading settings from "./\coregen.ini"
- Finished processing "./\coregen.ini"
- Preparing to elaborate core...
- Elaborating the module...
- Generating the core .EDN implementation
netlist... - Generating the .VEO/.V simulation support
files... - Successfully generated my_addreg_xladdsub_core1
(Adder Subtracter 5.0) - slices 13
- luts 14
- ffs 12
- brams 0
38Design Example FPGA Flow GUI
39Design Example FPGA Design
40Design Example ASIC Flow GUI
41 Design Example ASIC Gates
Flattened design, but the Valid signals are still
in.
42 Design Example ASIC Gates
Flattened design, and the design is optimized for
the case where the valid signals are always
valid.
43Design Example ASIC Layout (route)
44Current Status
- Software
- Database and scripting based PCB design /
verification method - Simulink-to-implementation design flow
integration for BEE and ASICs - High-level performance estimators
- Hardware
- One Complete assembled BPU system
- Function/Speed/Capacity initial test complete
- Documentation
- Online Interactive HTML doc of the entire system
45Immediate to do list
- User Interface
- Complete remote control through Ethernet
- Hardware-in-the-loop simulation acceleration with
Simulink and BEE - Software
- ASIC implementation of the virtual component
library using Module Compiler - ASIC backend flow and high-level control
integration