Title: Combining%20Simulators%20and%20FPGAs%20
1Combining Simulators and FPGAs An Out-of-Body
Experience
- Eric S. Chung, Brian Gold, James C. Hoe, Babak
Falsafi - echung, bgold, jhoe, babak_at_ece.cmu.edu
SIMFLEX/PROTOFLEX
2The RAMP full-system challenge
- RAMP vision for studying systems w/ FPGAs
- functional cycle-accurate simulation
- scalability, speed, flexibility on FPGAs
- full-system (run unmodified binaries OS)
?
?
?
IRQ controller
I/O MMUcontroller
DMAcontroller
CPU
CPU
Terminal
PCI Bus
Memory
Ethernetcontroller
SCSIcontroller
Graphics card
Disk
Disk
Full-sys RAMP will incur large effortyet, not
all behaviors frequently used (e.g., I/O)
3Combining simulators FPGAs
- Simulators already provide full-system
- ? why not simulate infrequent behaviors (e.g.,
I/O devices)?
Simulator
FPGA
CPU
CPU
CPU
CPU
Ethernet
Memory
SCSI
Ethernet
SCSI
Memory
disk
disk
- Advantages
- avoid impl. infreq. behaviors ? lowers full-sys
FPGA development - low impact on scalability perf. on FPGA
4Outline
- Motivation
- Migration
- Implementation status
- Conclusion
5Migration
Target design
FPGA
Simulator
Target objectsex func or timing cpu
- 3 ways to map target object to host
- FPGA-only Simulation-only Migratable
- Migratable objects
- switch modes between FPGA simulator hosts
- target behavior need not be 100 in FPGA mode
- e.g., impl. 80 target behavior in FPGA, 100 in
simulator
6Migration example
- Target-to-host mappings
- CPU migratable
- Memory FPGA-only
- Devices SW-only
CPU
FPGA
Memory
SCSI
Example CPU instruction stream
CPU state transfer
Simulator
load
CPU
add
time
multiply
I/O SCSI cmd
SCSI
Memory
add
sub
..
disk
7Advantages
- Lowers development effort
- avoid bring-up of infrequent behaviors
- migrate validate ref. models from simulator
- tailor impl. to workload (avoid rarely used
instrs, good for CISC x86) - Fast scalable
- perf-critical objects on FPGA (eg, CPU, memory)
- scalable for MPs ? add migratable CPUs
FPGA
Simulator
CPU
CPU
CPU
CPU
CPU
CPU
Memory
SCSI
Memory
SCSI
disk
8Subtleties
- Objects separated in simulator/FPGA interact
- examples interrupts, DMA
- handle by forwarding messages between
FPGA/simulator - FPGA-only SW-only mapped objects easy to locate
- migrated objects require tracking
Simulator
FPGA
CPU
CPU
DMA
Memory
SCSI
SCSI
Memory
disk
Forwarded DMA
9Subtleties
- Objects separated in simulator/FPGA interact
- examples interrupts, DMA
- handle by forwarding messages between
FPGA/simulator - FPGA-only SW-only mapped objects easy to locate
- migrated objects require tracking
Option 2Forced migration
Option 1Forwarded interrupt
Simulator
FPGA
CPU
CPU
Interrupt
Memory
SCSI
SCSI
Memory
disk
Cross-host interactions rare ? low impact on
FPGA perf.
10Subtleties cont.
- Migration cost
- migrating object requires state copy
- e.g., migratable CPU has registers TLBs
- FPGA-to-simulator latency sim. time limits
migrations/instr - FPGA simulator asynchrony
- simulated time ticks at different rates in FPGA
simulator - must synchronize for deterministic replay
accurate device timing
11Outline
- Motivation
- Migration
- Implementation in progress
- Conclusion
12Implementation status
- Target system
- Sun Firetm 3800 Server (up to 24-way)
- UltraSPARC III ISA
- Solaris 8
- Proof-of-concept software-to-software migration
- run 2 instances of Virtutech Simics
- migration designed tested in 2 weeks
- can migrate on arbitrary behavior (e.g., ADD
instruction)
13BlueSPARC core (in progress)
- In-order SPARCV9 core
- supports 144 out of 170 integer instr behaviors
- supports partial MMU w/ I- D-TLBs
- goal 99.999 of instrs behaviors in target
workloads - SPEC (mostly user-level), OLTP/DB2 (high TLB
misses, 40 time in priv-mode) - CPI ranges 5 to 7 cycles
- synth 15k LUTs on Virtex-II Pro 30, 85MHz,
12MIPS (worst-case) - developed in Bluespec HDL, 6000L in 6 weeks
- Core validation
- run RTL in lockstep w/ Simicss UltraSPARC
simulation model - workload validation w/ SPEC, OLTP/DB2, OpenSPARC
verif. suite
14Migration on FPGA (in progress)
Virtutech Simics
Xilinx XUP Virtex-II Pro 30
Simics UltraSPARC
BlueSPARC
PowerPC
Migration messageinterface
Simulated target devices
DDR memory
ethernet
- PowerPC functions
- core memory initialization from Simics
checkpoints - facilitates migration for BlueSPARC
- connects simulated devices to memory (e.g., SCSI
DMA)
15Conclusion
- Contributions
- virtualizes infrequent behaviors using simulation
- simplifies full-system FPGA emulator, still
fast/scalable - incremental validation from reference system
- Future work
- support migration in RDL?
- adding cores scaling across multiple FPGAs
- We are ready for BEE2
- Thanks! Questions? echung_at_ece.cmu.edu
- PROTOFLEX/SIMFLEX (http//www.ece.cmu.edu/simflex
)