Title: Hardwired networks on chip for FPGAs and their applications
1Hardwired networks on chip for FPGAsand their
applications
Kees Goossens (TU Delft, NXP) Muhammad Aqeel
Wahlah (TU Delft)
- Kees Goossens (NXP, TUD)
- Muhammad Aqeel Wahlah (TUD)
2overview
- applications
- network on chip
- FPGA
- key ideas
- hardwired NOC
- unified interconnect
- data coercion / type casting
- application dynamic partial reconfiguration
- multiple concurrent applications
- multiplex sub-applications (hardware tasks)
- example
- conclusions
3applications
- task / function mapped on IP
- includes local storage / buffering
- application set of communicating IPs / tasks /
... - data, control, code
- communication via connections
- use case set of concurrent applications
4network on chip (NOC)
- connects ports on hardware blocks (IP)
- data, control
- connections virtual wires
- real-time / quality of service
- programmable at run-time
- set up remove connections by programming
control registersin the NOC - styles of communication
- address-based /memory-mapped
- streaming
T3
A1
A2
IP
NOC
NI
NI
BA
IP
R
R
NI
NI
IP
T2
IP
R
NI
BAC
IP
T1
5FPGA fabric
LUT
IO processor
- soft IP are configured in
- configurable elements (LUT)
- and switch boxes (not shown)
- with a given configuration granularity (frame)
using the configuration interconnect (ICAP) - hard IP
- CPU
- on-chip memories (BRAM, ...)
- off-chip memory interfaces
- decryption IP
- etc.
CPU
LUT
de/encrypt accelerator
off-chipmemory
LUT
on-chip memory
LUT
on-chip memory
configuration bitstream loading programming /
control set MMIO registers xilinx terminology
(frames, ICAP, etc.)
ICAP
6application on FPGA
LUT
IO processor
soft data interconnect
soft control interconnect
A2
A1
- design an application as for ASIC
- IPs, interconnect, storage, sw
- but map on soft hard IP resources
- traditionally have separate softdata and control
interconnects - could also use soft NOC for both
CPU
frame
de/encrypt accelerator
off-chipmemory
BAC
frame
BAC
A1
A2
BA
on-chip memory
BA
frame
on-chip memory
ICAP
7multiple applications on FPGA
LUT
IO processor
soft data interconnect
soft control interconnect
A2
A1
- interconnects and IPs of different applications
share reconfiguration regions (frames) - dynamic reconfiguration is global, not partial
CPU
T3
LUT
de/encrypt accelerator
T1
off-chipmemory
BAC
LUT
BAC
A1
A2
BA
on-chip memory
BA
LUT
T2
on-chip memory
ICAP
8overview
- application
- network on chip
- FPGA
- key ideas
- hardwired NOC ? improved performance cost
- unified interconnect ? flexibility
- data coercion / type casting ? cool (and useful)
applications - application dynamic partial reconfiguration
- multiple concurrent applications
- multiplex sub-applications (hardware tasks)
- example
- conclusions
91. hardwired interconnect
hard interconnect(s)
CFR
IO processor
A2
A1
- replace soft interconnect(s)by hard
interconnect(s) - connect reconfifgurable regionsof LUTs (CFR)
- bit-level reconfigurability (CFR)
- switch boxes
- transaction-levelreconfigurability (NOC)
- routers, NIs
- memory mapped / streaming
- Hecht FPL05
CPU
T3
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
T1
on-chip memory
BA
CFR
T2
on-chip memory
ICAP
101. hardwired interconnect
hard interconnect(s)
CFR
IO processor
c3
C1
- 35 X smaller area
- 3.5 X higher speed
- 150 X better perfcost ratio(bits/sec/area)
- 200 X smaller configuration footprint(program
MMIO, no bitstream) - 200 X faster soft IP load boot
- dynamic partial reconfiguration
- no constraints on soft IP placement due to
communication - loss of flexibility
- fewer LUTs
- CFR frame ? 7 hard NOC
- based on Virtex4 Aethereal NOC, Goossens
NOCS08
C2
CPU
T3
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
T1
on-chip memory
CFR
T2
on-chip memory
ICAP
11performance cost
- essentially, it all depends on
- area softhard 351
- speed softhard 3.51
- configuration footprint of soft NOC (bitstream)
programming footprint of hard NOC (MMIO
registers) 2141 - resulting in
- boot time softhard 1200
- functional performancecost (bit/secarea)
softhard 1147
12performance cost
- configuration speed
- 1.9 Gb/s for dedicated configuration interconnect
(ICAP) - 8 Gb/s for hard NOC
- programming speed
- 118 MHz soft NOC
- 500 MHz hard NOC
- configuration footprint for soft NOC
- 1.8 Mb (8300 LUTs per routerNI)
- programming footprint for hard NOC
- 2100 bit per connection
- thus to configure program an NI
- 1 msec for soft NOC
- 10.6 µsec for hard NOC
132. unified interconnect
single hard interconnect
CFR
IO processor
A2
A1
- one interconnect (e.g. NOC) for
- data for functional mode
- control for programming
- bitstreams for configuration
- dynamic partitioning of different interconnects
CPU
T3
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
T1
on-chip memory
BA
CFR
T2
on-chip memory
ICAP
143. data coercion
bitstream
single hard interconnect
CFR
IO processor
- data control bitstream test
- connect a data portto a configuration port
- decrypt bitstreams
CPU
CFR
de/encrypt accelerator
off-chipmemory
CFR
data
on-chip memory
CFR
on-chip memory
153. data coercion
single hard interconnect
CFR
IO processor
- data control bitstream test
- connect a data portto a configuration port
- decrypt bitstreams
- relocate bitstreams
- run-time compute / optimise bitstreams
- JIT, peephole
CPU
PH
CFR
de/encrypt accelerator
bitstream
off-chipmemory
CFR
on-chip memory
CFR
IP
on-chip memory
163. data coercion
single hard interconnect
CFR
IO processor
- data control bitstream test
- connect a data portto a configuration port
- decrypt bitstreams
- relocate bitstreams
- run-time compute / optimise bitstreams
- JIT, peephole
- data port to test port (NOC as TAM)
- on-line (structural) testing
- on-chip test-vector generation
CPU
PH
CFR
de/encrypt accelerator
bitstream
off-chipmemory
CFR
on-chip memory
CFR
IP
on-chip memory
17overview
- applications
- network on chip
- FPGA
- key ideas
- hardwired NOC
- unified interconnect
- data coercion / type casting
- application dynamic partial reconfiguration
- multiple concurrent applications
- multiplex sub-applications (hardware tasks)
- example
- conclusions
18dynamic partial reconfiguration idea
- hardware operating system implements run-time
scheduling of - multiple concurrent applications
- independent applications on own virtual platform
- no communication, no interference
- performance virtualisation
- activation given by user, environment, etc.
app T
app D
A
AC
time
19dynamic partial reconfiguration idea
- hardware operating system implements run-time
scheduling of - multiple concurrent applications
- parts of single applications (soft IP, hardware
tasks) - multiplex parts of a single application on same
resources
or
sub-app A
sub-app C
app T
app D
A
C
C1
C2
C3
A1
A2
BA
time
20dynamic partial reconfiguration idea
- hardware operating system implements run-time
scheduling of - multiple concurrent applications
- parts of single applications (soft IP, hardware
tasks) - multiplex parts of a single application on same
resources - internal state
state
app T
A
C
app D
time
21dynamic partial reconfiguration implementation
- system manager
- resource management (CFR, NOC, memory, )
- inter-application virtual platforms
T
application manager
A
C
BAC
application manager
system manager
time
22dynamic partial reconfiguration implementation
- system manager
- resource management (CFR, NOC, memory, )
- inter-application virtual platforms
- intra-application phases
- NOC programming
- soft IP / (sub)-application configuration (incl.
clock, reset) - bottleneck?
A
C
BAC
application manager
system manager
time
23dynamic partial reconfiguration implementation
- system manager
- application manager
- application programming
T
application manager
A
C
BAC
application manager
system manager
time
24dynamic partial reconfiguration implementation
- system manager
- application manager
- application programming
- intra-application persistent data management
state
A
C
BAC
application manager
system manager
time
25overview
- applications
- FPGA
- network on chip
- key ideas
- hardwired NOC
- unified interconnect
- data coercion / type casting
- application dynamic partial reconfiguration
- multiple concurrent applications
- multiplex sub-applications (hardware tasks)
- example
- conclusions
26modelling
- SystemC
- bit cycle accurate NOC model
- behavioural CFR models
- accurate bitstream structure
- behavioural hard IP models
- model
- starting / stopping of applications
- dynamic, based on user input
- starting / stopping of sub-applications
- dynamic, based on flow of data
- configuration loading of bitstreams for soft IP
clock reset - programming of NOC, system sub-application
managers - management of persistent state
27example
single hard interconnect
CFR
IO processor
A2
A1
- system manager
- program NOC for configuration
CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
28example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
- system manager
- program NOC for configuration
- configure load bitstreams
- including bitstream syntax, etc.
CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
29example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
- system manager
- program NOC for configuration
- configure load bitstreams
- program NOC for (sub)-application A
CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
30example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
- system manager
- program NOC for configuration
- configure load bitstreams
- program NOC for (sub)-application A
- program start application manager
- including clocking reset
CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
31example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
- system manager
- program NOC for configuration
- configure load bitstreams
- program NOC for (sub)-application A
- program start application manager
- application manager
- programs starts sub-app A
- soft IP fn is modelled by CFR
CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
32example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
- system manager
- program NOC for configuration
- configure load bitstreams
- program NOC for (sub)-application A
- program start application manager
- application manager
- programs starts sub-app A
- sub-application A runs
CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
33example
- system manager
- program NOC for configuration
- configure load bitstreams
- program NOC for (sub)-application A
- program start application manager
- application manager
- programs starts sub-app A
- sub-application A runs
- Goossens NOCS08, Wahlah RAW09
34conclusions
- ideas
- hardwired NOC ? performancecost
- unified interconnects ? hardware multi-tasking
- data coercion / type casting ? cool useful
- very detailed model
- many simplifications restrictions
- many open issues
- design flow soft IP placement, binding,
relocation, etc. Madsen? - application model
- extend use-case model with intra-application
dynamism - more general notions of persistent state
- implementation separation of system
application managers
35(No Transcript)