Hardwired networks on chip for FPGAs and their applications - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Hardwired networks on chip for FPGAs and their applications

Description:

connects ports on hardware blocks (IP) data, control. connections: virtual wires ... parts of single applications (soft IP, 'hardware tasks' ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 35
Provided by: mpsoc
Category:

less

Transcript and Presenter's Notes

Title: Hardwired networks on chip for FPGAs and their applications


1
Hardwired networks on chip for FPGAsand their
applications
Kees Goossens (TU Delft, NXP) Muhammad Aqeel
Wahlah (TU Delft)
  • Kees Goossens (NXP, TUD)
  • Muhammad Aqeel Wahlah (TUD)

2
overview
  • applications
  • network on chip
  • FPGA
  • key ideas
  • hardwired NOC
  • unified interconnect
  • data coercion / type casting
  • application dynamic partial reconfiguration
  • multiple concurrent applications
  • multiplex sub-applications (hardware tasks)
  • example
  • conclusions

3
applications
  • task / function mapped on IP
  • includes local storage / buffering
  • application set of communicating IPs / tasks /
    ...
  • data, control, code
  • communication via connections
  • use case set of concurrent applications

4
network on chip (NOC)
  • connects ports on hardware blocks (IP)
  • data, control
  • connections virtual wires
  • real-time / quality of service
  • programmable at run-time
  • set up remove connections by programming
    control registersin the NOC
  • styles of communication
  • address-based /memory-mapped
  • streaming

T3
A1
A2
IP
NOC
NI
NI
BA
IP
R
R
NI
NI
IP
T2
IP
R
NI
BAC
IP
T1
5
FPGA fabric
LUT
IO processor
  • soft IP are configured in
  • configurable elements (LUT)
  • and switch boxes (not shown)
  • with a given configuration granularity (frame)
    using the configuration interconnect (ICAP)
  • hard IP
  • CPU
  • on-chip memories (BRAM, ...)
  • off-chip memory interfaces
  • decryption IP
  • etc.

CPU
LUT
de/encrypt accelerator
off-chipmemory
LUT
on-chip memory
LUT
on-chip memory
configuration bitstream loading programming /
control set MMIO registers xilinx terminology
(frames, ICAP, etc.)
ICAP
6
application on FPGA
LUT
IO processor
soft data interconnect
soft control interconnect
A2
A1
  • design an application as for ASIC
  • IPs, interconnect, storage, sw
  • but map on soft hard IP resources
  • traditionally have separate softdata and control
    interconnects
  • could also use soft NOC for both

CPU
frame
de/encrypt accelerator
off-chipmemory
BAC
frame
BAC
A1
A2
BA
on-chip memory
BA
frame
on-chip memory
ICAP
7
multiple applications on FPGA
LUT
IO processor
soft data interconnect
soft control interconnect
A2
A1
  • interconnects and IPs of different applications
    share reconfiguration regions (frames)
  • dynamic reconfiguration is global, not partial

CPU
T3
LUT
de/encrypt accelerator
T1
off-chipmemory
BAC
LUT
BAC
A1
A2
BA
on-chip memory
BA
LUT
T2
on-chip memory
ICAP
8
overview
  • application
  • network on chip
  • FPGA
  • key ideas
  • hardwired NOC ? improved performance cost
  • unified interconnect ? flexibility
  • data coercion / type casting ? cool (and useful)
    applications
  • application dynamic partial reconfiguration
  • multiple concurrent applications
  • multiplex sub-applications (hardware tasks)
  • example
  • conclusions

9
1. hardwired interconnect
hard interconnect(s)
CFR
IO processor
A2
A1
  • replace soft interconnect(s)by hard
    interconnect(s)
  • connect reconfifgurable regionsof LUTs (CFR)
  • bit-level reconfigurability (CFR)
  • switch boxes
  • transaction-levelreconfigurability (NOC)
  • routers, NIs
  • memory mapped / streaming
  • Hecht FPL05

CPU
T3
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
T1
on-chip memory
BA
CFR
T2
on-chip memory
ICAP
10
1. hardwired interconnect
hard interconnect(s)
CFR
IO processor
c3
C1
  • 35 X smaller area
  • 3.5 X higher speed
  • 150 X better perfcost ratio(bits/sec/area)
  • 200 X smaller configuration footprint(program
    MMIO, no bitstream)
  • 200 X faster soft IP load boot
  • dynamic partial reconfiguration
  • no constraints on soft IP placement due to
    communication
  • loss of flexibility
  • fewer LUTs
  • CFR frame ? 7 hard NOC
  • based on Virtex4 Aethereal NOC, Goossens
    NOCS08

C2
CPU
T3
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
T1
on-chip memory
CFR
T2
on-chip memory
ICAP
11
performance cost
  • essentially, it all depends on
  • area softhard 351
  • speed softhard 3.51
  • configuration footprint of soft NOC (bitstream)
    programming footprint of hard NOC (MMIO
    registers) 2141
  • resulting in
  • boot time softhard 1200
  • functional performancecost (bit/secarea)
    softhard 1147

12
performance cost
  • configuration speed
  • 1.9 Gb/s for dedicated configuration interconnect
    (ICAP)
  • 8 Gb/s for hard NOC
  • programming speed
  • 118 MHz soft NOC
  • 500 MHz hard NOC
  • configuration footprint for soft NOC
  • 1.8 Mb (8300 LUTs per routerNI)
  • programming footprint for hard NOC
  • 2100 bit per connection
  • thus to configure program an NI
  • 1 msec for soft NOC
  • 10.6 µsec for hard NOC

13
2. unified interconnect
single hard interconnect
CFR
IO processor
A2
A1
  • one interconnect (e.g. NOC) for
  • data for functional mode
  • control for programming
  • bitstreams for configuration
  • dynamic partitioning of different interconnects

CPU
T3
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
T1
on-chip memory
BA
CFR
T2
on-chip memory
ICAP
14
3. data coercion
bitstream
single hard interconnect
CFR
IO processor
  • data control bitstream test
  • connect a data portto a configuration port
  • decrypt bitstreams

CPU
CFR
de/encrypt accelerator
off-chipmemory
CFR
data
on-chip memory
CFR
on-chip memory
15
3. data coercion
single hard interconnect
CFR
IO processor
  • data control bitstream test
  • connect a data portto a configuration port
  • decrypt bitstreams
  • relocate bitstreams
  • run-time compute / optimise bitstreams
  • JIT, peephole

CPU
PH
CFR
de/encrypt accelerator
bitstream
off-chipmemory
CFR
on-chip memory
CFR
IP
on-chip memory
16
3. data coercion
single hard interconnect
CFR
IO processor
  • data control bitstream test
  • connect a data portto a configuration port
  • decrypt bitstreams
  • relocate bitstreams
  • run-time compute / optimise bitstreams
  • JIT, peephole
  • data port to test port (NOC as TAM)
  • on-line (structural) testing
  • on-chip test-vector generation

CPU
PH
CFR
de/encrypt accelerator
bitstream
off-chipmemory
CFR
on-chip memory
CFR
IP
on-chip memory
17
overview
  • applications
  • network on chip
  • FPGA
  • key ideas
  • hardwired NOC
  • unified interconnect
  • data coercion / type casting
  • application dynamic partial reconfiguration
  • multiple concurrent applications
  • multiplex sub-applications (hardware tasks)
  • example
  • conclusions

18
dynamic partial reconfiguration idea
  • hardware operating system implements run-time
    scheduling of
  • multiple concurrent applications
  • independent applications on own virtual platform
  • no communication, no interference
  • performance virtualisation
  • activation given by user, environment, etc.

app T
app D
A
AC
time
19
dynamic partial reconfiguration idea
  • hardware operating system implements run-time
    scheduling of
  • multiple concurrent applications
  • parts of single applications (soft IP, hardware
    tasks)
  • multiplex parts of a single application on same
    resources

or
sub-app A
sub-app C
app T
app D
A
C
C1
C2
C3
A1
A2
BA
time
20
dynamic partial reconfiguration idea
  • hardware operating system implements run-time
    scheduling of
  • multiple concurrent applications
  • parts of single applications (soft IP, hardware
    tasks)
  • multiplex parts of a single application on same
    resources
  • internal state

state
app T
A
C
app D
time
21
dynamic partial reconfiguration implementation
  • system manager
  • resource management (CFR, NOC, memory, )
  • inter-application virtual platforms

T
application manager
A
C
BAC
application manager
system manager
time
22
dynamic partial reconfiguration implementation
  • system manager
  • resource management (CFR, NOC, memory, )
  • inter-application virtual platforms
  • intra-application phases
  • NOC programming
  • soft IP / (sub)-application configuration (incl.
    clock, reset)
  • bottleneck?

A
C
BAC
application manager
system manager
time
23
dynamic partial reconfiguration implementation
  • system manager
  • application manager
  • application programming

T
application manager
A
C
BAC
application manager
system manager
time
24
dynamic partial reconfiguration implementation
  • system manager
  • application manager
  • application programming
  • intra-application persistent data management

state
A
C
BAC
application manager
system manager
time
25
overview
  • applications
  • FPGA
  • network on chip
  • key ideas
  • hardwired NOC
  • unified interconnect
  • data coercion / type casting
  • application dynamic partial reconfiguration
  • multiple concurrent applications
  • multiplex sub-applications (hardware tasks)
  • example
  • conclusions

26
modelling
  • SystemC
  • bit cycle accurate NOC model
  • behavioural CFR models
  • accurate bitstream structure
  • behavioural hard IP models
  • model
  • starting / stopping of applications
  • dynamic, based on user input
  • starting / stopping of sub-applications
  • dynamic, based on flow of data
  • configuration loading of bitstreams for soft IP
    clock reset
  • programming of NOC, system sub-application
    managers
  • management of persistent state

27
example
single hard interconnect
CFR
IO processor
A2
A1
  • system manager
  • program NOC for configuration

CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
28
example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
  • system manager
  • program NOC for configuration
  • configure load bitstreams
  • including bitstream syntax, etc.

CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
29
example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
  • system manager
  • program NOC for configuration
  • configure load bitstreams
  • program NOC for (sub)-application A

CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
30
example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
  • system manager
  • program NOC for configuration
  • configure load bitstreams
  • program NOC for (sub)-application A
  • program start application manager
  • including clocking reset

CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
31
example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
  • system manager
  • program NOC for configuration
  • configure load bitstreams
  • program NOC for (sub)-application A
  • program start application manager
  • application manager
  • programs starts sub-app A
  • soft IP fn is modelled by CFR

CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
32
example
bitstream
programming
data
single hard interconnect
CFR
IO processor
A2
A1
  • system manager
  • program NOC for configuration
  • configure load bitstreams
  • program NOC for (sub)-application A
  • program start application manager
  • application manager
  • programs starts sub-app A
  • sub-application A runs

CPU
systemmanager
CFR
de/encrypt accelerator
off-chipmemory
BAC
CFR
applicationmanager
on-chip memory
BA
CFR
on-chip memory
33
example
  • system manager
  • program NOC for configuration
  • configure load bitstreams
  • program NOC for (sub)-application A
  • program start application manager
  • application manager
  • programs starts sub-app A
  • sub-application A runs
  • Goossens NOCS08, Wahlah RAW09

34
conclusions
  • ideas
  • hardwired NOC ? performancecost
  • unified interconnects ? hardware multi-tasking
  • data coercion / type casting ? cool useful
  • very detailed model
  • many simplifications restrictions
  • many open issues
  • design flow soft IP placement, binding,
    relocation, etc. Madsen?
  • application model
  • extend use-case model with intra-application
    dynamism
  • more general notions of persistent state
  • implementation separation of system
    application managers

35
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com