Title: The von Neumann Syndrome
1The von Neumann Syndrome
TU Delft, Sept 28, 2007
- Reiner Hartenstein
- TU Kaiserslautern
(v.2)
http//hartenstein.de
2von Neumann Syndrome
- this term has been coined by RAM (C.V.
Ramamoorthy, emeritus, UC Berkeley)
3The first Reconfigurable Computer
- prototyped 1884 by Herman Hollerith
- a century before FPGA introduction
4Outline
- von Neumann overhead hits the memory wall
- The manycore programming crisis
- Reconfigurable Computing is the solution
- We need a twin paradigm approach
- Conclusions
5The spirit of the Mainframe Age
- For decades, weve trained programmers to think
sequentially, breaking complex parallelism down
into atomic instruction steps
- finally tending to code sizes of astronomic
dimensions
- Even in hardware courses (unloved child of CS
scenes) we often teach von Neumann machine design
deepening this tunnel view
- 1951 Hardware Design going von Neumann
(Microprogramming)
6von Neumann array of massive overhead phenomena
piling up to code sizes of astronomic
dimensions
7von Neumann array of massive overhead phenomena
piling up to code sizes of astronomic
dimensions
temptations by von Neumann style software
engineering
Dijkstra 1968 the go to considered harmful
massive communication congestion
R.H. 1975 universal bus considered harmful
Backus, 1978 Can programming be liberated from
the von Neumann style? Arvind et al., 1983 A
critique of Multiprocessing the von Neumann Style
8von Neumann overhead just one example
94 computation load only for moving this window
1989 94 computation load (image processing
example)
9the Memory Wall
instruction stream code size of astronomic
dimensions ..
needs off-chip RAM which fully hits
better compare off-chip vs. fast
on-chip memory
growth 50 / year
10Benchmarked Computational Density
alpha down by 100 in 6 yrs
IBM down by 20 in 6 yrs
11Outline
- von Neumann overhead hits the memory wall
- The manycore programming crisis
- Reconfigurable Computing is the solution
- We need a twin paradigm approach
- Conclusions
12The Manycore future
- we are embarking on a new computing age --
the age of massive parallelism Burton Smith
- everyone will have multiple parallel computers
B.S.
- Even mobile devices will exploit multicore
processors, also to extend battery life B.S.
- multiple von Neumann CPUs on the same µprocessor
chip lead to exploding (vN) instruction stream
overhead R.H.
13Several overhead phenomena
the watering pot model Hartenstein
per CPU!
has several von Neumann overhead phenomena
14Explosion of overhead by von Neumann parallelism
disproportionate to the number of processors
R.H. 2006 MPI considered harmful
15Rewriting Applications
- more processors means rewriting applications
- we need to map an application onto different size
manycore configurations
- most applications are not readily mappable onto a
regular array.
- Mapping is much less problematic with
Reconfigurable Computing
16Disruptive Development
- Computer industry is probably going to be
disrupted by some very fundamental changes. Ian
Barron
- We must reinvent computing. Burton J. Smith
- A parallel vN programming model for manycore
machines will not emerge for five to 10 years
experts from Microsoft Corp.
- I dont agree we have a model.
- Reconfigurable Computing Technology is Ready,
Users are Not
- Its mainly an education problem
17Outline
- von Neumann overhead hits the memory wall
- The manycore programming crisis
- Reconfigurable Computing is the solution
- We need a twin paradigm approach
- Conclusions
18The Reconfigurable Computing Paradox
- Bad FPGA technology reconfigurability overhead,
wiring overhead, routing congestion, slow clock
speed
- Up to 4 orders of magnitude speedup
tremendously slashing the electricity bill by
migration to FPGA
- The reason of this paradox ?
- There is something fundamentally wrong in using
the von Neumann paradigm
- The spirit from the Mainframe Age is collapsing
under the von Neumann syndrome
19beyond von Neumann Parallelism
the watering pot model Hartenstein
We need an approach like this
per CPU!
its data-stream-based RC
has several von Neumann overhead phenomena
) RC Reconfigurable Computing
20von Neumann overhead vs. Reconfigurable Computing
using reconfigurable data counters
using data counters
using program counter
) configured before run time
21von Neumann overhead vs. Reconfigurable Computing
(coarse-grained rec.)
using reconfigurable data counters
using data counters
using program counter
rDPA reconfigurable datapath array
1989 x 17 speedup by GAG (image processing
example)
1989 x 15,000 total speedup from this
migration project
) configured before run time
) just by reconfigurable address generator
22Reconfigurable Computing means
- For HPC run time is more precious than compiletime
http//www.tnt-factory.de/videos_hamster_im_laufra
d.htm
- Reconfigurable Computing means moving overhead
from run time to compile time
- Reconfigurable Computing
replaces looping at run time
by configuration before run time
) e. g. complex address computation
) or, loading time
23Data meeting the Processing Unit (PU)
... explaining the RC advantage
We have 2 choices
routing the data by memory-cycle-hungry
instruction streams thru shared memory
(data)
data-stream-based placement of the execution
locality ...
(PU)
pipe network generated by configware compilation
) before run time
24What pipe network ?
- pipe network, organized at compile time
Generalization of the systolic array
rDPA rDPU array, i. e. coarse-grained
R. Kress, 1995
) supporting non-linear pipes on free form
hetero arrays
rDPU reconf. datapath unit (no program counter)
25Migration benefit by on-chip RAM
Some RC chips have hundreds of on-chip RAM
blocks, orders of magnitude faster than off-chip
RAM
so that the drastic code size reduction by
software to configware migration can beat the
memory wall
multiple on-chip RAM blocks are the enabling
technology for ultra-fast anti machine solutions
GAGs inside ASMs generate the data streams
GAG generic address generator
rDPA rDPU array, i. e. coarse-grained
rDPU reconf. datapath unit (no program counter)
26Coarse-grained Reconfigurable Array example
image processing SNN filter ( mainly a pipe
network)
coming close to programmers mind set (much
closer than FPGA)
note kind of software perspective, but without
instruction streams ? datastreams pipelining
27Outline
- von Neumann overhead hits the memory wall
- The manycore programming crisis
- Reconfigurable Computing is the solution
- We need a twin paradigm approach
- Conclusions
28Software / Configware Co-Compilation
apropos compilation
The CoDe-X co-compiler
But we need a dual paradigm approach to run
legacy software together w. configware
Reconfigurable Computing Technology is Ready. --
Users are Not ?
29Curricula from the mainframe age
structurally disabled
non-von-Neumann accelerators
(this is not a lecture on brain regions)
no common model
the common model is ready, but users are not
not really taught
30We need a twin paradigm education
Brain Usage both Hemispheres
31RCeducation 2008
teaching RC ?
The 3rd International Workshop on Reconfigurable
Computing Education April 10, 2008,
Montpellier, France
http//fpl.org/RCeducation/
32We need new courses
We need undergraduate lab courses with HW / CW /
SW partitioning
We need new courses with extended scope on
parallelism and algorithmic
cleverness for HW / CW / SW co-design
We urgently need a Mead--Conway-like text book
R. H., Dagstuhl Seminar 03301,Germany, 2003
33Outline
- von Neumann overhead hits the memory wall
- The manycore programming crisis
- Reconfigurable Computing is the solution
- We need a twin paradigm approach
- Conclusions
34Conclusions
- We need to increase the population of
HPC-competent people B.S.
- We need to increase the population of
RC-competent people R.H.
- Data streaming is the key model of parallel
computation not vN
- Von-Neumann-type instruction streams considered
harmful RH
- But we need it for some small code sizes, old
legacy software, etc.
- The twin paradigm approach is inevitable, also in
education R. H..
35An Open Question
- Coarse-grained arrays technology ready, users
not ready
) offered by startups (PACT Corp. and others)
- Much closer to programmers mind set really much
closer than FPGAs
- Which effect is delaying the break-through?
36thank you
37END
38.
39Disruptive Development
The way the industry has grown up writing
software - the languages we chose, the
model of synchronization and orchestration, do
not lead toward uncovering parallelism for
allowing large-scale composition of big systems.
Iann Barron
40Dual paradigm mind set an old hat
(mapping from procedural to structural domain)
Software mind set
instruction-stream-based
flow chart -gt
control instructions
- Mapped into a Hardware mind set
action box Flipflop, decision box
(de)multiplexer