Title: Using a CSP based Programming Model for Reconfigurable Processor Arrays
1Using a CSP based Programming Model for
Reconfigurable Processor Arrays
- By Zain-ul-Abdin
- Zain-ul-Abdin_at_hh.se
2Motivation
- Emergence of new heterogeneous parallel
architectures - Increased Performance
- Power Efficiency
- Traditional methods
- Automatic parallelization by compilers
- Use of Thread model of computation
- Highly non-deterministic
- Use of Concurrent Programming Model
- Expresses computations in a productive manner by
matching it to target hardware - Supported by a compiler for allowing portability
3Array of Processors
- Consists of heterogenous processors with
specialized interconnection netrworks - Improved performance by exploiting paralellism
rather than scaling clock frequency - Flexible due to dynamically reconfigurable
interconnection network - Energy Efficient
- Individual brics can be switched off when not in
use - The Clock frequency of brics can be optimized
4Ambric Programming Model
- Design consists of
- Objects defines the functionality in either java
subset or assembly. - Structured composition described in aStruct
5Ambric-Simple Example
- Design Toplevel
- design SimpleDesigntop
- Root_IF root_Inst
-
- interface Root_IF
- binding CompRoot implements Root_IF
- simpledesign process1
- Vio inOut NumSources 1, NumSinks 1
- channel c0 inOut.out0, process1.in
- channel c1 process1.out, inOut.in0
Object Structure interface simpledesign
inbound in outbound out binding
Javasimpledesign implements simpledesign
implementation "simpledesign.java"
Object Implementation import ajava.io.InputStream
import ajava.io.OutputStream public class
simpledesign public void run(InputStreamltInteger
gt in, OutputStreamltIntegergt out)
while (true)
out.writeInt(in.readInt())
6Why use Occam-pi?
- Language level support for concurrency
- Provides higher order combinators for
facilitating composition of re-targetable data
parallel descriptions - Sematically transparent PAR/SEQ style
- Explicit control of graularity of parallelism and
data locality
7Occam-pi Language
- Based on ideas of CSP with pi-calculus
- Abstractions for underlying hardware
- Processes
- Channels (Unbuffered message passing)
- Rendezvous behavior of channels
- Receiver blocks until the sender wrote the value
- Sender continues after the receiver read the
value
8Occam-pi Language
PROC SimpleEx() INT x,y CHAN OF INT
c,d PAR SEQ c ! 117 d ? x SEQ c ?
y d ! 118
- Primitive actions
- Variable assignment
- Channel output !
- Channel input ?
- PAR
- SEQ
- Variables can only be written by one process in
parallel - Likewise, only a single process can read from a
channel, and another single process can write to
the channel
9Compilation Methodology
- Implemented a Backend for Ambric in
Tock(Translator of Occam to C by Kent) - Staged compilation
- Native SOPL code generation for Ambric
- Use of concurrency of Occam-pi
- Reduced memory footprint
10Occam-Ambric Compilation
11Ambric-related Transformations
- Introduction of Channel-end Specifiers
- Enables use of flat data parallelism
- Replicators transformations
- SEQ Replicators to For loops
- PAR Replicators unrolled to multiple PROCs
- Emission of aStruct structural interface and
binding code for each PROC - Emission of aJava class code corresponding to
each PROC
121D-Discrete Cosine Transform
13Performance Results
- 8-point DCT Implementations
14Conclusions
- Proposed the use of Occam-pi for programming a
coarse-grained processor architecture - Raises the abstraction level while not
compromising the efficiency - To extend the compiler for supporting mobility
features of Occam-pi for reconfigurable logic