Chapter 7' Systolic Array - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Chapter 7' Systolic Array

Description:

Each node in the iteration DGs in the index space will be mapped onto a PU's index ... However, if the iteration DG corresponds to a RIA algorithm, the assignment and ... – PowerPoint PPT presentation

Number of Views:156

Avg rating:3.0/5.0

Slides: 23

Provided by: YuHe8

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 7' Systolic Array

1
Chapter 7. Systolic Array
2
Systolic Array

Systolic array is an array processor architecture
that consists of
An array of identical processing units (PUs) or
nodes
Inter-connected with localized data links
that performs
Pipelined computation between PUs with
Identical computation at each node

Motivations
Low communication overhead
Easy to design
Suitable for VLSI implementation
Applications
Implementation of algorithms that can be
formulated in nested loops
Numerical linear algebra
Signal and image processing

3
Systolic Design Methodology

Algorithm mapping
Individual PUs are assigned with indices in the
index space
Assignment
Each node in the iteration DGs in the index space
will be mapped onto a PUs index
Scheduling
Each node in the iteration DGs will be assigned
with an integer schedule indicating the time step
it is to be executed.

Linear Mapping Methodology
In general, the assignment and scheduling is a
nonlinear operation.
However, if the iteration DG corresponds to a RIA
algorithm, the assignment and scheduling can be
accomplished by linear projection of each node in
the DG onto the index space of the PUs, and be
assigned with a schedule.

4
Formulate Algorithm in RIA format

Single Assignment Transformation
Remove unnecessary false data dependency between
iterations
Accomplished by introducing a new variable or
array of variables to hold intermediate values
during computation.
May impose unnecessary dependence constraint
during mapping.

Pipelined data duplication
Replace data broadcasting that requires global
data bus
Without affecting the algorithm performance
Accomplished by introducing an intermediate
variable that propagated among index nodes in the
DG
May impose unnecessary dependence constraint
during mapping

5
Example FIR Filter
y(4)
y(5)
y(6)

FIR filter formulation
Single assignment format with broadcasting data
Do n1,2, . . .
y1(n,-1)0
Do k0,K
y1(n,k)y1(n,k-1)
h(k)x(n-k)
enddo
y(n)y1(n,K)
Enddo

k
y(3)
h(4)
y(2)
h(3)
y(1)
h(2)
y(0)
h(1)
h(0)
n
x(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
6
Example FIR Filter

Regular Recurrent eq.
y1(n,-1)0, n 0,1,2,
h1(0,k)h(k), k0,,K
n0,1,2,, and k0,,K
y1(n,k)y1(n,k-1)
h1(n,k)x1(n,k)
h1(n,k)h1(n-1,k)
x1(n,k)x1(n-1,k-1)
y(n)y1(n,K), n0,1,2,
Leads to SIDG

7
Linear Schedule and Assignment

A schedule t(i) is a mapping from index i in the
DG to a positive integer t (time index)
A time index is a quantum of time that takes to
execute the operations of an iteration.
A linear schedule maps all indices i on the same
hyper-plane to the same time index.
It can be characterized by the normal vector of
the equi-temporal hyper plane s.

An assignment is a mapping from an index i in the
DG to an index n in the systolic array processor
index.
The processor index space has lower dimension
than that of the DG.
A linear assignment assigns all indices along the
same vector d in the DG to the same processor
(index).

8
Algebraic Formulation of Linear Assignment and
Schedule

Entries of the assignment vector d and scheduling
vector s must be integers. Their dimensions are
the same as the DG indices.
Processor space the orthogonal subspace of d.
Its entries are also integers.

PE Assignment by index node mapping
n PT i
Scheduling by arc (dependence vectors) mapping
?(e) of delays on the edge of DFG
e edges of the systolic array
v dependence vector

9
Affine Transformation

Processor space P span the subspace where the
processor array index space lies.
For any DG index i, it is assigned to the PE
whose index is
p(i) Pi po
where po mini?DG Pi is an offset.
If iteration i and j are both assigned to the
same PE, then i j kd where k is an integer.

The iteration i is scheduled to be executed at
t(i) sTi to
time step where
to mini?DG sTi is an offset.
If iteration i and j are both assigned to the
same PE, the schedule duration is
t(i) t(j) sT(i j) k(sTd).
Thus, when k 1, sTd is the iteration interval
between execution of two successive iterations on
the same PE.

10
Finding Processor Space Matrix

Problem
Given projection vector d, how to find processor
space P such that ?PPT ?ddT I? where ?, ?
are scaling constants such that P and d have
integer entries.
Solution
Find n ? n-1 matrix V, s.t.
Convert entries of V into integers to yield the P
matrix.

Find matrix V
Compute
Factorize M matrix
this can be accomplished using LU factorization,
eigenvalue or singular value decomposition of M.
Scale entries of the V matrix so that all entries
are integers

11
FIR Filter Linear Mapping Example

Dependence matrix
Choose s 1 1T
Choose d 1 0T. Then, P0 1T.
PE Assignment

Linear Schedule (arc mapping)
Input/Output Mapping

12
FIR Linear Mapping
y(0) 2(1) y(2) ?
y(4)
y(5)
y(6)
k
D
2D
y(3)
h(4)
d
D
D
2D
y(2)
D
h(3)
D
2D
s
y(1)
h(2)
D
D
2D
y(0)
h(1)
D
D
2D
h(0)
D
n
x(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(0) x(1) x(2) ?
13
FIR Linear Mapping
y(0) 2(1) y(2) ?
Weight stays, input pipelined, long cr. path for
output y
D
D
d
D
D
D
s
D
D
D
D
D
x(0) x(1) x(2) ?
14
FIR Linear Mapping
y(0) 2(1) y(2) ?
Weight stays, input pipelined, long cr. path for
output y
3D
D
2D
d
3D
D
2D
3D
D
s
2D
3D
D
2D
3D
D
2D
x(0) x(1) x(2) ?
15
Requirements for Valid Linear Assignment and
Schedule Vectors

Causality constraint
s scheduling vector
v any dependence vector.
If iteration i has data dependence on iteration
j, then t(i) gt t(j).
is permitted if v is a dependence
vector due to localization of a broadcast
variable

Resource conflict avoidance
s scheduling vector
d assignment vector
Note that sTd is the iteration interval between
two successive iterations that are executed on
the same PE. If sTd 0, a resource conflict will
occur.

16
Example Sorting
i

Input x(i), output m(i)
m(i)? x(i), m(i)?m(j) for i lt j.
RIA formulation (m1(i,i) -?)
for i1N,
x1(i,1)x(i)
for j1i,
m1(i1,j)max(x1(i,j),m1(i,j))
x1(i,j1)min(x1(i,j),m1(i,j))
if iN,
m(j)m(N1,j)
end
end
end

x11 x21 x31 x41
m11 m21 m31 m41
m51
x12 x22 x32 x42
m22 m32 m42 m52
x23 x33 x43
m33 m43 m53
x34 x44
m44 m54
j
x45
sort734.m
17
Sorting 3 different Mappings
i
x11 x21 x31 x41
m11 m21 m31 m41
m51
D
x12 x22 x32 x42
m22 m32 m42 m52
Insertion sort
D
x23 x33 x43
m33 m43 m53
D
x34 x44
m44 m54
D
j
D
x45
D
D
D
D
D
D
D
D
D
D
D
D
Bubble sort
D
Selection sort
18
Optimal Design Method

Total computation time Tcomp
p, q node indices in the DG
Sampling period Ts
Constrained optimization formulation
Find d and s to
minimize Tcomp or Ts
Subject to sTv gt 0, sTd gt 0.

19
Multi-Projection

Since dim(d) 1, so the dimension of processor
space (the dimension of systolic array) n-1
where n is the dimension of the DG.
If the dimension of the systolic array is smaller
than n-1, multi-projection can be applied.
Each projection will introduce a new scale in
time.
1st each delay D 1 iteration
2nd each delay ? M iterations

Example

D
D
D
D
D
s
?
D?
d
D
D
D
s
d
20
Comparison of schedule

After first projection
Execution time 4D with D 4 t.u. where the
execution time of one iteration (one node) is 1
t. u.

After second projection
Execution time 3D4t
D4 t.u., t 1 t. u.,
D 4t
Same as if s 1 5T with D 1 t. u.

3D3t
3D
3D
2D3t
2D
2D
D
D3t
D
0
0
t
2t
3t
21
Multi-projection

New presentation
The logical processor space after earlier linear
mapping can be regarded as an iteration DFG
(IDFG) where each node represents the execution
of an iteration, and a delay D represent
dependence on previous iteration.
IDFG contains delays and cycles.
An instance graph (IG) can be created from IDFG
by removing all edges with delays.

Node mapping same
Arc mapping delays on each dependence edge
?D of delays in previous mapping
(sTv)? additional delay in new time scale after
second level multi-projection

22
Illustration of Multi-projection

Remove arcs with delays to create instant graph

Project instant graph along new projection
direction and add appropriate delay on edges of
the IG.
Put back edges with delay with corrections due to
the new delay unit.

DFG
?
IG
D
?
D?

Write a Comment

User Comments (0)