Chapter 3 Parallel and Pipelined Processing - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Chapter 3 Parallel and Pipelined Processing

Description:

The timing of an algorithm is re-adjusted while keeping the partial ordering of ... transform may lead to pipelined structure without adding additional delays. ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 14
Provided by: YuHe8
Category:

less

Transcript and Presenter's Notes

Title: Chapter 3 Parallel and Pipelined Processing


1
Chapter 3 Parallel and Pipelined Processing
2
Basic Ideas
  • Parallel processing
  • Pipelined processing

time
time
P1 P2 P3 P4
P1 P2 P3 P4
a1
a2
a3
a4
a1
b1
c1
d1
b1
b2
b3
b4
a2
b2
c2
d2
c1
c2
c3
c4
a3
b3
c3
d3
d1
d2
d3
d4
a4
b4
c4
d4
Less inter-processor communication Complicated
processor hardware
More inter-processor communication Simpler
processor hardware
Colors different types of operations
performed a, b, c, d different data streams
processed
3
Data Dependence
  • Parallel processing requires NO data dependence
    between processors
  • Pipelined processing will involve inter-processor
    communication

P1 P2 P3 P4
P1 P2 P3 P4
time
time
4
Usage of Pipelined Processing
  • By inserting latches or registers between
    combinational logic circuits, the critical path
    can be shortened.
  • Consequence
  • reduce clock cycle time,
  • increase clock frequency.
  • Suitable for DSP applications that have
    (infinity) long data stream.
  • Method to incorporate pipelining Cut-set
    retiming
  • Cut set
  • A cut set is a set of edges of a graph. If these
    edges are removed from the original graph, the
    remaining graph will become two separate graphs.
  • Retiming
  • The timing of an algorithm is re-adjusted while
    keeping the partial ordering of execution
    unchanged so that the results correct

5
Graphic Transpose Theorem
  • The transfer function of a signal flow graph
    remain unchanged if
  • The directions of each arc is reversed
  • The input and output labels are switched.

un
z-1
z-1
yn
h2
h1
h0
xn
6
Data broadcast structure
  • Algorithm transform may lead to pipelined
    structure without adding additional delays.
  • Given a FIR filter SFG
  • Critical path TM2TA
  • Use graph transposition theorem
  • Reverse all arcs
  • Reverse input/output
  • We obtain
  • Critical path TM TA
  • No additional delay added!

7
Fine-grain pipelining
To further reduce TM. Critical Path Max TM1,
TM2, TA
8
Block Processing
  • One form of vectorized parallel processing of DSP
    algorithms. (Not the parallel processing in most
    general sense)
  • Block vector x(3k) x(3k1) x(3k2)
  • Clock cycle can be 3 times longer
  • Original (FIR filter)
  • Rewrite 3 equations at a time
  • Define block vector
  • Block formulation

9
Block Processing
10
General approach for block processing
11
Block Processing for IIR Digital Filter
  • Original formulation
  • Rewrite
  • Define block vectors
  • Then
  • Time indices
  • n sampling period
  • k clock period (processor)
  • k 2n
  • Note
  • Pipelining clock period sampling period.
  • Block (parallel) clock period not equal to
    sampling period.

12
Block IIR Filter
y(2(k-1))
D
?
x(2k)
y(2k)

S/P
P/S
x(n)
y(n)
y(2k1)

x(2k1)
y(2(k-1)1)
?
D
13
Timing Comparison
x(1)
x(2)
x(3)
x(4)
1
2
3
4
MAC
y(1)
y(2)
y(3)
y(4)
  • Pipelining
  • Block processing

x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(7)
1
2
3
4
5
6
7
8
Add
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
y(7)
a y(1)
Mul
1
2
3
4
5
6
7
8
x(2)
x(4)
x(6)
x(8)
2
2
4
4
6
6
8
8
x(1)
x(3)
x(5)
x(7)
1
1
3
3
5
5
7
7
Write a Comment
User Comments (0)
About PowerShow.com