Distributed Arithmetic presentation

About This Presentation

Transcript and Presenter's Notes

Title: Distributed Arithmetic

1
Distributed Arithmetic

Dr Sumam David S.
Dept. of EC, NITK Surathkal
Courtesy for slides Xilinx Professors Workshop
Resources

2
Objective

Distributed arithmetic
What ?
Where ?
How ?

3
What is DA?

Multiplication using LUT
Used to implement multipliers in LUT rich FPGAs

4
Twos Complement Multiplication

One bit at a time

5
SDA 1-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
/-
X0
Parallel to serial converter
Scaling Accumulator
6
Distributed Arithmeticfor a 2-Tap Filter

Partial products of equal weight are added
together before being summed to next higher
partial product weight
Create look-up table of summed partial products

-23 22 21 20
-23 22 21 20
C0 1 0 0 1 (-7)
C1 0 1 1 0 ( 6)
X0 0 1 1 1 ( 7)
X
X1 0 1 0 1 ( 5)
X

( 1 0 0 1 ( 1 0 0
1 ( 1 0 0 1 (0 0 0 0 1 1 0 0 1
1 1 1
0 1 1 0) 0 0 0
0 ) 0 1 1 0 ) 0 0 0 0
) 0 0 0 1 1 1 1 0
1 1 1 1 1 0 0
1 1 1 1 1 0 0 0 0 1
1 1 0 1 1 0 1
(-1) (-14) (-4) (0) (-19)
(-49)
( 30)
(Serial-Data / Tap-Parallel Multiply)
Sign Extension
7
SDA 2-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
X0
/-
A1
X1
Scaling Accumulator
8
SDA 4-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
0000...0
X0
C0
1

A1
0000...0
X1
C1

1
A2
0000...0
X2
C2
1

A3
0000...0
X3
C3
9
SDA 8-Tap FIR Filter
N BITS WIDE SAMPLE DATA
A0
Partial Product ROM
X0
A1
X1
A2
Pre-Adder
X2
A3
X3

/-
A0
X4
Partial Product ROM
Scaling Accumulator
A1
X5
A2
X6
4 -input LUT contains all possible sums of the
partial products
A3
X7
10
Xilinx DA FIR Performance
6000
Dual MAC
DA FIR B8
5000
DA FIR B12
4000
DA FIR B16
3000
Performance (MMACs/s)
Serial FPGA FIR
2000
1000
0
0
50
100
150
200
250
Filter Length (Taps)
Filter Length (Taps)
fclk 200 MHz for both processor and FPGA B
data sample precision for FPGA
11
Trade Clock Cycles for Logic Area
Trade Clock Cycles for Logic Area
Multi bits per clock cycle
20Ms/s
160Ms/s
b7
b7
b7
Serial-DA
Parallel-DA
b4
b3
b0
Hardware Over-sampling 4
b0
Hardware Over-sampling 8
Hardware Over-sampling 2
b0
b0
b7
b3
Hardware Over-sampling 1
b4
b0
The sample is serialized and processed 1 bit per
clock cycle. 8 clock cycles are thus required to
process the whole sample
The sample is serialized and processed 2 bits per
clock cycle. 4 clock cycles are thus required to
process the whole sample
The sample is processed in parallel 8 bits per
clock cycle
The sample is serialized and processed 4 bits per
clock cycle
b0
12
Conclusion

Efficiency of computation
Slow as its bit serial
Memory requirements

13
References

The role of Distributed Arithmetic in FPGA based
signal processing, www.xilinx.com

Write a Comment

User Comments (0)

About PowerShow.com

Distributed Arithmetic PowerPoint PPT Presentation