Is F Better than D - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Is F Better than D

Description:

David Hansen and James Michelussi – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 23
Provided by: JohnD318
Category:
Tags: better | fr14

less

Transcript and Presenter's Notes

Title: Is F Better than D


1
Is F Better than D
  • David Hansen and James Michelussi

2
Introduction
  • Discrete Fourier Transform (DFT)
  • Fast Fourier Transform (FFT)
  • FFT Algorithm Applying the Mathematics
  • Implementations of DFT and FFT
  • Hardware Benchmarks
  • Conclusion

3
DFT
  • In 1807 introduced by Jean Baptiste Joseph
    Fourier.
  • allows a sampled or discrete signal that is
    periodic to be transformed from the time domain
    to the frequency domain
  • Correlation between the time domain signal and N
    cosine and N sine waves

X(k) DFT Frequency Signal N Number of Sample
Points X(n) Time Domain Signal WN Twiddle
Factor
4
DFT (Walking Speed)
  • Why is this important? Where is this used?
  • allows machines to calculate the frequency domain
  • allows for the convolution of signals by just
    multiplying them together
  • Used in digital spectral analysis for speech,
    imaging and pattern recognition as well as signal
    manipulation using filters
  • But the DFT requires N2 multiplications!

5
FFT (Jet Speed)
  • J. W. Cooley and J. W. Tukey are given credit for
    bringing the FFT to the world in the 1960s
  • Simply an algorithm for more efficiently
    calculating the DFT
  • Takes advantage of symmetry and periodicity in
    the twiddle factors as well as uses a divide and
    conquer method
  • Symmetry WNr N/2 -WNr
  • Periodicity WNrN WNr
  • Requires only (N/2)log2(N) multiplications !
  • Faster computation times
  • More precise results due to less round-off error

6
FFT Algorithm
  • Several different types of FFT Algorithms
    (Radix-2, Radix-4, DIT DIF)
  • Focus on Radix-2 using Decimation in Time (DIT)
    method
  • Breaks down the DFT calculation into a number of
    2-point DFTs
  • Each 2-point DFT uses an operation called the
    Butterfly
  • These groups are then re-combined with another
    group of two and so on for log2(N) stages
  • Using the DIT method the input time domain points
    must be reordered using bit reversal

7
Butterfly Operation
8
Bit Reversal
9
8-Point Radix-2 FFT Example
10
8-Point Radix-2 FFT Example
11
Implementations of DFT and FFT
  • David Hansen

12
DFT Implementation
for (r0 rltsamples/2 r) float re 0.0f,
im 0.0f float part (float)r -2.0f PI /
(float)samples for (k0 kltsamples
k) float theta part (float)k re
data_ink cos(theta) im data_ink
sin(theta)
  • Nested For Loop, (N/2)N Iterations O(N2)
  • 63027.41 Cycles / Sample (123 cycles per inner
    loop iteration)
  • Obvious Inefficiencies, cos and sin math.h
    functions
  • Efficient assembly coding could reduce the inner
    loop to 3 cycles per iteration (1,536 cycles /
    sample)

13
C FFT Implementation
void fft_float (unsigned NumSamples, float
RealIn, float ImagIn, float RealOut,
float ImagOut ) for ( i0 i lt NumSamples
i ) // Iterate over the samples and
perform the bit-reversal j ReverseBits
( i, NumBits ) BlockEnd 1 //
Following loop iterates Log2(NumSamples) for
( BlockSize 2 BlockSize lt NumSamples
BlockSize ltlt 1 ) // Perform Angle
Calculations (Using math.h sin/cos) //
Following 2 loops iterate over NumSamples/2
for ( i0 i lt NumSamples i BlockSize )
for ( ji, n0 n lt BlockEnd
j, n ) // Perform
butterfly calculations
BlockEnd BlockSize
14
C FFT Implementation
  • Bit-Reverse For Loop N iterations
  • Nested For Loops
  • First Outer Loop Log2(N) iterations
  • Made use of sin/cos math.h functions
  • Second Outer Loop N / BlockSize iterations
  • Inner Loop BlockSize/2 iterations
  • O(N Log2(N) N/BlockSize BlockSize/2)
  • O(NNLog2(N))
  • 193.84 Cycles / Sample

15
Assembly FFT Implementation
  • Bit-Reverse Address Generation
  • Hide Bit-Reverse operation inside first and
    second FFT Stages
  • Sin and Cos values stored in a Look-Up-Table
  • 256 Kbyte LUT added to Data1
  • Needed to grow Data1 Memory Space using LDF file
  • Interleaved Real and Imaginary Arrays
  • Quad Reads Loads 2 Complex Points per Cycle
  • Supports the Real FFT for input signals with no
    Imaginary component
  • 40 Algorithm-based Savings

16
Assembly FFT Implementation
  • Special Butterfly Instruction
  • Can perform addition/subtraction in parallel in
    one compute block
  • Speeds up the inner-most loop
  • VLIW and SIMD Operations
  • Performs simultaneous operations in both compute
    blocks
  • Loop unrolling and instruction scheduling keeps
    the entire processor busy with instructions.
  • 11.35 Cycles per Sample

17
Assembly FFT Implementation
_BflyLoop qj24r2726 k5k5k9 fr6r30r12 fr16r6-r7 yr30qj04 k3k5 and k4 fr15r23r4 fr24r8r18, fr26r8-r18 xr30qj04 r54lk7k3 fr7r31r13 fr25r9r19, fr27r9-r19 qj14r2524 fr14r30r13 fr17r14r15 qj24r2726 k5k5k9 fr6r2r4 fr18r6-r7 yr118qj04 k3k5 and k4 fr15r31r12 fr24r20r16, fr26r20-r16 xr118qj04 r1312lk7k3 fr7r3r5 fr25r21r17, fr27r21-r17 qj14r2524 fr14r2r5 fr19r14r15 qj24r2726 k5k5k9 fr6r10r12 fr16r6-r7 yr2320qj04 k3k5 and k4 fr15r3r4 fr24r28r18, fr26r28-r18 xr2320qj04 r54lk7k3 fr7r11r13 fr25r29r19, fr27r29-r19 qj14r2524 fr14r10r13 fr17r14r15 qj24r2726 k5k5k9 fr6r22r4 fr18r6-r7 yr3128qj04 k3k5 and k4 fr15r11r12 fr24r0r16, fr26r0-r16 xr3128qj04 r1312lk7k3 fr7r23r5 fr25r1r17, fr27r1-r17 .align_code 4 if NLC0E, jump _BflyLoop
18
DC FFT Test
  • FFT Source Array
  • FFT Output Magnitude

19
Audio FFT Test
  • FFT Source Array
  • FFT Output Magnitude

20
1024 Point DFT / FFT Comparison
Implementation Cycles Per Sample
DFT Implemented in C 63,027.41 cycles / sample
DFT Implemented in Assembly 1,536 cycles / sample
FFT Implemented in C 193.85 cycles / sample
FFT Implemented in Assembly 11.35 cycles / sample
21
1024 Point Radix-2 FFT Hardware Comparison
Processor Architecture Cycles Per Sample Processor Frequency Execution Time
ADSP-21369 (SHARC) 8.98 cycles / sample 400 MHz 22.99 µSec
TigerSHARC (website) 9.16 cycles / sample 600 MHz 15.63 µSec
TigerSHARC (our results) 11.35 cycles / sample 600 MHz 19.37 µSec
TMS320C6000 14.125 cycles / sample 350 MHz 41.33 µSec
TMS320DM644x 7.59 cycles / sample 594 MHz 13.08 µSec
22
Conclusion
  • The FFT algorithm is very useful when computing
    the frequency domain on a DSP.
  • FFT is much faster than a regular DFT algorithm
  • FFT is more precise by having less errors
    created due to round off.
  • The timed coding examples further support this
    claim and demonstrate how to code the algorithm.
  • The Radix-2 FFT isnt the fastest but it uses a
    less complex addressing and twiddle factor
    routine
  • In this case (unlike in school) F is better then
    D.
Write a Comment
User Comments (0)
About PowerShow.com