FFT Accelerator Project - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

FFT Accelerator Project

Description:

Radix-8 FFT. Inference ... Radix. References ... Jones, D. Radix-4 FFT Algorithms, Connexions Web Site: www.cnx.org. Thank You ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 25
Provided by: Roh987
Category:

less

Transcript and Presenter's Notes

Title: FFT Accelerator Project


1
FFT Accelerator Project
  • Rohit Prakash (2003CS10186)
  • Anand Silodia (2003CS50210)

11th May, 2007
Dr. Kolin Paul Prof. M. Balakrishnan
Supervisors
2
Overview
  • Objective
  • To get the best software implementation of FFT
  • Identify the part of code that is taking the
    maximum time to execute

3
  • Examined 3 FFT algorithms
  • Radix-4
  • Radix-16
  • Radix-8
  • Compared them with FFTW
  • Analysed these on the following parameters
  • Execution Time
  • Number of Complex calculations
  • Memory references

4
System Info
  • Intel Pentium 4 (HT) 3.0Ghz
  • Cache Size L2 1024 KB
  • RAM 1GB
  • O.S. Fedora Core 3
  • Compiler icpc, version 9.1
  • Flags used -fast (-xP, -O3, -ipo, -no-prec-div,
    -static)

5
Execution Time (radix-4 vs. FFTW)
6
Execution Time (radix-16 vs. FFTW)
7
Execution Time (radix-8 vs. FFTW)
8
Execution Time (radix-4 vs. radix-16)
9
Execution Time (radix-8 vs. radix-16)
10
Execution Time (radix-4 vs. radix-8)
11
Inference
  • For small input sizes, upto 4096, the execution
    time is lowest for radix-4 (and is comparable to
    FFTW)
  • But for higher input sizes (gt4096), radix-8 fares
    better than radix-4 and radix-16

12
Cache
  • I1 cache 16 KB, 8-way associative
  • D1 cache 16 KB, 8-way associative
  • L2 cache 1 MB, 8-way associative

13
Memory References
14
Memory References
15
L2 Cache Misses
16
L2 Cache Misses
17
Comparative Analysis
4096 points
13.68
13.25
13.66
16777216 points
96.19
74.98
71.87
18
Inference
  • For smaller input sizes, cache misses are
    greatest for radix-16 (theres a linear increase
    in misses from radix-4 to radix-16)
  • But for large input sizes, (greater than 4096),
    the number of cache misses in radix-8 is the
    lowest.

19
Code Hotspot
Radix-8 FFT
20
Inference
  • Due to OOP, Complex (object) creation takes the
    maximum amount of Clock-ticks
  • Apart from that, the maximum time is taken by
    complex multiplications, followed by complex
    additions and complex subtractions

21
Comparative Analysis (Complex )
22
Comparative Analysis (Complex /-)
23
References
  • Thomas H.Cormen,Charles E. Leiserson, Ronald L.
    Rivest, Clifford Stein, Introduction to
    Algorithms
  • Matteo Friga and Steven G. Johnson, The design
    and implementation of FFTW3
  • Matteo Friga and Steven G. Johnson, The Fastest
    Fourier Transform in the West
  • Alan H. Karp, Bit Reversal on Uniprocessors
  • Jones, D. Radix-4 FFT Algorithms, Connexions Web
    Site www.cnx.org

24
Thank You
Write a Comment
User Comments (0)
About PowerShow.com