Title: The FFT on a GPU
1The FFT on a GPU
- Graphics Hardware 2003
- July 27, 2003
- Kenneth Moreland Edward Angel
- Sandia National Labs U. of New Mexico
Sandia is a multiprogram laboratory operated by
Sandia Corporation, a Lockheed Martin
Company,for the United States Department of
Energys National Nuclear Security
Administration under contract DE-AC04-94AL85000.
2Overview
- Introduction
- Motivation, FFT review.
- FFT Techniques
- Exploitable FFT properties.
- Implementation
- Results
- Performance, applications, conclusions.
3Motivation
- The Fourier transform is a principal tool for
digital image processing. - Filtering.
- Correction.
- Compression.
- Classification.
- Generation.
- As such, should not our graphics hardware support
such a tool?
4The Discrete Fourier Transform
- Converts data in the spatial or temporal domain
into frequencies the data comprise.
5The Discrete Fourier Transform
- 2D transform can be computed by applying the
transform in one direction, then the other.
6The Fast Fourier Transform
- Divide and Conquer Algorithm
- Input sequence is divided into subsequences
consisting of values from even and odd indices,
respectively.
7Index Magic
- Do not use recursion.
- Use dynamic programming iterate over entire
array computing all values for each recursive
depth together, like mergesort. - Indexing is non-obvious.
- Unlike mergesort, recursive step does not divide
array into contiguous chunks. - At any iteration, what partition does a given
index belong to, and where can one find the
applicable values of the sub-partitions?
8Index Magic
- Common solution rearrange data by reversing the
bits of indices. - FFT can occur with contiguous partitions.
- Requires an extra data copy.
- Our solution, determine indexing in place.
Note that the paper has a typo.
9Fourier Symmetry of Real Sequences
- In general, the frequency spectra of even real
functions contain imaginary values. - Captures magnitude and phase shift of sinusoids.
- Brute force FFT doubles computation and storage
costs. - But, Fourier transforms of real functions have
symmetry. -
- Values at and are real
(because they are conjugates with themselves).
10Fourier Transform of Real Functions
- Pick two functions, let them be f(x) and g(x).
- Let h(x) f(x) j g(x).
- Note that there is no loss of information.
- Can perform FFT of h in half the time as
performing the brute force FFT of f and g
individually. - Simply point to one row of image as real
components and another as imaginary components.
f
g
11Untangling Fourier Transform Pairs
- Fourier transform is linear.
- H(u) F(u) j G(u)
- We can untangle using symmetry of F and G.
- Add and subtract H(u) and H(N u) to cancel out
conjugate terms of F and G.
12Untangling Fourier Transform Pairs
13Packing Transforms of Real Functions
- We can store Fourier transform in an array the
same size as the input. - Throw away conjugate duplicates.
- Throw away imaginary values known to be zero.
14Column-wise FFT
- We have two columns with real values.
- Use same tangled approach.
- All other columns are complex numbers.
- Use regular FFT.
Real
Real
Paired for Complex
15Packing 2D Transforms of Real Functions
- Rows transformed from complex values are already
packed appropriately. - The two rows transformed from real values are
untangled and packed to follow suite.
Real Values
Imaginary Values
16Available Resources
- nVidia GeForce FX 5800 Ultra.
- Full 32-bit floating point pipeline and frame
buffers. - Fully programmable vertex and fragment units.
- Cg
- High level language for vertex and fragment
programs. - Traditional CPU 1.7 GHz Intel Zeon
- Freely available high performance FFT
implementations.
17Implementation
- Using a SIMD model for parallel computation.
- Draw quadrilateral parallel to screen.
- Rasterizer invokes the same fragment program in
parallel over all pixels covered by
quadrilateral. - Inputs/output dependent on location of pixel the
fragment program is running. - We require many rendering passes.
- Use render to texture extension.
- Use two frame buffers one for retrieving values
of last pass and one for storing results of
current computation.
18Implementation
FFT
FFT
Untangle
Untangle
Frequency Spectra
Images
FFT
FFT
Untangle
Untangle
19Fragment Programs
- Written in Cg, compiled for GeForce FX.
20Applications
21Applications
- Texture generation.
- Volume rendering.
22Performance
- Computation speed 2.5 GigaFLOPS
- Texture read rate 3.4 GB/sec
23Conclusions
- The Fourier transform on the GPU has many
potential applications. - A well established FFT on the CPU (FFTW) still
has an edge over GPU implementation. - Both software and hardware of GPU are first
generations. - Room for improvement.
24Get the Cg Code
- http//www.cgshaders.org ?
- http//www.cs.unm.edu/kmorel/documents/fftgpu
- kmorel_at_sandia.gov
25Questions?