Title: Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation
1Numerical Error Minimizing Floating-Point to
Fixed-Point ANSI C Compilation
- Tor Aamodt and Paul Chow
- University of Toronto
2Presentation Outline
- Background / Motivation
- Floating-to-Fixed-Point Conversion
- Architectural Support
- Experimental Results
- Summary / Future Directions
3Background University of Toronto DSP Project
- Motivation DSP Compiler/Architecture Co-design
- First Generation Silicon (Sean Pengs M.A.Sc.
Thesis) taped- out Sept. 30, 1999 108 pin PGA /
0.35 µm CMOS / 63 MHz - 16-bit Fixed-Point VLIW with Two-Level
Instruction Fetching - Harvard Memory Architecture
- 5 stage pipeline IF1 ? IF2 ? ID ? EX ? WB
- 7 function units
- 2 integer units 16.0 multiply 1.15 multiply
operations - 2 address units modulo addressing
- 2 memory units each tied to one data memory
bank - 1 control unit
4BackgroundFixed-Point versus Floating-Point
32 bit Floating-Point (IEEE)
Fixed-Point
5BackgroundFixed-Point versus Floating-Point
6Motivation
- Why convert floating-point code to fixed-point
code? Saves area and power. - Why automate the process? Manual conversion is
time-consuming and error-prone. - What qualities are we looking for in an automated
conversion system? Good signal quality. Fast
code.
7Background Fixed-point Numerical
Representations in Signal Processing
- Consider a program P with associated inputs x(k)
? SP. Example P an IIR filter, SP the set of
all human speech samples x(k). - Signal Scaling Integer Word Length (IWL)
- definition
8BackgroundFixed-Point Arithmetic Operations
gtgt n (binary point alignment)
Multiplication
9Presentation Outline
- Background Material / Motivation
- Floating-to-Fixed-Point Conversion
- Architecture Support
- Experimental Results
- Summary / Future Directions
10Conversion ProcessPrevious Work
- Worst-Case Evaluation Markus Willems et. al.
FRIDGE An Interactive Code Generation
Environment for HW/SW CoDesign. ICASSP, April
1997. - A Statistical Approach Ki-Il Kum, Jiyang Kang,
and Wonyong Sung. A Floating-Point to
Fixed-Point C Converter for Fixed-Point Digital
Signal Processors. In Proc. 2nd SUIF Compiler
Workshop, August 1997.
11Conversion Process Overview
sin(x) ? utdsp_sin(x)
float p, x, y, AN, BN for( int i0 i lt N
i ) p (condition) ? A B y
xpi
float fubar( float p ) float sum 0.0
for( int i0 i lt N i) sum pi
12Conversion Process Collecting Dynamic Range
Information
Code Instrumentation
profile(tmp_1,1) profile(tmp_2,2) profile(y,0)
fin
13Conversion Process Desired Result
Continuation of Previous Example
float a, b, xN y axi bxi1
14Conversion ProcessType Conversion / Scaling
Operation Generation
- Type conversion float, double ? int
- Scaling Operations are added to expression trees
using a post-order traversal... - Two previous algorithms from the literature for
generating scaling operations... - Neither use Intermediate Result Profile data,
instead, they combine range information from leaf
nodes in a bottom-up fashion. - Is Useful Information Lost?
15Conversion ProcessIRP Using Intermediate
Result Profile Data
- Worst-Case Evaluation Markus Willems et. al.
FRIDGE An Interactive Code Generation
Environment for HW/SW CoDesign. ICASSP, April
1997. - A Statistical Approach Ki-Il Kum, Jiyang Kang,
and Wonyong Sung. A Floating-Point to
Fixed-Point C Converter for Fixed-Point Digital
Signal Processors. In Proc. 2nd SUIF Compiler
Workshop, August 1997. - UTDSP Algorithms IRP, IRP-SA
- Each node ? has a measured IWL and a current
IWL - Measured IWL as determined by profiling
- Current IWL due to scaling operations
within ?
16Scaling Operation Generation
17IRP Additive Operations
A ? B ? (A ltlt nA) ? (B gtgt n-nB)
where nA IWLA current - IWLA measured nB
IWLA current - IWLB measured n IWLA
measured - IWLB measured
IWLAB current IWLA measured
18IRP Multiplication
IWLAB current IWLA measured IWLB measured
19IRP Division
IWLA?B ndividend IWLA measured - IWLB
measured 1
20IRP-SA Using Shift Absorption
Problem
y (axi bxi1gtgt1) ltlt 1
21Presentation Outline
- Background Material / Motivation
- Floating-to-Fixed-Point Conversion
- Architecture Support
- Experimental Results
- Summary / Future Directions
22Architectural Support
Common occurrence (using IRP-SA)
AB ltlt n
23Presentation Outline
- Background Material / Motivation
- Floating-to-Fixed-Point Conversion
- Architecture Support
- Experimental Results
- Summary / Future Directions
24Experimental Results
- Four test-cases presented in paper
- (1) 4th Order IIR Filter
- (2) 1024 Point Radix 2 Decimation in Time FFT
- (3) Nonlinear Feedback Control System
- (4) 16th Order Lattice Filter
- Look at (1) in detail, summarize results for
others. - Explore some interesting properties exhibited in
(4) that are indicative of possible future
improvements.
25Experimental Results4th Order IIR Filter
- 4th Order Chebyshev Type II Low-Pass Filter
- Designed using MATLABs cheby2 command
- Transfer Function
26Experimental Results4th Order IIR Filter (contd)
- Filter Realization
- MATLABs tfsos command (pole-zero pairing)
- 2 Cascaded Direct-Form IIR filters
27Experimental Results4th Order IIR Filter (contd)
IRP
(A20t2 - A21D20 ltlt 1) (A22D21 ltlt
1 ) ltlt 2
IRP-SA
(A20t2 ltlt 3) - (A21D20 ltlt 3)
(A22D21 ltlt 3)
28Experimental Results1024-Point Radix-2 FFT
29Experimental ResultsRotational Inverted Pendulum
U of T System Control Group Non-linear Testbench
30Experimental ResultsRotational Inverted Pendulum
31Experimental ResultsRotational Inverted
Pendulum - 12-bit Controller Comparison
WC 32.8 dB IRP-SA 41.1 dB IRP-SA w/ fmls
48.0 dB
32Experimental Results16th Order Lattice Filter
t
h
16
Order Elliptic Bandpass Filter Transfer Function
20
0
-20
Magnitude (dB)
-40
-60
-80
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
Normalized Frequency (
rad/sample)
1000
500
0
Phase (degrees)
-500
-1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
Normalized Frequency (
rad/sample)
33Experimental ResultsLattice Filter
34Experimental ResultsLattice Filter
define N 16 double stateN1, KN,
VN1 double lattice( double x ) double
y 0.0 for( int i0 i lt N i )
x x - KN-i-1 stateN-i-1
stateN-i stateN-i-1 KN-i-1x y
y VN-istateN-i state0 x
return y V0state0
35Experimental ResultsLattice Filter
- Observation Wide Dynamic Ranges of state,
V, x, and y are due to Name Dependencies
of array elements and accumulators when assigning
integer word lengths. - Can use Loop Unrolling Renaming to break
dependencies and achieve far better results
(iteration dependant analysis mentioned in FRIDGE
paperhowever no experimental results reported)
36Presentation Outline
- Background Material / Motivation
- Floating-to-Fixed-Point Conversion
- Architecture Support
- Experimental Results
- Summary / Future Directions
37Summary
- Intermediate result profile data can used to
reduce numerical error of fixed-point code. - A fractional multiply with integrated left shift
operation can improve the results, especially
when combined with the IRP-SA algorithm. - Improvements between 3.0 dB and 12.8 dB have been
observed so far.
38Future Directions
- Structural Transformations
- Extended Precision Arithmetic
- Overflows due to accumulated rounding error use
two profiling phases to estimate the effect of
second-order interactions.