Title: Floating Point Computing in DSP Systems
1Floating Point Computing in DSP Systems
- By Mehrnaz Monajati
- Instructor Dr. S.M. Fakhrai
- This is a class presentation. All data are copy
rights of their respective authors as listed in
the references and have been used here for
educational purpose only.
2Fixed vs. Floating Point DSPs
- Cost
- Ease of use
- Accuracy
- Dynamic range
3Fixed vs. Floating Point DSPs
- Cost
- Today, fixed-point DSPs continue to benefit more
from cost reductions of scale in manufacturing - since they are more often used for high-volume
applications - the same reductions will apply to floating-point
DSPs when high-volume demand for the devices
appears. - Today, cost has increasingly become an issue of
SOC integration and volume, rather than a result
of the size of the DSP core itself.
4Fixed vs. Floating Point DSPs
- Last days
- TI floating-point supported the C language
- FXP DSPs were programmed at the assembly code
level - Coding of real arithmetic in to hardware
- Directly in FLP
- indirectly in FXP
- software routines that added development time and
extra instructions to the algorithm - Programming
- Easier in FLP
- Today
- TI fixed-point DSPs have long been supported by
outstandingly efficient C compilers - The advantage of implementing real arithmetic
directly in floating-point hardware still remains - Reduction in FXP complexity
- FXP DSPs still have an edge in cost and FLP DSPs
in ease of use, but the edge has narrowed
5Fixed vs. Floating Point DSPs
- Accuracy
- Dynamic range
- Accuracy of FLP is greater than FXP
- FLP has greater precision in integer as well as
real values - Exponentiation vastly increases the dynamic range
- Internal data representations in FLP DSPs are
more exact than in FXP - ensuring greater accuracy in end result
6Fixed vs. Floating Point DSPs
- FXP DSPs
- TIs TMS320C62x FXP DSPs
- Two data paths operating in parallel
- Each with a 16-bit word width
- provides signed integer values within a range
from 215 to 215 - TMS320C64x DSPs,
- double the overall throughput with four 16-bit
multipliers - TMS320C5x and TMS320C2x DSPs
- designed for handheld and control applications,
respectively - are based on single 16-bit data paths
7Fixed vs. Floating Point DSPs
- FLP DSPs
- TMS320C67x FLP DSPs
- divide a 32-bit data path into two parts a
24-bit mantissa and an 8-bit exponent. - 16M range of precision
- supporting a vastly greater dynamic range than is
available with the FXP format. The C67x DSP can
also perform calculations - C67x DSP
- Using industry-standard double-width precision
- 64 bits, including a 53-bit mantissa and an
11-bit exponent - Achieves much greater precision and dynamic range
at the expense of speed, since it requires
multiple cycles for each operation
8Standards for FLP Number Formats
9FLP Nnumber Formats
10Sample Floating Point DSPs
- AMD - Athlon Processor
- Xilinx Virtex-5 APU Floating Point Unit
- Digital Core Design DFPAU ver 2.05
11AMD - Athlon Processor 2000
- Include the most powerful floating point engine
for x86 platforms - Delivers twice the peak x87 floating point
execution rate of the Intel Pentium III
processor - Rivals the FP performance of many RISC processors
in that time - Superscalar and Super pipelined
- Higher clock frequencies
- Higher overall throughput
Ref. 3
12AMD - Athlon Processor 2000
Ref. 3
13Xilinx Virtex-5 APU FLP Unit 2009
- designed for the PowerPC 440 embedded
microprocessor of the Virtex-5 FXT FPGA family - support for IEEE-754 standard in single or double
precision - Optimized for 21 and 31 APUCPU clock ratios
- allowing PowerPC processor to operate at maximum
frequency - Application
- Digital signal processing of high-quality audio
or video signals where a very large dynamic range
is needed to retain fidelity. - Matrix inversion in wireless communications and
radar - Digital signal processing tasks, spectral methods
such as FFT - Statistical processing
- where floating-point is often the simplest way to
avoid integer overflow and rounding errors
14Xilinx Virtex-5 APU FLP Unit 2009
- Increased Processing Capacity
- Hardware floating-point operations complete
faster than the equivalent software emulation
routines - The floating-point operators within the FPU are
pipelined - multiple floating-point calculations can proceed
in parallel - The FPU is autonomous
- the PowerPC processor internal pipeline can
continue to execute integer instructions while
floating-point operations are handled by the FPU
in parallel - IEEE 754-1985 / Book-E Standard Compatibility
- The standard represents very small numbers by
allowing significands of the form "0.x" in
addition to the usual 1.x used by normalized
FLP numbers - In Book-E, the multiply part of a multiply-add
operation should not round its result before
supplying it to the addition part - The FPU treats all not-a-number (NaN) values as
quiet NaNs, which do not cause exceptions. When a
floating-point operation results in a NaN because
one of the inputs was a NaN, the input NaN is not
propagated to the output the default quiet NaN
value is provided. This value is
0x7ff8000000000000 in double precision, and
0x7f800000 in single precision
15Xilinx Virtex-5 APU FLP Unit
Ref. 4
16Digital Core Design DFPAU ver. 2.05, 2010
- It is a FLP Arithmetic Co-processor
- directly replaces C software functions, by
equivalent, very fast hardware operations - significantly accelerate system performance
- It doesnt require any programming
- Everything is done automatically during software
compilation by the DFPAU C driver. - Supports addition, subtraction, multiplication,
division, square root, comparison, absolute value - The input numbers format is according to IEEE-754
- Each floating point function can be turned on/off
at configuration level - providing the flexible scalability of DFPAU
module - technology independent design
17Digital Core Design DFPAU ver. 2.05, 2010
Ref. 5
Ref. 5
18Architectural Modification to Improve FLP Unit in
FPGAs 2008 1
- Variable length shifters account for over 30 of
a adder and 25 of a multiplier - Coarse-grained approach
- Embedded Shifter
- fine-grained approach
- Multiplexer
embedded shifter 41 multiplexer
Consumed chip area 1.5 0.48
Saved area 14.6 7.3
Increased clock rate 3.3 11.6
19Low power FLP Unit 2009 2
- Design of embedded systems applications with low
power consumption and fast processing - performing basic operations such as addition,
subtraction, multiplication and division - Idea
- the functional units (adder, shifter, registers)
are shared between different operations - Advantage saving silicon area
- Disadvantage the increase in the number of
cycles required to perform the operation
20Low power FLP Unit - 2009
Ref. 2
21Low power FLP Unit - 2009
Ref. 2
22Reconfigurable FLP Unit 2009 7
- Non-numerical applications usually have very few
FLP operations - FLP unit is always under idle mode
- In idle mode, the floating-point unit still
consume power and the die area is wasted - Idea
- reconfigurable floating-point unit that provide
integer and floating-point operations
23Reconfigurable FLP Unit
rAMM Array
Ref. 7
24Reconfigurable FLP Unit
Ref. 7
25Reconfigurable FLP Unit
Ref. 7
Ref. 7
26References
- M. Beauchamp, et al., "Architectural
modifications to enhance the floating-point
performance of FPGAs," IEEE Transactions on Very
Large Scale Integration Systems, vol. 16, p. 177,
2008. - R.Neves, et al. "A Floating Point Unit
Architecture for Low Power Embedded Systems
Applications," XXIV SIM - South Symposium on
Microelectronics, 2009. - AMD Athlon Floating Point Engine, "AMD Athlon
Processor floating Point Capability, The Most
Powerful, Architecturally Advanced Floating Point
Engine Ever Delivered in an x86 Microprocessor,"
with paper, 2000. - Xilinx DS693 Virtex-5 APU Floating-Point Unit
v1.01a, Data Sheet, DS693, 2009. - DFPAU floating-point pipelined divider, 2010,
lthttp//www.altera.comgt. - G. Frantz and R. Simar, "Comparing Fixed and
Floating Point DSPs," SPRY061, Texas Instruments,
2004. - Y. Lee and J. Jou, "Design of A Reconfigurable
Floating-Point Unit," 2009.
27Thanks for Your attention
28Embedded shifter block diagram
Ref. 1
2941 Multiplexer
Ref. 1