Decimal Floating-point Multiplication via Carry-Save Addition - PowerPoint PPT Presentation

About This Presentation
Title:

Decimal Floating-point Multiplication via Carry-Save Addition

Description:

Decimal Floating-point Multiplication via Carry-Save Addition Mark Erle Systems & Technology Group International Business Machines Brian Hickmann & Mike Schulte – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 23
Provided by: lirmmFrar
Category:

less

Transcript and Presenter's Notes

Title: Decimal Floating-point Multiplication via Carry-Save Addition


1
Decimal Floating-point Multiplicationvia
Carry-Save Addition
  • Mark Erle
  • Systems Technology Group
  • International Business Machines

Brian Hickmann Mike Schulte Electrical
Computer Engineering University of Wisconsin at
Madison
2
Outline
  • Introduction and motivation
  • Extensions to fixed-point design
  • Implementation highlights
  • Verification and synthesis results
  • Summary

3
Introduction
  • Preponderance of business data in decimal form
  • Inexact mapping between decimal and binary
  • Decimal arithmetic used/required in banking,
    finance, insurance, accounting
  • Increasing support in arithmetic community, (IEEE
    P754 in ballot review process)
  • Multiplication a key function

4
Motivation
  • What's involved in extending fixed-point
    multiplication to support floating-point?
  • What are the similarities and differences
    with BFP multiplication?

5
(No Transcript)
6
Intermediate Exponent Calculation
  • Preferred exponent
  • PE EA EB - bias
  • Based on location of the decimal point (effective
    shift right)
  • IEIP PE p
  • After left shifting the intermediate product
  • IESIP IEIP SLA

7
Intermediate Product Shifting
  • Based on leading zero counts of operands
  • SLA may be off by one need guard digit
  • SLA min(LZA LZB, p)
  • Shift right when IEIP lt Emin

8
Sticky Bit Generation
  • Logically, all bits beyond the round digit must
    be ORed after left shifting
  • SC SIP p 2, where 2 is for g and r
  • Generate sticky bit on-the-fly, ORing one digit
    at a time while decrementing SC
  • SC min(0, p (LZA - LZB))
  • SIP - p ((p LZA) (p LZB)) p
  • Calculate two cycles prior to when needed

9
Rounding - Scheme
  • No rounding overflow... simplifies scheme
  • Unique compound adder needed
  • SIP may be in redundant form
  • Require CSIP0 and CSIP1 named C0 and C1
  • Possible corrective left shift (cls) of one digit
  • SIP SA SB or SA SB - 1
  • Adder p digits wide
  • Concatenate g or g 1

10
Rounding Scheme Continued
  • Three cases based on MSDs of C0 and C1
  • No leading zeros, no corrective left shift
  • Leading zeros, possible corrective left shift
  • Zero followed by all nines
  • Logically, select one among the following
  • C0 , C1
  • C0 1 g, C0 1 g 1
  • C1 1 g, C1 1 g 1
  • Zero, largest finite number, infinity

11
Exception Detection Handling
  • Invalid operation
  • sNaN (pass significand of sNaN)
  • 0 x 8 (produce qNaN with significand 0)
  • Overflow (and Inexact)
  • IEIP SLA gt Emax
  • Increase SLA until all LZs removed
  • Underflow (and possibly Inexact)
  • IEIP SLA lt Emin
  • Decrease SLA until 0, then shift right
  • Inexact

12
(No Transcript)
13
Implementation Highlights
  • Leverage operands' LZCs
  • SC, SLA, and IESIP
  • Handle NaNs with minimal overhead
  • No dataflow modification
  • Coerce multiplicand or multiplier to 1
  • Support gradual underflow
  • No dataflow modification
  • Simply extend number of iterations
  • Simple, control-based rounding scheme

14
RTL Model and Verification
  • Verilog model for both fixed-point and
    floating-point multiplier designs
  • All rounding modes, NaNs, exceptions
  • Over 500,000 random directed testcases
  • IBM decNumber based
  • IBM Haifa's FPgen (IEEE754R compliance)
  • IBM dectest
  • Validated pre- and post-synthesis

15
Synthesis Results
  • 64-bit (16 digit) operands, DPD encoded
  • LSI Logic's gflxp 0.11um CMOS, 55ps FO4
  • Synopsys Design Compiler
  • Results
  • Fixed-point 119,653 um2 14.72 FO4s
  • Floating-point 237,607 um2 15.45 FO4s
  • Critical path
  • Fixed-point 42 compressor (accumulator)
  • Floating-point 128-bit barrel shifer

16
Applicability to Parallel Designs
  • IE and IP shift generation
  • Rounding scheme
  • NaN handling
  • Exception detection and handling
  • On-the-fly sticky bit generation... NO

17
Sequential vs. Parallel
  • Sequential
  • Less area
  • Potentially better cycle time
  • Parallel
  • Less latency
  • Higher throughput

18
Summary
  • Extended fixed-point, serial multiplier to
    support floating-point
  • Leveraged operands' LZCs
  • Developed an efficient rounding scheme
  • Verified RTL and gate-level models
  • Presented area and delay numbers for fixed- and
    floating-point designs
  • Discussed applicability to parallel designs

19
Et voilà!Vive le système décimale!
20
  • Backup Slides

21
No Rounding Overflow
  • If SIP 2p 1
  • MSD 0
  • Increment will not cause rounding overflow
  • If SIP 2p
  • Then we must have string of p 9s
  • p 9s is greater than maximum product
  • No rounding overflow possible
  • Simplifies rounding scheme

22
Decimal Storage Format
Write a Comment
User Comments (0)
About PowerShow.com