Bridged, ThreePath Fused MultiplyAdders - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Bridged, ThreePath Fused MultiplyAdders

Description:

Bridged, Three-Path Fused Multiply-Adders. A proposal ... Proposed project in IEEE-754 (1985) double-precision format. 52-bit significand/mantissa (fraction) ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 29
Provided by: Electrical55
Category:

less

Transcript and Presenter's Notes

Title: Bridged, ThreePath Fused MultiplyAdders


1
Bridged, Three-Path Fused Multiply-Adders
  • A proposal for the improvement of on-chip FMAs

Eric Quinnell, M.S.E.E. under the supervision
of Professor Earl E. Swartzlander, Jr.
2
Qualification Committee
  • Dr. Earl E. Swartzlander, Jr.
  • Dr. Adnan Aziz
  • Dr. Jacob Abraham
  • Dr. Tony Ambler
  • Dr. Jason Arbaugh
  • Mr. Carl Lemonds

3
Outline
  • Introduction
  • Problem Statement
  • Previous Work
  • Proposed Work
  • Expected Results
  • Implementation
  • Conclusion
  • Questions

4
Introduction
  • Proposed project in IEEE-754 (1985)
    double-precision format
  • 52-bit significand/mantissa (fraction)
  • 11-bit exponent
  • 1-bit sign
  • Follows format of previous papers on subject
  • Focus on arithmetic, operand execution
  • Exceptions, specials, denorms, NaN, infinity not
    considered

5
Fused Multiply-Add
  • Principal paper and patent under Montoye et al.
    from IBM in 1990
  • Equation found in any polynomial
  • Used in DSP, FFT, graphics, division,
    transcendentals, dot-products, advanced
    mathematics
  • Faster than FADD, FMUL
  • Only one rounding stage

D (A x B) C
6
Fused Multiply-Add
RISC System/6000
Montoye et al., IBM 1990
7
Industrial Use
  • IBM RS/6000
  • IBM PowerPC 603 604 series
  • HP PA 8000 series
  • MIPS R10000
  • ARM VFP10
  • Intel Itanium

8
Problem Statement
  • All industrial FMAs use RS/6000 architecture as
    base
  • Many FMAs entirely replace FADD, FMUL. This taxes
    the stand-alone instructions. (Bad backwards
    compatibility)
  • FADD (A x 1.0) B
  • FMUL (A x B ) 0.0
  • FMA has weak success in industry

9
Problem Statement
In order for the FMA unit to have a future in
processing and to continue the benefits of its
use, a new architecture that both reduces latency
and remains compatible with old applications must
be designed.
10
Previous work Power PC 603e
11
Previous Work HAL SPARC64
  • pseudo-FMA forwards finished multiplies
    directly to the FPA
  • FMA data is rounded twice, hence pseudo

12
Previous Work Lang/Bruguera
  • Combine addition/rounding stage
  • Critical path is through LZA. Data waits at
    161-bit normalizer for shifting instructions

13
Previous Work Seidel Multi-Path
  • 5-cases and paths
  • Speculatively compute in parallel.
  • Select path at the end based on correct exponent
    difference
  • Stemmed from the dual-path FPA

14
Previous Work Seidel Multi-Path
15
Previous Work Xiao
  • 3-input LZA equations to speed up critical path
    of Lang/Bruguera

16
Proposed Work Three-Path FMA
  • Variation on Seidel multiple-path suggestion by
    reducing 5-cases to 3-paths
  • Uses a Lang/Bruguera improvement to combine
    addition/rounding with a Xiao 3-input LZA for
    near path
  • Architecture designed for reduced latency

17
(No Transcript)
18
Proposed Work Bridged FMA
  • Variation of SPARC pseudo FMA.
  • Keep full multiplier and adder in FPU.
  • Bridge the two by re-using resources when FMA
    instruction is called
  • Architecture designed for backwards compatibility
    with legacy code

19
(No Transcript)
20
Proposed Work Bridged, 3-Path FMA
  • Combination of three-path FMA and bridged FMA
  • Three-path for reduced latency
  • Bridge for re-use of components, backwards
    compatibility
  • Only needs to share multiplier

21
(No Transcript)
22
Expected Results
  • 3-path FMA architecture expected to be fastest,
    lowest latency FMA to date
  • Bridged FMA expected to perform execution without
    adding latency to multiplier/adder
  • 3-path, bridged expected to provide both reduced
    latency and hardware reuse, providing a full
    execution hardware set option for future FPUs.

23
Implementation
  • All proposed hardware, as well as an RS/6000,
    will be implemented for latency, area, and power
    comparison
  • Implementation will be done using AMD 45 and 65nm
    HSpice models, timing tools, floorplanning, power
    estimation, routing, and parasitic extraction
  • Tool licensing agreement in writing. Tool use
    will be considered a PR donation from AMD for
    support of UT dissertation research.

24
Implementation Schedule
25
Conclusion
  • FMAs are a rising power in industrial level
    computing, with several chips already putting
    them to use
  • No reduced latency improvements to the RS/6000
    architecture have been adopted in the 16 years
    since its introduction
  • Single add or multiplication instructions are
    currently latency taxed in FMAs
  • Proposed three-path and bridged architectures
    decrease FMA latency significantly and remain
    single add/multiply compatible
  • Proposed architectures to be implemented on AMD
    45nm/65nm technology to prove theorized gains

26
Break and Questions
27
Courses Graduate

28
Courses Upper Division (BSEE)
Write a Comment
User Comments (0)
About PowerShow.com