IA-64 Microarchitecture --- Itanium Processor - PowerPoint PPT Presentation

About This Presentation
Title:

IA-64 Microarchitecture --- Itanium Processor

Description:

Prefetches up to 32 bytes per cycle (2 bundles) into a prefetch buffer (up to hold ... Overall: not so good as Intel has advertised. Conclusion. Large code size ... – PowerPoint PPT presentation

Number of Views:618
Avg rating:3.0/5.0
Slides: 22
Provided by: junf
Category:

less

Transcript and Presenter's Notes

Title: IA-64 Microarchitecture --- Itanium Processor


1
IA-64 Microarchitecture --- Itanium Processor
  • Jun Feng
  • Jun Xie
  • Huafeng Lü

2
Outline
  • Introduction
  • Pipeline Issue
  • Performance Comparison
  • Summary

3
Itanium Processor
  • First implementation of IA-64
  • Compiler based exploitation of ILP
  • Also has many features of superscalar

4
(No Transcript)
5
(No Transcript)
6
10-stage Pipeline
  • Front-end
  • Instruction delivery
  • Operand delivery
  • Execution

7
(No Transcript)
8
Front-end
  • IPG, Fetch, Rotate
  • Prefetches up to 32 bytes per cycle (2 bundles)
    into a prefetch buffer (up to hold 8 bundles)
  • Branch prediction is done using a multilevel
    adaptive predictor

9
(No Transcript)
10
Instruction delivery
  • EXP and REN
  • Distributes up to 6 instructions to the 9
    functional units
  • Implements registers renaming for both rotation
    and register stacking

11
(No Transcript)
12
Operand delivery
  • WLD and REG
  • Accesses the register file
  • Performs register bypassing
  • Accesses and updates a register scoreboard
  • Checks predicate dependences

13
(No Transcript)
14
Execution
  • EXE, DET and WRB
  • Executes instructions through ALUs and load/store
    units
  • Detects exceptions and posts NaTs
  • Retires instructions and performs write-back

15
(No Transcript)
16
(No Transcript)
17
Integer Performance SPECint benchmark
considerably slower
  • Itanium is considerably slower than Alpha 21264
    and Pentium 4.
  • Only 60 of of P4, 68 of Alpha
  • Itanium HP rx4610, 800MHz, 4MB off-chip L3 cache
  • Alpha 21264 Compaq GS320, 1GHz, on-chip L2 cache
  • Pentium 4 Compaq Precision 330, 2GHz, 256KB
    on-chip L2 cache

18
Floating Point Performance SPECfp benchmarks a
different story
  • Itanium is quicker than Alpha 21264 and Pentium
    4.
  • 108 of of P4, 120 of Alpha
  • Itanium HP rx4610, 800MHz, 4MB off-chip, L3
    cache
  • Alpha 21264 Compaq GS320, 1GHz, on-chip L2 cache
  • Pentium 4 Compaq Precision 330, 2GHz, on-chip L2
    cache

19
Discussion on SPECfp
  • Floating point app competitive
  • .higher degrees of ILP
  • .aggressive memory system
  • Art benchmark 4 times of Pentium 4
  • Alpha outperform when tuned
  • In terms of power worse than P4
  • 56 of floating point performance per watt

20
Summary By Us
  • Good floating point performance
  • Poor integer performance
  • Overall not so good as Intel has advertised

21
Conclusion
  • Large code size
  • Only static instruction-level parallelism
  • Cannot manage cache misses/hits flexibly
  • Lack of applications
Write a Comment
User Comments (0)
About PowerShow.com