Title: Comparing IA-64 and HPL-PD
1Comparing IA-64 and HPL-PD
2Overview
- IA-64 has a number of novel features for
supporting ILP - Predication
- Data Speculation
- Control Speculation
- Software Pipelining
- Compiler-directed Caching
- These features all exist in HPL-PD!
- also great similarity in ISA (arithmetic, logic
operations, etc). - there are few extensions
- Multimedia Instructions
- Semaphore Instructions
3Predication Support
- IA-64 and Trimaran both support conditional
executions of instructions through predicate
registers, and instructions to manipulate them. - Both support parallel compare operations
- I.e. assigning to two predicate registers
simultaneously - through a modifier in HPL-PD
- through a completer in IA-64
- wired-and, wired-or
4Control Speculation
- Control Speculation is supported in both IA-64
and HPL-PD with the same semantics - IA-64
- GPR includes 1 bit speculation tag (NAT bit)
- FPR uses a special encoding called NATVal
- No extra bit needed
- Only LOAD instruction has control speculative
version - Need verification instruction for exception
handling - HPL-PD
- Both GPR and FPR have speculation tag
- Extra bit like NAT in IA-64
- All integer instruction and float point
instruction have control speculative versions - Exception is automatically tracked by the hardware
5Control SpeculationIA-64 Example
6Control SpeculationHPL-PD Example
7Data Speculation
- Data speculation is supported in both IA-64 and
HPL-PD in a similar manner. - I.e. moving a load above a store that may write
to the same address. - IA-64
- Supports load checking (ld.s) as well as checking
with recovery - The compiler can move up not only the
definitions, but also one or more of its uses
(check.a) - HPL-PD
- Also supports recovery in load checking (BRDV)
8Data SpeculationExamples
IA-64
HPL-PD
9Data SpeculationRecovery Examples (IA-64)
10Data SpeculationRecovery Examples (HPL-PD)
11Compiler Directed Cache
- The memory hierarchy is visible to the compiler
in both HPL-PD and IA-64 - IA-64
- The compiler can supply hints in store, load, and
prefetch instructions on where in the cache
hierarchy the data will be found or left. - For prefetching, the lfetch instructions
requests that cache lines be moved between
different levels of the memory hierarchy. - lfetch maintains cache coherence
- HPL-PD
- The compiler can also supply hints in store, and
load instructions - Prefetching is simply a load to R0
12Compiler Directed CacheIA-64
13Compiler Directed CacheHPL-PD
14Support for Software Pipelining
- Both IA-64 and Trimaran implement rotating
registers, loop counters, and epilogue counters
in combination with predication. - Used to implement modulo scheduling of loops.
15Software Pipelining ExampleHPL-PD
Example of software pipelining in Trimaran
A slice executed as a single VLIW instruction.
Taken from the Trimaran Tutorial
16Software Pipelining IA-64
Software pipelining on the IA-64
loop (p14) ld1 r32 r12,1 (p15) add r34 1,
r33 (p16) st1 r13 r35,1 br.ctop loop
C source for (i0 iltn i) yi xi 1
Taken from the Intel web tutorial
17Differences
- Multimedia Instruction
- Semaphore Instruction
- Register Stack Engine
18Register Stack Engine
- IA-64 implements a mechanism called a register
stack engine (RSE) that manages the dynamic
allocation of stack frames using registers
gpr32-gpr127. - The operations of the RSE are transparent to the
software. - It ensures that contents of registers are always
available.
19Multimedia Instruction
- IA-64 has multimedia instructions that treat the
GPRs as concatenation of eight 8-bit, four 16-bit
or two 32-bits and operate on each element
independently and in parallel. - Inspired by MMX
- The instructions include
- parallel addition and subtraction
- parallel average
- parallel shift left and add
- parallel compare
- parallel multiply right
20Semaphore Instruction
- IA-64 has semaphore instructions that
- atomically load a general register from memory,
- perform an operation and
- then store a result to the same memory location.
- The instructions include
- exchange
- compare and exchange
- fetch and add