Title: IA64 Overview
1IA-64 Overview
2Architectural History
- Cydrome/Multiflow - late 1980s
- HP labs 1991 to 1994. Instruction set spec 11/93
- 3 instruction bundles, templates
- 128 gr, fr registers, rotating and stacking
- 8 branch registers, 64 predicate registers
- HP-Intel joint effort starts June 1994
- Many architectural changes
- Completed early 1996.
3Machine Resources
Floating point
General
NAT
Preds
Branch
0
0
gr0 0
KR0 ... KR7 RSC BSP BSPSTORE RNAT IA-32 IA-32
0true
fr0 0.0
0
fr1 1.0
15
16
31
31
31
8
32
32
32
64
Instr. Ptr.
63
IP
64
CCV UNAT FPSR ITC
PMD0 PMD1
User Mask
6
127
PFS LC EC
Performance Monitors
127
127
127
64
82
64
64
4Application Registers
- KR0-7 Kernel regs, writable priv level 0
- RSC/BSP/BSPSTORE/RNAT register save engine
- IA-32 control registers (7)
- CCV Compare and exchange value
- UNAT Nat collection for ld8.fill/spill
- FPSR Floating point status register
- ITC Interval Timer
- PFS previous frame marker, epilog count, priv
level - LC, EC Loop count and epilog count
5Instruction Bundling
128 bit aligned instruction bundles contain
Three 41 bit instructions 4 bit dispersal
template, 1 bit cross bundle parallelism Branches
are to bundle boundaries Implementations are
allowed to have any number of functional
units, so no guarantee of parallel execution
Template controls dispersal to functional units
Memory, Integer, Floating point, Branch,
long immediate Little-endian bit/byte/slot
numbering
s
slot 2
slot 1
slot 0
tmplt
6Templates and Dispersal
Templates 0 1 2 M L X M I I M
I / I M M I M/ M I M F I M M F M F
B M I B M B B B B B Left to
right execution / optional stop
I
I
I
I
0
1
2
0
1
2
M
M
F
F
Dispersal of instructions to functional units is
very simple. This example shows a two
bundle CPU with two M units, four I units, and 2
F units. Each functional unit takes instructions
from one or two slots only.
7Assembly Language Format
(pred) opcode.completer targetsource,source
- pred specifies controlling predicate p0..p63
- (pred) is optional, defaults to p0
- completer is zero or more modifiers to the basic
operation - the target register r0..r127, fr0..fr127,
b0..b7, etc - one or more source registers or immediate values
- forces a stop.
8Memory Unit Only Instructions
- Integer (1/2/4/8 byte) and FP (s/d/e/pair)load
- Integer (1/2/4/8 byte) and FP (s/d/e) store
- Base update, reg or immediate, load or store
- Advanced(.a)/Speculative(.s)/both(.sa)
- .Fill and .Spill (Nat bit or 82 bit FP)
- Ordered load (.acq), store (.rel)
- Check load (.c.nc, .c.clr, .c.clr.acq)
- Speculation check (chk.a, chk.s for FR)
- Cache hints (.nt1, .nta, .bias)
9Memory Unit Only Instructions
- Compare exchange, Fetch and add, Exchange
- Fence, Sync, Serialize instruction stream
- Flush cache, flush register stack, purge TLB
- Line Prefetch (.fault, .nt1, .nta, .excl)
- Getf/Setf move GR to/from FR
- Move to/from AR, PSR, CPUID
- Alloc
10Memory or I unit Instructions
- Move GR to GR
- Add, Sub (immediate or 1)
- Add long (22 bit imm), Shladd (1-4), Add pointer
- And, Or, Xor, Andcm (reg or immediate)
- Add pointer, shift and add pointer
- Speculation check (chk.s for GR)
11Memory or I unit Instructions
- Compare (64 or 32 bit) to two predicates
- 3 conditions (.eq, .lt, .ltu) plus 7 psuedo-ops
- 5 types (none, .unc, .or, .and, .or.andcm)
- Parallel integer arithmetic (1, 2, 4 byte)
- add, sub, signed/unsigned, optional saturating
- average, compare (.eq, .gt to target)
12I unit Only Instructions
- Shift, extract, deposit, test bit, sign extend
- Find first zero, pop count
- Move long immediate (64 bits)
- Parallel shifts and multiplies
- mix, mux, pack, unpack
- mpy(2x2-gt4), mpyshr(2x2-gt2 bytes)
- shift (2/4 bytes), shift and add (2 bytes)
- min max, (1/2 bytes)
- sum of absolute diffs (1 byte)
13FP Unit Instructions
- FPMuladd, FPMulsub, FPNegMul FPNegMulAdd
- FPMax, FPMin
- Integer Muladd
- Reciprocal, sq root approximation
- Convert to/from integer
- Test FP class
- And, Or, Xor, Andcomplement
- Merge
- Clear flags, check flags
14Parallel FP Arithmetic
- Mul, Muladd, Mulsub, NegMul, NegMulAdd
- Reciprocal, Sq root approximation
- Compare
- Min, Max, abs min, abs max
- Mix, Swap, Pack, Unpack, Select
- Convert to integer
15Branches
- 9 types
- Conditional (by predicate), .call, .return
- to IA-32 instruction set (.ia)
- Loops (.cloop, .ctop, .cexit, .wtop, .wexit)
- Hints
- Static/dynamic, taken/not taken
- Sequential prefetch few/many lines
- Deallocate branch cache info