IA64 Overview - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

IA64 Overview

Description:

mix, mux, pack, unpack. mpy(2x2- 4), mpyshr(2x2- 2 bytes) ... Mix, Swap, Pack, Unpack, Select. Convert to integer. 10/26/09. IA-64 Overview, Vail. 15 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 16
Provided by: FOTL
Category:
Tags: ia64 | overview | unpack

less

Transcript and Presenter's Notes

Title: IA64 Overview


1
IA-64 Overview
2
Architectural History
  • Cydrome/Multiflow - late 1980s
  • HP labs 1991 to 1994. Instruction set spec 11/93
  • 3 instruction bundles, templates
  • 128 gr, fr registers, rotating and stacking
  • 8 branch registers, 64 predicate registers
  • HP-Intel joint effort starts June 1994
  • Many architectural changes
  • Completed early 1996.

3
Machine Resources
Floating point
General
NAT
Preds
Branch
0
0
gr0 0
KR0 ... KR7 RSC BSP BSPSTORE RNAT IA-32 IA-32
0true
fr0 0.0
0
fr1 1.0
15
16
31
31
31
8
32
32
32
64
Instr. Ptr.
63
IP
64
CCV UNAT FPSR ITC
PMD0 PMD1
User Mask
6
127
PFS LC EC
Performance Monitors
127
127
127
64
82
64
64
4
Application Registers
  • KR0-7 Kernel regs, writable priv level 0
  • RSC/BSP/BSPSTORE/RNAT register save engine
  • IA-32 control registers (7)
  • CCV Compare and exchange value
  • UNAT Nat collection for ld8.fill/spill
  • FPSR Floating point status register
  • ITC Interval Timer
  • PFS previous frame marker, epilog count, priv
    level
  • LC, EC Loop count and epilog count

5
Instruction Bundling
128 bit aligned instruction bundles contain
Three 41 bit instructions 4 bit dispersal
template, 1 bit cross bundle parallelism Branches
are to bundle boundaries Implementations are
allowed to have any number of functional
units, so no guarantee of parallel execution
Template controls dispersal to functional units
Memory, Integer, Floating point, Branch,
long immediate Little-endian bit/byte/slot
numbering
s
slot 2
slot 1
slot 0
tmplt
6
Templates and Dispersal
Templates 0 1 2 M L X M I I M
I / I M M I M/ M I M F I M M F M F
B M I B M B B B B B Left to
right execution / optional stop
I
I
I
I
0
1
2
0
1
2
M
M
F
F
Dispersal of instructions to functional units is
very simple. This example shows a two
bundle CPU with two M units, four I units, and 2
F units. Each functional unit takes instructions
from one or two slots only.
7
Assembly Language Format
(pred) opcode.completer targetsource,source
  • pred specifies controlling predicate p0..p63
  • (pred) is optional, defaults to p0
  • completer is zero or more modifiers to the basic
    operation
  • the target register r0..r127, fr0..fr127,
    b0..b7, etc
  • one or more source registers or immediate values
  • forces a stop.

8
Memory Unit Only Instructions
  • Integer (1/2/4/8 byte) and FP (s/d/e/pair)load
  • Integer (1/2/4/8 byte) and FP (s/d/e) store
  • Base update, reg or immediate, load or store
  • Advanced(.a)/Speculative(.s)/both(.sa)
  • .Fill and .Spill (Nat bit or 82 bit FP)
  • Ordered load (.acq), store (.rel)
  • Check load (.c.nc, .c.clr, .c.clr.acq)
  • Speculation check (chk.a, chk.s for FR)
  • Cache hints (.nt1, .nta, .bias)

9
Memory Unit Only Instructions
  • Compare exchange, Fetch and add, Exchange
  • Fence, Sync, Serialize instruction stream
  • Flush cache, flush register stack, purge TLB
  • Line Prefetch (.fault, .nt1, .nta, .excl)
  • Getf/Setf move GR to/from FR
  • Move to/from AR, PSR, CPUID
  • Alloc

10
Memory or I unit Instructions
  • Move GR to GR
  • Add, Sub (immediate or 1)
  • Add long (22 bit imm), Shladd (1-4), Add pointer
  • And, Or, Xor, Andcm (reg or immediate)
  • Add pointer, shift and add pointer
  • Speculation check (chk.s for GR)

11
Memory or I unit Instructions
  • Compare (64 or 32 bit) to two predicates
  • 3 conditions (.eq, .lt, .ltu) plus 7 psuedo-ops
  • 5 types (none, .unc, .or, .and, .or.andcm)
  • Parallel integer arithmetic (1, 2, 4 byte)
  • add, sub, signed/unsigned, optional saturating
  • average, compare (.eq, .gt to target)

12
I unit Only Instructions
  • Shift, extract, deposit, test bit, sign extend
  • Find first zero, pop count
  • Move long immediate (64 bits)
  • Parallel shifts and multiplies
  • mix, mux, pack, unpack
  • mpy(2x2-gt4), mpyshr(2x2-gt2 bytes)
  • shift (2/4 bytes), shift and add (2 bytes)
  • min max, (1/2 bytes)
  • sum of absolute diffs (1 byte)

13
FP Unit Instructions
  • FPMuladd, FPMulsub, FPNegMul FPNegMulAdd
  • FPMax, FPMin
  • Integer Muladd
  • Reciprocal, sq root approximation
  • Convert to/from integer
  • Test FP class
  • And, Or, Xor, Andcomplement
  • Merge
  • Clear flags, check flags

14
Parallel FP Arithmetic
  • Mul, Muladd, Mulsub, NegMul, NegMulAdd
  • Reciprocal, Sq root approximation
  • Compare
  • Min, Max, abs min, abs max
  • Mix, Swap, Pack, Unpack, Select
  • Convert to integer

15
Branches
  • 9 types
  • Conditional (by predicate), .call, .return
  • to IA-32 instruction set (.ia)
  • Loops (.cloop, .ctop, .cexit, .wtop, .wexit)
  • Hints
  • Static/dynamic, taken/not taken
  • Sequential prefetch few/many lines
  • Deallocate branch cache info
Write a Comment
User Comments (0)
About PowerShow.com