Register Renaming - PowerPoint PPT Presentation

About This Presentation

Title:

Register Renaming

Description:

How to compile for Post-RISC machines. Dynamic Register Renaming through ... Save previous VP in reorder buffer to be able to roll back. Functional Description ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 32

Provided by: francoga4

Category:

more less

Transcript and Presenter's Notes

Title: Register Renaming

1
Register RenamingValue Prediction
2
Overview

Need for Post-RISC
Register Renaming vs. Allocation Strategies
How to compile for Post-RISC machines
Dynamic Register Renaming throughVirtual-Physical
Registers

3
Software Outlives Hardware

How to make old software run faster?
Faster CPU clock and memory hierarchy
Adapt CPUs to actual software (profiling/tuning)
More instructions per cycle
Todays software will run on tomorrows CPUs
Need to keep software interface stable
More functional units and registers

4
Compile-time vs. Run-time

Little is known about software at compile-time
Space/time trade-offs
Memory speeds cannot keep up with CPU speeds
When to apply optimizations that increase
code-size

5
Solutions

New scalable architecture (IA-64)
Decouple physical/virtual registers using
register windows
More explicit parallelism allows for more
function units
Explicit speculative instructions
Post-RISC architecture
Remove limits in super-scalar implementation of
existing architectures
Extract even more parallelism out of existing
software

6
Anti- and Output Dependencies

Also called read-after-write (RAW) hazards
An instruction may use a result produced by the
previous instruction
Both instructions may not execute simultaneously
in multiple pipelines.
The second instruction must typically be stalled.

7
Structural Dependencies

Stalls results in less than optimal performance
We may have single-issue cycles, which process
only a single instruction.
Worse, we may have zero-issue cycles, which
initiate no new instructions.
Data dependencies can also limit performance for
a scalar machine
Two cycle memory load/write
Intra-instruction dependencies

8
Scheduling

Scheduling can remove stalls
Intra-instruction dependencies cannot be removed
by scheduling (CISC)

9
Need for Post-RISC

Super-scalar has diminishing returns in CPI
(Clocks Per Instruction)
2-Way ? 1.6 - 1.8 (85)
4-Way ? 2.6 (65)
8-Way ? ???
More parallelism needed
Look beyond set of 4 instructions

10
Post-RISC characteristics

Out-of-order execution
(Existed 20 years ago on IBM and CDC)
Innovative for single-chip
Branch history bits
Precise interrupts
Fetch/Flow Prediction
More caching
Instruction cache becomes CPU scratch space
Register renaming
First in IBM 360/91 FPU

11
Specint92 Trends

Specint92 numbers are increasing
DEC has historically been the champ
Specint92/Clock rates
DEC low (21164_at_300 gt 1.14 10/95)
IBM strong early (580H_at_55 gt 1.76 9/93)
HP (PA-8000_at_133 2.7 10/95)

12
The Post-RISC Architecture
13
Post-RISC CPUs

Traditional RISC
DEC Alpha 21164
Sun UltraSPARC-1

(partially) Post-RISC
PowerPC 604
MIPS R10000
HP PA-8000
Intel Pentium Pro
DEC Alpha 21264
HAL SPARC64

14
Automatic Register Renaming

Every R-write allocates new R
The register name A is an alias for the last R
allocated by a write to A
An instruction reading and writing an register
allocates a new R too

15
Advantages over More ISA Registers

Smaller instructions
Allow same software to run on range of
implementations
Compare the same program running on Pentium or
AMD Ath
Less state to save
Faster function calls
Faster context switches
Life-times can be optimized

16
Renaming Implementation

Rename Storage Locations
Reorder Buffer
Physical Register File
Similarities
Allocate at decode
Release at commit

17
Renaming using Reorder buffer

Results are kept in reorder buffer
Source operands are read either from
the register file, or
a reorder buffer entry
Not-yet-ready results are forwarded to
instruction queue
Used by Intel Pentium III, PowerPC 604, SPARC64

18
Renaming on Pentium III

All registers can be renamed (generic,
floating-point, status)
Renaming uses a set of 40 reorder buffers
FPU control/status cannot be renamed
Max 2 renamings per instruction

19
Register Allocation Example

Minimal number of named registers
Scheduling is limited
Strictly serial execution

rA Mem1 rA rA rA Mem2 rA rA
Mem3 rA rA 1 Mem4 rA
Mem2 Mem1 Mem1 Mem4 Mem3 1
20
Renaming using Physical Register File

Register file contains more registers than
defined in ISA (logical registers)
Map logical register to physical registers during
decode
Operands are always read from logical file
Used by MIPS R10000 and DEC 21264

21
Virtual-Physical Registers

Motivation better utilization of physical
registers
Important in presence of long latency
instructions
Conventional scheme wastes register for each
Decoded instruction that has not finished
execution
Committed instruction whose result is dead
Can be eliminated by maintaining reference counter

Example load f2,0(r6) fdiv f2,f2,f10 fmul f2,
f2,f12 fadd f2,f2,1
22
Virtual-Physical Register Renaming

General Map Table
Indexed by logical register L
VP register last virtual-physical register that
L has been mapped to
P register Last physical register that L and VP
have been mapped to
V-bit indicates whether P is valid
Physical Map Table
Has entry for each VP
Contains last physical register that VP has been
mapped to

23
Functional Description

For each logical source register S do a GMT
lookup
If V-bit is set, rename S to P
Otherwise, rename S to VP
Rename the logical destination register to a new
VP
Update GMT set VP to new mapping and reset V
Save previous VP in reorder buffer to be able to
roll back

24
Functional Description

Instruction Queue Fields
Operation code
Destination VP
Source operands
Ready-bits for source operands when ready
Source operand contains a physical register
number
Reorder Buffer Entry
Destination logical register
Completion bit
VP mapping of last instruction with same logical
destination

25
Functional Description

When source operands are ready, instruction is
issued
When instruction completes
new physical register R is allocated for result
PMT is updated to reflect new mapping
VP number of destination is broadcast to all
entries in instruction queue with physical
register identifier
GMT is updated entry corresponding to logical
destination is checked for match with the VP and
if so, the physical register nr is copied to the
P register field and the V flag is set
As a result a new instruction using same logical
register will find corresponding physical
register in GMT
Lastly, C flag of entry in reorder buffer is set