Register Renaming - PowerPoint PPT Presentation

About This Presentation
Title:

Register Renaming

Description:

How to compile for Post-RISC machines. Dynamic Register Renaming through ... Save previous VP in reorder buffer to be able to roll back. Functional Description ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 32
Provided by: francoga4
Category:

less

Transcript and Presenter's Notes

Title: Register Renaming


1
Register RenamingValue Prediction
2
Overview
  • Need for Post-RISC
  • Register Renaming vs. Allocation Strategies
  • How to compile for Post-RISC machines
  • Dynamic Register Renaming throughVirtual-Physical
    Registers

3
Software Outlives Hardware
  • How to make old software run faster?
  • Faster CPU clock and memory hierarchy
  • Adapt CPUs to actual software (profiling/tuning)
  • More instructions per cycle
  • Todays software will run on tomorrows CPUs
  • Need to keep software interface stable
  • More functional units and registers

4
Compile-time vs. Run-time
  • Little is known about software at compile-time
  • Space/time trade-offs
  • Memory speeds cannot keep up with CPU speeds
  • When to apply optimizations that increase
    code-size

5
Solutions
  • New scalable architecture (IA-64)
  • Decouple physical/virtual registers using
    register windows
  • More explicit parallelism allows for more
    function units
  • Explicit speculative instructions
  • Post-RISC architecture
  • Remove limits in super-scalar implementation of
    existing architectures
  • Extract even more parallelism out of existing
    software

6
Anti- and Output Dependencies
  • Also called read-after-write (RAW) hazards
  • An instruction may use a result produced by the
    previous instruction
  • Both instructions may not execute simultaneously
    in multiple pipelines.
  • The second instruction must typically be stalled.

7
Structural Dependencies
  • Stalls results in less than optimal performance
  • We may have single-issue cycles, which process
    only a single instruction.
  • Worse, we may have zero-issue cycles, which
    initiate no new instructions.
  • Data dependencies can also limit performance for
    a scalar machine
  • Two cycle memory load/write
  • Intra-instruction dependencies

8
Scheduling
  • Scheduling can remove stalls
  • Intra-instruction dependencies cannot be removed
    by scheduling (CISC)

9
Need for Post-RISC
  • Super-scalar has diminishing returns in CPI
    (Clocks Per Instruction)
  • 2-Way ? 1.6 - 1.8 (85)
  • 4-Way ? 2.6 (65)
  • 8-Way ? ???
  • More parallelism needed
  • Look beyond set of 4 instructions

10
Post-RISC characteristics
  • Out-of-order execution
  • (Existed 20 years ago on IBM and CDC)
  • Innovative for single-chip
  • Branch history bits
  • Precise interrupts
  • Fetch/Flow Prediction
  • More caching
  • Instruction cache becomes CPU scratch space
  • Register renaming
  • First in IBM 360/91 FPU

11
Specint92 Trends
  • Specint92 numbers are increasing
  • DEC has historically been the champ
  • Specint92/Clock rates
  • DEC low (21164_at_300 gt 1.14 10/95)
  • IBM strong early (580H_at_55 gt 1.76 9/93)
  • HP (PA-8000_at_133 2.7 10/95)

12
The Post-RISC Architecture
13
Post-RISC CPUs
  • Traditional RISC
  • DEC Alpha 21164
  • Sun UltraSPARC-1
  • (partially) Post-RISC
  • PowerPC 604
  • MIPS R10000
  • HP PA-8000
  • Intel Pentium Pro
  • DEC Alpha 21264
  • HAL SPARC64

14
Automatic Register Renaming
  • Every R-write allocates new R
  • The register name A is an alias for the last R
    allocated by a write to A
  • An instruction reading and writing an register
    allocates a new R too

15
Advantages over More ISA Registers
  • Smaller instructions
  • Allow same software to run on range of
    implementations
  • Compare the same program running on Pentium or
    AMD Ath
  • Less state to save
  • Faster function calls
  • Faster context switches
  • Life-times can be optimized

16
Renaming Implementation
  • Rename Storage Locations
  • Reorder Buffer
  • Physical Register File
  • Similarities
  • Allocate at decode
  • Release at commit

17
Renaming using Reorder buffer
  • Results are kept in reorder buffer
  • Source operands are read either from
  • the register file, or
  • a reorder buffer entry
  • Not-yet-ready results are forwarded to
    instruction queue
  • Used by Intel Pentium III, PowerPC 604, SPARC64

18
Renaming on Pentium III
  • All registers can be renamed (generic,
    floating-point, status)
  • Renaming uses a set of 40 reorder buffers
  • FPU control/status cannot be renamed
  • Max 2 renamings per instruction

19
Register Allocation Example
  • Minimal number of named registers
  • Scheduling is limited
  • Strictly serial execution

rA Mem1 rA rA rA Mem2 rA rA
Mem3 rA rA 1 Mem4 rA
Mem2 Mem1 Mem1 Mem4 Mem3 1
20
Renaming using Physical Register File
  • Register file contains more registers than
    defined in ISA (logical registers)
  • Map logical register to physical registers during
    decode
  • Operands are always read from logical file
  • Used by MIPS R10000 and DEC 21264

21
Virtual-Physical Registers
  • Motivation better utilization of physical
    registers
  • Important in presence of long latency
    instructions
  • Conventional scheme wastes register for each
  • Decoded instruction that has not finished
    execution
  • Committed instruction whose result is dead
  • Can be eliminated by maintaining reference counter

Example load f2,0(r6) fdiv f2,f2,f10 fmul f2,
f2,f12 fadd f2,f2,1
22
Virtual-Physical Register Renaming
  • General Map Table
  • Indexed by logical register L
  • VP register last virtual-physical register that
    L has been mapped to
  • P register Last physical register that L and VP
    have been mapped to
  • V-bit indicates whether P is valid
  • Physical Map Table
  • Has entry for each VP
  • Contains last physical register that VP has been
    mapped to

23
Functional Description
  • For each logical source register S do a GMT
    lookup
  • If V-bit is set, rename S to P
  • Otherwise, rename S to VP
  • Rename the logical destination register to a new
    VP
  • Update GMT set VP to new mapping and reset V
  • Save previous VP in reorder buffer to be able to
    roll back

24
Functional Description
  • Instruction Queue Fields
  • Operation code
  • Destination VP
  • Source operands
  • Ready-bits for source operands when ready
    Source operand contains a physical register
    number
  • Reorder Buffer Entry
  • Destination logical register
  • Completion bit
  • VP mapping of last instruction with same logical
    destination

25
Functional Description
  • When source operands are ready, instruction is
    issued
  • When instruction completes
  • new physical register R is allocated for result
  • PMT is updated to reflect new mapping
  • VP number of destination is broadcast to all
    entries in instruction queue with physical
    register identifier
  • GMT is updated entry corresponding to logical
    destination is checked for match with the VP and
    if so, the physical register nr is copied to the
    P register field and the V flag is set
  • As a result a new instruction using same logical
    register will find corresponding physical
    register in GMT
  • Lastly, C flag of entry in reorder buffer is set

26
Register Allocation Example
  • Uses more named registers
  • Scheduling more effective
  • 2-way super-scalar execution

rA Mem1 rB Mem3 rA rA rA rB rB
1 Mem2 rA Mem4 rB
Mem2 Mem1 Mem1 Mem4 Mem3 1
27
Effect of Register Renaming
  • Schedule uses 4 hardware registers
  • 2-way super-scalar execution

rA1 Mem1 rB1 Mem3 rA2 rA1 rA1 rB2
rB1 1 Mem2 rA2 Mem4 rB2
28
Effect of Register Renaming
  • Schedule uses 4 hardware registers
  • Can hide memory-write latency
  • Still no full use of multiple pipelines

rA1 Mem1 rA2 rA1 rA1 Mem2 rA2 rA3
Mem3 rA4 rA3 1 Mem4 rA4
29
Renaming and O-O-O execution
  • Instructions wait for
  • Availability of execution unit
  • Input dependencies
  • Older instructions have priority
  • Load instructions have priority
  • Instructions do NOT wait for
  • Program order
  • Branch resolution
  • Output dependencies
  • (use rename register)

30
Renaming and O-O-O execution
  • Schedule uses 4 hardware registers
  • Can hide memory-write latency
  • Bad schedule uses both pipelines
  • Only one register name used

rA1 Mem1 rA2 rA1 rA1 Mem2 rA2
rA3 Mem3 rA4 rA3 1 Mem4 rA4
31
Renaming aware scheduling?
  • Use Register Renaming in allocator
  • minimal number of named registers
  • maximal number of register instances
  • Do not do scheduling that CPU can do
  • over-scheduling can be worse than no scheduling
    at all
Write a Comment
User Comments (0)
About PowerShow.com