CompilerControlled DualPath Branch Execution - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

CompilerControlled DualPath Branch Execution

Description:

(Friedrich-Schiller-Universit t Jena) Theo Ungerer (Universit t Karlsruhe) ... few branch units, few load/store units) condition codes stored in general ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 15
Provided by: eberhardz
Category:

less

Transcript and Presenter's Notes

Title: CompilerControlled DualPath Branch Execution


1
Compiler-ControlledDual-Path Branch Execution
  • Eberhard Zehendner
  • Andreas Unger
  • (Friedrich-Schiller-Universität Jena)
  • Theo Ungerer
  • (Universität Karlsruhe)
  • please mail your suggestions to zehendner_at_acm.org

2
Branch Instructions ...
  • inhibit scheduling (control dependence)
  • misprediction penalty (single-path speculation)
  • resource conflicts (branch unit saturation)

3
Processor Architecture
  • Base architecture
  • wide-issue superscalar (many ALUs, few branch
    units, few load/store units)
  • condition codes stored in general registers (as
    for instance in the Alpha Architecture)
  • Thread-handling extensions
  • instruction set extensions FORK, SYNCcc
  • thread-handling unit
  • instruction fetch from several threads
  • register set shared between threads

4
Thread-handling Instructions
  • FORK LL, Rx
  • during instruction fetch
  • creates a new thread, beginning at label LL
  • immediately starts fetching from the new thread
    (parallel to fetching from the current thread)
  • during instruction execution
  • passes back a thread descriptor in register Rx
  • one-cycle operation
  • SYNCcc Rx, Ry, Rz
  • during instruction execution
  • tests for condition cc on register Rx(cc may be
    one of eq, lt, le)
  • immediately terminates the thread pointed to by
    descriptor in register Rz if condition is
    fulfilled
  • immediately terminates the thread pointed to by
    descriptor in register Ry otherwise

5
Simultaneous Speculation Scheduling (S3)
  • Step 1 replace branch with fork/sync
  • Rz contains descriptor of running thread
  • Bcc Rx, LL FORK LL, Ry
  • instruction_1 SYNCcc Rx, Ry, Rz
  • ..... instruction_1
  • .....
  • LL LL
  • instruction_2 SYNCcc Rx, Ry, Rz
  • ..... instruction_2
  • .....
  • Step 2 schedule code in both threads
  • scheduling constraints
  • static register renaming

6
S3 - Creating a New Thread
A
Ry dead Rz thread
Bcc Rx, LL
LL C
B
A
FORK LL2, Ry
SYNCcc Rx, Ry, Rz
LL2 SYNCcc Rx, Ry, Rz
mv Ry ? Rz
B
LL C
7
S3 - Some Guidelines
  • unpredictable branches should be the primary
    targets of the method
  • replacing a branch instruction by a triple of
    FORK and SYNC instructions may already remove the
    misprediction penalty
  • critical paths can be further optimised by moving
    instructions to the speculative sections (between
    FORK and SYNC)
  • large speculative sections tend to waste fetch,
    issue, and execute resources
  • late-resolving predicates used in small
    speculative sections may block the retirement
    process

8
Example
  • code section from innermost loop of function
    compress() in SPECint95 benchmark

9
Example (branch taken)
10
Example (branch not taken)
11
S3 - Speculative Code Motion
A
FORK LL2, Ry
LL2 C
B
SYNCcc Rx, Ry, Rz
LL3 E
SYNCcc Rx, Ry, Rz
p
D
  • Rx, Ry, Rz are not targets of p
  • all targets of p are dead at LL3
  • no target of p is source or target in C

12
S3 - Code Duplication ?
A
p
FORK LL2, Ry
LL2 C
SYNCcc Rx, Ry, Rz
LL3 E
  • Rx, Ry, Rz are not targets of p
  • all targets of p are dead at LL3
  • no target of p is source or target in C

13
Comparison
  • Branch prediction
  • fails for unpredictable branches
  • small instruction window for scheduling
  • Hardware eager execution
  • small instruction window for scheduling
  • Pure software single-path speculation
  • fails for statically unpredictable branches
  • Pure software dual-path speculation
  • speculative instructions are placed in front of
    the branch and thus all have to be executed
  • misprediction penalty remains the same
  • Conditional move instructions
  • speculative instructions are placed in front of
    the c-move and thus all have to be executed
  • Predication
  • speculative instructions are merged to form a
    single instruction stream and thus all have to be
    fetched (and possible issued or executed)
  • diverging paths may re-introduce misprediction

14
Advantages of S3
  • fewer restrictions for scheduling
  • fits into known techniques for global scheduling
  • no misprediction penalties
  • both branch paths are predicted simultaneously
  • saving fetch, issue, and execute resources
  • fastest speculation path terminates speculation
  • branch unit is relieved
  • one branch instruction less per replacement
  • moderate hardware overhead
  • small extension to usual instruction set
  • thread-control unit works similar to branch unit
  • can make use of modern fetching techniques
Write a Comment
User Comments (0)
About PowerShow.com