Continuous Optimization

About This Presentation

Title:

Continuous Optimization

Description:

Brian Fahs Todd Rafacz Sanjay J. Patel Steven S. Lumetta. Advanced Computer Systems Group ... S. Onder and R. Gupta. G. S. Tyson and T. M. Austin. 19. Advanced ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 24

Provided by: brian345

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Continuous Optimization

1
Continuous Optimization

Brian Fahs Todd Rafacz Sanjay J. Patel
Steven S. Lumetta
Advanced Computer Systems Group
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign

2
Continuous Optimization

Concept
Optimize instructions in processor pipeline
Technique
Streaming table-based optimization hardware
Motivation
Reduce dataflow height
Pre-execute instructions
Catch branch mispredictions early

3
Outline

Continuous optimizer design
Performance characterization
Current work

4
Continuous Optimization
5
Symbolic Values

Expression format
Simple enough to implement
Optimize a large fraction of instructions

value (physical register ltlt scale) /- offset
6
Optimizer Organization
Computation Simplification
Memory Simplification
7
Computation Simplification
RAT CP/RA Table
CP/RA Optimizer Logic
add r3, 1 -gt r6
8
Computation Simplification Three Cases
Optimization not possible
Early execution
Dataflow height reduction
add r3, 1 -gt r6
add pr32, 1 -gt pr38
9
Memory Simplification
Produced during Computation Simplification
Data Address 0x12345
RLE/SF/ SSR Table
Unknown store address flushes table
RLE/SF/SSR Optimizer Logic
10
Optimizing Loads
Optimizing Stores
st r6 -gt 0x12345
11
Value Feedback
12
Implementation Issues

Processing dependent instructions

(default no)

Optimizer latency

(default 2 stages)

Execution feedback delay

(default 1 cycle)
pipe stages
pipeline stage
add r1, 1, r1
add r1, 1, r1
fetch
optimizer
execute
Xmit delay
13
Performance Evaluation

Experimental Setup
Alpha ISA
SPECint, SPECfp, and mediabench
Pentium 4 style pipeline
20 stages minimum for branch resolution
22 stages min. with continuous optimizer

14
Performance
Average speed up
15
Performance Factors

Dataflow height reduction
Early instruction execution
Early branch resolution
Removal of forwarded loads
Silent store removal
Early load address resolution
Feedback of execution results

16
Optimizer Performance
benchmark executed early recovered mispredicted branches load/store address generated loads forwarded silent stores removed
SPECint 32 10 73 13 3
SPECfp 27 29 78 17 3
mediabench 42 20 97 32 3
average 34 20 83 21 3
17
Performance Factors
No early load address resolution
No early branch resolution
No early execution
No feedback
18
Related Works