Application of Instruction Analysis/Synthesis Tools to x86 - PowerPoint PPT Presentation

About This Presentation
Title:

Application of Instruction Analysis/Synthesis Tools to x86

Description:

Application of Instruction Analysis/Synthesis Tools to x86 s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information Engineering – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 19
Provided by: IngJ151
Category:

less

Transcript and Presenter's Notes

Title: Application of Instruction Analysis/Synthesis Tools to x86


1
Application of Instruction Analysis/Synthesis
Tools to x86s Functional Unit Allocation
  • Ing-Jer Huang and Ping-Huei Xie
  • Institute of Computer Information Engineering
  • National Sun Yat-sen University
  • Kaohsiung, Taiwan 80441
  • R. O. C.
  • ijhuang_at_cie.nsysu.edu.tw

2
Superscalar Model under Investigation
  • Decoupled superscalar architecture
  • register renaming
  • branch prediction
  • Assumptions
  • no cache miss
  • fast instruction fetcher and decoder
  • 100 branch prediction correct
  • load/store unit 2 cyclesothers 1 cycle
  • large RS and ROB

3
The Problem
  • Q How many functional units are needed in an x86
    compatible superscalar core?
  • A The distribution of functional unit usage in
    typical x86 programs

4
How to Obtain FU Distribution?
  • Simulation-based approaches
  • Shinatani, 1995, Davidson, 1995, Hara et
    al., 1996, etc.
  • Running on different CPU platforms
  • Slow, but can explore many configurations
  • Monitoring-based approaches
  • Adams et al., 1989, Bhandarkar et al., 1997,
    Huang, 1997, etc.
  • Directly running on the same CPU platform
  • Fast, but work for only the configuration of the
    underlying CPU platform

5
A Fast Performance/Cost Approximation Environment
6
ASIA Automatic Synthesis of Instruction Set
Architedcture
  • GOAL analyzes and synthesizes application-specifi
    c instruction set for pipelined uni-processors.
  • APPROACH a micro-operation scheduling engine
    based on a simulated annealing algorithm
  • ? The superscalar core is an application-specific
    RISC core for x86 emulation

7
ASIA-II Extensions for Superscalar Architecture
  • Register renaming
  • Temporary registers are used on the fly to
    resolve anti and data dependencies.
  • Execution window
  • Instructions are dispatched sequentially.
  • Branch prediction
  • Effective sizes of basic blocks are enlarged.

8
Register Renaming
  • In ASIA-II ignore output, anti dependencies
    during scheduling

9
Realistic Patterns in the Execution Window
  • Balanced distribution 0bjective function
    includes both time steps and H/W counts
  • Window effect MOPs are displaced with a limited
    distance long distance is possible with many
    iterations of displacement .as long as
    performance is improved.

10
Basic Block Expansion (Eblocks) Due to Branch
Prediction
11
A Small Example from Word97
12
Extended Basic Blocks
13
Scheduled Eblocks
14
A Small Example - FU Usage
15
Description of Benchmark
16
Micro-operation Level Parallelism (MSP)
17
Functional Unit Usage
  • Notation
  • A - Integer unit
  • M - Memory unit
  • B - Branch unit
  • F - Floating unit
  • Others is the sum of that frequent less than 1.0

18
Accumulated Coverage of Functional Unit Allocation
(NSC 98)
(IA-64)
(AMD K6)
(Pentium Pro)
(Base Machine)
19
Conclusions
  • Synthesis/analysis tools have been used to
    observe the functional unit usage and MLP in
    superscalar core.
  • Speedup over simulation is over 600 times.
  • FUTURE WORK investigate various
    microarchitecture features
  • register renaming vs. branch prediction
  • functional unit optimization
Write a Comment
User Comments (0)
About PowerShow.com