KILOINSTRUCTION PROCESSORS - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

KILOINSTRUCTION PROCESSORS

Description:

Performance improvements of high-frequency micro-processors is seriously limited ... Offloads them to slow-lane instruction queue larger, slower, less complex ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 30
Provided by: C393
Category:

less

Transcript and Presenter's Notes

Title: KILOINSTRUCTION PROCESSORS


1
KILO-INSTRUCTION PROCESSORS
  • Arzucan Özgür
  • Department of Computer Engineering
  • Bogaziçi University

15.12.2005 Cmpe 511
2
Introduction
3
Memory Wall
60/yr.
1000
CPU
Moores Law
100
Processor-Memory Performance Gap(grows 50 /
year)
Performance
10
RAM 7/yr.
RAM
1
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1982
Time
  • Performance improvements of high-frequency
    micro-processors is seriously limited by main
    memory access latencies

4
Reducing Memory Latency
5
Cache memory hierarchies
  • Cache memory hierarchies
  • First level (L1) cache built into the processor
    core
  • Takes 1-3 processor clock cycles to access
  • If there is a miss in the L1 cache ? on-chip L2
    cache accessed in the order of 10 processor
    cycles
  • Accessing main memory takes at least in the order
    of 100 processor cycles
  • Prefetching data from memory to the cache
  • Prefetch addresses hard to predict

6
Out-of-order superscalar processors
7
Sequence of instructions containing data cashe
misses
8
Kilo-Instruction Processors
9
Definition
  • An out-of-order superscalar processor that
    supports thousands of in-flight instructions
  • Intelligent use of resources

10
(No Transcript)
11
(No Transcript)
12
Scalability
  • Thousands of In-flight Instructions and In-Order
    Commit make designs impractical
  • ROB Needs to maintain a copy of every in-flight
    instruction
  • IQs Instructions depending on long latency
    instructions remain in these queues for a long
    time
  • LSQs Instructions remain in the queue until
    commit
  • Registers A new physical register for each
    instruction producing a new value
  • We would like to get the IPC of thousands of
    instructions in-flight without drastically
    increasing resource requirements

13
Efficient Kilo-Instruction Processor Design
  • Multi-Checkpointing the ROB
  • Out-of-Order Commit
  • Early Release of Resources
  • Ephemeral Registers
  • Load Queues

14
Checkpointing
15
Checkpointing
  • ROB allows of the restoration of the correct
    state at any instruction (not necessary)
  • Checkpoint ? a snapshot of the processor state
    taken at a specific instruction of the program
    being executed (checkpoint processor state for a
    subset of instructions)
  • With this snapshot the processor can restore
    state to that point in case of an exception or
    misprediction

16
Design Decisions
  • How many in-flight checkpoints should be
    maintained by the processor?
  • large number of checkpoints reduce the penalty of
    the recovery process
  • large number of checkpoints increase the
    implementation cost
  • What kind of instructions should be checkpointed?
  • take a checkpoint at any instruction
  • some instructions are better candidates (exsome
    current processors take checkpoints at branch
    instructions in order to minimize the branch
    misprediction penalty)
  • How much information should be kept by each
    checkpoint?

17
Multicheckpointing
18
Selective Checkpointing
  • Replace ROB ?? Pseudo-ROB
  • Processor removes instructions that reach the
    pseudo-ROBs head at fixed rate
  • Processor state is recovarable for any
    instruction in the pseudo-ROB
  • Checkpoint taken when incomplete instruction
    leaves the pseudo-ROB

19
Instruction Queue Management
20
Bi-level Issue Queue
  • Processor detects instructions that will hold an
    issue queue for a long time
  • Removes this instructions from primary issue
    queue
  • Offloads them to slow-lane instruction queue ?
    larger, slower, less complex
  • Same principle applied to load-store queue

21
Physical Register File
22
Ephemeral Registers
  • A conventional superscalar processor assigns
    registers to architected registers when an
    instruction enters the issue queue
  • An instruction reserves a physical register for
    its entire flight time
  • A physical register not written a value until
    much later ? primary function is tracking data
    dependencies
  • Use virtual registers ? late register allocation
  • Release register if no other instruction that
    reads the data ? early release

23
Performance Evaluation
24
(No Transcript)
25
(No Transcript)
26
Kilo-Instruction Multiprocessors
27
Ideal Network
28
References
  • Adrian Cristal, Oliverio J. Santana, Francisco
    Cazorla, Marco Galluzzi, Tanausu Ramirez, Miquel
    Pericas, Mateo Valero. "Kilo-Instruction
    Processors Overcoming the Memory Wall," IEEE
    Micro, vol. 25,  no. 3,  pp. 48-57,  May/June, 
    2005.
  • A. Cristal, O. Santana, M. Valero, and J.F.
    Martínez. Toward kilo-instruction processors. In
    ACM Trans. on Architecture and Code Optimization,
    Vol. 1, No. 4, Dec. 2004
  • Marco Galluzzi, Valentin Puente, Adrián Cristal,
    Ramón Beivide, José-Ángel Gregorio, Mateo Valero,
    A first glance at Kilo-instruction based
    multiprocessors, Conf. Computing Frontiers 2004
    212-221

29
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com