Improving IJVM Performance presentation

About This Presentation

Transcript and Presenter's Notes

Title: Improving IJVM Performance

1
Improving IJVM Performance

Rev up the clock speed
Redesign circuits to reduce delay, e.g.
Use ripple adder decrease worst-case ALU
propagation
Reorder some microinstructions, e.g.
Do something useful during memory delays
Grab work from next machine instruction
Add an incrementer to PC register
Eliminate decoder for B bus
Note tradeoff among speed, cost, and space

2
More Improvements

Add a bus for A side of ALU
Do fetch separately pre-fetch in batches
This is really a simple two-stage pipeline
Find parallelism (the wall is serial tasks)

3
(No Transcript)
4
Still More Improvements

Pipeline around ALU (3 stages)
Note pipeline enables faster clock too
Add lots of registers (a RISC-y thing to do)
Add level(s) of cache
P4 has a trace cache for micro-ops
Add smarter decoding, e.g.
Go superscalar
Dispatch/route as you decode

5
(No Transcript)
6
(No Transcript)
7
Pipelines and Control Signals
8
(No Transcript)
9
Pentium II Dispatch/Execute Unit
10
UltraSPARC II Microarchitecture
11
UltraSPARC II Pipeline
12
Even More Improvements

Add branch prediction
Track history (works like cache)
Allow two misses before switching prediction
Out-of-order execution
Register renaming enables multiple versions
Scoreboards track which registers instructions
use
What about interrupts?
Pentium 4 reorders execution only, not
issue/retire
Speculative execution (like EPIC)

13
Out of Order Scoreboard
14
Pentium 4 Path
15
Possibly Even More Improvements

IJVM is really kinda CISC-y
RISC might skip most of this micro-sequencing
Note how P4 does it both ways (ROM is optional)
Symmetric MultiProcessor (SMP)
Shared RAM, but separate caches (coherency
issues)
Alternatives clusters/blades, NUMA/CC-NUMA
Hyperthreading (or Simultaneous Multi-Threading)
Something between superscalar and multicore
Dispatch instructions from collection of threads
Vector processing (SIMD single instr, multiple
data)
Multicore share outer cache, separate inner
cache(s)

16
SMP
17
Cluster
18
CC-NUMA
19
Hyperthreading and Multicore
20
Multicore Chip Organization

Write a Comment

User Comments (0)

About PowerShow.com

Improving IJVM Performance PowerPoint PPT Presentation