Improving IJVM Performance - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Improving IJVM Performance

Description:

Hyperthreading (or Simultaneous Multi-Threading) Something between superscalar and multicore ... Cluster. CC-NUMA. Hyperthreading and Multicore. Multicore Chip ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 21
Provided by: acade124
Category:

less

Transcript and Presenter's Notes

Title: Improving IJVM Performance


1
Improving IJVM Performance
  • Rev up the clock speed
  • Redesign circuits to reduce delay, e.g.
  • Use ripple adder decrease worst-case ALU
    propagation
  • Reorder some microinstructions, e.g.
  • Do something useful during memory delays
  • Grab work from next machine instruction
  • Add an incrementer to PC register
  • Eliminate decoder for B bus
  • Note tradeoff among speed, cost, and space

2
More Improvements
  • Add a bus for A side of ALU
  • Do fetch separately pre-fetch in batches
  • This is really a simple two-stage pipeline
  • Find parallelism (the wall is serial tasks)

3
(No Transcript)
4
Still More Improvements
  • Pipeline around ALU (3 stages)
  • Note pipeline enables faster clock too
  • Add lots of registers (a RISC-y thing to do)
  • Add level(s) of cache
  • P4 has a trace cache for micro-ops
  • Add smarter decoding, e.g.
  • Go superscalar
  • Dispatch/route as you decode

5
(No Transcript)
6
(No Transcript)
7
Pipelines and Control Signals
8
(No Transcript)
9
Pentium II Dispatch/Execute Unit
10
UltraSPARC II Microarchitecture
11
UltraSPARC II Pipeline
12
Even More Improvements
  • Add branch prediction
  • Track history (works like cache)
  • Allow two misses before switching prediction
  • Out-of-order execution
  • Register renaming enables multiple versions
  • Scoreboards track which registers instructions
    use
  • What about interrupts?
  • Pentium 4 reorders execution only, not
    issue/retire
  • Speculative execution (like EPIC)

13
Out of Order Scoreboard
14
Pentium 4 Path
15
Possibly Even More Improvements
  • IJVM is really kinda CISC-y
  • RISC might skip most of this micro-sequencing
  • Note how P4 does it both ways (ROM is optional)
  • Symmetric MultiProcessor (SMP)
  • Shared RAM, but separate caches (coherency
    issues)
  • Alternatives clusters/blades, NUMA/CC-NUMA
  • Hyperthreading (or Simultaneous Multi-Threading)
  • Something between superscalar and multicore
  • Dispatch instructions from collection of threads
  • Vector processing (SIMD single instr, multiple
    data)
  • Multicore share outer cache, separate inner
    cache(s)

16
SMP
17
Cluster
18
CC-NUMA
19
Hyperthreading and Multicore
20
Multicore Chip Organization
Write a Comment
User Comments (0)
About PowerShow.com