The Intel Pentium Processor - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

The Intel Pentium Processor

Description:

... cycle, two pre-fetch buffers read instructions to be executed. ... If a branch is not found in the branch target buffer, then it predicted that it won't jump. ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 13
Provided by: Bobo68
Category:

less

Transcript and Presenter's Notes

Title: The Intel Pentium Processor


1
The Intel PentiumProcessor
  • Bogdan Ilisie
  • Rika Kanai

Pentium
Pentium Pro
Pentium II
Pentium III
2
The first Intel Pentium
  • Introduced to market on March 22, 1993 with a CPU
    clock cycle of 66 Mhz
  • With its coming, it hosted many innovations, the
    most notable being

Superscalar architecture
Dynamic Branch Prediction
Pipelined Integer Unit
Pipelined Floating-Point Unit
These features made the newly introduced chip a
very popular choice for desktop, although it was
later found that the processor had some notorious
implementation errors.
3
The Pentium CPU (MMX)
4
Pipelined Integer Unit
As it can be seen from the previous diagram, the
Integer unit has two pipelines(U and V),while the
Floating Point Unit (FPU) has one pipeline.
The Pentium pipelined Integer Unit supports 5
stages 1) Pre-fetch 2) Decode 3) Address
generate 4) EX Execute - ALU and Cache Access
5) WB Writeback
Although different later processors like the MMX
tampered with the 5 execution steps(by adding
intermediate LIFO structures to hold bulks of
instructions), the steps remain the core
foundation of the pipelining.
5
Pipelined Integer Unit
1) In the Pre-fetch cycle, two pre-fetch buffers
read instructions to be executed. Instructions
can be fetched from the U or V pipeline. The U
pipeline contains more complex instructions.
2) In the Decode cycle, two decoders, decode the
instructions and try to pair them together so
they can run in parallel, since the Pentium
features a Superscalar architecture.
Even though the Pentium processor features a
Superscalar architecture, in order for two
instructions to run concurrently, like in the
diagram below, they need to satisfy some rules.
Essentially, the instructions have to be
independent otherwise they cannot be paired
together.
3) In the second Decode stage, or the address
generate stage, the addresses of memory operands
are calculated. After these calculations, the EX
stage of the pipeline is ready to execute.
A Floating Point instruction cannot be paired
with an Integer instruction.
6
Pipelined Integer Unit (Conclusion)
4) In the Execution cycle, the ALU is reached.
5) In the Write Back cycle, information is
written back to the registers.
For two instructions to be paired together in the
Decode stage, they have to lack
dependencies.The two paired instructions would
also have to be basic, in the sense that they
containno displacements or immediate
addressing.As it can be deduced, pipelines will
sometimes execute an instruction at the
time,despite the Superscalar ability.
If two instructions are executing concurrently in
the pipeline (given they satisfy the proper
conditions, and are independent) and one of them
stalls as a result of hazard control, the other
one will also stall.
7
Branch Prediction
Other than the Superscalar ability of the Pentium
processor, the branch prediction mechanism is a
much-debated improvement.
Predicting the behaviors of branches can have a
very strong impact on the performance of a
machine. Since a wrong prediction would result in
a flush of the pipes and wasted cycles.
The branch prediction mechanism is done through a
branch target buffer. The branch target buffer
contains the information about all branches.
The prediction of whether a jump will occur or
no, is based on the branchs previous behavior.
There are four possible states that depict a
branchs disposition to jump Stage 0 Very
unlikely a jump will occurStage 1 Unlikely a
jump will occurStage 2 Likely a jump will
occurStage 3 Very likely a jump will occur
8
Branch Prediction
When a branch has its address in the branch
target buffer, its behavior is tracked.
This diagram portrays the four stages associated
branch prediction. If a branch doesnt jump two
times in a row, it will go down to State 0. Once
in Stage 0, the algorithm wont predict another
another jump unless the branch will jump for two
consecutive jumps (so it will go from State 0 to
State 2) Once in Stage 3, the algorithm wont
predict another nojump unless the branch is not
taken for two consecutive times.
9
Branch Prediction
It is actually believed that Pentiums algorithm
for branch prediction is incorrect. As it can be
seen in the diagram to the right, State 0 will
jump directly to State 3, instead of following
the usual path which would include State 1, and
State 2.
This abnormality might be attributed to the way
in which the branch target buffer operates
  • If a branch is not found in the branch target
    buffer, then it predicted that it wont
    jump.- A branch wont get an actual entry in the
    branch target buffer, until the first time it
    jumps, and when it does, it goes straight into
    State 3.- Because the branch wont get an entry
    into the branch target buffer until the first
    time it jumps, this will cause an alteration into
    the actual state diagram, as it can be clearly
    seen.

More information about this problem can be found
at http//x86.ddj.com/articles/branch/branchpredic
tion.htm
10
Branch Prediction (in later Pentium Models)
The Intel Pentium branch prediction algorithm is
indeed better than a 50 guess, but it has
limitations.
In a need to increase the accuracy of branch
predictions, the processors following the Pentium
adopted a different branch prediction algorithm.
Some loops have repetitive patterns and they need
to be recognized. With a two bit binary counter,
it is impossible to attain any complexity.
Later generation processors, such as the Pentium
MMX, Pentium Pro, Pentium II, use another
mechanism for branch prediction.
A 4 bit register is used to record the previous
behavior of the branch. If the 4 bit register
would be 0001, it would mean that the branch only
jumped the last time out of 4.
A 4 bit register would not be of much use without
any additional logic. In addition to the 4 bit
register, there are 16, 2-bit counters like the
ones that were previously shown.
11
Branch Prediction (in later Pentium Models)
A 4 bit register that records the behavior of the
branch along with 16 2-bit counters, the
mechanism is able to give more accurate branching
predictions.
Since the register has 4 bits, it has 16 possible
values, so the current value of the 4 bit
register can always be associated with one of the
16 bit counters, like it is shown in the diagram
to the right. Each value in the 4 bit register,
represents a trend of that branch. For each
trend, we must be able to predict the next value.
Since each register value will be pointing to a
different 2-bit counter, the state of the 2-bit
counter will most likely return the correct
prediction for that particular register pattern.
Therefore, by combining a 4 bit register that
records past trends, with 16 individually updated
2-bit counters, we end up with a much stronger
mechanism for prediction, which is currently used
in Pentium MMX, Pentium II, and others.
12
Newer Generation Chips
The next move up from Pentium was Pentium MMX.
The Pentium MMX, includes new instructions,
registers, and data types which are aimed at
maximizing the speed of multimedia computations.
Since multimedia work requires massive data
manipulation, SIMD instructions were added to the
MMX set. SIMD instructions work on multiple data
values at once, in order to maximize the amount
of work done by each instruction.
The improved multimedia support of the MMX, along
with lower power consumption, larger caches, and
new branch prediction mechanisms, brought about
the new generations of Pentiums (II III)
Write a Comment
User Comments (0)
About PowerShow.com