Title: CS 6461: Computer Architecture Instruction Level Parallelism
1CS 6461 Computer ArchitectureInstruction Level
Parallelism
- Instructor M. Lancaster
- Corresponding to Hennessey and Patterson
- Fifth Edition
- Section 3.1
2Instruction Level Parallelism
- Almost all processors since 1985 use pipelining
to overlap the execution of instructions and
improve performance. This potential overlap
among instructions is called instruction level
parallelism - First introduced in the IBM Stretch (Model 7030)
in about 1959 - Later the CDC 6600 incorporated pipelining and
the use of multiple functional units - The Intel i486 was the first pipelined
implementation of the IA32 architecture
3Instruction Level Parallelism
- Instruction level parallel processing is the
concurrent processing of multiple instructions - Difficult to achieve within a basic code block
- Typical MIPS programs have a dynamic branch
frequency of between 15 and 25 - That is, between three and six instructions
execute between a pair of branches, and data
hazards usually exist within these instructions
as they are likely to be dependent - Given basic code block size in number of
instructions, ILP must be exploited across
multiple blocks
4Instruction Level Parallelism
- The current trend is toward very deep pipelines,
increasing from a depth of lt 10 to gt 20. - With more stages, each stage can be smaller, more
simple and provide less gate delay, therefore
very high clock rates are possible.
5Loop Level ParallelismExploitation among
Iterations of a Loop
- Loop adding two 1000 element arrays
- Codefor (i1 ilt 1000 ii1) xi xi
yi - If we look at the generated code, within a loop
there may be little opportunity for overlap of
instructions, but each iteration of the loop can
overlap with any other iteration
6Concepts and ChallengesApproaches to Exploiting
ILP
- Two major approaches
- Dynamic these approaches depend upon the
hardware to locate the parallelism - Static fixed solutions generated by the
compiler, and thus bound at compile time - These approaches are not totally disjoint, some
requiring both - Limitations are imposed by data and control
hazards
7Features Limiting Exploitation of Parallelism
- Program features
- Instruction sequences
- Processor features
- Pipeline stages and their functions
- Interrelationships
- How do program properties limit performance?
Under what circumstances?
8Approaches to Exploiting ILPDynamic Approach
- Hardware intensive approach
- Dominate desktop and server markets
- Pentium III, 4, Athlon
- MIPS R10000/12000
- Sun UltraSPARC III
- PowerPC 603, G3, G4
- Alpha 21264
9Approaches to Exploiting ILPStatic Approach
- Compiler intensive approach
- Embedded market and IA-64
10Terminology and Ideas
- Cycles Per Instruction
- Pipeline CPI Ideal Pipeline CPI Structural
Stalls Data Hazard Stalls Control Stalls - Ideal Pipeline CPI is the max that we can achieve
in a given architecture. Stalls and/or their
impacts must be minimized. - During 1980s CPI 1 was a target objective for
single chip microprocessors - 1990s objective reduce CPI below 1
- Scalar processors are pipelined processors that
are designed to fetch and issue at most one
instruction every machine cycle - Superscalar processors are those that are
designed to fetch and issue multiple instructions
every machine cycle
11Approaches to Exploiting ILP That We Will
Explore
Technique Reduces
Forwarding and bypassing Potential data hazards and stalls
Delayed branches and simple branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences
Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and output dependences
Branch prediction Control stalls
Issuing multiple instructions per cycle Ideal CPI
Hardware Speculation Data hazard and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Loop unrolling Control hazard stalls
Basic computer pipeline scheduling Data hazard stalls
Compiler dependence analysis, software pipelining, trace scheduling Ideal CPI, data hazard stalls
Hardware support for Compiler speculation Ideal CPI, data, control stalls.
12Approaches to Exploiting ILPReview of
Terminology
- Instruction issue
- The process of letting an instruction move from
the instruction decode phase (ID) into the
instruction execution (EX) phase - Interlock (pipeline interlock, instruction
interlock) is the resolution of pipeline hazards
via hardware. Pipeline interlock hardware must
detect all pipeline hazards and ensure that all
dependencies are satisfied
13Data Dependencies and Hazards
- How much parallelism exists in a program and how
it can be exploited - If two instructions are parallel, they can
execute simultaneously in a pipeline without
causing any stalls (assuming no structural
hazards exist) - There are no dependencies in parallel
instructions - If two instructions are not parallel and must be
executed in order, they may often be partially
overlapped.
14Pipeline Hazards
- Hazards make it necessary to stall the pipeline.
- Some instructions in the pipeline are allowed to
proceed while others are delayed - For this example pipeline approach, when an
instruction is stalled, all instructions further
back in the pipeline are also stalled - No new instructions are fetched during the stall
- Instructions issued earlier in the pipeline must
continue
15Data Dependencies and Hazards
- Data Dependences an instruction j is data
dependent on instruction i if either of the
following holds - Instruction i produces a result that may be used
by instruction j - Instruction j is data dependent on instruction k,
and instruction k is data dependent on
instruction i that is, one instruction is
dependent on another if there exists a chain of
dependencies of the first type between two
instructions.
16Data Dependencies and Hazards
- Data Dependences
- Code ExampleLOOP L.D F0,0(R1) F0array
element ADD.D F4,F0,F2 add scalar in
F2 S.D F4,0(R1) store result DADDUI R1,R1,
-8 decrement pointer 8 BNE R1,R2,LOOP - The above dependencies are in floating point data
for the first two arrows, and integer data in the
last two instructions
17Data Dependencies and Hazards
- Data Dependences
- Arrows show where order of instructions must be
preserved - If two instructions are dependent, they cannot be
simultaneously executed or be completely
overlapped
18Data Dependencies and Hazards
- Dependencies are properties of programs
- Whether a given dependence results in an actual
hazard being detected and whether that hazard
actually causes a stall are properties of the
pipeline organization
19Data Dependencies and Hazards
- Hazard created
- Code Example DADDUI R1,R1,-8 decrement
pointer 8 BNE R1,R2,LOOP - When the branch test is moved from EX to ID stage
- If test stayed in ID, dependence would not cause
a stall (Branch delay would still be two cycles
however)
20Data Dependencies and Hazards
Branch destination and test known at end of third
cycle of execution
Branch destination and test known at end of
second cycle of execution
21Data Dependencies and Hazards
- Presence of dependence indicates a potential for
a hazard, but the actual hazard and the length of
any stall is a property of the pipeline. - Data dependence
- Indicates possibility of stall
- Determines the order in which results are
calculated - Sets an upper bound on how much parallelism can
be possibly exploited. - We will focus on overcoming these limitation
22Overcoming Dependences
- Two Ways
- Maintain dependence but avoid the hazard
- Schedule the code dynamically
- Transform the code
23Difficulty in Detecting Dependences
- A data value may flow between instructions either
through registers or through memory locations - Therefore, detection is not always
straightforward - For instructions referring to memory, the
register dependences are easy to detect - Suppose however we have R4 20 and R6 100 and
we use 100(R4) and 20(R6) - Suppose we have incremented R4 in an instruction
between two references (say 20(R4) ) that look
identical
24Name Dependences Two Categories
- Two instructions use the same register or memory
location, called a name, but there is actually no
flow of data between the instructions associated
with that name. In cases where i precedes j. - 1. An antidependence between instructions i and j
occurs when instruction j writes a register or
memory location that instruction i reads. The
original ordering must be preserved - 2. An output dependence occurs when instruction i
and instruction j write the same register or
memory location, the order again must be preserved
25Name Dependences Two Categories
- 1. An antidependence
- i DADD R1,R2.-8
- j DADD R2,R5,0
- 2. An output dependence
- i DADD R1,R2.-8
- j DADD R1,R4,10
26Name Dependences
- Not true data dependencies, and therefore we
could execute them simultaneously or reorder them
if the name (register or memory location) used in
the instructions is changed so that the
instructions do not conflict - Register renaming is easier
- i DADD R1,R2,-8
- j DADD R2,R4,10 i DADD R1,R2,-8
- j DADD R5,R4,10
27Data Hazards
- A hazard is created whenever there is a
dependence between instructions, and they are
close enough that the overlap caused by
pipelining or other reordering of instructions
would change the order of access to the operand
involved in the dependence. - We must preserve program order the order the
instructions would execute if executed in a
non-pipelined system - However, program order only need be maintained
where it affects the outcome of the program
28Data Hazards Three Types
- Two instructions i and j, with i occurring before
j in program order, possible hazards are - RAW (read after write) j tries to read a source
before i writes it, so j incorrectly gets the old
value - The most common type
- Program order must be preserved
- In a simple common static pipeline a load
instruction followed by an integer ALU
instruction that directly uses the load result
will lead to a RAW hazard
29Data Hazards Three Types
- Second type
- WAW (write after write) j tries to write an
operand before it is written by i, with the
writes ending up in the wrong order, leaving
value written by i - Output dependence
- Present in pipelines that write in more than one
pipe or allow an instruction to proceed even when
a previous instruction is stalled - In the classic example, WB stage is used for
write back, this class of hazards avoided. - If reordering of instructions is allowed this is
a possible hazard - Suppose an integer instruction writes to a
register after a floating point instruction does
30Data Hazards Three Types
- Third type
- WAR (write after read) j tries to write an
operand before it is read by i, so i incorrectly
gets the new value. - Antidependence
- Cannot occur in most static pipelines note that
reads are early in ID and writes late in WB
31Control Dependencies
- Determines ordering of instruction, i with
respect to a branch instruction so that the
instruction i is executed in the correct program
order and only when it should be. - Example
- if p1 S1if p2 S2
32Control Dependencies
- Example
- if p1 S1if p2 S2
- S1 is control dependent on p1 and S2 is control
dependent on P2 but not on P1
33Control Dependencies
- Two constraints imposed
- An instruction that is control dependent on a
branch cannot be moved before the branch so that
its execution is no longer controlled by the
branch. For example we cannot take a statement
from the then portion of an if statement and move
it before the if statement. - An instruction that is not control dependent on a
branch cannot be moved after the branch so that
the execution is controlled by the branch. For
example, we cannot take a statement before the if
and move it into the then portion
if p1 S1if p2 S2
34Control Dependencies
- Two properties of our simple pipeline preserve
control dependencies - Instructions execute in program order
- Detection of control or branch hazards ensures
that an instruction that is control dependent on
a branch is not executed until the branch
direction is known - We can introduce instructions that should not
have been executed (violating control
dependences) if we can do so without affecting
the correctness of the program
35Control Dependencies are Really
- Not the issue Really the issue is the
preservation of - Exception behavior
- Data flow
36Preserving Exception Behavior
- Preserving exception behavior means that any
changes in the ordering of instruction execution
must not change how exceptions are raised in the
program - We may relax this rule and say that reordering of
instruction execution must not cause any new
exceptions - DADDU R2,R3,R4 BEQZ R2, L1 LW R1,0(R2)
Could cause illegal mem acc L1 - In the above, if we do not maintain the data
dependence of R2, we may change the program. If
we ignore the control dependency and move the
load instruction before the branch, the load
instruction may cause a memory protection
exception - There is no visible data dependence that prevents
this interchange, only control dependence
37Preserving Exception Behavior
- To allow reordering of these instructions (which
as we said preserves data dependence) we would
like to just ignore the exception.
38Preserving Data Flow
- This means preserving the actual flow of data
values between instructions that produce results
and those that consume them. - Branches make data flow dynamic, since they allow
the source of data for a given instruction to
come from many points
39Preserving Data Flow
- Example
- DADDU R1,R2,R3 BEQZ R4,L DSUBU R1,R5,R6L
OR R7,R1,R8 depends on branch taken - Cannot move DSUBU above branch
- By preserving the control dependence of the OR on
the branch we prevent an illegal change to the
data flow
40Preserving Data Flow
- Sometimes violating the control dependence cannot
affect either the exception behavior or the data
flow - DADDU R1,R2,R3 BEQZ R1,skip DSUBU R4,R5,R6
DADDU R5,R4,R9skip OR R7,R1,R8 suppose R4
not used after here - If R4 unused after this point, changing the value
of R4 just before the branch would not affect
data flow - If R4 were dead and DSUBU could not generate an
exception we could move the DSUBU instruction
before the branch - This is called speculation since compiler is
betting on branch outcome
41Control Dependence Again
- Control dependence in the simple pipeline is
preserved by implementing control and hazard
detection that can cause control stalls - Can be eliminated by a variety of hardware
techniques - Delayed branches can reduce stalls arising from
control hazards, but requires that the compiler
preserve data flow