CS 6461: Computer Architecture Instruction Level Parallelism - PowerPoint PPT Presentation

About This Presentation
Title:

CS 6461: Computer Architecture Instruction Level Parallelism

Description:

CS 6461: Computer Architecture Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey and Patterson – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 42
Provided by: BA746
Category:

less

Transcript and Presenter's Notes

Title: CS 6461: Computer Architecture Instruction Level Parallelism


1
CS 6461 Computer ArchitectureInstruction Level
Parallelism
  • Instructor M. Lancaster
  • Corresponding to Hennessey and Patterson
  • Fifth Edition
  • Section 3.1

2
Instruction Level Parallelism
  • Almost all processors since 1985 use pipelining
    to overlap the execution of instructions and
    improve performance. This potential overlap
    among instructions is called instruction level
    parallelism
  • First introduced in the IBM Stretch (Model 7030)
    in about 1959
  • Later the CDC 6600 incorporated pipelining and
    the use of multiple functional units
  • The Intel i486 was the first pipelined
    implementation of the IA32 architecture

3
Instruction Level Parallelism
  • Instruction level parallel processing is the
    concurrent processing of multiple instructions
  • Difficult to achieve within a basic code block
  • Typical MIPS programs have a dynamic branch
    frequency of between 15 and 25
  • That is, between three and six instructions
    execute between a pair of branches, and data
    hazards usually exist within these instructions
    as they are likely to be dependent
  • Given basic code block size in number of
    instructions, ILP must be exploited across
    multiple blocks

4
Instruction Level Parallelism
  • The current trend is toward very deep pipelines,
    increasing from a depth of lt 10 to gt 20.
  • With more stages, each stage can be smaller, more
    simple and provide less gate delay, therefore
    very high clock rates are possible.

5
Loop Level ParallelismExploitation among
Iterations of a Loop
  • Loop adding two 1000 element arrays
  • Codefor (i1 ilt 1000 ii1) xi xi
    yi
  • If we look at the generated code, within a loop
    there may be little opportunity for overlap of
    instructions, but each iteration of the loop can
    overlap with any other iteration

6
Concepts and ChallengesApproaches to Exploiting
ILP
  • Two major approaches
  • Dynamic these approaches depend upon the
    hardware to locate the parallelism
  • Static fixed solutions generated by the
    compiler, and thus bound at compile time
  • These approaches are not totally disjoint, some
    requiring both
  • Limitations are imposed by data and control
    hazards

7
Features Limiting Exploitation of Parallelism
  • Program features
  • Instruction sequences
  • Processor features
  • Pipeline stages and their functions
  • Interrelationships
  • How do program properties limit performance?
    Under what circumstances?

8
Approaches to Exploiting ILPDynamic Approach
  • Hardware intensive approach
  • Dominate desktop and server markets
  • Pentium III, 4, Athlon
  • MIPS R10000/12000
  • Sun UltraSPARC III
  • PowerPC 603, G3, G4
  • Alpha 21264

9
Approaches to Exploiting ILPStatic Approach
  • Compiler intensive approach
  • Embedded market and IA-64

10
Terminology and Ideas
  • Cycles Per Instruction
  • Pipeline CPI Ideal Pipeline CPI Structural
    Stalls Data Hazard Stalls Control Stalls
  • Ideal Pipeline CPI is the max that we can achieve
    in a given architecture. Stalls and/or their
    impacts must be minimized.
  • During 1980s CPI 1 was a target objective for
    single chip microprocessors
  • 1990s objective reduce CPI below 1
  • Scalar processors are pipelined processors that
    are designed to fetch and issue at most one
    instruction every machine cycle
  • Superscalar processors are those that are
    designed to fetch and issue multiple instructions
    every machine cycle

11
Approaches to Exploiting ILP That We Will
Explore
Technique Reduces
Forwarding and bypassing Potential data hazards and stalls
Delayed branches and simple branch scheduling Control hazard stalls
Basic dynamic scheduling (scoreboarding) Data hazard stalls from true dependences
Dynamic scheduling with renaming Data hazard stalls and stalls from antidependences and output dependences
Branch prediction Control stalls
Issuing multiple instructions per cycle Ideal CPI
Hardware Speculation Data hazard and control hazard stalls
Dynamic memory disambiguation Data hazard stalls with memory
Loop unrolling Control hazard stalls
Basic computer pipeline scheduling Data hazard stalls
Compiler dependence analysis, software pipelining, trace scheduling Ideal CPI, data hazard stalls
Hardware support for Compiler speculation Ideal CPI, data, control stalls.
12
Approaches to Exploiting ILPReview of
Terminology
  • Instruction issue
  • The process of letting an instruction move from
    the instruction decode phase (ID) into the
    instruction execution (EX) phase
  • Interlock (pipeline interlock, instruction
    interlock) is the resolution of pipeline hazards
    via hardware. Pipeline interlock hardware must
    detect all pipeline hazards and ensure that all
    dependencies are satisfied

13
Data Dependencies and Hazards
  • How much parallelism exists in a program and how
    it can be exploited
  • If two instructions are parallel, they can
    execute simultaneously in a pipeline without
    causing any stalls (assuming no structural
    hazards exist)
  • There are no dependencies in parallel
    instructions
  • If two instructions are not parallel and must be
    executed in order, they may often be partially
    overlapped.

14
Pipeline Hazards
  • Hazards make it necessary to stall the pipeline.
  • Some instructions in the pipeline are allowed to
    proceed while others are delayed
  • For this example pipeline approach, when an
    instruction is stalled, all instructions further
    back in the pipeline are also stalled
  • No new instructions are fetched during the stall
  • Instructions issued earlier in the pipeline must
    continue

15
Data Dependencies and Hazards
  • Data Dependences an instruction j is data
    dependent on instruction i if either of the
    following holds
  • Instruction i produces a result that may be used
    by instruction j
  • Instruction j is data dependent on instruction k,
    and instruction k is data dependent on
    instruction i that is, one instruction is
    dependent on another if there exists a chain of
    dependencies of the first type between two
    instructions.

16
Data Dependencies and Hazards
  • Data Dependences
  • Code ExampleLOOP L.D F0,0(R1) F0array
    element ADD.D F4,F0,F2 add scalar in
    F2 S.D F4,0(R1) store result DADDUI R1,R1,
    -8 decrement pointer 8 BNE R1,R2,LOOP
  • The above dependencies are in floating point data
    for the first two arrows, and integer data in the
    last two instructions

17
Data Dependencies and Hazards
  • Data Dependences
  • Arrows show where order of instructions must be
    preserved
  • If two instructions are dependent, they cannot be
    simultaneously executed or be completely
    overlapped

18
Data Dependencies and Hazards
  • Dependencies are properties of programs
  • Whether a given dependence results in an actual
    hazard being detected and whether that hazard
    actually causes a stall are properties of the
    pipeline organization

19
Data Dependencies and Hazards
  • Hazard created
  • Code Example DADDUI R1,R1,-8 decrement
    pointer 8 BNE R1,R2,LOOP
  • When the branch test is moved from EX to ID stage
  • If test stayed in ID, dependence would not cause
    a stall (Branch delay would still be two cycles
    however)

20
Data Dependencies and Hazards
Branch destination and test known at end of third
cycle of execution
Branch destination and test known at end of
second cycle of execution
21
Data Dependencies and Hazards
  • Presence of dependence indicates a potential for
    a hazard, but the actual hazard and the length of
    any stall is a property of the pipeline.
  • Data dependence
  • Indicates possibility of stall
  • Determines the order in which results are
    calculated
  • Sets an upper bound on how much parallelism can
    be possibly exploited.
  • We will focus on overcoming these limitation

22
Overcoming Dependences
  • Two Ways
  • Maintain dependence but avoid the hazard
  • Schedule the code dynamically
  • Transform the code

23
Difficulty in Detecting Dependences
  • A data value may flow between instructions either
    through registers or through memory locations
  • Therefore, detection is not always
    straightforward
  • For instructions referring to memory, the
    register dependences are easy to detect
  • Suppose however we have R4 20 and R6 100 and
    we use 100(R4) and 20(R6)
  • Suppose we have incremented R4 in an instruction
    between two references (say 20(R4) ) that look
    identical

24
Name Dependences Two Categories
  • Two instructions use the same register or memory
    location, called a name, but there is actually no
    flow of data between the instructions associated
    with that name. In cases where i precedes j.
  • 1. An antidependence between instructions i and j
    occurs when instruction j writes a register or
    memory location that instruction i reads. The
    original ordering must be preserved
  • 2. An output dependence occurs when instruction i
    and instruction j write the same register or
    memory location, the order again must be preserved

25
Name Dependences Two Categories
  • 1. An antidependence
  • i DADD R1,R2.-8
  • j DADD R2,R5,0
  • 2. An output dependence
  • i DADD R1,R2.-8
  • j DADD R1,R4,10

26
Name Dependences
  • Not true data dependencies, and therefore we
    could execute them simultaneously or reorder them
    if the name (register or memory location) used in
    the instructions is changed so that the
    instructions do not conflict
  • Register renaming is easier
  • i DADD R1,R2,-8
  • j DADD R2,R4,10 i DADD R1,R2,-8
  • j DADD R5,R4,10

27
Data Hazards
  • A hazard is created whenever there is a
    dependence between instructions, and they are
    close enough that the overlap caused by
    pipelining or other reordering of instructions
    would change the order of access to the operand
    involved in the dependence.
  • We must preserve program order the order the
    instructions would execute if executed in a
    non-pipelined system
  • However, program order only need be maintained
    where it affects the outcome of the program

28
Data Hazards Three Types
  • Two instructions i and j, with i occurring before
    j in program order, possible hazards are
  • RAW (read after write) j tries to read a source
    before i writes it, so j incorrectly gets the old
    value
  • The most common type
  • Program order must be preserved
  • In a simple common static pipeline a load
    instruction followed by an integer ALU
    instruction that directly uses the load result
    will lead to a RAW hazard

29
Data Hazards Three Types
  • Second type
  • WAW (write after write) j tries to write an
    operand before it is written by i, with the
    writes ending up in the wrong order, leaving
    value written by i
  • Output dependence
  • Present in pipelines that write in more than one
    pipe or allow an instruction to proceed even when
    a previous instruction is stalled
  • In the classic example, WB stage is used for
    write back, this class of hazards avoided.
  • If reordering of instructions is allowed this is
    a possible hazard
  • Suppose an integer instruction writes to a
    register after a floating point instruction does

30
Data Hazards Three Types
  • Third type
  • WAR (write after read) j tries to write an
    operand before it is read by i, so i incorrectly
    gets the new value.
  • Antidependence
  • Cannot occur in most static pipelines note that
    reads are early in ID and writes late in WB

31
Control Dependencies
  • Determines ordering of instruction, i with
    respect to a branch instruction so that the
    instruction i is executed in the correct program
    order and only when it should be.
  • Example
  • if p1 S1if p2 S2

32
Control Dependencies
  • Example
  • if p1 S1if p2 S2
  • S1 is control dependent on p1 and S2 is control
    dependent on P2 but not on P1

33
Control Dependencies
  • Two constraints imposed
  • An instruction that is control dependent on a
    branch cannot be moved before the branch so that
    its execution is no longer controlled by the
    branch. For example we cannot take a statement
    from the then portion of an if statement and move
    it before the if statement.
  • An instruction that is not control dependent on a
    branch cannot be moved after the branch so that
    the execution is controlled by the branch. For
    example, we cannot take a statement before the if
    and move it into the then portion

if p1 S1if p2 S2
34
Control Dependencies
  • Two properties of our simple pipeline preserve
    control dependencies
  • Instructions execute in program order
  • Detection of control or branch hazards ensures
    that an instruction that is control dependent on
    a branch is not executed until the branch
    direction is known
  • We can introduce instructions that should not
    have been executed (violating control
    dependences) if we can do so without affecting
    the correctness of the program

35
Control Dependencies are Really
  • Not the issue Really the issue is the
    preservation of
  • Exception behavior
  • Data flow

36
Preserving Exception Behavior
  • Preserving exception behavior means that any
    changes in the ordering of instruction execution
    must not change how exceptions are raised in the
    program
  • We may relax this rule and say that reordering of
    instruction execution must not cause any new
    exceptions
  • DADDU R2,R3,R4 BEQZ R2, L1 LW R1,0(R2)
    Could cause illegal mem acc L1
  • In the above, if we do not maintain the data
    dependence of R2, we may change the program. If
    we ignore the control dependency and move the
    load instruction before the branch, the load
    instruction may cause a memory protection
    exception
  • There is no visible data dependence that prevents
    this interchange, only control dependence

37
Preserving Exception Behavior
  • To allow reordering of these instructions (which
    as we said preserves data dependence) we would
    like to just ignore the exception.

38
Preserving Data Flow
  • This means preserving the actual flow of data
    values between instructions that produce results
    and those that consume them.
  • Branches make data flow dynamic, since they allow
    the source of data for a given instruction to
    come from many points

39
Preserving Data Flow
  • Example
  • DADDU R1,R2,R3 BEQZ R4,L DSUBU R1,R5,R6L
    OR R7,R1,R8 depends on branch taken
  • Cannot move DSUBU above branch
  • By preserving the control dependence of the OR on
    the branch we prevent an illegal change to the
    data flow

40
Preserving Data Flow
  • Sometimes violating the control dependence cannot
    affect either the exception behavior or the data
    flow
  • DADDU R1,R2,R3 BEQZ R1,skip DSUBU R4,R5,R6
    DADDU R5,R4,R9skip OR R7,R1,R8 suppose R4
    not used after here
  • If R4 unused after this point, changing the value
    of R4 just before the branch would not affect
    data flow
  • If R4 were dead and DSUBU could not generate an
    exception we could move the DSUBU instruction
    before the branch
  • This is called speculation since compiler is
    betting on branch outcome

41
Control Dependence Again
  • Control dependence in the simple pipeline is
    preserved by implementing control and hazard
    detection that can cause control stalls
  • Can be eliminated by a variety of hardware
    techniques
  • Delayed branches can reduce stalls arising from
    control hazards, but requires that the compiler
    preserve data flow
Write a Comment
User Comments (0)
About PowerShow.com