Mescal Architecture Project: Forwarding - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Mescal Architecture Project: Forwarding

Description:

... can be used with these to extract forwarding information, which can be passed to ... So we need to be able to extract latencies from structure and export it to ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 26
Provided by: YUC80
Category:

less

Transcript and Presenter's Notes

Title: Mescal Architecture Project: Forwarding


1
Mescal Architecture Project Forwarding Precise
Exceptions
  • EE244 Fall 2000
  • Sam Williams

2
Outline
  • Motivation
  • Problem statement
  • Prior work
  • Investigative approach
  • Results
  • Summary
  • Conclusions
  • Future Work

3
Motivation Forwarding
  • Pipelining can improve performance by a factor of
    the pipeline depth.
  • However, in a statically scheduled processor,
    latency is the pipeline depth, thus instructions
    must be scheduled at compile time, and in the
    worst case, nops must be inserted.
  • Forwarding eliminates this by allowing
    instructions proceed as soon as the operand
    values are ready.
  • So performance is improved, at the cost of
    additional area, power and cycle time.

4
Motivation Precise Exceptions
  • For a precise exception, we want all previous
    instructions to have committed, and no future
    instructions.
  • This allows for restart of program flow after and
    exception. e.g. after a TLB refill exception, or
    an external interrupt.
  • Furthermore it allows for debugging by allowing
    for precise stops.

5
Motivation Design Methodology
  • Chip complexity is growing exponentially
  • Shorter design cycle in order to reach a target
    market
  • DSM effects
  • On chip heterogeneity
  • Can the first two be dealt with under the MESCAL
    environment?

6
Problem statement
  • Can Forwarding be easily implemented on a
    component basis in the MESCAL framework.
  • Can this information be exported to the compiler
    to allow better scheduling.
  • What about dynamic hazard detection, and variable
    latency functional units
  • Can precise exceptions be implemented

7
Prior Work
  • In the forwarding part, it will be a
    generalization of forwarding and reservation
    stations (aka Tomasulos Algorithm) with virtual
    register renaming
  • Precise exceptions are handled (implicitly)
    through a deep pipeline, or through a reorder
    buffer.

8
Architectural Components
  • For min/max cycle time, its easy to distinguish
    combinational logic from registers
  • Then we just remove the registers and examine
    each block of combinational logic.
  • For pipeline latencies, we must determine when
    values are produced, and when they are used.
  • Removing all the functional units, and replacing
    them with theyre know latencies allows for
    construction of weighted DCG.
  • In this case to facilitate the difference between
    registers used within a functional unit, and
    those used as pipeline registers or reservation
    stations, it is useful to move to a higher
    abstraction Architectural components (e.g.
    pipeline components, and functional units,
    memories, etc)
  • Additionally, it is important to determine when
    instructions are issued, and when they commit
  • Furthermore, busses used for forwarding (backward
    arcs), must be distinguished from carrying
    (forward) arcs which are used in a deep pipeline.

9
Investigative Approach
  • Implement the reservation station component, and
    a tag array (physical to virtual mapping) for a
    deep pipeline
  • Combine these components to create a framework
    for precise interrupts
  • Define the algorithm that can be used with these
    to extract forwarding information, which can be
    passed to the compiler.

10
Forwarding and Hazards Reservation Stations
  • Configurable reservation station with
  • - N operand value/tag pairs
  • - M output value/tag/valid trios
  • - K forwarding paths in the form of CDBs, sent
    to the N
  • renamed (operand) registers

11
Forwarding and Hazards Renamed Register Table
  • Configurable Table with
  • - N source operand addresses e.g. the address
    to the RF
  • - M destination operand addresses/tag pairs -
    e.g. the address
  • to the RF, and the tag (renamed register)
  • - K forwarding paths in the form of CDBs, which
    are translated
  • to RF addresses, data, and enables.

12
Director/Conductor Interaction with forwarding
components
  • Since for some realistic elements will have
    variable latency, scheduling can not be
    determined at compile time.
  • Additionally, these components can be used to
    construct a dynamically scheduled processor.
  • Reservation Stations must interpret stall /
    occupied signals, and signal director/conductor
  • Interrupts, which cant be predicted statically,
    and exceptions can interrupt control flow

13
Precise Exceptions
  • In an inorder pipelined machine this can be
    easily realized since results are only
    committed at the end of the pipeline, determine
    there if it caused an exception, update the PC,
    and flush the pipeline.
  • For OO machines, this is more complicated. The
    standard technique is to implement a reorder
    buffer and take the exception when the
    instruction commits.
  • For interrupts just take them when an instruction
    commits.

14
Example of MIPS with full forwarding components
  • For an Arithmetic operation, the ALU needs both
    operands, and only one for LD/SD.
  • The Memory Stage needs both operands
  • Allows for a SD to proceed passed the E stage
    even if the store data isnt ready yet.
  • RS_D,RS_M have no CDB inputs
  • RS_R,RS_E each have two
  • For precise exceptions, we should send a tag
    containing both the virtual and physical
    register.
  • Immediate operands are placed in RS_D

15
Example of inorder VLIW with forwarding components
  • 3 sub instructions per packed instruction
  • All each sub instruction is synchronized with the
    other instructions in the packed instruction.
  • 1 stage removed forcing computation of EA by
    previous instruction
  • Any of the 3 outputs can be forwarded to any of
    the 6 inputs in RS_R

16
Example of out of order VLIW with forwarding
components
  • 3 sub instructions per packed instruction
  • Any of the 3 outputs can be forwarded to either
    of the 2 operands of RS_R
  • Out of order execution and commit.
  • Need multi-line / multi-issue reservations
    stations (version I implemented was single line,
    single issue, not too difficult to do, could also
    be implemented with existing components)
  • Need a reorder buffer for inorder commit and
    thereby precise exceptions.

17
Example of out of order VLIW with forwarding
components
  • reorder buffer can be constructed out of 3
    operand, no output reservation stations.
  • On issue, the first entry is written with
    op/pc/qs
  • Only the final entry requires all operands. This
    allows instructions to proceed to the last stage
    without outputs being available. They then stall
    in the final entry for outputs. This helps to
    reduce latency on exceptions and promote out of
    order execution.
  • All outputs must be forwarded to all other
    reservation stations since one of their operands
    could have finished execution, but would be
    waiting to commit in the reorder buffer.
  • All execution units must forward data to every
    entry in the reorder buffer.
  • This could be redesigned into a circular queue
    reorder buffer by changing which entry
    instructions are issued to and committed from.

18
Deep Pipeline Motivating Example
  • In some deep pipelines, we will either need to
    stall or have statically scheduled code
  • Regardless of the forwarding we put into place
    there must be 3 stalls or fills if the output of
    a MUL is used by the ADD
  • So we need to be able to extract latencies from
    structure and export it to either the Compiler
    (for code generation) or the Conductor (so that
    it knows when it should stall) e.g. generating
    control logic from the data path.
  • Solution is to build a graph, however each edge
    is noted either as forwarding (green), or
    carrying (red)
  • Forwarding paths travel backwards in the
    pipeline, thus our shortest path algorithm must
    pass thru at least one forwarding edge to be
    valid. e.g. ADD used by MEM, there is a straight
    path of length 3, but its not valid, since its
    the same instruction at that time.

19
Algorithm Shortest Path in DCG
  • stalls depth(s) dist(s,d) - depth(d) - 1
  • Where dist(s,d) is the shortest valid path in the
    graph from the source s to the destination d,
    and depth(n) is the depth in the pipeline
    reservation station n is.
  • A path is valid if it goes thru one forwarding
    path.
  • We can define a forwarding path from a write back
    stage to the register file access stage
  • Its first called with dist(s,d,0,false)
  • dist(node n, node d, int cur, int valid)
  • if((nd)(valid1))
  • return(cur)
  • min pipeline_depth
  • if(validgt1)return(min)
  • if(curgtmin)return(min)
  • foreach e (_at_n-gtedges)
  • tempdist(e-gtnode, d, cur1, valid
    (e-gttypeFORWARDING))
  • if(templtmin)mintemp
  • return(min)

20
Example
  • Dist(s,d) Depth(s)-Depth(d)-1
  • shorizontal, dvertical
  • stalls(s,d) (Dist(s,d)) (Depth(s)-Depth(d)-1)

21
Results
  • If designs are based on architectural components,
    it is easy to determine pipeline structure
  • The algorithm in its naïve form is slow, but
    pipeline depth will be typically less than 10,
    and only has to be done once (table lookup)
  • The design is based on a combination of
    Tomasulos algorithm, and virtual register
    renaming.
  • I implemented a configurable (number operands,
    outputs, forwarding paths only) reservations
    station
  • I also implemented a Tag array which keeps track
    of the physical to virtual register mapping
  • I did not simulate any designs

22
Summary
  • If designs are based on architectural components,
    it is easy to determine pipeline structure, and
    thus export it to the compiler
  • The main architectural component I implemented
    was a generic reservation station which can also
    be viewed as a pipeline register with forwarding.
  • By combining these reservation stations, deep
    pipelines and reorder buffers can be realized.
  • I also implemented a tag array, which maps
    virtual to physical registers

23
Conclusion and Future Work
  • Design with architectural components in the
    MESCAL framework allows the architect to quickly
    explore, simulate, and analyze various
    architects, and their affect on performance and
    code density.
  • This could also be used by a genetic algorithm
    just as branch components are to generate
    accurate branch predictors.
  • The reservation station needs to be adapted to be
    multi-line, feeding multiple functional units to
    allow for true OOE super scalar processor design.
  • There is no current tools to extract or
    synthesize these components to extract timing or
    area numbers.
  • A future design methodology might involve writing
    code generators which could take the
    configuration of each component and generate a
    VERILOG model, which could be synthesized, or
    even directly to a netlist.

24
Next Step ?
  • Current stdcell Libraries include layout,
    schematic, spice paramaters, boolean equivalent,
    etc
  • A higher level library could include a simulation
    model, and a VERILOG code generator
  • Thus the ASIC design cycle would start with an
    architectural model, which could be used to
    generate VERILOG code, and from there through a
    standard ASIC flow.
  • Can this be used to functionally verify the
    correctness of a micro-architecture to an ISA

25
References
  • Tomasulo, R. M. 1967. An efficient algorithm
    for exploiting multiple arithmetic units, IBM J
    Research and Development 111 (January)
  • Patterson, D. A. and J. L. Hennessy 1996.
    Computer Architecture A Quantitative Approach,
    Morgan Kaufmann, San Francisco
Write a Comment
User Comments (0)
About PowerShow.com