Lecture 6 Scoreboarding II - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Lecture 6 Scoreboarding II

Description:

2. Functional unit status Indicates the state of the functional unit (FU) ... High Performance without special compilers. Differences between IBM ... – PowerPoint PPT presentation

Number of Views:401
Avg rating:3.0/5.0
Slides: 29
Provided by: mantonm5
Category:

less

Transcript and Presenter's Notes

Title: Lecture 6 Scoreboarding II


1
Lecture 6Scoreboarding II
CSCE 513 Computer Architecture
  • Topics
  • Dynamic Scheduling
  • Scoreboarding
  • Tomasulo Overview
  • Readings Appendix Sections A.6-A.8

September 8, 2009
2
Overview
  • Last Time
  • Test 2
  • Apologies for the length
  • Dynamic Scheduling
  • MIPS R4000 Pipeline/ Scoreboard
  • Problem 5.4 a-c
  • New
  • Stalls in Diagrams revisited
  • Test 2 Take Home questions
  • Dynamic Scheduling Review
  • Scoreboard scheduling of pipelines
  • References
  • Appendix A.8, Chapter 3

3
Scoreboard pipeline revisited
  • Issue Decode and check for structural hazards
  • Read operands wait until no data hazard, then
    read operands
  • All data hazards are handled by the scoreboard
    mechanism

4
Scoreboard Stages
  • Issue Instruction is issued when
  • No structural hazard for a functional unit
  • No WAW with an instruction in execution
  • Read Instruction reads operands when they become
    available (RAW)
  • Operand available if no instruction is going to
    write it or if it is currently being written
    (forward through register file)
  • EX normal execution, except functional units
    notify the Scoreboard on completion
  • Write Instruction writes when all previous
    instructions have read the earlier value
  • The scoreboard is updated when an instruction
    proceeds to a new stage.

5
Data structures in the scoreboard
  • 1. Instruction status keeps track of in which
    stage an instruction is.
  • 2. Functional unit status Indicates the state of
    the functional unit (FU). 9 fields for each FU
  • Busy Indicates whether the unit is busy or not
  • Op Operation to perform in the unit (e.g. add or
    sub)
  • Fi Destination register name
  • Fj, Fk Source register names
  • Qj, Qk Name of functional unit producing regs
    Fj, Fk
  • Rj, Rk Flags indicating when Fj and Fk are ready

3. Register result status Indicates which
functional unit will write to each register, if
any.
6
Simulator Applet
  • .

http//www.ecs.umass.edu/ece/koren/architecture/sc
oreboard/
7
Scoreboard example pages A-72 to A-76
8
Detailed Scoreboard Pipeline Control Figure A-76
9
Tong (Univ Maryland) Example
  • Alternative example to one in text.
  • Assumptions
  • 2 integer units
  • 2 FP add units
  • 1 FP multiply unit, and
  • 1 FP divide unit
  • Recall the Scoreboard data structures
  • instruction status
  • the functional unit status
  • the register status

10
Tong (Univ Maryland) Example
  • Loop
  • LD F2,0(R1)
  • ADDD F6,F2,F4
  • MULTD F8,F6,F0
  • SUBI R1,R1,8
  • BNEZ R1,Loop
  • Now we can save some testing and branching for
    large loops by unrolling.
  • Unrolling also helps in scheduling.

Unrolled code by Compiler Loop LD F2,0(R1)
ADDD F6,F2,F4 MULTD F8,F6,F0 LD F10,-8(R1)
ADDD F12,F10,F4 MULTD F14,F12,F0 SUBI
R1,R1,16 BNEZ R1,Loop
11
Tongs Example Big Picture
Modifications needed ? Of course! Issue, Read,
2nd FP adder, 2nd Integer unit
12
Tongs Example continued
13
Functional unit status
Busy Indicates whether the unit is busy or
not Op Operation to perform in the unit (e.g.
add or sub) Fi Destination register name Fj, Fk
Source register names Qj, Qk Name of functional
unit producing regs Fj, Fk Rj, Rk Flags
indicating when Fj and Fk are ready
14
Register result status
  • Register result status Indicates which
    functional unit will write to each register, if
    any

15
Tongs Example at completion of first add
16
Tongs Example at completion of first add
17
Tongs Example Notes on status
  • What was the state just prior to completion?
  • Rj No,
  • During next cycle Add2 will finish up
  • Int units are not occupied, but have to wait
    since the next un-issued instruction is the 2nd
    multiply
  • Counting Cycles (just focusing on Execute stage)
  • Load1 - ? Assume 4
  • Add (4)
  • Loop unrolling
  • No Loop-carried dependency
  • Therefore both iteration could proceed in
    parallel if there were a second multiply unit.

18
Review of Data Hazards
  • Name Dependencies when two instructions use the
    same register or memory location
  • Antidependence between instruction i and
    instruction j occurs when instruction j write a
    register or memory location that instruction i
    reads
  • Output dependence when instruction i and j write
    the same register or memory location
  • Summary of Hazards
  • RAW (read after write)
  • WAW (write after write)
  • WAR (write after read)

19
Scoreboard complications
  • Out-of-order completion gt WAR, WAW hazards
  • WAR instruction is stalled in the WB stage
    until a previous instruction has read the operand
  • WAW instruction is stalled in the Issue stage
    until a previous instruction has written its
    result

Scoreboard keeps track of dependencies and state
of operations
20
Review Summary
  • Key idea of Scoreboard Allow instructions behind
    stall to proceed (Decode gt Issue instr read
    operands)
  • Enables out-of-order execution gt out-of-order
    completion
  • ID stage checked both for structural data
    dependencies

21
http//www.ecs.umass.edu/ece/koren/architecture/To
masulo/AppletTomasulo.html
22
Web References
  • http//www.ecs.umass.edu/ece/koren/architecture/To
    masulo/AppletTomasulo.html
  • http//www.icsa.informatics.ed.ac.uk/cgi-bin/hase/
    tomasulo.pl?tom-t.html,contents.html,menu.html
  • http//www.cs.wisc.edu/markhill/conference-talk.h
    tml

23
Another Dynamic Algorithm Tomasulo Algorithm
  • For IBM 360/91 about 3 years after CDC 6600
    (1966)
  • Goal High Performance without special compilers
  • Differences between IBM 360 CDC 6600 ISA
  • IBM has only 2 register specifiers/instr vs. 3 in
    CDC 6600
  • IBM has 4 FP registers vs. 8 in CDC 6600
  • Why Study? lead to Alpha 21264, HP 8000, MIPS
    10000, Pentium II, PowerPC 604,

24
Tomasulo Algorithm vs.Scoreboard
  • Control buffers distributed with Function Units
    (FU) vs. centralized in scoreboard
  • FU buffers called reservation stations have
    pending operands
  • Registers in instructions replaced by values or
    pointers to reservation stations(RS) called
    register renaming
  • avoids WAR, WAW hazards
  • More reservation stations than registers, so can
    do optimizations compilers cant
  • Results to FU from RS, not through registers,
    over Common Data Bus that broadcasts results to
    all FUs
  • Load and Stores treated as FUs with RSs as well
  • Integer instructions can go past branches,
    allowing FP ops beyond basic block in FP queue

25
Tomasulo Organization
FPRegisters
FP Op Queue
LoadBuffer
StoreBuffer
CommonDataBus
FP AddRes.Station
FP MulRes.Station
26
Reservation Station Components
  • OpOperation to perform in the unit (e.g., or
    )
  • Vj, VkValue of Source operands
  • Store buffers has V field, result to be stored
  • Qj, QkReservation stations producing source
    registers (value to be written)
  • Note No ready flags as in Scoreboard Qj,Qk0 gt
    ready
  • Store buffers only have Qi for RS producing
    result
  • BusyIndicates reservation station or FU is
    busy
  • Register result statusIndicates which
    functional unit will write each register, if one
    exists. Blank when no pending instructions that
    will write that register.

27
Three Stages of Tomasulo Algorithm
  • 1. Issueget instruction from FP Op Queue
  • If reservation station free (no structural
    hazard), control issues instr sends operands
    (renames registers).
  • 2. Executionoperate on operands (EX)
  • When both operands ready then execute if not
    ready, watch Common Data Bus for result
  • 3. Write resultfinish execution (WB)
  • Write on Common Data Bus to all awaiting units
    mark reservation station available
  • Normal data bus data destination (go
    to bus)
  • Common data bus data source (come from bus)
  • 64 bits of data 4 bits of Functional Unit
    source address
  • Write if matches expected Functional Unit
    (produces result)
  • Does the broadcast

28
Tomasulo Example Cycle 0
Write a Comment
User Comments (0)
About PowerShow.com