Title: Lecture 6 Scoreboarding II
1Lecture 6Scoreboarding II
CSCE 513 Computer Architecture
- Topics
- Dynamic Scheduling
- Scoreboarding
- Tomasulo Overview
- Readings Appendix Sections A.6-A.8
September 8, 2009
2Overview
- Last Time
- Test 2
- Apologies for the length
- Dynamic Scheduling
- MIPS R4000 Pipeline/ Scoreboard
- Problem 5.4 a-c
- New
- Stalls in Diagrams revisited
- Test 2 Take Home questions
- Dynamic Scheduling Review
- Scoreboard scheduling of pipelines
- References
- Appendix A.8, Chapter 3
3Scoreboard pipeline revisited
- Issue Decode and check for structural hazards
- Read operands wait until no data hazard, then
read operands - All data hazards are handled by the scoreboard
mechanism
4Scoreboard Stages
- Issue Instruction is issued when
- No structural hazard for a functional unit
- No WAW with an instruction in execution
- Read Instruction reads operands when they become
available (RAW) - Operand available if no instruction is going to
write it or if it is currently being written
(forward through register file) - EX normal execution, except functional units
notify the Scoreboard on completion - Write Instruction writes when all previous
instructions have read the earlier value - The scoreboard is updated when an instruction
proceeds to a new stage.
5Data structures in the scoreboard
- 1. Instruction status keeps track of in which
stage an instruction is.
- 2. Functional unit status Indicates the state of
the functional unit (FU). 9 fields for each FU - Busy Indicates whether the unit is busy or not
- Op Operation to perform in the unit (e.g. add or
sub) - Fi Destination register name
- Fj, Fk Source register names
- Qj, Qk Name of functional unit producing regs
Fj, Fk - Rj, Rk Flags indicating when Fj and Fk are ready
3. Register result status Indicates which
functional unit will write to each register, if
any.
6Simulator Applet
http//www.ecs.umass.edu/ece/koren/architecture/sc
oreboard/
7Scoreboard example pages A-72 to A-76
8Detailed Scoreboard Pipeline Control Figure A-76
9Tong (Univ Maryland) Example
- Alternative example to one in text.
- Assumptions
- 2 integer units
- 2 FP add units
- 1 FP multiply unit, and
- 1 FP divide unit
- Recall the Scoreboard data structures
- instruction status
- the functional unit status
- the register status
10Tong (Univ Maryland) Example
- Loop
- LD F2,0(R1)
- ADDD F6,F2,F4
- MULTD F8,F6,F0
- SUBI R1,R1,8
- BNEZ R1,Loop
- Now we can save some testing and branching for
large loops by unrolling. - Unrolling also helps in scheduling.
Unrolled code by Compiler Loop LD F2,0(R1)
ADDD F6,F2,F4 MULTD F8,F6,F0 LD F10,-8(R1)
ADDD F12,F10,F4 MULTD F14,F12,F0 SUBI
R1,R1,16 BNEZ R1,Loop
11Tongs Example Big Picture
Modifications needed ? Of course! Issue, Read,
2nd FP adder, 2nd Integer unit
12Tongs Example continued
13Functional unit status
Busy Indicates whether the unit is busy or
not Op Operation to perform in the unit (e.g.
add or sub) Fi Destination register name Fj, Fk
Source register names Qj, Qk Name of functional
unit producing regs Fj, Fk Rj, Rk Flags
indicating when Fj and Fk are ready
14Register result status
- Register result status Indicates which
functional unit will write to each register, if
any
15Tongs Example at completion of first add
16Tongs Example at completion of first add
17Tongs Example Notes on status
- What was the state just prior to completion?
- Rj No,
- During next cycle Add2 will finish up
- Int units are not occupied, but have to wait
since the next un-issued instruction is the 2nd
multiply - Counting Cycles (just focusing on Execute stage)
- Load1 - ? Assume 4
- Add (4)
- Loop unrolling
- No Loop-carried dependency
- Therefore both iteration could proceed in
parallel if there were a second multiply unit.
18Review of Data Hazards
- Name Dependencies when two instructions use the
same register or memory location - Antidependence between instruction i and
instruction j occurs when instruction j write a
register or memory location that instruction i
reads - Output dependence when instruction i and j write
the same register or memory location - Summary of Hazards
- RAW (read after write)
- WAW (write after write)
- WAR (write after read)
19Scoreboard complications
- Out-of-order completion gt WAR, WAW hazards
- WAR instruction is stalled in the WB stage
until a previous instruction has read the operand
- WAW instruction is stalled in the Issue stage
until a previous instruction has written its
result
Scoreboard keeps track of dependencies and state
of operations
20Review Summary
- Key idea of Scoreboard Allow instructions behind
stall to proceed (Decode gt Issue instr read
operands) - Enables out-of-order execution gt out-of-order
completion - ID stage checked both for structural data
dependencies
21http//www.ecs.umass.edu/ece/koren/architecture/To
masulo/AppletTomasulo.html
22Web References
- http//www.ecs.umass.edu/ece/koren/architecture/To
masulo/AppletTomasulo.html - http//www.icsa.informatics.ed.ac.uk/cgi-bin/hase/
tomasulo.pl?tom-t.html,contents.html,menu.html - http//www.cs.wisc.edu/markhill/conference-talk.h
tml
23Another Dynamic Algorithm Tomasulo Algorithm
- For IBM 360/91 about 3 years after CDC 6600
(1966) - Goal High Performance without special compilers
- Differences between IBM 360 CDC 6600 ISA
- IBM has only 2 register specifiers/instr vs. 3 in
CDC 6600 - IBM has 4 FP registers vs. 8 in CDC 6600
- Why Study? lead to Alpha 21264, HP 8000, MIPS
10000, Pentium II, PowerPC 604,
24Tomasulo Algorithm vs.Scoreboard
- Control buffers distributed with Function Units
(FU) vs. centralized in scoreboard - FU buffers called reservation stations have
pending operands - Registers in instructions replaced by values or
pointers to reservation stations(RS) called
register renaming - avoids WAR, WAW hazards
- More reservation stations than registers, so can
do optimizations compilers cant - Results to FU from RS, not through registers,
over Common Data Bus that broadcasts results to
all FUs - Load and Stores treated as FUs with RSs as well
- Integer instructions can go past branches,
allowing FP ops beyond basic block in FP queue
25Tomasulo Organization
FPRegisters
FP Op Queue
LoadBuffer
StoreBuffer
CommonDataBus
FP AddRes.Station
FP MulRes.Station
26Reservation Station Components
- OpOperation to perform in the unit (e.g., or
) - Vj, VkValue of Source operands
- Store buffers has V field, result to be stored
- Qj, QkReservation stations producing source
registers (value to be written) - Note No ready flags as in Scoreboard Qj,Qk0 gt
ready - Store buffers only have Qi for RS producing
result - BusyIndicates reservation station or FU is
busy - Register result statusIndicates which
functional unit will write each register, if one
exists. Blank when no pending instructions that
will write that register.
27Three Stages of Tomasulo Algorithm
- 1. Issueget instruction from FP Op Queue
- If reservation station free (no structural
hazard), control issues instr sends operands
(renames registers). - 2. Executionoperate on operands (EX)
- When both operands ready then execute if not
ready, watch Common Data Bus for result - 3. Write resultfinish execution (WB)
- Write on Common Data Bus to all awaiting units
mark reservation station available - Normal data bus data destination (go
to bus) - Common data bus data source (come from bus)
- 64 bits of data 4 bits of Functional Unit
source address - Write if matches expected Functional Unit
(produces result) - Does the broadcast
28Tomasulo Example Cycle 0