Title: CS 2200 Lecture 05a Datapaths Part 1
1CS 2200 Lecture 05aDatapaths Part 1
- (Lectures based on the work of Jay Brockman,
Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
Ken MacKenzie, Richard Murphy, and Michael
Niemier)
2Five classic components (of an architecture)
(Remember this???)
3Lets take a closer look...
Processor
Control
Datapath
4Review Digital Logic
- Combinational logic gates, ROMs
- (how its implemented)
- tri-state buffer 0/1 or Z (unconnected)
- Sequential logic edge-triggered flip-flops
- (a.k.a. memory), stores state
- all state stored in edge-triggered flip-flops
- single clock exactly one clock goes to every
flip-flop - Finite State Machines (FSMs)
- Moore Mealy forms
- state-transition diagram
- state-transition table
A combination of combinational logic
and sequential logic (used to build and
control real and useful things)
5Digital Logic reading?
- Patterson Hennessy, Appendix B
- nice quick read
- my old-CS3760 class notes (see my web page)
- your ECE 2030 book/notes
6Today
- recipes for computation
- combinational
- sequential single-bus datapath control
- single-bus datapath for the LC-2200
- slow but straightforward
- used in Project 1
7Computation
- Weve designed computation elements
- add/subtract, and/or/xor/not
- could do multiply divide?
- How do you build bigger computations?
32
32
32
32
32
32
8An adder in Boolean gates
- This is just 1 bit, but obviously we can scale it
up
9Example
- y a bc cx2
- all numbers (x, y) and constants (a, b, c) are
32-bit integers
f(x)
x
y
10Examplecombinational implementation
cx2
c
x
y
a
b
bx
11Combinational Example Timing
- Suppose ADD requires 10 ns and MUL 100 ns
- Tpd of the whole circuit?
c
x
y
a
b
12Combinational Circuit
- Delay is minimum possible
- Tpd 210 ns
- imposed by dataflow of the desired computation!
- Circuit cost is maximum
- two adders
- three multipliers
- No flexibility
- equation is hardwired in the circuit topology
- maybe the constants (A, B, C) could be set by
switches
13Sequential Circuit
- A sequential circuit would let us re-use
functional units and save hardware cost - But how to wire them up??
One of each type
some storage
14A recipethe single-bus datapath
(y a bc cx2)
- One common bus (32 bits wide)
15A recipethe single-bus datapath
(y a bc cx2)
One common bus (32 bits wide)
1 type of functional unit each outputs
connected to bus via tri-state buffers
MUL
ADD
DrMUL
DrADD
tri-state buffers
16A recipethe single-bus datapath
(y a bc cx2)
One common bus (32 bits wide)
A
B
LdA
LdB
D
LdD
C
LdC
Inputs connected to the bus via registers (i.e.
some sequential logic)
MUL
ADD
1 type of functional unit each outputs connected
to bus via tri-state buffers
17A recipethe single-bus datapath
(y a bc cx2)
One common bus (32 bits wide)
Y
A
B
LdY
LdA
LdB
D
LdD
C
LdC
y
Inputs connected to the bus via registers
ROM 0 a 1 b 2 c 3 unused
MUL
ADD
2
romaddr
1 type of functional unit each outputs connected
to bus via tri-state buffers
x
DrX
DrADD
Other, e.g. constants and I/O
18A recipethe single-bus datapath
y a bx cx2
Ex. A2, B4, C6, x2
Part 1 C ? x D ? x
Part 2 D ? x C ? 6
Part 3 A ? Cx2
Part 4 C ? 4 D already x
Part 5 B ? Bx
Part 6 B ? Bx Cx2 (or RegA RegB)
Part 7 A ? A
Part 8 Y ? RegA RegB
19Big Picture
- Fetch the instruction from memory
- Decode the instruction and decide what to do
- Execute the instruction
- Repeat.
- What hardware do we need to
- Add 2 registers together and store the result in
a 3rd? - Lets look at the LC-2200
- (or alternatively the MIPS)
- (well do them both but its your choice as to
which is first)
20LC-2200 datapath
PC
lets look at a generic instruction (we start
with the PC ? which stores the address of the
next instruction to be executed)
21LC-2200 datapath
PC
PC indexes memory data-out instruction encoding
22LC-2200 datapath
PC
We store the output of memory in IR (Side note
why do we need to do this? Ideally we could use
bits of address to set ALU functions, etc.)
IR
23LC-2200 Instruction Types
all are encoded in single, 32-bit words
R-type Register-Register
31
28
0
19
20
23
24
27
3
4
OP
RA
RB
unused
RD
- How many possible opcodes? How many registers?
24LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
strip off bits that would be used to index the
register file (RA, RB source registers always
encoded in same place)
What if the instruction is an other type? (its
OK ? random un-needed registers will still be
read, but not used b/c of control signals set by
opcode)
25LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
1st value is read stored in a temporary
register
Dout
A
LdA
(if these are seen as unconditional inputs to the
ALU)
26LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
1st value is read stored in a temporary
register
Dout
A
B
LdB
(if these are seen as unconditional inputs to the
ALU)
27LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
opcodes add 0000 nand 0001 addi 0010 lw 0011 sw 0
100 beq 0101 jalr 0110 halt 0111
A
B
LdA
LdB
2
could take from opcode
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
28QuestionsDid you understand what we just
did?Can someone show me how an add works?(draw
what happens using bits in PCas registers
starting point)
29PC
Consider the LW instruction lw s0, 4(s1)
lw
0011
RB ? MEMRA Offset
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
this where s1 goes
Dout
A
B
LdA
LdB
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
31
28
0
19
20
23
24
27
OP
RA
RB
immediate 20-bit signed
30PC
Consider the LW instruction lw s0, 4(s1)
lw
0011
RB ? MEMRA Offset
explain sign extending (well see with
MIPS) cant just send 1002 (would get
garbage otherwise)
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
encoded as an immediate value
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
31
28
0
19
20
23
24
27
OP
RA
RB
immediate 20-bit signed
31MAR
PC
Consider the LW instruction lw s0, 4(s1)
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
Used control code 10
32MAR
PC
Consider the LW instruction lw s0, 4(s1)
memory address register (indexes memory)
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
2
need destination register
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
common (i.e. with MIPS) to use ALU to increase PC
while decoding, etc.
33Making it more real...
- This is getting very complicated!!!
- For reasons involving implementation details, the
simplest (lowest cost, lowest performance)
technique is to use a single bus to connect all
the various functional units
( Big jump from this to next example PC 1 or
PC 4 )
34Bus
use bus instead of HW for everything
Pro Simpler HW Con Timing protocols/contention
35BusOnly One Functional Unit at a time can drive
bus
36BusAny (and all) functional units can access bus
Functional Unit
Functional Unit
Functional Unit
Functional Unit
37Questions?
38LC-2200 Datapath (in terms of a bus structure)
32
A
LdA
B
LdB
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
Heres sign extend example
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
39Recall our basic add instruction
40LC-2200 Datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
Did we leave anything out?
41Need to increment PC!
42LC-2200 Datapath (PC used to index memory)
32
A
LdA
2) Only want to load MAR
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
1) Let PC control bus
Dout
Dout
sign extend
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
43LC-2200 Datapath (in terms of a bus structure)
32
2) IR loaded
A
LdA
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) Let mem control bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
44LC-2200 Datapath (in terms of a bus structure)
32
2) Load it into a register
A
LdA
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) Let PC control bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
45LC-2200 Datapath (in terms of a bus structure)
32
10
3) PC is loaded with next inst. to fetch
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
1) increment PC
Dout
Dout
sign extend
2) ALU contorls the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
46LC-2200 Datapath (in terms of a bus structure)
32
3) write 1st register (A)
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) increment register file
2) register file controls the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
47LC-2200 Datapath (in terms of a bus structure)
32
3) write 2nd register (B)
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) increment register file
2) register file controls the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
48LC-2200 Datapath (in terms of a bus structure)
32
4) write registers
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) get opcode, do ADD
3) index register file
2) ALU drives the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
49What about all of those red blocks???(note
some foreshadowing on the way)
50Lets revisit our 1st examplethe single-bus
datapath, HW function
Minimal HW cost in time
y a bx cx2
One common bus (32 bits wide)
Y
LdY
A
LdA
B
LdB
D
LdD
C
LdC
y
Inputs connected to the bus via registers
Other, e.g. constants and I/O
ROM 0 a 1 b 2 c 3 unused
MUL
ADD
2
romaddr
1 type of functional unit each outputs connected
to bus via tri-state buffers
x
DrX
DrADD
51Wed use a FSM for control(more later like a
lecture or 2)
- Datapath control inputs are FSM outputs
- (i.e. control signal to control HW generate by
FSM) - Datapath status outputs (none in this case) would
be FSM inputs - FSM contains as many states as required
load/dont load registers
drive/dont drive buses
what input needed?
52Try Designing States!y a bx cx2
0 DrX, LdC, LdD
Read X input into both C/D registers
1 DrMUL, LdC
Write XX into C register
Might be slightly different than ordering in
early example
53Recallthe single-bus datapath, HW function
Minimal HW cost in time
y a bx cx2
One common bus (32 bits wide)
Y
LdY
A
LdA
B
LdB
D
LdD
C
LdC
y
Inputs connected to the bus via registers
Other, e.g. constants and I/O
ROM 0 a 1 b 2 c 3 unused
MUL
ADD
2
romaddr
1 type of functional unit each outputs connected
to bus via tri-state buffers
x
DrX
DrADD
54FSM states
0 DrX, LdC, LdD
1 DrMUL, LdC
Compute cxx
2 DrROM(C), LdD
3 DrMUL, LdB
4 DrX, LdC
Compute bx
5 DrROM(B), LdD
6 DrMUL, LdA
7 DrADD, LdB
Compute a
8 DrROM(A), LdA
9 DrADD, LdY
55Timing?
5 100 5 110 ns
registers 5nS
y
MUL 100nS
ADD 10nS
ROM 0 a 1 b 2 c 3 unused
2
20nS
romaddr
x
tri-states 5nS
56Timing Details
Add
C
MUL
bus
Arbitrary time to generate control
57Timing (contd)
0 DrX, LdC, LdD
1 DrMUL, LdC
2 DrROM(C), LdD
3 DrMUL, LdB
110 ns 10 states 1100 ns
4 DrX, LdC
5 DrROM(B), LdD
6 DrMUL, LdA
7 DrADD, LdB
8 DrROM(A), LdA
9 DrADD, LdY
58Timing
- Datapath circuit 1100 ns
- Hardwired circuit 210 ns
- But -- where is the extra time going??
- 1. loss of parallelism in computation
- 2. worst-case timing assumptions
- 3. overhead of flexibility (registers/tristates)
- 4. loss of parallelism in communication
(single-bus bottleneck)
591. Parallelism in Computation
- Tpd as drawn 210 ns
- Suppose it had to be sequential?
c
x
y
a
b
320 ns
602. Worst-Case Timing AssumptionsClock cycle
sized to fit MUL 500 ns for five ops
registers 5 ns
y
MUL 100 ns
ADD 10 ns
ROM 0 a 1 b 2 c 3 unused
2
But we have to do more
20 ns
romaddr
x
tri-states 5 ns
613. Cost of Flexibility10 ns (10) for
reg/tristate 550 ns for five ops
registers 5 ns
y
MUL 100 ns
ADD 10 ns
ROM 0 a 1 b 2 c 3 unused
2
20 ns
romaddr
x
tri-states 5 ns
624. Parallelism in Communication
0 DrX, LdC, LdD
1 DrMUL, LdC
2 DrROM(C), LdD
3 DrMUL, LdB
How many of these states actually compute
something?
4 DrX, LdC
5 DrROM(B), LdD
6 DrMUL, LdA
What are the rest of the states doing?
7 DrADD, LdB
8 DrROM(A), LdA
9 DrADD, LdY
63(4. Parallelism in Communication)
c
x
y
a
b
320 ns
64Single bus recipe summary
- One common bus (32 bits wide)
- One of each type of functional unit
- Inputs from bus via registers
- Outputs connected to the bus via tri-state
buffers - Any other pseudo- functional units
- I/O
- constants
- temporary storage
65General-Purpose Computation
- Story so far
- 1. combinational
- 2. sequential, using single-bus recipe
- However, single-bus recipe still requires new
functional units and a new FSM for every problem.
How can we make it universal?
66Universal Machine
- 1. enough functional units
- (pretty easy...)
- 2. enough memory for constants, temporaries, etc
- 3. (the crux) FSM is an interpreter