CS 2200 Lecture 05a Datapaths Part 1 - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

CS 2200 Lecture 05a Datapaths Part 1

Description:

... Part ... Part 2: D x. C 6. Part 3: A Cx2. Part 4: C 4. D already x. Part 5: B Bx ... Part 8: Y Reg[A] Reg[B] Part 1: C x. D x. 19. Big Picture. Fetch the ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 67
Provided by: michaelt8
Category:
Tags: 05a | datapaths | lecture | part

less

Transcript and Presenter's Notes

Title: CS 2200 Lecture 05a Datapaths Part 1


1
CS 2200 Lecture 05aDatapaths Part 1
  • (Lectures based on the work of Jay Brockman,
    Sharon Hu, Randy Katz, Peter Kogge, Bill Leahy,
    Ken MacKenzie, Richard Murphy, and Michael
    Niemier)

2
Five classic components (of an architecture)
(Remember this???)
3
Lets take a closer look...
Processor
Control
Datapath
4
Review Digital Logic
  • Combinational logic gates, ROMs
  • (how its implemented)
  • tri-state buffer 0/1 or Z (unconnected)
  • Sequential logic edge-triggered flip-flops
  • (a.k.a. memory), stores state
  • all state stored in edge-triggered flip-flops
  • single clock exactly one clock goes to every
    flip-flop
  • Finite State Machines (FSMs)
  • Moore Mealy forms
  • state-transition diagram
  • state-transition table

A combination of combinational logic
and sequential logic (used to build and
control real and useful things)
5
Digital Logic reading?
  • Patterson Hennessy, Appendix B
  • nice quick read
  • my old-CS3760 class notes (see my web page)
  • your ECE 2030 book/notes

6
Today
  • recipes for computation
  • combinational
  • sequential single-bus datapath control
  • single-bus datapath for the LC-2200
  • slow but straightforward
  • used in Project 1

7
Computation
  • Weve designed computation elements
  • add/subtract, and/or/xor/not
  • could do multiply divide?
  • How do you build bigger computations?

32
32
32
32
32
32
8
An adder in Boolean gates
  • This is just 1 bit, but obviously we can scale it
    up

9
Example
  • y a bc cx2
  • all numbers (x, y) and constants (a, b, c) are
    32-bit integers

f(x)
x
y
10
Examplecombinational implementation
  • y a bx cx2

cx2
c
x
y
a
b
bx
11
Combinational Example Timing
  • Suppose ADD requires 10 ns and MUL 100 ns
  • Tpd of the whole circuit?

c
x
y
a
b
12
Combinational Circuit
  • Delay is minimum possible
  • Tpd 210 ns
  • imposed by dataflow of the desired computation!
  • Circuit cost is maximum
  • two adders
  • three multipliers
  • No flexibility
  • equation is hardwired in the circuit topology
  • maybe the constants (A, B, C) could be set by
    switches

13
Sequential Circuit
  • A sequential circuit would let us re-use
    functional units and save hardware cost
  • But how to wire them up??

One of each type
some storage
14
A recipethe single-bus datapath
(y a bc cx2)
  • One common bus (32 bits wide)

15
A recipethe single-bus datapath
(y a bc cx2)
One common bus (32 bits wide)
1 type of functional unit each outputs
connected to bus via tri-state buffers
MUL
ADD
DrMUL
DrADD
tri-state buffers
16
A recipethe single-bus datapath
(y a bc cx2)
One common bus (32 bits wide)
A
B
LdA
LdB
D
LdD
C
LdC
Inputs connected to the bus via registers (i.e.
some sequential logic)
MUL
ADD
1 type of functional unit each outputs connected
to bus via tri-state buffers
17
A recipethe single-bus datapath
(y a bc cx2)
One common bus (32 bits wide)
Y
A
B
LdY
LdA
LdB
D
LdD
C
LdC
y
Inputs connected to the bus via registers
ROM 0 a 1 b 2 c 3 unused
MUL
ADD
2
romaddr
1 type of functional unit each outputs connected
to bus via tri-state buffers
x
DrX
DrADD
Other, e.g. constants and I/O
18
A recipethe single-bus datapath
y a bx cx2
Ex. A2, B4, C6, x2
Part 1 C ? x D ? x
Part 2 D ? x C ? 6
Part 3 A ? Cx2
Part 4 C ? 4 D already x
Part 5 B ? Bx
Part 6 B ? Bx Cx2 (or RegA RegB)
Part 7 A ? A
Part 8 Y ? RegA RegB
19
Big Picture
  • Fetch the instruction from memory
  • Decode the instruction and decide what to do
  • Execute the instruction
  • Repeat.
  • What hardware do we need to
  • Add 2 registers together and store the result in
    a 3rd?
  • Lets look at the LC-2200
  • (or alternatively the MIPS)
  • (well do them both but its your choice as to
    which is first)

20
LC-2200 datapath
PC
lets look at a generic instruction (we start
with the PC ? which stores the address of the
next instruction to be executed)
21
LC-2200 datapath
PC
PC indexes memory data-out instruction encoding
22
LC-2200 datapath
PC
We store the output of memory in IR (Side note
why do we need to do this? Ideally we could use
bits of address to set ALU functions, etc.)
IR
23
LC-2200 Instruction Types
all are encoded in single, 32-bit words
R-type Register-Register
31
28
0
19
20
23
24
27
3
4
OP
RA
RB
unused
RD
  • How big an immediate?
  • How many possible opcodes? How many registers?

24
LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
strip off bits that would be used to index the
register file (RA, RB source registers always
encoded in same place)
What if the instruction is an other type? (its
OK ? random un-needed registers will still be
read, but not used b/c of control signals set by
opcode)
25
LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
1st value is read stored in a temporary
register
Dout
A
LdA
(if these are seen as unconditional inputs to the
ALU)
26
LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
1st value is read stored in a temporary
register
Dout
A
B
LdB
(if these are seen as unconditional inputs to the
ALU)
27
LC-2200 datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
opcodes add 0000 nand 0001 addi 0010 lw 0011 sw 0
100 beq 0101 jalr 0110 halt 0111
A
B
LdA
LdB
2
could take from opcode
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
28
QuestionsDid you understand what we just
did?Can someone show me how an add works?(draw
what happens using bits in PCas registers
starting point)
29
PC
Consider the LW instruction lw s0, 4(s1)
lw
0011
RB ? MEMRA Offset
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
this where s1 goes
Dout
A
B
LdA
LdB
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
31
28
0
19
20
23
24
27
OP
RA
RB
immediate 20-bit signed
30
PC
Consider the LW instruction lw s0, 4(s1)
lw
0011
RB ? MEMRA Offset
explain sign extending (well see with
MIPS) cant just send 1002 (would get
garbage otherwise)
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
encoded as an immediate value
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
31
28
0
19
20
23
24
27
OP
RA
RB
immediate 20-bit signed
31
MAR
PC
Consider the LW instruction lw s0, 4(s1)
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
Used control code 10
32
MAR
PC
Consider the LW instruction lw s0, 4(s1)
memory address register (indexes memory)
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
2
need destination register
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
common (i.e. with MIPS) to use ALU to increase PC
while decoding, etc.
33
Making it more real...
  • This is getting very complicated!!!
  • For reasons involving implementation details, the
    simplest (lowest cost, lowest performance)
    technique is to use a single bus to connect all
    the various functional units

( Big jump from this to next example PC 1 or
PC 4 )
34
Bus
use bus instead of HW for everything
Pro Simpler HW Con Timing protocols/contention
35
BusOnly One Functional Unit at a time can drive
bus
36
BusAny (and all) functional units can access bus
Functional Unit
Functional Unit
Functional Unit
Functional Unit
37
Questions?
38
LC-2200 Datapath (in terms of a bus structure)
32
A
LdA
B
LdB
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
Heres sign extend example
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
39
Recall our basic add instruction
40
LC-2200 Datapath
PC
registers 16x 32 bits
Din
WrREG
4
?
IR
regno
Dout
A
B
LdA
LdB
2
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
Did we leave anything out?
41
Need to increment PC!
42
LC-2200 Datapath (PC used to index memory)
32
A
LdA
2) Only want to load MAR
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
1) Let PC control bus
Dout
Dout
sign extend
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
43
LC-2200 Datapath (in terms of a bus structure)
32
2) IR loaded
A
LdA
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) Let mem control bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
44
LC-2200 Datapath (in terms of a bus structure)
32
2) Load it into a register
A
LdA
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) Let PC control bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
45
LC-2200 Datapath (in terms of a bus structure)
32
10
3) PC is loaded with next inst. to fetch
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
1) increment PC
Dout
Dout
sign extend
2) ALU contorls the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
46
LC-2200 Datapath (in terms of a bus structure)
32
3) write 1st register (A)
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) increment register file
2) register file controls the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
47
LC-2200 Datapath (in terms of a bus structure)
32
3) write 2nd register (B)
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) increment register file
2) register file controls the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
48
LC-2200 Datapath (in terms of a bus structure)
32
4) write registers
10
memory 1024x 32 bits
Addr Din
registers 16x 32 bits
Din
IR31..0
WrREG
WrMEM
2
4
ALU 00 ADD 01 NAND 10 A - B 11 A 1
func
IR19..0
regno
20
Dout
Dout
sign extend
1) get opcode, do ADD
3) index register file
2) ALU drives the bus
0?
RA 4-bit register number to control logic
IR27..24
1
RB 4-bit register number to control logic
IR23..20
RD 4-bit register number to control logic
IR3..0
OP 4-bit opcode to control logic
IR31..28
1
Z 1-bit boolean to control logic
49
What about all of those red blocks???(note
some foreshadowing on the way)
50
Lets revisit our 1st examplethe single-bus
datapath, HW function
Minimal HW cost in time
y a bx cx2
One common bus (32 bits wide)
Y
LdY
A
LdA
B
LdB
D
LdD
C
LdC
y
Inputs connected to the bus via registers
Other, e.g. constants and I/O
ROM 0 a 1 b 2 c 3 unused
MUL
ADD
2
romaddr
1 type of functional unit each outputs connected
to bus via tri-state buffers
x
DrX
DrADD
51
Wed use a FSM for control(more later like a
lecture or 2)
  • Datapath control inputs are FSM outputs
  • (i.e. control signal to control HW generate by
    FSM)
  • Datapath status outputs (none in this case) would
    be FSM inputs
  • FSM contains as many states as required

load/dont load registers
drive/dont drive buses
what input needed?
52
Try Designing States!y a bx cx2
0 DrX, LdC, LdD
Read X input into both C/D registers
1 DrMUL, LdC
Write XX into C register
Might be slightly different than ordering in
early example
53
Recallthe single-bus datapath, HW function
Minimal HW cost in time
y a bx cx2
One common bus (32 bits wide)
Y
LdY
A
LdA
B
LdB
D
LdD
C
LdC
y
Inputs connected to the bus via registers
Other, e.g. constants and I/O
ROM 0 a 1 b 2 c 3 unused
MUL
ADD
2
romaddr
1 type of functional unit each outputs connected
to bus via tri-state buffers
x
DrX
DrADD
54
FSM states
0 DrX, LdC, LdD
1 DrMUL, LdC
Compute cxx
2 DrROM(C), LdD
3 DrMUL, LdB
4 DrX, LdC
Compute bx
5 DrROM(B), LdD
6 DrMUL, LdA
7 DrADD, LdB
Compute a
8 DrROM(A), LdA
9 DrADD, LdY
55
Timing?
5 100 5 110 ns
registers 5nS
y
MUL 100nS
ADD 10nS
ROM 0 a 1 b 2 c 3 unused
2
20nS
romaddr
x
tri-states 5nS
56
Timing Details
Add
C
MUL
bus
Arbitrary time to generate control
57
Timing (contd)
0 DrX, LdC, LdD
1 DrMUL, LdC
2 DrROM(C), LdD
3 DrMUL, LdB
110 ns 10 states 1100 ns
4 DrX, LdC
5 DrROM(B), LdD
6 DrMUL, LdA
7 DrADD, LdB
8 DrROM(A), LdA
9 DrADD, LdY
58
Timing
  • Datapath circuit 1100 ns
  • Hardwired circuit 210 ns
  • But -- where is the extra time going??
  • 1. loss of parallelism in computation
  • 2. worst-case timing assumptions
  • 3. overhead of flexibility (registers/tristates)
  • 4. loss of parallelism in communication
    (single-bus bottleneck)

59
1. Parallelism in Computation
  • Tpd as drawn 210 ns
  • Suppose it had to be sequential?

c
x
y
a
b
320 ns
60
2. Worst-Case Timing AssumptionsClock cycle
sized to fit MUL 500 ns for five ops
registers 5 ns
y
MUL 100 ns
ADD 10 ns
ROM 0 a 1 b 2 c 3 unused
2
But we have to do more
20 ns
romaddr
x
tri-states 5 ns
61
3. Cost of Flexibility10 ns (10) for
reg/tristate 550 ns for five ops
registers 5 ns
y
MUL 100 ns
ADD 10 ns
ROM 0 a 1 b 2 c 3 unused
2
20 ns
romaddr
x
tri-states 5 ns
62
4. Parallelism in Communication
0 DrX, LdC, LdD
1 DrMUL, LdC
2 DrROM(C), LdD
3 DrMUL, LdB
How many of these states actually compute
something?
4 DrX, LdC
5 DrROM(B), LdD
6 DrMUL, LdA
What are the rest of the states doing?
7 DrADD, LdB
8 DrROM(A), LdA
9 DrADD, LdY
63
(4. Parallelism in Communication)
c
x
y
a
b
320 ns
64
Single bus recipe summary
  • One common bus (32 bits wide)
  • One of each type of functional unit
  • Inputs from bus via registers
  • Outputs connected to the bus via tri-state
    buffers
  • Any other pseudo- functional units
  • I/O
  • constants
  • temporary storage

65
General-Purpose Computation
  • Story so far
  • 1. combinational
  • 2. sequential, using single-bus recipe
  • However, single-bus recipe still requires new
    functional units and a new FSM for every problem.
    How can we make it universal?

66
Universal Machine
  • 1. enough functional units
  • (pretty easy...)
  • 2. enough memory for constants, temporaries, etc
  • 3. (the crux) FSM is an interpreter
Write a Comment
User Comments (0)
About PowerShow.com