Title: Computer = ALU Memory
1Computer ALU Memory
Lets try to compute 3 2 5
3
2
3
2
3
2
5
2(No Transcript)
3GPR Architecture(General Purpose Register)
Lets compute 3 2 5 again !
3
Bus X
Bus Y
2
Put 3 on bus X Put 2 on bus Y Stuff X and Y into
ALU ALU adds X and Y SLU send result to bus
W Put bus W into Mem
3
2
3
2
5
5
5
Bus W
Our programmer needs to do this !
4(No Transcript)
5GRP Machine Details
Load reg from mem Load reg from mem Add reg to
reg into reg Store reg in mem
Our programmer needs to do this !
6(No Transcript)
7Accumulator Architecure
Get 3 from Memory and ADD !
1. Assume 8 is already in the accumulator. The
programmer writes
4
2
3
Add 3
8
Accumulator
8
3
2. The ALU does 3 8 11 and writes the result
back into the accumulator
Memory
8(No Transcript)
9Lets build a Computer
- Lets take a RISC. What do we need ?
- Memory
- Registers
- ALU
- Control Circuits
- A programming language
- A good Name - Simple Although Meaningful
10(No Transcript)
11Whats needed to build Sam-4 ?
Arithmetic Logic Unit to do the maths business
Data memory to hold source and results of our work
Registers to hold results of computations
Data Memory
Code Memory to store the program
12(No Transcript)
13Program Memory
Data out
Memory stores program instructions at a sequence
of byte addresses. Each instruction is 32 bits,
so the addresses increment by 4 bytes.
Here the Program Counter input address 4 to the
memory which reads out the data word (32 bits) at
address 4. This is the inst- ruction add
Address in
14(No Transcript)
15Registers, Registers
1. Registers Store data at addresses. Yep, thats
Memory !
2. There are TWO read ports (X and Y) where data
can be simultaneously read out of the reg file.
4. The addresses for the read ports (X and Y) and
the write port (W) come in here.
3. Multiport Registers have an input port (W)
where data is send to be written into the
register file.
16(No Transcript)
17Data Memory
Heres the memory
The Memory Data Register (MDR) is a parking place
for data coming and going from the memory.
The Memory Address Register holds the address of
the data location selected for read or write e,g,
7
7
18(No Transcript)
19Heres Sam
20(No Transcript)
21Fetch-Execute Cycle
(Much more Clever and Useful)
1. Fetch instruction from memory
5. Write back results to registers
r3 lt- ALU
add r3,r2,r1
4. Do any Memory Access
2. Decode the opcode and read any registers
None needed
ALU lt- r1 ALU lt- r2
3. Do any ALU operations
ALU add
Get contents of address 1
22(No Transcript)
23First Example
ld r0 , 1 ld r1 , 2 add r2,r1,r0 st r2 ,
7
Load r0 with data at address 1 Load r1 with data
at address 2 Add r0 and r1. Put result in
r2 Store r2 in memory address 7
Note each of these instructions runs through 5
steps of its own F-E Cycle
24(No Transcript)
251. Instruction Fetch
Ld 0 1
X
Y
0
X
Ld r0,1
r0
1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 0
7
26(No Transcript)
272. Decode, Reg Ops
Ld 0 1
X
Y
0
X
Ld r0,1
r0
1
1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 4
7
28(No Transcript)
293. ALU Operation
Ld 0 1
X
Y
0
X
Ld r0,1
r0
1
1
Y
r1
Code Memory
r2
1
mdr
W
Data Memory
ALU
1
mar
PC 4
7
30(No Transcript)
314. Memory Access
Ld 0 1
0
X
Y
0
X
Ld r0,1
r0
1
1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
1
mar
PC 4
7
7
32(No Transcript)
335. Register Write
Ld 0 1
0
X
Y
X
Ld r0,1
r0
1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 4
7
W
34(No Transcript)
351. Instruction Fetch
add 2 0 1
X
Y
0
X
r0
1
add r2,r0,r1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 4
7
W
36(No Transcript)
372. Decode, Reg Ops
add 2 0 1
X
Y
0
X
r0
1
add r2,r0,r1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 8
7
W
38(No Transcript)
393. ALU Operation
add 2 0 1
X
Y
0
X
r0
1
add r2,r0,r1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 8
7
W
40(No Transcript)
414. Memory Access
add 2 0 1
X
Y
0
X
r0
1
add r2,r0,r1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 8
7
W
42(No Transcript)
435. Register Write
add 2 0 1
X
Y
0
X
r0
1
add r2,r0,r1
Y
r1
Code Memory
r2
mdr
W
Data Memory
ALU
mar
PC 8
7
W
44(No Transcript)
45Instruction Encoding Example
destination
All Sams instructions take up 32 bits.
opcode
Source regs
add rd rs rt unused
Sams instructions start with the opcode then the
destination reg- ister then the source register
rd lt- rs rt
e.g. add r3, r1, r2 means r3 r1 r2
First 6 bits for the opcode.
010110 00011 00010 00001 unused
3
2
1
6
5
5
5
Nr of Bits
11
46(No Transcript)
47The Instruction Register
Loaded with the instruction, the IR decodes this
into bits which drive the CPU digital logic
circuits
add 2 1 3
Add r2,r1,r3
?
3
1
2
010110 00010 00001 00011 unused
Electronic Wires
48(No Transcript)
49Control Path
001010 00010 00001 00011 unused
add r2, r1, r3
The add instruction is decoded and produces
digital signals which select the function in
the ALU
r1
r3
-
Add !
000101 00010 00001 00011 unused
sub r2, r1, r3
r1
r3
The sub function decoded produces different
digital signals
-
Subtract !
50(No Transcript)
51Sam and MIPS are 32 bit
32 bits wide
add rd,rs,rt
001010 00110 01001 00011 unused
opcode rd rs rt unused
001010 00010 00001 0101001111111011
ldr rd,rsc
opcode rd rs 16-bit address
001010 101001111110010101011011111
ldr rd,c
opcode 26-bit address
52(No Transcript)
53Other Arithmetic Instructions
destination
rd lt- rs - rt
opcode
Source regs
sub rd rs rt unused
Same coding applies to other arithmetic
instructions
sub r3,r2,r1 and r2,r1,r0 or r5,r1,r2
54(No Transcript)
55A simple Load instruction
Load into rd the contents of memory at address
which is in reg rs. Simple!
destination
opcode
Single source reg
rs
rd
ld
unused
memory
0
2. Load the data into r9
1
ldr r9 , r1
2
3
145
3
1. Lets say have already loaded r1 with 3
115
4
231
5
2. Get data from mem at addr r1 (3)
69
6
7
145
56(No Transcript)
57A more complex Load
Load register rd with the contents of memory
which you find at address r1 c.
destination
opcode
Source
constant c
rs
rd
ldr
memory
ldr r9 , r1 2
3 2
The mem address is formed as a sum
5
231
58(No Transcript)
59 and a Store instruction
Source
destination
opcode
Note here the data is moved from destination to
store. Confusing? Mm.
constant c
rs
rd
str
1. Get data from r1
str r9 , r1 2
3 2
5
2. Write it to memory
196
60(No Transcript)
61 Load Immediate
destination
opcode
In load immediate we get the constant C
immediately following the opcode into the reg.
Constant C
rd
ldi
All reference to memory has gone!
ldi r9 , 5
5
Load 5 straight into r9
62(No Transcript)
63A Summary So Far
Now its time to move on and look in detail at
the hierarchy of computer languages to see the
influence on the ISA.
64(No Transcript)
65Assembling a Spreadsheet
Excel Application
The Great Idea here is that the ISA we need at
the bottom must serve the grand master at the
top, the Application.
Main() int f,g,h f g h
ld r0, g ld r1, h add r2,r0,r1 st r2, f
HLL Imple-mentation
The ISA must support the HLL implementation
Electronics
ISA Assembler
66(No Transcript)
67Arrays ( Tables)
How do we sum the array of numbers in column B?
1. We would use the instruction ld r1,r0 B
where B3, the start address of the array
2. Then we load r0 with 0 then 1, then 2, to
scan down the array
Ld r0 , 0 Ld r3 , 0 Ld r1, r0 3
r0 (0) 3 3
68(No Transcript)
69Arrays ( Tables)
How do we sum the array of numbers in column B?
- Increment r1 to get the next data value inc
r1 (0 1 1) - ld r2,r0 B where B3, the start address of
the array but now r- contains 0
Inc r0 Ld r1, r0 3 add r3,r3,r1
Get next cell, lad its value and add it to the
sum, in r3
70(No Transcript)
71Making Decisions
What about SAM?
Lets say we want to add 2 to a number B if
another number C is equal to 10
You mean, If C 10, then add 2 to B
First load the test number 10
Yep
Heres how we would do it in C
Branch around the add
if(c 10) b b 2
Branch if not equal r1 r2 to addr 36
72(No Transcript)
73Loops
Lets say we want to make the sequence 0,3,6,9,12
and stop.
We take 4 steps and each step add 3 x x 3
And a register to hold the sum at each step
So we need a register to keep track of the number
of steps (r0)
0
1
2
3
4
0
3
6
9
12
Branch unless r0 r1 4
r0
r2
74(No Transcript)
75Some x86 instructions
mov ax , bx c mov ax , bx add ax , bx add
bx , ax
These look rather like Sams RISC ops
Lets compare the RR and RM ISAs. Clearly RR
needs more memory while the RM uses stronger
operations
But this is not. Here the contents of ax is being
added straight into memory ! The x86 is a
register memory ISA and Sam is a register
register ISA
ldi r1 , a ldi r2 , b add r3,r1,r2 st r3 ,
b mov ax, a add b,ax
Sam
Intel x86
76(No Transcript)
77Intel Instruction Format
IA-32 Format
78(No Transcript)
79Variable Length Instructions
All Sams instructions had the same length, 32
bits. This is also true for other RISC ISAs such
as SPARC and MIPS. Compare this with the x86
instruction vary from 1 to 17 bytes. Heres some
stats.
Clearly long complex instructions are used
infrequently
Instruction Length (bytes)
But the use does depend on the app.
Expresso
Gcc
Spice
Nasa
Frequency of use
80(No Transcript)
81Instruction Timing
All Sams instructions occur in 5 clock cycles
Time
- 1 Gigahertz SPARC in 1 second are 1
GigaClockCycles - Thats 109 cycles
- Thats 1,000,000,000 cycles
- Thats 200,000,000 add ops !
One Clock Cycle
82(No Transcript)
83Variable Time Instructions
Heres a timing diagram for an Intel add
add ax , bx c
bx c
ax ax mem
and the second to actually add memory to
register ax
We need two adds. The first to get the address
summed up
84(No Transcript)
85Potent x86 Instructions
1.Application
Greenspan
strcmp(str, Greenspan)
2.High-Level Language (C)
mov x,2 Immediate to memory 6
xlat x Translate al via table 1
imul x Multiply memory with ax 4
inc x Increment memory by 1 4
Repne scasb Scan string for match ! various
3.Intel ISA code
86(No Transcript)
87Top 10 Intel x86 Instructions
Rank Instruction Usage 1 load 22
2 conditional branch 20 3 arithmetic /
logic 19 4 compare 16 5 store 12
6 move reg - reg 4 7 call - return 2
We see that most instructions are Simple load,
store, calculate, branch. None of Intels
potent stuff figures here. So why did Intel
design instructions no-one uses ?
88(No Transcript)
89ISA RD into the 80s
- Lets downshift and make things simpler
- Use simple instructions, load, store, add
- Many of these will do one x86 potent op
- Need more memory, but memory is cheap
- More CPU cycles, but can still be faster
1980 Berkeley Patterson RISC (SPARC) 1981
Stanford Hennessy MIPS
90(No Transcript)
91Intel Architecture
Looks Great from the outside but is a golden
mishmash with history of add-ons
92(No Transcript)
93RISC Architecture
RISC Architecture
Minimalist Functional
94(No Transcript)
95Summary so far
CISC
RISC
Minimalist Something like Zen All instructions
the same length in memory Small number of
instructions Small number of addressing
modes Simple instructions 5 clock cycles SPARC,
MIPS
Different Length in memory Large number of
instructions Huge number of addressing
modes Complex Instructions Variable number of
clock cycles. Intel
96(No Transcript)
97Today the consequences of
Intel (CISC)
MIPS (RISC)
98(No Transcript)
99Laundry Model
Washer Drier Store Basket
Wardrobe
100(No Transcript)
101Process Steps
9.00
10.00
11.00
- Load the washer at 9.00
- Done at 10, load the drier
- Drier Done at 11
A. Wash then Dry
102(No Transcript)
103Sequential Process
9.00
15.00
11.00
13.00
- Load washer at 9.00
- Done at 10, load drier
- Drier Done at 11
- Reload washer at 11
- Done at 12, load drier
- Drier done at 13
- Reload washer at 13
- Done at 14, load drier
- Done at 15
3 loads takes 6 hours
104(No Transcript)
105Overlapping Process
9.00
15.00
11.00
13.00
- Load washer at 9.00
- Done at 10, load drier reload washer
- Both Done at 11. Reload drier reload washer
- Both done at 12. Reload drier
- Drier done at 13
From 10.00 till 11.00 both washer and dryer
running concurrently
3 loads takes 4 hours
106(No Transcript)
107Washing Pipeline Filling
9.00
11.00
13.00
15.00
17.00
18.00
- 5 Cycles !!!
- Get washing
- Wash
- Dry
- Store
- Put away
5 loads in 9 hours
108(No Transcript)
109Can we Pipeline SAM ?
1.Fetch
2.Dec/Reg
3.ALU
4.Mem
5.RW
110(No Transcript)
111Pipelined Sam4
1.Fetch
2.Dec/Reg
3.ALU
4.Mem
5.RW
Buffer
112(No Transcript)
1135 Stages in Pipeline
1.Fetch
2.Dec/Reg
3.ALU
4.Mem
5.RW
r1,r2
add r3,r1,r2
r3
add
Lets take the instruction add r3,r1,r2 and show
which stage is needed for each part of the
instruction.
114(No Transcript)
115Two Instructions
r0
ld r3,r02
2
add r4,r1,r2
Two instructions into the pipeline
116(No Transcript)
117Structural Hazard
Write (store)
Here we are being asked to read from memory and
write to it simultaneously. Impossible!
st r0,5
Solution Use separate code and data memories
add r4,r1,r2
Read (fetch)
118(No Transcript)
119Hazardous Washing
9.00
11.00
13.00
15.00
17.00
18.00
Washing basket containes both clean and dirty
washing!
120(No Transcript)
121Code and Data Memories
122(No Transcript)
123Data Hazard
r3 set here
add r3,r1,r2
add r4,r1,r3
but need r3 here EARLIER !
124(No Transcript)
125Data Hazard
Need value of r3 for second instruction before
the first is complete.
add r3,r1,r2
add r4,r1,r3
126(No Transcript)
127Pipeline Stalls
add r3,r1,r2
Stall
Mem
Stall
add r4,r1,r3
Resolve Hazard Insert delay into second
instruction stream. Stall Cycles.
But this needs extra electronics on the chip.
Complex and Costly.
128(No Transcript)
129Forwarding
Need value of r3 for second instruction before
the first is complete.
add r3,r1,r2
add r4,r1,r3
So build in extra circuits to get the data as
soon as it is available from the ALU
130(No Transcript)
131Compiler resolves Hazard
Compile can detect possible hazard and insert 2
nops (no ops)
add r3,r1,r2
nop
nop
add r4,r1,r3
132(No Transcript)
133Example
ld
r1
ld r1,7
7
ld
r2
8
ld r2,8
r1, r2
add
r3
add r3,r1,r2
134(No Transcript)