Title: Structure of Computer Systems
1Structure of Computer Systems
- Course 4
- The Central Processing Unit - CPU
2CPU - Central Processing Unit
- Classic (idyllic) view
- Incorporates 2 of the 5 components of the von
Neumanns classical model - ALU
- CU Control Unit
- It is the brain (intelligent part) of a computer
- Fetch (read) instruction, decode/interpret it,
read data, execute instruction and store the
result - Do its job in a synchronized and sequential way
one thing at a time -
3CPU - Central Processing Unit
- Todays view
- Contains all kind of computer components
- Multiple CPUs
- symmetric, asymmetric,
- multiple cores,
- multiple ALUs, specialized ALUs (e.g. floating
point, multimedia MMX, SSE2) - Memory multiple levels of cache memory (L0, L1,
L2, Trace cache) - Interfaces and Peripheral devices (in case of
microcontrollers and DSPs) - Serial channels
- Parallel interfaces,
- Timers, counters
- Converters (ADC, DAC)
- Network interfaces
- Interrupt system
- Bus controller(s) and arbiter(s)
- Memory management units
- Execute instructions in parallel and in a
speculative order - Intelligence may be distributed in memories and
interfaces as well - Where is that nice idyllic image ?
4Starting with the beginning
- A simple computer
- Attributes sequential, one (accumulator)
register, one memory for instructions and data
Legend CG - clock generator PhG phase
generator PC program counter IR instruction
register Acc - accumulator
5A simple computer
- How does it work?
- 4 phases
- IF instruction fetch read the instruction
into IR - Dec - Decode the instruction generate control
signals - PreEx - Prepare execution e.g. read the data
from memory - Exe Execute e.g. adding, subtraction
6A simple computer
- Example 1 ADD Acc, M100h
- IF Sel0 gt Address PC IR_ld impuls gt IR
ADD 100 - Dec Sel1 gtAddress IR_adr100 Inc1
increment PC - PreEx Op_sel code_add gt ALU is doing an
adding - Exe Acc_ld gt Acc Acc M100
7A simple computer
- Example 2 JMP 200h
- IF Sel0 gt Address PC IR_ld impulse gt
IRJMP 200 - Dec Inc 1 gt increment PC
- PreEx PC_ld 1 gt PCIR_addr100
- Exe
- Example 3 SHR Acc
- IF and Dec the same
- PreEx
- Exe Acc_shr 1 gt shift the accumulator one
position to the right
8A simple computer
- Homework try to implement
- MOV Maddr, Acc
- MOV Acc, Maddr
- Conditional jump (e.g if Acc0, gt0, lt0)
- MOV Acc, 0
9A simple computer
- Issues
- Every instruction executed in a fixed (4) number
of steps - Too many for simple instructions
- Too few for complex instructions (e.g. multiply)
- Only one internal register hard to operate with
data - No Input and Output devices
- Limited number of possible operations small
instruction set - Possible improvements
- Variable number of phases -gt the phase generator
should depend on the instruction code - Multiple internal registers -gt 2 buses input
data output data - Front panel with 7segment LEDs and switches
- Increase the number of instructions -gt more
complex Decoder and Command and Control Unit
10A more sophisticated computer, but still simple
the MIPS architecture
- Attributes
- Sequential
- 32 internal registers of 16 bits
- Instructions fixed length, variable content
- Harvard memory architecture separate instruction
and data memory - An instruction is executed in 5 phases
- IF instruction fetch
- ID decode the instruction and prepare (read)
the data - Ex execute the instruction
- M - operation with the memory
- Wb write back store the result
- Instruction types
- R Register ex. ADD RS, RD,RT
- I Immediate ex. ADDI RT,RS, constant
LW RT, offset(RS) - J Jump ex. JMP target
11MIPS architecture
- Instruction formats
- Fixed length (4 bytes) but multiple content
- R register type instructions
- ltinstrgt rd, rs, rt
- rd destination register
- rs source register
- rt target register
- Ex add s1, s2, s3 s1s2s3
12MIPS architecture Instruction formats
- I immediate type instruction - with immediate
value (constant) - ltinstrgt rt, rs, IMM
- rs source register
- rt target register
- Ex addi s1, s2, 55 s1s255
- J jump type instructions
- ltinstrgt LABEL
- Ex j et1 jump
13MIPS architecture
- Address generation and instruction fetch
PC_MUX_Sel1
PC_ld
IR_ld
4
Op_code
MUX
Program Memory
PC
Address
IR
Instr. code
op_address
Add
0
MUX
const.
Jump address
PC_MUX_Sel2
PC PC4 - increment the PC PCJump_Address
absolute jump PCPC Jump_Address relative jump
14MIPS architecture
- Decode and data preparation
Exec cmds.
DEC
op_code
Mem. cmds.
WB cmds.
Instruction register
reg. 0
MUX
A (data)
reg. 1
reg. 2
IR
op1_ad
reg. 31
op2_ad
MUX
B (data)
Register Block
address
I (Immediate value)
15MIPS architecture
Data out
16MIPS architecture
17MIPS architecture
Clk
Phase gen.
Clock gen.
Instr. dec
4
IR
PC
Instr. mem
Data Mem
Regs
Regs
ALU
0
18Pipeline execution
- What does it mean?
- Work as an assembly line
- idea General Motors around 1900
- How to do it?
- Specialized components (units) for every phase of
instruction execution - Memorize the partial results in temporary buffers
- What can we achieve?
- Higher execution speed at the same clock
frequency - CPI 1
19Sequential v.s. Pipeline execution
- Sequential execution CPI5
- Pipeline execution CPI1 (in the ideal case)
T1 T2 T3 T4 T5 T6
T7 T8 T9 T10
i1
IF ID Ex M Wb
IF ID Ex M Wb
i2
i3
IF ID Ex M Wb
i4
IF ID Ex M Wb
i5
IF ID Ex M Wb
20Superscalare and superpipeline architectures
- Superscalar
- Multiple pipelines
- 2 instructions are fetched every clock
- CPI ½
- Superpipeline
- phases require only half clock period
- CPI 1/2
T1 T2 T3 T4 T5 T6
instr. i IF ID Ex M Wb
instr. i1 IF ID Ex M Wb
instr. i2 IF ID Ex M
Wb
instr. i3 IF ID Ex M
Wb
T1 T2 T3 T4 T5 T6
instr. i IF ID Ex M Wb
instr. i1 IF ID Ex M Wb
instr. i2 IF ID Ex M
Wb
instr. i3 IF ID Ex M
Wb
21Pipelined MIPS architecture
22Pipeline architecture
- There is no free meal!
- Hazard cases
- Data hazard
- Data dependency between consecutive instructions
- Control hazard
- Jump/branch instructions change the normal
(sequential) order of instruction execution - Structural hazard
- Instructions in different phases use the same
structural component (e.g. ALU, registers,
memory, bus, etc.) - Result reduce the speed and the efficiency of
the pipeline architecture
23Hazard cases in pipeline architectures
- Data hazard
- Data hazard types
- RAW - read after write
- Occurs very often avoided through forwarding
(see Common data bus) - WAR write after read
- It is rare in classic pipeline more often in
superscalar pipelines - WAW write after write
- RAR not a hazard
24Hazard cases in pipeline architectures
- Data hazard (cont.)
- Solutions
- Detection and Stall phases
- instruction with unsolved data dependency waits
in the instruction fetch stage until the data
is available - the next instructions are also stalled
- Register renaming
- multiple copies of a register (see alias
registers for Pentium Pro) - instructions with no logical dependency between
them can get different copies of the same
register - avoid artificial data dependency caused by the
limited number of internal registers - Forwarding (see Common data bus)
- transfer a result in advance before it is written
in the final place (register or memory location) - Out-of-order execution
- speculative execution (see Pentium Pro
architecture)
25Hazard cases in pipeline architectures
- Structural hazard
- Solutions
- Detection and Stall phases
- Redundant functional units see Pentium
processors - Harvard memory organization separate code and
data memory see microcontrollers - Multiple buses see DSPs
- Out-of-order execution
26Hazard cases in pipeline architectures
- Control hazard
- Solutions
- Stall phases
- Branch prediction
- Out-of-order execution
27Pipeline architecture hazard cases
- Solving hazard cases
- Detect hazard cases and introduce stall phases
- Rearrange instructions
- re-arrange instructions in order to reduce the
dependences between consecutive instructions - Methods
- Static scheduling made before program execution
optimization made by the compiler or user - Dynamic scheduling made during program
execution optimization made by the processor
out-of-order execution - Branch prediction techniques
28Static v.s. dynamic scheduling
- Static scheduling
- The optimal order of instructions is established
by the compiler, based on information about the
structure of the pipeline - Advantages it is made once and benefit every
time the code is executed - Drawback compiler should know about the
structure of the hardware (e.g. pipeline stages,
phases of every instruction) compiler must be
changed when the processor version changes - Dynamic scheduling
- The hardware has the capacity to reorder
instruction to avoid or reduce the effect of
hazard cases - Advantage the processor knows best its
structure optimization can be better connected
to the hardware some dependences are reviled on
at run-time - Drawbacks reordering decisions are made every
time the code is executed mode complex hardware
is needed