Title: CoE EE 00142 Computer Organization Set 8 Pipelining and CISCRISC Machines
1CoE - EE 00142Computer OrganizationSet 8 -
Pipelining and CISC/RISC Machines
2Instruction Pipelines
- Normally only one control unit
- Instructions are executed in serial
- Consider IF - DOF - EX - WB
- where IF - Instruction Fetch
- DOF - Decode and Operand Fetch
- EX - Execute
- WB - Write Back Results
3Pipelined Computer
- Consider multiple control units
- Same instruction memory
- Same data memory
- Concept is
- as you execute one instruction
- you fetch an additional instruction
- and you write back the results of a third
4Normal Computer Operation Analogy
Graphics from Logic and Computer Design
Fundamentals, Mano Kime, Prentice Hall
5Normal Computer
IF - Instruction fetch DOF - Decode and operand
fetch EX - Execution WB - Write back
6Laundry Analogy
- Place one load of clothes in the washer
- When finished move clothes to dryer
- When dryer finished fold clothes
- When finished folding put away
7Laundry Analogy
Let each take 30 min
Fig 6.1
8Pipeline Execution
IF - Instruction fetch DOF - Decode and operand
fetch EX - Execution WB - Write back
Graphics from Logic and Computer Design
Fundamentals, Mano Kime, Prentice Hall
9Single-Cycle vs. Pipeline
Consider using instructions lw, sw,
add-sub-and-or-slt, and beq
Then the single cycle clock cycle must be long
enough to accommodate the longest instruction
8ns
10Comparison
Fig 6.3
11Example
Consider the simple program below where LDI R1, 1
is load register 1 with the number 1, etc.
- Typically takes 7 x 4 or about 28 cycles to
complete
- LDI R1, 1
- LDI R2, 2
- LDI R3, 3
- LDI R4, 4
- LDI R5, 5
- LDI R6, 6
- LDI R7, 7
Note - These instructions are independent of
each other!!!
12Pipeline Execution
Theoretically 2.8 times faster
Graphics from Logic and Computer Design
Fundamentals, Mano Kime, Prentice Hall
13Complex Example
Consider the more complex program where ADD R1,
R0, R2 is add register 2 to register 0 and put
the result in register 1
- 1 ADD R1, R0, R1
- 2 ADD R3, R2, R3
- 3 ADD R5, R4, R5
- 4 ADD R7, R6, R7
- 5 ADD R3, R1, R3
- 6 NOP
- 7 ADD R7, R5, R7
- 8 NOP
- 9 NOP
- 10 ADD R7, R3, R7
- Note the NOP instructions which mean - no
operation
Note - These instructions do depend on each
other!!!
14Complex Example Program
- 1 ADD R1, R0, R1
- 2 ADD R3, R2, R3
- 3 ADD R5, R4, R5
- 4 ADD R7, R6, R7
- 5 ADD R3, R1, R3
- 6 NOP
- 7 ADD R7, R5, R7
- 8 NOP
- 9 NOP
- 10 ADD R7, R3, R7
IF - Instruction fetch DOF - Decode and operand
fetch EX - Execution WB - Write back
Graphics from Logic and Computer Design
Fundamentals, Mano Kime, Prentice Hall
15Problems with Pipelines
- All units must take exactly the same amount of
time - If instruction 1 is a JUMP
- Then instruction 2 is wrong
- Data instructions sometimes wait
16Pipeline Issues - Hazards
- Sometimes the pipeline must stop STALL
- Sometimes the pipeline must be FLUSHED
- Generally it works, and it is FAST
17Structural Hazards
- Hardware cannot support the combinations of
instructions that we want to execute in the same
clock cycle - Examples
- Washer-dryer combination
- Fetching data from the same memory as
instructions at the same time
18Control Hazards
- Execution of one instruction requires decision
based on previous instruction - Examples
- We need to see the first wash-dry results before
we do any further wash - We need the results of an arithmetic operation
before we can make the decision
19Solutions for Hazards
- Stall Wait until everything is known and you
can move on - Predict Guess the outcome and move on however
if incorrect you must do over
20Data Hazard
- The result of one instruction is used as the
input to a second instruction
21Forwarding or Bypassing
Use data when available as opposed to waiting
until the total instruction is completed
22Single-cycle Datapath
Fig 6.10
23Pipeline Diagrams
24Example Code
sub 2, 1, 3 Reg 2 written and 12, 2,
5 Depends on correct 2 or 13, 6, 2
Depends on correct 2 add 14, 2, 2 Depends
on correct 2 sw 15, 100 (2) Base address 2
25Pipelined Dependencies
Fig 6.36
26Use of nops
sub 2, 1, 3 Reg 2 written nop nop and
12, 2, 5 Depends on correct 2 or 13,
6, 2 Depends on correct 2 add 14, 2,
2 Depends on correct 2 sw 15, 100 (2)
Base address 2
27Control or Branch Hazards
- Assume branch NOT taken
- Continue sequential execution
- Discard instructions if branch is taken
- Dynamic branch prediction
- More general form of above
- Much more complex hardware
- Branch prediction buffer
- Branch history table
28Pipelined Branch Instructions
Fig 6.50
29Modern Machines
- Both Pentium Pro and Power PC have dynamically
scheduled pipelines - Both use branch prediction
- Contain rename buffers or registers to hold
temporary results - Significant real-estate is used for this purpose
30Pentium Pro Silicon Area
Contains about 5.5 million transistors
Fig 1.18
31RISC vs. CISC Machines
- Two philosophies for computer designs
- RISC - Reduced Instruction Set Computer
- CISC - Complex Instruction Set Computer
32CISC Machines
- Large number of different instructions
- Typically complex instructions
- Microprogrammed control units
- Complex addressing schemes
- Many instructions seldom used
- Complex decode
33CISC Philosophy
Hardware is always faster than software,
therefore make a powerful instruction set, with
lots of addressing modes, allowing assembly
language programs that can do a lot, with short
programs.
34CISC Example
- Consider the VAX instruction
- ACBF - add- compare - and - branch on two IEEE
floating point numbers - this is one instruction
(opcode 4F) - The VAX also has 24 addressing modes
35Instruction Formats
Simple Format
Complex Format
36CISC Machines
- CISCs also typically have very complex
microprogrammed control - Typically each instruction is slow
- even simple cases like LDA 0
- But use fewer assembly language instructions
37RISC Machines
- Small number of different instructions
- Typically simple instructions
- Usually register to register type instructions
- Instruction format normally fixed length - one
word - Simple decode
38RISC Philosophy
Almost no one uses complex assembly language
instructions, and people mostly use compilers
which never use complex instructions
39RISC Machines
- Therefore
- Design with commonly used instructions
- Use few addressing modes
- Simple instructions
- Same length
40RISC Machines
- Typically hardwired control - faster
- Simple decode - faster
- Typically 1 instruction per cycle
- More registers
- Easier to write compilers
- Takes more assembly instructions to accomplish
task
41RISC Machines
- Most all new instruction sets since 1982
- examples
- MIPS, Sun SPARC, HP, Power PC, DEC Alpha
42End of Set