RISC Pipelining presentation

About This Presentation

Transcript and Presenter's Notes

Title: RISC Pipelining

1
RISC Pipelining CS 147 Spring 2011 Kui Cheung
2
RISC Pipelining

Classic five stage instruction
Fetch fetch instruction from memory
Decode determine what action is required
Execute execute instruction
Memory data cache access
Writeback write result to register

3
Arm9
If we use the basketball team analogy, we can
assign the following positions to the different
stages.
1)Coach give a play to the point guard. 2)Point
guard pass the ball to the right person to
execute the play. 3)SF or PF continue setting up
the play by doing some fancy moves and then
pass the ball to the center. 4)Center continue
setup and pass the ball to SG for a clean
shot. 5)SG takes the shot.
Power Forward
Shooting Guard
Coach
Point Guard
Small Forward
Center
Nintendo DS 5 Stage Pipeline
4
Arm9
1)Fetch instruction from instruction register(IR)
4)Access cache if needed
2)Determine what action to take
3)Execute the instruction
5)Write result in register
Example MOV Reg1, Mem1
1)fetch instruction(MOV Reg1, Mem1) 2)decided it
is a move instruction from memory to
register 3)fetch address of memory to be
move 4)fetch data from memory 5)write data to Reg1
Nintendo DS 5 Stage Pipeline
5
RISC Pipelining
Instruction 1 2 3 4 5 6 7 8 9
1 FI DI EX MEM WB
2 FI DI EX MEM WB
3 FI DI EX MEM WB
4 FI DI EX MEM WB
5 FI DI EX MEM WB

FI - fetch instruction
DI - decode instruction
EX - execute instruction
MEM data cache access
WB - write back

6
Pipeline Delay
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1, Mem1 MOV Reg1, Reg2 MOV Mem2, Reg1
(a) No data load delay in the pipeline

1) move data from Mem1 to Reg1
2) move data from Reg2 to Reg1
3) move data from Reg1 to Mem2

7
Pipeline Delay
Write data from Mem1 into Reg1
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1,Mem1 MOV Reg2,(Reg1)
Must wait for data to be loaded into Reg1
(b)Data dependency delay
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1,Mem1 MOV Reg2,(Reg1)
Stall(bubble)
1) move data from Mem1 to Reg1 2) move data from
Reg1 to Reg2
8
Pipeline Delay
Add a NOP(no operation perform) to fill the gap
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1,Mem1 NOP MOV Reg2,(Reg1)
1) move data from Mem1 to Reg1 2) no operation
perform 3) move data from Reg1 to Reg2
9
(c)Control dependency delay
At this point Reg3 equal Reg2 Reg1, and line
103 can compare Reg3 to Reg4 and decide jumping
to 106 or not
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB

FI DI EX
101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3
,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1,
Reg2 106 MOV Mem1, Reg4
Data dependency delay
jump
Reg3 Reg4, jump to 106
Waiting for 103 to decide going to 104 or jumping
to 106
101 add Reg2 to Reg1 and put in Reg3 102 no
operation perform 103 if Reg3 Reg4, jump to 106
else 104
104 move Reg3 to Mem1 105 add Reg2 to Reg1 and
put in Reg4 106 move Reg4 to Mem1
10
(c)Control dependency delay
At this point Reg3 equal Reg2 Reg1, and line
103 can compare Reg3 to Reg4 and decide jumping
to 106 or not
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB

FI DI EX MEM WB
101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3
,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1,
Reg2 106 MOV Mem1, Reg4
Data dependency delay
Reg3 Reg4, jump to 106, no time wasted
Guess branch will happen
101 add Reg2 to Reg1 and put in Reg3 102 no
operation perform 103 if Reg3 Reg4, jump to 106
else 104
104 move Reg3 to Mem1 105 add Reg2 to Reg1 and
put in Reg4 106 move Reg4 to Mem1
11
(c)Control dependency delay
At this point Reg3 equal Reg2 Reg1, and line
103 can compare Reg3 to Reg4 and decide jumping
to 106 or not
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX
FI DI
FI DI FI
FI
101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3
,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1,
Reg2 106 MOV Mem1, Reg4 107 MOV Reg2, Mem2
Data dependency delay
Reg3 not Reg4, clear and fetch 104 next
Guess wrong can lead to wasted time
12
Pure RISC Pipeline

Simple primitive instructions and addressing
modes
Instructions execute in one clock cycle
Uniformed length instructions and fixed
instruction format
Instructions interface with memory via fixed
mechanisms (load/store)
Pipelining
Instruction set is orthogonal (little overlapping
of instruction functionality)
Hardwired control
Complexity pushed to the compiler

13
Pure RISC Pipeline

Register to register cycle
1) F instruction fetch from register
2) E execute , perform ALU operations
with register input and output
Load and Store cycle
1) F instruction fetch from register
2) E execute, calculates memory address
3) W memory, register to memory, memory to
register operations

14
Pure RISC Pipeline
a) Traditional pipeline
Instruction 1 2 3 4 5 6 7
1 F E W
2 F E
3 F E
4 F
5 F E W
100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP
105 103 ADD Reg1, Reg2 105 MOVE Mem2, Reg1
100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump
to 105 103 add Reg1 to Reg2 105 move Reg1 to Mem2
Jump execute and 103 is cleared from the
pipeline, 105 is fetch
F fetch E execute W write back
15
Pure RISC Pipeline
a) RISC Pipeline with inserted NOP
Instruction 1 2 3 4 5 6 7
1 F E W
2 F E
3 F E
4 F E
5 F E W
100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP
105 103 NOP 105 MOVE Mem2, Reg1
100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump
to 105 103 no operation 105 move Reg1 to Mem2
A NOP is added so no special circuitry is needed
to clear the pipeline
F fetch E execute W write back
16
Pure RISC Pipeline
a) Reversed instructions
Instruction 1 2 3 4 5 6 7
1 F E W
2 F E
3 F E
4 F E W
100 MOVE Reg1, Mem1 101 JUMP 105 102 ADD 1,
Reg1 105 MOVE Mem2, Reg1
Delayed branch When a branch occur, delay the
execution and fetch the next instruction
first. ex) fetch 102 before executing JUMP to
105, this way 102 can execute at the same
time 105 is fetch
100 move Mem1 to Reg1 101 Jump to 105 102 add
Reg1 to Reg2 105 move Reg1 to Mem2
F fetch E execute W write back
17
Superpipeline
A B C D E F G H I J K L
A B C D E F G H I J K
A B C D E F G H I J
A B C D E F G H I
A B C D E F G H
A B C D E F G
A B C D E F
A B C D E
A B C D
A B C
A B
A
A B C D E F G H
Branch executed and pipeline is clear
In theory, more and shorter stages could allow
more instructions to be process at the same
time. But a branch could lead to wasted cycles.
18
Arm11 Pipeline
Fetch Instruction
Decode
Execute
Memory
Writeback
Arm11(IPhone 3G) 8 Stage pipeline
19
RISC Pipelining
Dynamic Branch Prediction 95 accuracy
Decode(5 stages)
Fetch Instruction(2 stages)
Execute, Memory, Writeback(6 stages)
Arm Cortex A8(IPhone3GS, Samsung Galaxy S) 13
Stage pipeline
20
I7(Nehalem)Superpipeline
Fetch
Decode
14 Stages
Execute
Memory, Writeback
21
Reference

http//www.jp.arm.com/event/pdf/forum2008/t1-1.pdf
http//www-cs-faculty.stanford.edu/eroberts/cours
es/soco/projects/2000-01/risc/pipelining/index.htm
l
http//www.bit-tech.net/hardware/cpus/2008/11/03/i
ntel-core-i7-nehalem-architecture-dive/5
http//qu.academia.edu/AwsYousif/Papers/120709/A_N
ew_Trend_for_CISC_and_RISC_Architectures
Course text book Computer Organization and
Architecture, 7th editions, William Stallings

Write a Comment

User Comments (0)

About PowerShow.com

RISC Pipelining PowerPoint PPT Presentation