CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13 - PowerPoint PPT Presentation

About This Presentation
Title:

CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13

Description:

Group control lines by pipeline stage needed. Extend pipeline registers with control bits ... of instructions (single person to fold and put clothes away) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 26
Provided by: davep173
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: CS 161Computer Architecture Introduction to Advanced Architecturs Lecture 13


1
CS 161Computer Architecture Introduction to
Advanced ArchitectursLecture 13
  • Instructor L.N. Bhuyan
  • www.cs.ucr.edu/bhuyan
  • Adapted from notes by Dave Patterson(http.cs.berk
    eley.edu/patterson)

2
Stages of Execution in Pipelined MIPS
  • 5 stage instruction pipeline
  • 1) I-fetch Fetch Instruction, Increment PC
  • 2) Decode Instruction, Read Registers
  • 3) Execute Mem-reference Calculate
    Address R-format Perform ALU Operation
  • 4) Memory Load Read Data from Data Memory
    Store Write Data to Data Memory
  • 5) Write Back Write Data to Register

3
Pipelined Execution Representation
Time
IFtch
Dcd
Exec
Mem
WB
IFtch
Dcd
Exec
Mem
WB
IFtch
Dcd
Exec
Mem
WB
IFtch
Dcd
Exec
Mem
WB
Program Flow
  • To simplify pipeline, every instruction takes
    same number of steps, called stages
  • One clock cycle per stage

4
Review Single-cycle Datapath for MIPS
Stage 5
Instruction Memory (Imem)
Data Memory (Dmem)
  • Use datapath figure to represent pipeline

5
Graphical Pipeline Representation
Time (clock cycles)
I n s t r. O r d e r
Reg
DM
Reg
Load
IM
Reg
DM
Reg
Add
Reg
DM
Reg
Store
IM
Reg
DM
Reg
Sub
Reg
DM
Reg
Or
(right half highlighted means read, left half
write)
6
Required Changes to Datapath
  • Introduce registers to separate 5 stages by
    putting IF/ID, ID/EX, EX/MEM, and MEM/WB
    registers in the datapath.
  • Next PC value is computed in the 3rd step, but we
    need to bring in next instn in the next cycle
    Move PCSrc Mux to 1st stage
  • Branch address is computed in 3rd stage. With
    pipeline, the PC value has changed! Must carry
    the PC value along with instn. Width of IF/ID
    register (IR)(PC) 64 bits.
  • For lw instn, we need write register address at
    stage 5. But the IR is now occupied by another
    instn! So, we must carry the IR destination field
    as we move along the stages. See connection in
    fig. Length od ID/EX register
    (Reg1)(Reg2)(offset)(PC) destn 133 bits

7
Pipelined Datapath (with Pipeline Regs)(6.2)
Fetch Decode
Execute Memory
Write Back
0
M
u
x
1
IF/ID
EX/MEM
ID/EX
MEM/WB
A
d
d
A
d
d
4
A
d
d
r
e
s
u
l
t
S
h
i
f
t
l
e
f
t

2
R
e
a
d
n
o
r
e
g
i
s
t
e
r

1
i
A
d
d
r
e
s
s
P
C
t
R
e
a
d
c
u
d
a
t
a

1
r
t
R
e
a
d
s
Z
e
r
o
n
r
e
g
i
s
t
e
r

2
I
A
L
U
R
e
a
d
A
L
U
0
R
e
a
d
W
r
i
t
e
A
d
d
r
e
s
s
1
d
a
t
a

2
r
e
s
u
l
t
d
a
t
a
r
e
g
i
s
t
e
r
M
M
Imem
u
Regs
u
W
r
i
t
e
x
x
d
a
t
a
1
0
W
r
i
t
e
Dmem
d
a
t
a
3
2
1
6
S
i
g
n
e
x
t
e
n
d
5
69 bits
64 bits
133 bits
102 bits
8
Pipelined Control (6.3)
  • Start with single-cycle controller
  • Group control lines by pipeline stage needed
  • Extend pipeline registers with control bits

W
B
I
n
s
t
r
u
c
t
i
o
n
Mem
W
B
C
o
n
t
r
o
l
E
X
W
B
Mem
MemToRegRegWrite
Branch MemReadMemWrite
I
F
/
I
D
I
D
/
E
X
E
X
/
M
E
M
M
E
M
/
W
B
9
Problems for Pipelining
  • Hazards prevent next instruction from executing
    during its designated clock cycle, limiting
    speedup
  • Structural hazards HW cannot support this
    combination of instructions (single person to
    fold and put clothes away)
  • Control hazards conditional branches other
    instructions may stall the pipeline delaying
    later instructions (must check detergent level
    before washing next load)
  • Data hazards Instruction depends on result of
    prior instruction still in the pipeline (matching
    socks in later load)

10
MIPS R4000 pipeline
11
Advanced Architectural Concepts
  • Can we achieve CPI lt 1? (i.e., can we have IPC gt
    1?) State-of-the-Art Microprocessor
  • Superscalar execution or Instruction Level
    Parallelism (ILP)
  • Deeper Pipeline gt Dynamic Branch Prediction gt
    Speculation gt Recovery
  • Out-of-order Execution gt Instruction Window
    and Prefetch gt Reorder Buffers
  • VLIW Ex Intel/HP Titanium

12
Instruction Level Parallelism (ILP) IPC gt 1
Time
IFtch
Dcd
Exec
Mem
WB
Dcd
WB
IFetch
Exec
Mem
Mem
WB
Exec
IFtch
Dcd
WB
Exec
Dcd
Mem
IFtch
Exec
WB
Dcd
IFtch
Mem
Program Flow ILP 2
EX Pentium, SPARC, MIPS 10000, IBM Power PC
13
HW Schemes Instruction Parallelism
  • Key idea Allow instructions behind stall to
    proceed
  • DIVD F0,F2,F4
  • ADDD F10,F0,F8
  • SUBD F12,F8,F14
  • Enables out-of-order execution gt out-of-order
    completion
  • ID stage checks for hazards. If no hazards, issue
    the instn for execution. Scoreboard dates to CDC
    6600 in 1963

14
How ILP Works
  • Issuing multiple instructions per cycle would
    require fetching multiple instructions from
    memory per cycle gt called Superscalar degree or
    Issue width
  • To find independent instructions, we must have a
    big pool of instructions to choose from, called
    instruction buffer (IB). As IB length increases,
    complexity of decoder (control) increases that
    increases the datapath cycle time
  • Prefetch instructions sequentially by an IFU that
    operates independently from datapath control.
    Fetch instruction (PC)L, where L is the IB size
    or as directed by the branch predictor.

15
Microarchitecture of an ILP-based CPU (Power PC)
16
(No Transcript)
17
Very Large Instruction Word (VLIW) IPC gt 1
Time
IFtch
Dcd
Exec
Mem
WB
Exec
Exec
WB
Exec
Dcd
Mem
IFtch
Exec
Program Flow EX Itanium
18
TriMedia TM32 Architecture
32-bit peripheral bus
64-bit memory bus
multi-port 128 words x 32 bits register file
bypass network
datacache16KB
FU
FU
FU
FU
FU
PC
instruction cache 32 KB
Compressed code in the Instruction Cache
19
What is Multiprocessing
  • Parallelism at the Instruction Level is limited
    because of data dependency gt Speed up is
    limited!!
  • Abundant availability of program level
    parallelism, like Do I 1000, Loop Level
    Parallelism. How about employing multiple
    processors to execute the loops gt Parallel
    processing or Multiprocessing
  • With billion transistors on a chip, we can put a
    few CPUs in one chip gt Chip multiprocessor

20
Hardware Multithreading
  • We need to develop a hardware multithreading
    technique because switching between threads in
    software is very time-consuming (Why?), so not
    suitable for main memory (instead of I/O) access,
    Ex Multitasking
  • Develop multiple PCs and register sets on the CPU
    so that thread switching can occur without having
    to store the register contents in main memory
    (stack, like it is done for context switching).
  • Several threads reside in the CPU simultaneously,
    and execution switches between the threads on
    main memory access.
  • How about both multiprocessors and multithreading
    on a chip? gt Network Processor

21
Hardware Multithreading
  • How can we guarantee no dependencies between
    instructions in a pipeline?
  • One way is to interleave execution of
    instructions from different program threads on
    same pipeline
  • Interleave 4 threads, T1-T4, on non-bypassed
    5-stage pipe
  • T1 LW r1, 0(r2)
  • T2 ADD r7, r1, r4
  • T3 XORI r5, r4, 12
  • T4 SW 0(r7), r5
  • T1 LW r5, 12(r1)

22
Architectural Comparisons (cont.)
Simultaneous Multithreading
Multiprocessing
Superscalar
Fine-Grained
Coarse-Grained
Time (processor cycle)
Thread 1
Thread 3
Thread 5
Thread 2
Thread 4
Idle slot
23
Intel IXP2400 Network Processor
  • XScale core replaces StrongARM
  • 1.4 GHz target in 0.13-micron
  • Nearest neighbor routes added between
    microengines
  • Hardware to accelerate CRC operations and Random
    number generation
  • 16 entry CAM

24
IBM Cell Processor
SPU Synergetic Processor Unit
25
Chip Multiprocessors
Write a Comment
User Comments (0)
About PowerShow.com