Title: CSE 8383 - Advanced Computer Architecture
1CSE 8383 - Advanced Computer Architecture
- Week-3
- Week of Jan 26, 2004
- engr.smu.edu/rewini/8383
2Contents
- Linear Pipelines
- Nonlinear pipelines
- Instruction Pipelines
- Arithmetic Operations
- Design of Multifunction Pipeline
3Linear Pipeline
- Processing Stages are linearly connected
- Perform fixed function
- Synchronous Pipeline
- Clocked latches between Stage i and Stage i1
- Equal delays in all stages
- Asynchronous Pipeline (Handshaking)
4Latches
S1
S2
S3
L1
L2
Slowest stage determines delay
Equal delays ? clock period
5Reservation Table
Time
X
X
X
X
S1
S2
S3
S4
65 tasks on 4 stages
Time
X X X X X
X X X X X
X X X X X
X X X X X
S1
S2
S3
S4
7Non Linear Pipelines
- Variable functions
- Feed-Forward
- Feedback
83 stages 2 functions
Y
X
S1
S2
S3
9Reservation Tables for X Y
X X X
X X
X X X
S1
S2
S3
Y Y
Y
Y Y Y
S1
S2
S3
10Linear Instruction Pipelines
- Assume the following instruction execution
phases - Fetch (F)
- Decode (D)
- Operand Fetch (O)
- Execute (E)
- Write results (W)
11Pipeline Instruction Execution
I1 I2 I3
I1 I2 I3
I1 I2 I3
I1 I2 I3
I1 I2 I3
F
D
O
E
W
12Dependencies
- Data Dependency
- (Operand is not ready yet)
- Instruction Dependency
- (Branching)
- Will that Cause a Problem?
13Data Dependency
- I1 -- Add R1, R2, R3
- I2 -- Sub R4, R1, R5
1
2
3
4
5
6
I1 I2
I1 I2
I1 I2
I1 I2
I1 I2
F
D
O
E
W
14Solutions
- STALL
- Forwarding
- Write and Read in one cycle
- .
15Instruction Dependency
1
2
3
4
5
6
I1 I2
I1 I2
I1 I2
I1 I2
I1 I2
F
D
O
E
W
16Solutions
- STALL
- Predict Branch taken
- Predict Branch not taken
- .
17Floating Point Multiplication
- Inputs (Mantissa1, Exponenet1), (Mantissa2,
Exponent2) - Add the two exponents ? Exponent-out
- Multiple the 2 mantissas
- Normalize mantissa and adjust exponent
- Round the product mantissa to a single length
mantissa. You may adjust the exponent
18Linear Pipeline for floating-point multiplication
Round
Normalize
Add Exponents
Multiply Mantissa
Round
Normalize
Accumulator
Partial Products
Add Exponents
Re normalize
19Linear Pipeline for floating-point Addition
Partial Shift
Find Leading 1
Add Mantissa
Partial Shift
Subtract Exponents
Round
Re normalize
20Combined Adder and Multiplier
B
Partial Products
C
G
H
F
A
Partial Shift
Find Leading 1
Add Mantissa
Partial Shift
Exponents Subtract / ADD
Round
Re normalize
E
D
21Reservation Table for Multiply
1 2 3 4 5 6 7
A X
B X X
C X X
D X X
E X
F
G
H
22Reservation Table for Addition
1 2 3 4 5 6 7 8 9
A Y
B
C Y
D Y
E Y
F Y Y
G Y
H Y Y
23Nonlinear Pipeline Design
- Latency
- The number of clock cycles between two
initiations of a pipeline - Collision
- Resource Conflict
- Forbidden Latencies
- Latencies that cause collisions
24Nonlinear Pipeline Design cont
- Latency Sequence
- A sequence of permissible latencies between
successive task initiations - Latency Cycle
- A sequence that repeats the same subsequence
- Collision vector
- C (Cm, Cm-1, , C2, C1), m lt n-1
- n number of column in reservation table
- Ci 1 if latency i causes collision, 0 otherwise
25Mul Mul Collision (lunch after 1 cycle)
1 2 3 4 5 6 7
A X Z
B X X Z Z
C X X Z Z
D X Z X
E X Z
F
G
H
26Mul Mul Collision (lunch after 2 cycles)
1 2 3 4 5 6 7
A X Z
B X X Z Z
C X X Z Z
D X X Z
E X
F
G
H
27Mul Mul Collision (lunch after 3 cycles)
1 2 3 4 5 6 7
A X Z
B X X Z Z
C X X Z Z
D X X
E X
F
G
H
28Collision Vector for Multiply after Multiply
- Forbidden Latencies 1, 2
- Collision vector
- 0 0 0 0 1 1 ? 11
- Maximum forbidden latency 2 ? m 2
29Example
Y
X
S1
S2
S3
30Reservation Tables for X Y
X X X
X X
X X X
S1
S2
S3
Y Y
Y
Y Y Y
S1
S2
S3
31Reservation Tables for X Y
X X X
X X
X X X
S1
S2
S3
Y Y
Y
Y Y Y
S1
S2
S3
32Forbidden Latencies
- X after X
- X after Y
- Y after X
- Y after Y
33X after X
2
X1 X2 X1 X2 X1
X1 X2 X1 X2
X1 X2 X1 X2 X1
S1
S2
S3
5
X1 X2 X1 X1
X1 X1 X2
X1 X1 X1 X2
S1
S2
S3
34X after X
4
X1 X2 X1 X1
X1 X1 X2 X2
X1 X1 X2 X1
S1
S2
S3
7
X1 X1 X2 X1
X1 X1
X1 X1 X1
S1
S2
S3
35Collision Vector
- Forbidden Latencies 2, 4, 5, 7
- Collision Vector
- 1 0 1 1 0 1 0
36Y after Y
Y Y Y
Y Y
Y Y Y Y Y
S1
S2
S3
Y Y Y
Y
Y Y Y Y
S1
S2
S3
37Collision Vector
- Forbidden Latencies 2, 4
- Collision Vector
- 1 0 1 0
38Exercise Find the collision vector
1 2 3 4 5 6 7
A X X X
B X X
C X X
D X
39State Diagram for X
8
1 0 1 1 0 1 0
8
3
8
6
1
1 0 1 1 0 1 1
1 1 1 1 1 1 1
3
6
40Cycles
- Simple cycles ? each state appears only once
- (3), (6), (8), (1, 8), (3, 8), and (6,8)
- Greedy Cycles ? simple cycles whose edges are all
made with minimum latencies from their respective
starting states - (1,8), (3) ? one of them is MAL