Title: Intro to the
1Intro to the c6x VLIW processor
- Texas Instruments TMSC6000 series
- TMSC6700 subseries include floating point
- VLIW Very Long Instruction Word
2Operations in Parallel
registers
Function units
3Operations in Parallel
registers
bypassing
Function units
4Non-orthogonal
registers
registers
Bypass
Function units
5Non-orthogonal
B
A
registers
registers
Bypass
Function units
L2
S2
M2
D2
L1
S1
M1
D1
See TI's picture
6Specialized Function Units
- L units arithmetic, compare, and logical ops
- S units arithmetic, logical, branches,
constant generation - M units multiplies
- D units address generation / memory accesses
7Complicated hardware
registers
registers
8Explicit parallelism
registers
registers
9Simple VLIW encoding
- Slots that cannot be utilized are filled with
no-ops - Bad for code density, cache utilization,
energy, ...
10C6X Packets
- One bit of each instruction indicates whether
next instruction can be executed in parallel
(0 EOP) - Any slot can go to any function unit
0
1
0
1
1
1
1
1
11C6X Packets
- One bit of each instruction indicates whether
next instruction can be executed in parallel - Any slot can go to any function unit
0
1
0
1
1
1
1
1
12C6X Packets
- One bit of each instruction indicates whether
next instruction can be executed in parallel - Any slot can go to any function unit
0
1
0
1
1
1
1
1
1
1
1
1
1
1
0
0
- Packet cannot cross an 8-word boundary
- Resources constrain which instructions can be
combined in the same packet - You can branch into the middle of a packet!
13Explicit scheduling
Delay slots must be respected no HW interlocks
or scoreboarding Multiply 1 delay slot Load
4 delay slots Branch 5 delay slots
B5 B3 B2
B5 B3 B2
B7 B5 B1
B7 B5 B1
Right
Wrong
14Predicated execution
Why? To get rid of branches (5 delay slots 8
wide ....) Basic idea a comparison result is
stored to a condition register this
register is then used as an operand of other
instructions, and its value causes those
operations to be selectively enabled or
squashed. Condition registers A1, A2, B0, B1,
B2
Example If (B3ltB4) B3 else B4
15Predicated execution
With branches
With predicates
cmp B3, B4 bge L2 ltnopgt B3
B31 b DONE ltnopgt L2 B4 B41 DONE
cmplt B3, B4 B0 B0 B3
B31 !B0 B4 B41 ...and the last
two canbe issued in parallel! Control
dependencyhas been converted to data
dependency...
16Assembly details
.text .align 32 .global
proc proc mvk 4, b3
mvk 5, b4 cmpgt
b3, b4, b0 b0 mvk.S2 9, b5
!b0 mvk.S1 8, a5 stw
a5, -a154 .....
17Fetch/execute pipeline
PG generate program address PS program
address send PW program memory access PR fetch
reaches CPU boundary DP instruction dispatch DC
instruction decode E1 execute 1 E2 execute
2 E3 execute 3 E4 execute 4 E5 execute 5
18Addressing Modes
C equivalent
R (R) Rucst5
(Rucst5) -Rucst5
(R-ucst5) RoffsetR
(RoffsetR) -RoffsetR
(R-offsetR) Special case 15b
offsets B15ucst15 B14ucst15
19Addressing Modes
Pre/post increment/decrement R ,
R Rucst5, Rucst5 --Rucst5,
R--ucst5 RoffsetR,
RoffsetR --RoffsetR, R--offsetR
20Resources
http//www.cs.cmu.edu/tcal/15745/