Intro to the - PowerPoint PPT Presentation

About This Presentation
Title:

Intro to the

Description:

TMSC6700 subseries include floating point. VLIW = Very Long Instruction Word ... L units: arithmetic, compare, and logical ops ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 21
Provided by: www2C
Learn more at: http://www.cs.cmu.edu
Category:
Tags: intro | subseries

less

Transcript and Presenter's Notes

Title: Intro to the


1
Intro to the c6x VLIW processor
  • Texas Instruments TMSC6000 series
  • TMSC6700 subseries include floating point
  • VLIW Very Long Instruction Word

2
Operations in Parallel
registers
Function units
3
Operations in Parallel
registers

bypassing
Function units
4
Non-orthogonal
registers
registers


Bypass
Function units
5
Non-orthogonal
B
A
registers
registers


Bypass
Function units
L2
S2
M2
D2
L1
S1
M1
D1
See TI's picture
6
Specialized Function Units
  • L units arithmetic, compare, and logical ops
  • S units arithmetic, logical, branches,
    constant generation
  • M units multiplies
  • D units address generation / memory accesses

7
Complicated hardware
registers
registers


8
Explicit parallelism
registers
registers


9
Simple VLIW encoding
  • Slots that cannot be utilized are filled with
    no-ops
  • Bad for code density, cache utilization,
    energy, ...

10
C6X Packets
  • One bit of each instruction indicates whether
    next instruction can be executed in parallel
    (0 EOP)
  • Any slot can go to any function unit

0
1
0
1
1
1
1
1
11
C6X Packets
  • One bit of each instruction indicates whether
    next instruction can be executed in parallel
  • Any slot can go to any function unit

0
1
0
1
1
1
1
1
12
C6X Packets
  • One bit of each instruction indicates whether
    next instruction can be executed in parallel
  • Any slot can go to any function unit

0
1
0
1
1
1
1
1
1
1
1
1
1
1
0
0
  • Packet cannot cross an 8-word boundary
  • Resources constrain which instructions can be
    combined in the same packet
  • You can branch into the middle of a packet!

13
Explicit scheduling
Delay slots must be respected no HW interlocks
or scoreboarding Multiply 1 delay slot Load
4 delay slots Branch 5 delay slots
B5 B3 B2
B5 B3 B2
B7 B5 B1
B7 B5 B1
Right
Wrong
14
Predicated execution
Why? To get rid of branches (5 delay slots 8
wide ....) Basic idea a comparison result is
stored to a condition register this
register is then used as an operand of other
instructions, and its value causes those
operations to be selectively enabled or
squashed. Condition registers A1, A2, B0, B1,
B2
Example If (B3ltB4) B3 else B4
15
Predicated execution
With branches
With predicates
cmp B3, B4 bge L2 ltnopgt B3
B31 b DONE ltnopgt L2 B4 B41 DONE
cmplt B3, B4 B0 B0 B3
B31 !B0 B4 B41 ...and the last
two canbe issued in parallel! Control
dependencyhas been converted to data
dependency...
16
Assembly details
.text .align 32 .global
proc proc mvk 4, b3
mvk 5, b4 cmpgt
b3, b4, b0 b0 mvk.S2 9, b5
!b0 mvk.S1 8, a5 stw
a5, -a154 .....
17
Fetch/execute pipeline
PG generate program address PS program
address send PW program memory access PR fetch
reaches CPU boundary DP instruction dispatch DC
instruction decode E1 execute 1 E2 execute
2 E3 execute 3 E4 execute 4 E5 execute 5
18
Addressing Modes
C equivalent
R (R) Rucst5
(Rucst5) -Rucst5
(R-ucst5) RoffsetR
(RoffsetR) -RoffsetR
(R-offsetR) Special case 15b
offsets B15ucst15 B14ucst15
19
Addressing Modes
Pre/post increment/decrement R ,
R Rucst5, Rucst5 --Rucst5,
R--ucst5 RoffsetR,
RoffsetR --RoffsetR, R--offsetR
20
Resources
http//www.cs.cmu.edu/tcal/15745/
Write a Comment
User Comments (0)
About PowerShow.com