Title: Understanding the TigerSHARC ALU pipeline
1Understanding the TigerSHARC ALU pipeline
- Determining the speed of one stage of IIR filter
Part 1Getting code to work
2Understanding the TigerSHARC ALU pipeline
- TigerSHARC has many pipelines
- If these pipelines stall then the processor
speed goes down - Need to understand how the ALU pipeline works
- Learn to use the pipeline viewer
- May be different answer for floating point and
integer operations
3Register File and COMPUTE Units
4Simple ExampleIIR -- Biquad
S0 S1 S2
- For (Stages 0 to 3) Do
- S0 Xin H5 S2 H3 S1 H4
- Yout S0 H0 S1 H1 S2 H2
- S2 S1
- S1 S0
Not a great bit ofIIR code as It cantbe used
in a loop on an array of values as is
reallynecessary
5Set up the tests. Want to make sure correct
answer as code changes
include ltEmbeddedUnit/EmbeddedUnit.hgt include
ltEmbeddedUnit/CommonTests.hgt include
ltEmbeddedUnit/EmbeddedTests.hgt
6Step 1 Stub plus return value
- Build an assembly language stub forfloat
iirASM(void) - Make it return a floating point value of 40.5 to
show that we can return a value of 40.5
J8 is an INTEGERso how can we return
40.5?ANSWER WE DONTWe return the bit
pattern for 40.5, which is the same as an
INTEGER bit pattern
7Code does not work when passing back floats with
J8 register
We are passing back40.5 in normal return
register, but that isobviously NOT whatthe C
compiler wasexpecting Wrong code convention
8Code does work when using XR8 register NOTE
NOT XFR8
9Step 2 Using C code as comments -- set up the
coefficients
XFR0 0.0 DOES NOT EXISTas a float
instruction XR0 0.0 DOES EXIST Bit-patterns
requireinteger Xregisters Leave what
youwanted to dobehind ascomments
10ARCHITECTURALISSUES DONT NEEDSPECIAL
FLOAT CONSTANTINSTRUCTIONSInitialize X
registersto float valuesvia integeroperations
XR Then use XFRfloat operations What I
want todo is left behindas commentsfor
thestranger readingmy code nextweek (ME)
11Modify C code so that it can be translated into
assembly code
Can only have1 instruction per line Code must
execute sequentially so remember the
12Start with S0 Xin instruction
Cant use XFR8 XFR6 to copy a register
13Since XFR8 XFR6 is not allowedTry XR8 R6
SIMD ? Single instruction Multiple DataSISD ?
Single instruction SingleData R6 means move XR6
and YR6 (Multiple data move described in 1
instruction) Try XR8 XR6 (integer bit-pattern
move)
New TigerSHARC architectureissues SIMD versus
SISD
14Some operationsare FLOAToperations and must
have XFR on left sideof equationBUT only R on
the right Some operations areSISD
operations and must haveXR on both side of
theequation (or just R on both sidesof the
equation makingthem SIMD X and Y withgarbage
happening on Y) Personally, I thinkall these
problemsare assembler issuesand could be
madeconsistent
15What we have learnt
- TigerSHARC has both SISD (single data) and SIMD
(multiple data) ability - XFR4 R4 R5 The answer (left) is single
data so the SISD choice is taken on right
read XR4 and XR5 (bit patterns), treat as floats
when do multiplication (F on left), and store
(bit pattern of answer) in XR4
16What we have learnt
- TigerSHARC has both SISD (single data) and SIMD
(multiple data) ability - SISDXR4 XR5 Move X part of R5 register
into X part of R4 register - XR4 YR5 Move Y part of R5 register
into X part of R4 register - SIMDXYR4 R5 Move X part of R5 register
into X part of R4 register and Y part of R5
register into Y part of R4 register - R4 R5 Short hand version of XYR4 R5
to confuse you - Does YXR4 R5 also exist? Move X part of R5
register into Y part of R4 register and X part
of R5 register into Y part of R4 register
17Disconnect from target and go to simulator
18Activate Simulator
19Rebuild the project and set breakpoints at start
and end of ASM code
20Activate the pipeline viewer
21Adjust the pipeline window so can see all the
instruction pipeline stages
Have just located an arrow iconwhich causes the
pipeline window to fillthe screen all the way
across
22PIPELINE STAGESSee page 8-34 of Processor manual
- 10 pipeline stages, but may be completely
desynchronized (happen semi-indepently) - Instruction fetch -- F1, F2, F3 and F4
- Integer ALU PreDecode, Decode, Integer, Access
- Compute Block EX1 and EX2
23PIPELINE STAGESSee page 8-34 of Processor manual
- Instruction fetch -- F1, F2, F3 and F4
- Fetch Unit Pipe
- Memory driven not instruction driven
- 128 bits fetched may make up 1, 2, 3, or 4
instruction lines (or parts of a couple of
instruction lines - Instruction fetched into IAB, instruction
alignment buffer
24PIPELINE STAGESSee page 8-34 of Processor manual
- Integer ALU pipe PD, D, I and A
- PreDecode the next COMPLETE instruction line
(1, 2, 3 or 4 ) fetched from IAB - Decode different instructions dispatched to
different execution units (J-IALU, K-IALU,
Compute Blocks) - Data memory access start in Integer stage
- A stands for Access stage
- Results are not available EX2 stage, but (by
register forwarding) can be sometimes accessed
earlier
25PIPELINE STAGESSee page 8-34 of Processor manual
- Compute Block
- EX1 and EX2
- Result is always written to the target register
on the rising edge of CCLK after stage EX2 - Following multiple use of register (read and
store) in one line guaranteed to pipeline
correctly - R2 R0 R1 R6 R2 R3
R2 at end of instruction R2
value at beginning of instruction used
26Only interested in later stages of the pipeline.
Adjust properties
27Run the code till first ASM break point Note
down cycle Number 39830
Then runagain tillreach second ASM
breakpoint Calculate execution time
Instruction in pipeline for a long time before
simulator stops
28Pipeline during code execution
29Pipeline viewer says 26 cyclesbut what do we
expect to get from our code?
1 2 3 4 5 6 7 8
8 cycles in this part of the code as expect1
instruction per clock cycle
30Pipeline viewer says 26 cyclesbut what do we
expect -- 21
20 error in timingToo much Where are theextra
cycles coming from? How easy is itto code in
such a way that the extracycles can
beremoved? ANSWERFairly straight forwardto
fix in principle, can be difficult in practice
1 2 3 4 5 6 7 8 9 10 11 12 13
Again 1 instruction / cycleexpected 13 cycles
expected 8 from before 21
31Understanding the TigerSHARC ALU pipeline
- TigerSHARC has many pipelines
- If these pipelines stall then the processor
speed goes down - Need to understand how the ALU pipeline works
- Learn to use the pipeline viewer
- May be different answer for floating point and
integer operations