Understanding the TigerSHARC ALU pipeline - PowerPoint PPT Presentation

About This Presentation
Title:

Understanding the TigerSHARC ALU pipeline

Description:

Understanding the TigerSHARC ALU pipeline Determining the speed of one stage of IIR filter Part 1 Getting code to work – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 32
Provided by: Micha1179
Category:

less

Transcript and Presenter's Notes

Title: Understanding the TigerSHARC ALU pipeline


1
Understanding the TigerSHARC ALU pipeline
  • Determining the speed of one stage of IIR filter
    Part 1Getting code to work

2
Understanding the TigerSHARC ALU pipeline
  • TigerSHARC has many pipelines
  • If these pipelines stall then the processor
    speed goes down
  • Need to understand how the ALU pipeline works
  • Learn to use the pipeline viewer
  • May be different answer for floating point and
    integer operations

3
Register File and COMPUTE Units
4
Simple ExampleIIR -- Biquad
S0 S1 S2
  • For (Stages 0 to 3) Do
  • S0 Xin H5 S2 H3 S1 H4
  • Yout S0 H0 S1 H1 S2 H2
  • S2 S1
  • S1 S0

Not a great bit ofIIR code as It cantbe used
in a loop on an array of values as is
reallynecessary
5
Set up the tests. Want to make sure correct
answer as code changes
include ltEmbeddedUnit/EmbeddedUnit.hgt include
ltEmbeddedUnit/CommonTests.hgt include
ltEmbeddedUnit/EmbeddedTests.hgt
6
Step 1 Stub plus return value
  • Build an assembly language stub forfloat
    iirASM(void)
  • Make it return a floating point value of 40.5 to
    show that we can return a value of 40.5

J8 is an INTEGERso how can we return
40.5?ANSWER WE DONTWe return the bit
pattern for 40.5, which is the same as an
INTEGER bit pattern
7
Code does not work when passing back floats with
J8 register
We are passing back40.5 in normal return
register, but that isobviously NOT whatthe C
compiler wasexpecting Wrong code convention
8
Code does work when using XR8 register NOTE
NOT XFR8
9
Step 2 Using C code as comments -- set up the
coefficients
XFR0 0.0 DOES NOT EXISTas a float
instruction XR0 0.0 DOES EXIST Bit-patterns
requireinteger Xregisters Leave what
youwanted to dobehind ascomments
10
ARCHITECTURALISSUES DONT NEEDSPECIAL
FLOAT CONSTANTINSTRUCTIONSInitialize X
registersto float valuesvia integeroperations
XR Then use XFRfloat operations What I
want todo is left behindas commentsfor
thestranger readingmy code nextweek (ME)
11
Modify C code so that it can be translated into
assembly code
Can only have1 instruction per line Code must
execute sequentially so remember the
12
Start with S0 Xin instruction
Cant use XFR8 XFR6 to copy a register
13
Since XFR8 XFR6 is not allowedTry XR8 R6
SIMD ? Single instruction Multiple DataSISD ?
Single instruction SingleData R6 means move XR6
and YR6 (Multiple data move described in 1
instruction) Try XR8 XR6 (integer bit-pattern
move)
New TigerSHARC architectureissues SIMD versus
SISD
14
Some operationsare FLOAToperations and must
have XFR on left sideof equationBUT only R on
the right Some operations areSISD
operations and must haveXR on both side of
theequation (or just R on both sidesof the
equation makingthem SIMD X and Y withgarbage
happening on Y) Personally, I thinkall these
problemsare assembler issuesand could be
madeconsistent
15
What we have learnt
  • TigerSHARC has both SISD (single data) and SIMD
    (multiple data) ability
  • XFR4 R4 R5 The answer (left) is single
    data so the SISD choice is taken on right
    read XR4 and XR5 (bit patterns), treat as floats
    when do multiplication (F on left), and store
    (bit pattern of answer) in XR4

16
What we have learnt
  • TigerSHARC has both SISD (single data) and SIMD
    (multiple data) ability
  • SISDXR4 XR5 Move X part of R5 register
    into X part of R4 register
  • XR4 YR5 Move Y part of R5 register
    into X part of R4 register
  • SIMDXYR4 R5 Move X part of R5 register
    into X part of R4 register and Y part of R5
    register into Y part of R4 register
  • R4 R5 Short hand version of XYR4 R5
    to confuse you
  • Does YXR4 R5 also exist? Move X part of R5
    register into Y part of R4 register and X part
    of R5 register into Y part of R4 register

17
Disconnect from target and go to simulator
18
Activate Simulator
19
Rebuild the project and set breakpoints at start
and end of ASM code
20
Activate the pipeline viewer
21
Adjust the pipeline window so can see all the
instruction pipeline stages
Have just located an arrow iconwhich causes the
pipeline window to fillthe screen all the way
across
22
PIPELINE STAGESSee page 8-34 of Processor manual
  • 10 pipeline stages, but may be completely
    desynchronized (happen semi-indepently)
  • Instruction fetch -- F1, F2, F3 and F4
  • Integer ALU PreDecode, Decode, Integer, Access
  • Compute Block EX1 and EX2

23
PIPELINE STAGESSee page 8-34 of Processor manual
  • Instruction fetch -- F1, F2, F3 and F4
  • Fetch Unit Pipe
  • Memory driven not instruction driven
  • 128 bits fetched may make up 1, 2, 3, or 4
    instruction lines (or parts of a couple of
    instruction lines
  • Instruction fetched into IAB, instruction
    alignment buffer

24
PIPELINE STAGESSee page 8-34 of Processor manual
  • Integer ALU pipe PD, D, I and A
  • PreDecode the next COMPLETE instruction line
    (1, 2, 3 or 4 ) fetched from IAB
  • Decode different instructions dispatched to
    different execution units (J-IALU, K-IALU,
    Compute Blocks)
  • Data memory access start in Integer stage
  • A stands for Access stage
  • Results are not available EX2 stage, but (by
    register forwarding) can be sometimes accessed
    earlier

25
PIPELINE STAGESSee page 8-34 of Processor manual
  • Compute Block
  • EX1 and EX2
  • Result is always written to the target register
    on the rising edge of CCLK after stage EX2
  • Following multiple use of register (read and
    store) in one line guaranteed to pipeline
    correctly
  • R2 R0 R1 R6 R2 R3

R2 at end of instruction R2
value at beginning of instruction used
26
Only interested in later stages of the pipeline.
Adjust properties
27
Run the code till first ASM break point Note
down cycle Number 39830
Then runagain tillreach second ASM
breakpoint Calculate execution time
Instruction in pipeline for a long time before
simulator stops
28
Pipeline during code execution
29
Pipeline viewer says 26 cyclesbut what do we
expect to get from our code?
1 2 3 4 5 6 7 8
8 cycles in this part of the code as expect1
instruction per clock cycle
30
Pipeline viewer says 26 cyclesbut what do we
expect -- 21
20 error in timingToo much Where are theextra
cycles coming from? How easy is itto code in
such a way that the extracycles can
beremoved? ANSWERFairly straight forwardto
fix in principle, can be difficult in practice
1 2 3 4 5 6 7 8 9 10 11 12 13
Again 1 instruction / cycleexpected 13 cycles
expected 8 from before 21
31
Understanding the TigerSHARC ALU pipeline
  • TigerSHARC has many pipelines
  • If these pipelines stall then the processor
    speed goes down
  • Need to understand how the ALU pipeline works
  • Learn to use the pipeline viewer
  • May be different answer for floating point and
    integer operations
Write a Comment
User Comments (0)
About PowerShow.com