Title: Generating a software loop with memory accesses
1Generating a software loop with memory accesses
- TigerSHARC assembly syntax
2Concepts
- Learning just enough TigerSHARC assembly code to
make a software loop work - Comparing the timings for rectification of
integer and floating point arrays, using - debug C code,
- Release C code
- Our FIRST_ASM code
- Looking in MIXED mode at the code generated by
the compiler
3Passing integer rectify
4Add the ASM testsWant link to fail to find
mangled name
Name mangled function name
5More detailed look at the code
As with 68K needs a .section But name and format
different
As with 68K need .align statement Is the 4 in
bytes (8 bits)or words (32 bits)
As with 68K need .globalto tell other code that
this function exists
Single semi-colons Double semi-colons
Start function label End function label
Label format similar to 68K Needs leading
underscore and final colon
6Using J8 for returned int value
Now passing this test by accidentShould be
conditionally passing back NULL
7Parameter passing
- Spaces for first four parameters present on the
stack (as with 68K) - But the first four parameters are passed in
registers (J4, J5, J6 and J7 most of the time)
(as with MIPS) - The parameters passed in registers are often
stored into the spaces on the stack (like the
MIPS) when assembly code functions call assembly
code functions - J4, J5, J6 and J7 are volatile registers
8Coding convention
- // int HalfWaveRectifyRelease(int initial_array
, - //
int final_array , int N) - define initial_pt_inpar1 J4
- define final_pt_inpar2 J5
- define M_J6_inpar3 J6
- define return_pt_J8 J8
9ELSE is a KEYWORD
Missing means allthese instructionsare
joined into 1-lineof more than 4 instructions
Note END_IF not definedand not yet recognized
asan error
10Personally, because of name mangling issues, I
cut-and-paste function name into labels
Two issues Jumps can be predicted to happen
(default)Quad stuff issue
11The code was not exactly what we designed (C
equivalent) refactor and retest after the
refactoring
NEXT STEP
12For loop structure Use 68K style of looping
jumps
13For loop structure Use 68K style of
looping tests and jumps
14Accessing memory
- Basic mode
- Special register J31 acts as zero when used in
additions - Pt_J5 is a pointer register into an array
- Read_J1 is being used as a data register
- J registers like MIPS registers (used as pointer
and data).NOT like 68K registers either data
or address but not both - Read_J1 Pt_J5 read value from memory
location pointed to by J5 -- Compare to 68K
MOVE.L (A5), D1 - Read_J1 Pt_J5 8 read value from memory
location pointed to by the value (J5 8) --
Compare to 68K MOVE.L 8(A5), D1
PREMODIFY address used J5 8, no change in J5 - Read_J1 Pt_J5 J31 read value from memory
location pointed to by J5 but read somewhere
that this CAN be faster than just Read_J1
Pt_J5 -- NEED TO CONFIRM
15Accessing memory step 2
- Basic mode
- Pt_J5 is a pointer register into an array
- Offset_J4 is used as an offset
- Read_J1 is being used as a data register
- Read_J1 Pt_J5 Offset_J4 read value from
memory location pointed to by (J5 J4) - PRE-MODIFY address used J5 J4, no change in
J5 - Compare to 68K MOVE.L (A5, D4), D1
- Read_J1 Pt_J5 Offset_J4 read value from
memory location pointed to by J5, and then
perform add - POST-MODIFY address used J5, then perform J5
J5 J4 - Compare to 68K MOVE.L (A5), D1
ADD.L A4, A5 but as single
instruction
16Many other addressing modes
- Normal memory accesses
- Merged memory accesses
- Broadcast memory accesses
- Single register accesses
- Dual register accesses
- Quad register accesses
- Cross-over accesses
- Access of COMPLEX numbers
17For loop structure Use 68K style of looping
QUAD ERRORISSUEAGAIN
18Write the float-asm
- Integer 0 has bit pattern 0x0000 0000
- Float 0.0 has bit pattern 0x0000 0000
- Integer has format b S??? ???? ???? ???? ? ???
???? ???? ???? - Float has format b S??? ???? ???? ???? ? ???
???? ???? ???? - Float algorithm - if S 1 (negative) set to
zero - Otherwise leave unchanged same as integer
algorithm - Just re-use integer algorithm with a change of
name
EXPONENT
19Float ASM test
20Do the timing tests
21Weird results
- Variation of about 6 cycles in testing
- Our first ASM is faster than debug and slower
than release that was expected - Our integer code was slower than our float code
that was unexpected since the same code - Can we optimize an improve the timing?
DEBUG RELEASE FIRST_ASM
INTEGER 426416 124 118 316 320
FLOAT 462 458 210216 224 222
22Integer release code identify new instructions
23Float release identify new instructions
24Exercise 1 needed for Lab. 1
- FIR filter operation -- data and
filter-coefficients are both integer arrays in
C
25Exercise 1 needed for Lab. 1
- FIR filter operation -- data and
filter-coefficients are both integer arrays in
ASM
26Insert C code
27Insert assembler code version
28Concepts
- Learning just enough TigerSHARC assembly code to
make a software loop work - Comparing the timings for rectification of
integer and floating point arrays, using - debug C code,
- Release C code
- Our FIRST_ASM code
- Looking in MIXED mode at the code generated by
the compiler