Title: Generating a software loop with memory accesses
1Generating a software loop with memory accesses
- TigerSHARC assembly syntax
2Concepts
- Learning just enough TigerSHARC assembly code to
make a software loop work - Comparing the timings for rectification of
integer and floating point arrays, using - debug C code,
- Release C code
- Our FIRST_ASM code
- Looking in MIXED mode at the code generated by
the compiler
3Test Driven Development
- Work with customer to check that the tests
properly express what the customer wants done.
Iterative process with customer heavily
involved Agile methodology.
CUSTOMER DEVELOPER
4Note Special marker
Compiler optimization FLOATS 927 ? 304 -- THREE
FOLD INTS 960 ? 150 SIX FOLD Why the
difference, and can we do better, and do we want
to? Note the failures what are they
5Write tests about passing values back from an
assembly code routine
6More detailed look at the code
As with 68K and Blackfin needs a .section But
name and format different
As with 68K need .align statement Is the 4 in
bytes (8 bits)or words (32 bits)
As with 68K need .globalto tell other code that
this function exists
Single semi-colons Double semi-colons
Start function label End function label Used
for profiling code
Label format similar to 68K Needs leading
underscore and final colon
7Return registers
- There are many, depending on what you need to
return - Here we need to use J8 as the return register to
pass back integer pointer - Many registers available need ability to
control usage - J0 to J31 registers (integers and pointers)
(SISD mode) - XR0 to XR31 registers (integers) (SISD mode)
- XFR0 to XFR31 registers (floats) (SISD mode)
- Did I also mention
- I0 to I31 registers (integers and pointers)
(SISD mode) - YR0 to YR31 , YFR0 to YFR31 (SIMD mode)
- XYR, YXR and R registers (SIMD mode)
- And also the MIMD modes
- And the double registers and the quad registers
. - define return_pt_J8 J8 // J8 is a
VOLATILE, NON-PRESERVED register
8Parameter passing
- SPACES for first four parameters ARE ALWAYS
present on the stack (as with 68K) - But the first four parameters are passed in
registers (J4, J5, J6 and J7 most of the time)
(as with MIPS and Blackfin) - The parameters passed in registers are often
stored into the spaces on the stack (like the
MIPS) as the first step when assembly code
functions call assembly code functions - J4, J5, J6 and J7 are volatile, non-preserved
registers
9Can we pass back the start of the final array
Still passing tests byaccident and this needs
to be conditional returnvalue
10What we need to know based on experiences from
other processors
- Can we return from an assembly language routine
without crashing the processor? - Return a parameter from assembly language routine
- (Is it same for ints and floats?)
- Pass parameters into assembly language
- (Is it same for ints and floats?)
- Do IF THEN ELSE statements
- Read and write values to memory
- Read and write values in a loop
- Do some mathematics on the values fetched from
memory - All this stuff is demonstrated by coding
HalfWaveRectifyASM( )
11Why is ELSE a keyword
- FOUR PART ELSE INSTRUCTION IS LEGAL
- IF JLT ELSE, J1 J2 J3 // Conditional
execution if true ELSE, XR1 XR2
XR3 // Conditional if true YFR1
YFR2 YFR3 // Unconditional -- always - IF JLT DO, J1 J2 J3 // Conditional
execution -- if true DO, XR1 XR2 XR3
// Conditional -- if true YFR1 YFR2
YFR3 // Unconditional -- always - Having this sort of format means that the
instruction pipeline is not disrupted when we do
IF statements
12Label name is not the problem
NOTE This is C-like syntax, But it is not
C Statement must end in Not ONE
semicolon end of instructionTWO
semicolons end of parallel
instruction line
13Add dual-semicolons everywhereWorry about
multiple issues later
This dual semi-colon Is so important that
you MUST code review for it all the time or else
you waste so much time in the Lab. Key in exams /
quizzes
At last an error I know how to fix ?
14Well I thought I understood it !!!
- Speed issue JUMP instructions cant be too
close together when stored in memory - Not normally a problem when if code is larger
15Add a single instruction of 4 NOPsnop nop nop
nop TEMPORARY
- Fix the last error as part of Assignment 1
Fix the remaining error In handling the IF THEN
ELSE as part of assignment 1 Worry about code
efficiency later (refactor) when all code working
16What we need to know based on experiences from
other processors
- Can we return from an assembly language routine
without crashing the processor? - Return a parameter from assembly language routine
- (Is it same for ints and floats?)
- Pass parameters into assembly language
- (Is it same for ints and floats?)
- Do IF THEN ELSE statements
- Read and write values to memory
- Read and write values in a loop
- Do some mathematics on the values fetched from
memory - All this stuff is demonstrated by coding
HalfWaveRectifyASM( )
17Target. Changing this C code into assembly (to
get more speed)
- Code we generated yesterday was similar to parts
of this, but not equivalent. - Re-factor the code to make the assembly code and
C functionality equivalent
18The code was not exactly what we designed (C
equivalent) re-factor and retest after the
re-factoring
NEXT STEP
19Refactored C code
I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT
OF THE IF-THEN-ELSE TO OPTIMIZE THIS
PARTICULAR CODE BIT USE IF TRUE EXECUTE THIS
STATEMENT SINGLE LINE Avoiding JUMPS
in the mainflow of the code will speedthe flow
of the code Almost right. SYNTAX ERROR Look in
the manual to findthe correct syntax IF NJLE
DO, J8 0
20No syntax errors (No CODE ERRORS). Code does not
work (CODE DEFECTS)
We dont haveenough code topass all the
testsbut we are failingtests we did notexpect
to fail
21Run forensic tests to find out where DEFECT is
being introduced
Identify mistake byremoving codesections Witho
ut the IF
22Add another line to the codeCan now spot the
error
New format of IF-THEN-ELSE Is doing exactly the
opposite of what we want IF NOT TRUE return
NULL (0) Need JLE not NJLE
23Assignment 1 code the following as a software
loop follow MIPS / Blackfin approach
- DONE DURING TUTOTIAL
- int CalculateSum(void)
- int sum 0
- for (int count 0 count lt 6 count)
- sum sum count
-
- return sum
24Reminder software for-loopbecomes while loop
with initial test
- int CalculateSum(void)
- int sum 0
- int count 0
- while (count lt 6)
- sum sum count
- count
-
- return sum
-
- Do line by line translation intoassembly code
25USE SOFTWARE LOOP HEREDo loop control first
- Have some jumps too close together
NOTEJGE is ILLEGALUSE NJLT Customize?define
JGE NJLT
26Run the tests with 4 nop padding to check that
get out of loop as expected
Adding 4 nops-- lose 1 cyclegain an hour not
trying tosolve the problem If need the 1
cyclerefactor the code later
27Accessing memory
- Basic mode
- Special register J31 acts as zero when used in
additions - Pt_J5 is a pointer register into an array
- Value_J1 is being used as a data register
- J registers like MIPS registers (used as pointer
and data).NOT like 68K or Blackfin registers
those can be used as either data or address
registers but not both - NOTE Later we will find that using TigerSHARC
registers for data operations is a BAD idea - Value_J1 Pt_J5 read value from memory
location pointed to by J5 -- Compare to
Blackfin Value_R0 Pt_P0 - Value_J1 Pt_J5 J31 read value from
memory location pointed to by J5 but read
somewhere that this CAN be faster than just
Value_J1 Pt_J5 -- NEED TO CONFIRM
28Accessing memory step 2
- Basic mode
- Pt_J5 is a pointer register into an array
- Offset_J4 is used as an offset
- Value_J1 is being used as a data register to
receive the memory value load / store
architecture - Read_J1 Pt_J5 Offset_J4 read value from
memory location pointed to by (J5 J4) - PRE-MODIFY address used J5 J4, no change in
J5 - Read_J1 Pt_J5 Offset_J4 read value from
memory location pointed to by J5, and then
perform add operation on the J5 register (points
to NEXT location) - POST-MODIFY address used J5, then perform J5
J5 J4
29Add in the memory accessesFORGET TigerSHARC
RISC PROCESSOR
LOAD/STORE ONLYLike MIPS and Blackfin Must place
value intoregister, and then copyregister to
memory NO J5 J0 0 NO J3 0J5 J0
J3 Uses wrong J3 Remember TigerSHARCcan
handle parallel instructions YESJ3 0J5
J0 J3
30Understand the error messageToo many J resource
usage missing
Unintentionally doing theparallel instruction
line J5 J0 J2 J0 J0 1
31Note Missing label is not an assembler error,
its a linker error
Fix warningsDEFECTmay be days before try to
linkthen hard to find
32NOW the assembler know where CONTINUE is, then
it can tell you that you have two JUMP
instructions too close together
- Fix with magic 4 nops and lose one cycle / loop
33Not getting expected Test resultsSomething is
logically wrong (DEFECT)
34Obvious question are we even getting into the
loop. Add BREAKPOINT to TEST code flow.(We dont
add BREAKPOINTS to code follow in detail)
CODE NEVER GOT TOBREAKPOINT meanscode never
entered loop Forgot to do count 0 So not even
getting into loop as there isa garbage value
already inCount_J0 fromcode we
executedearlier -- DEFECT
35Not bad for a first effortFaster than compiler
in debug mode
36Where did the float ASM code suddenly appear from?
- Integer 0 has bit pattern 0x0000 0000
- Float 0.0 has bit pattern 0x0000 0000
- Integer 6 has format b 0??? ???? ???? ????
???? ???? ???? ???? - Float 6.0 has format b 0??? ???? ???? ????
???? ???? ???? ???? - Integer -6 has format b 1??? ???? ???? ????
???? ???? ???? ???? - Float -6.0 has format b 1??? ???? ???? ????
???? ???? ???? ???? - Formats are very different, but the sign bit is
in the same place - Float algorithm - if S 1 (negative) set to
zero - Otherwise leave unchanged same as integer
algorithm - Just re-use integer algorithm with a change of
name
EXPONENT
37Final code Float rectify code just has a
different name
38What we NOW KNOW
- Can we return from an assembly language routine
without crashing the processor? - Return a parameter from assembly language routine
- (Is it same for ints and floats?)
- Pass parameters into assembly language
- (Is it same for ints and floats?)
- Do IF THEN ELSE statements
- Read and write values to memory
- Read and write values in a loop
- Do some mathematics on the values fetched from
memory - All this stuff is demonstrated by coding
HalfWaveRectifyASM( )