OPTIMIZING C CODE FOR THE ARM PROCESSOR - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

OPTIMIZING C CODE FOR THE ARM PROCESSOR

Description:

OPTIMIZING C CODE FOR THE ARM PROCESSOR Optimizing code takes time and reduces source code readability Usually done for functions that are critical for performance or ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 20

Provided by: accy150

Category:

more less

Transcript and Presenter's Notes

Title: OPTIMIZING C CODE FOR THE ARM PROCESSOR

1
OPTIMIZING C CODE FOR THE ARM PROCESSOR

Optimizing code takes time and reduces source
code readability
Usually done for functions that are critical for
performance or power consumption and are executed
frequently
Usually in combination with profiling

2
LOCAL VARIABLES

ARM registers are 32-bit. Therefore it is more
efficient to use 32-bit data types
Use signed and unsigned integer types and avoid
char and short
Only exception is if you want wraparound to occur
Unsigned int is more efficient for division

3
LOOP STRUCTURES (incrementing for loop)

int checksum_v5(int data)
unsigned int i
int sum0
for (i0 ilt64 i)
sum (data)
return sum

checksum_v5 MOV r2,r0 r2data MOV r0,0
sum0 MOV r1,0 i0 checksum_v5_loop LDR
r3,r2,4 r3 (data) ADD r1,r1,1
i CMP r1,0x40 compare i, 64 ADD r0, r3,
r0 sum r3 BCC checksum_v5_loop if
(ilt64) goto loop MOV pc,r14 return sum
4
LOOP STRUCTURES (decrementing for loop)

int checksum_v6(int data)
unsigned int i
int sum0
for (i64 i!0 i--)
sum (data)
return sum

checksum_v6 MOV r2,r0 r2data MOV r0,0
sum0 MOV r1,0x40 i64 checksum_v6_loop LDR
r3,r2,4 r3 (data) SUBS r1,r1,1 i--
and set flags ADD r0, r3, r0 sum r3 BNE
checksum_v6_loop if (i!0) goto loop MOV
pc,r14 return sum
5
LOOP UNROLLING
checksum_v7 MOV r2,0 sum0 checksum_v6_loop
LDR r3,r2,4 r3 (data) SUBS r1,r1,4
N -4 and set flags ADD r2, r3, r2 sum
r3 LDR r3,r2,4 r3 (data) ADD r2, r3,
r2 sum r3 LDR r3,r2,4 r3
(data) ADD r2, r3, r2 sum r3 LDR
r3,r2,4 r3 (data) ADD r2, r3, r2 sum
r3 BNE checksum_v6_loop if (N!0) goto
loop MOV r0,r2 r0 sum MOV pc,r14 return
r0

int checksum_v7(int data,unsigned int N)
int sum0
do
sum (data)
sum (data)
sum (data)
sum (data)
N -4
while (N!0)
return sum

6
Loop Unrolling example

Unroll the following loop by a factor of 2, 4,
and eight
for (i0 ilt64 i)
ai bi ci1

7
Factor of 2

for (i0 ilt32 i)
a2i b2i c2i1
a2i1 b2i1 c2i11

8
Factor of 4

for (i0 ilt16 i)
a4i b4i c4i1
a4i1 b4i1 c4i11
a4i2 b4i2 c4i21
a4i3 b4i3 c4i31

9
Factor of 8

for (i0 ilt8 i)
a8i b8i c8i1
a8i1 b8i1 c8i11
a8i2 b8i2 c8i21
a8i3 b8i3 c8i31
a8i4 b8i4 c8i41
a8i5 b8i5 c8i51
a8i6 b8i6 c8i61
a8i7 b8i7 c8i71

10
REGISTER ALLOCATION

Limit the number of local variables in the
internal loop of functions to 12
Use the important variables in the innermost loop
to help the compiler

11
CALLING FUNCTIONS

Try to restrict functions to four arguments. Use
structures to group related arguments and pass
structure pointers instead
Define small functions in the same source file
and before the functions that call them.

12
REGISTER ALLOCATION

Limit the number of internal loop variables to 12
so they can be stored in registers

13
SUMMARY

Use signed int and unsigned int types for local
variables, function arguments and return values
The most efficient form of loop is the do-while
loop that counts down to zero
Unroll important loops
Try to limit functions to four arguments.
Avoid divisions. Use multiplication by reciprocal
Use the inline assembler

14
ARM INLINE ASSEMBLY

int main()
int n1,n2,m
n15
n23
__asm //inline assembly code
MUL m,n1,n2
printf("The result is d\n",m)
return(0)

15
USING INLINE ASSEMBLY

Used for ARM instructions not supported by the C
compiler (coprocessor instruction set extensions)
Creates portability issues

16
ALTERNATIVE CALLING ASSEMBLY FUNCTION FROM C

include ltstdio.hgt
extern void multip(int n1, int n2, int m)
int main()
int n1,n2,m
n15 //Assigning numbers
n23
multip(n1,n2,m) //calling function
printf("The result is\n",m)

17
Assembly function

AREA example, CODE, READONLY
EXPORT multip external function name
IMPORT n1 input
IMPORT n2
IMPORT m return variable
Multip function begins
LDR r3,n1 load data from memory to
registers
LDR r1,r3
LDR r4,n2
LDR r2,r4
LDR r5,m
LDR r0,r5
MUL r0,r1,r2
STR r0,r5 store result to m memory location
MOV pc,lr return from call
END

18
PORTABILITY ISSUES

Char type Unsigned on ARM, signed on many other
processors
Alignment ARM lw, sw instructions assume the
address is a multiple of the type you are loading
or storing
Endianess Little endian (default), can be
configured to big endian
Inline assembly Separate inline assembly into
small inlined functions

19
EXAMPLE