Title: Embedded Systems Programming
1Embedded Systems Programming
- Writing Optimised C code for ARM
2Why write optimised C code?
- For embedded system size and/or speed are of key
importance - The compiler optimisation phase can only do so
much - In order to write optimal C code you need to know
details of the underlying hardware and the
compiler
3What compilers cant do
- void memclr( char data, int N)
-
- for ( N gt 0 N--)
-
- data0
- data
-
- Is N on first loop?
- 0 1 is dangerous!
- Is data array 4 byte aligned?
- Can store using int
- Is N a multiple of 4?
- Could do 4 word blocks at a time
- Compilers have to be conservative!
4An example Program
- The program might seem fine even resource
friendly - Using a char saves space
- for loops make good assembler
- Lets look at the assembler code
- / program showing inefficient
- variable and loop
- usage craig Nov 04
- /
- int checksum_1(int data)
-
- char i int sum 0
- for (i 0 i lt 64 i)
- sum datai
- return sum
5.text .align 2 .global checksum_1 .type
checksum_1,functionchecksum_1 _at_ args 0,
pretend 0, frame 0 _at_ frame_needed 1,
current_function_anonymous_args 0 mov ip,
sp stmfd sp!, fp, ip, lr, pc sub fp, ip,
4 mov r1, r0 mov r0, 0 _at_ sum 0 mov r2,
r0 _at_ i 0.L6 ldr r3, r1, r2, asl 2 _at_
datai add r0, r0, r3 _at_ sum
datai add r3, r2, 1 _at_ i and r2, r3,
255 cmp r2, 63 _at_ i lt 64 bls .L6 ldmea fp,
fp, sp, pc.Lfe1 .size checksum_1,.Lfe1-check
sum_1
6What is wrong?
- The use of char means that the compiler has to
cast to look at 8 bits using - and r2, r3, 255
- The loop variable requires a register and
initialisation - If the loop is called often then the tests and
branch is quite an overhead
7Variable sizes
- In general the compiler will use 32bit registers
for local variables but will have to cast them
when used as 8 or 16 bit values - If you can, use unsigned ints, if you cant
explicitly cast - Using signed shorts can be quite a problem for
compilers
8Watch your shorts!
short add( short a, short b) return a (b
gtgt 1)
- The above C code turns into the rather nasty
assembler - The gnu C compiler is very cautious when
confronted with short variables
Becomes .
mov ip, sp stmfd sp!, fp, ip, lr, pc sub fp,
ip, 4 mov r1, r1, asl 16 mov r0, r0, asl
16 mov r0, r0, asr 16 add r0, r0, r1, asr
17 mov r0, r0, asl 16 mov r0, r0, asr
16 ldmea fp, fp, sp, pc
9Loops 1
- As well as using a char for a loop counter the
loop counter could be redundant - Terminate loops by counting down to 0 the reduces
register usage and means no initialisation - Use do..while instead of for loops
10Efficient loop C
/ Program to show efficient use of
variables and loops / int checksum_2(int
data) int sum 0, i 64 do
sum (data) while ( --i !
0 ) return sum
11Efficient loop assembler
checksum_2 _at_ args 0, pretend 0, frame
0 _at_ frame_needed 1, current_function_anonymous_
args 0 mov ip, sp stmfd sp!, fp, ip, lr,
pc sub fp, ip, 4 mov r1, r0 mov r0, 0 _at_
sum 0 mov r2, 64 _at_ i 64.L6 ldr r3,
r1, 4 _at_ (data) add r0, r0, r3 _at_ sum
(data) subs r2, r2, 1 _at_ --i bne .L6 ldmea
fp, fp, sp, pc
12Loop unrolling
- If a loop is going to be repeated often then the
test and branch can be quite an overhead - If the loop is a multiple of 4 and is done quite
a lot then the loop can be unrolled - This increases code a size but is more speed
efficient - Sizes that are not multiples of 4 can be done but
are less efficient.
13An unrolled loop
Program to show efficient use of variables
and loops loop unrolling / int checksum_2(int
data) int sum 0, i 64 do
sum (data) sum (data)
sum (data) sum (data) i
- 4 while ( i ! 0 ) return sum
14checksum_2 _at_ args 0, pretend 0, frame
0 _at_ frame_needed 1, current_function_anonymous_
args 0 mov ip, sp stmfd sp!, fp, ip, lr,
pc sub fp, ip, 4 mov r2, r0 mov r0,
0 mov r1, 64.L6 ldr r3, r2, 4 add r0,
r0, r3 ldr r3, r2, 4 add r0, r0, r3 ldr r3,
r2, 4 add r0, r0, r3 ldr r3, r2,
4 add r0, r0, r3 subs r1, r1,
4 bne .L6 ldmea fp, fp, sp, pc
15Loop unrolling ! 4
/ Program to show use of loop unrolling
/ int checksum_2(int data, unsigned int N)
int sum 0 unsigned int i for ( i N/4 i
! 0 i--) sum (data)
sum (data) sum (data)
sum (data) for ( i N3 i ! 0
i--) sum (data) return sum