OPTIMIZING C CODE FOR THE ARM PROCESSOR - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

OPTIMIZING C CODE FOR THE ARM PROCESSOR

Description:

OPTIMIZING C CODE FOR THE ARM PROCESSOR Optimizing code takes time and reduces source code readability Usually done for functions that are critical for performance or ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 20
Provided by: accy150
Category:

less

Transcript and Presenter's Notes

Title: OPTIMIZING C CODE FOR THE ARM PROCESSOR


1
OPTIMIZING C CODE FOR THE ARM PROCESSOR
  • Optimizing code takes time and reduces source
    code readability
  • Usually done for functions that are critical for
    performance or power consumption and are executed
    frequently
  • Usually in combination with profiling

2
LOCAL VARIABLES
  • ARM registers are 32-bit. Therefore it is more
    efficient to use 32-bit data types
  • Use signed and unsigned integer types and avoid
    char and short
  • Only exception is if you want wraparound to occur
  • Unsigned int is more efficient for division

3
LOOP STRUCTURES (incrementing for loop)
  • int checksum_v5(int data)
  • unsigned int i
  • int sum0
  • for (i0 ilt64 i)
  • sum (data)
  • return sum

checksum_v5 MOV r2,r0 r2data MOV r0,0
sum0 MOV r1,0 i0 checksum_v5_loop LDR
r3,r2,4 r3 (data) ADD r1,r1,1
i CMP r1,0x40 compare i, 64 ADD r0, r3,
r0 sum r3 BCC checksum_v5_loop if
(ilt64) goto loop MOV pc,r14 return sum
4
LOOP STRUCTURES (decrementing for loop)
  • int checksum_v6(int data)
  • unsigned int i
  • int sum0
  • for (i64 i!0 i--)
  • sum (data)
  • return sum

checksum_v6 MOV r2,r0 r2data MOV r0,0
sum0 MOV r1,0x40 i64 checksum_v6_loop LDR
r3,r2,4 r3 (data) SUBS r1,r1,1 i--
and set flags ADD r0, r3, r0 sum r3 BNE
checksum_v6_loop if (i!0) goto loop MOV
pc,r14 return sum
5
LOOP UNROLLING
checksum_v7 MOV r2,0 sum0 checksum_v6_loop
LDR r3,r2,4 r3 (data) SUBS r1,r1,4
N -4 and set flags ADD r2, r3, r2 sum
r3 LDR r3,r2,4 r3 (data) ADD r2, r3,
r2 sum r3 LDR r3,r2,4 r3
(data) ADD r2, r3, r2 sum r3 LDR
r3,r2,4 r3 (data) ADD r2, r3, r2 sum
r3 BNE checksum_v6_loop if (N!0) goto
loop MOV r0,r2 r0 sum MOV pc,r14 return
r0
  • int checksum_v7(int data,unsigned int N)
  • int sum0
  • do
  • sum (data)
  • sum (data)
  • sum (data)
  • sum (data)
  • N -4
  • while (N!0)
  • return sum

6
Loop Unrolling example
  • Unroll the following loop by a factor of 2, 4,
    and eight
  • for (i0 ilt64 i)
  • ai bi ci1

7
Factor of 2
  • for (i0 ilt32 i)
  • a2i b2i c2i1
  • a2i1 b2i1 c2i11

8
Factor of 4
  • for (i0 ilt16 i)
  • a4i b4i c4i1
  • a4i1 b4i1 c4i11
  • a4i2 b4i2 c4i21
  • a4i3 b4i3 c4i31

9
Factor of 8
  • for (i0 ilt8 i)
  • a8i b8i c8i1
  • a8i1 b8i1 c8i11
  • a8i2 b8i2 c8i21
  • a8i3 b8i3 c8i31
  • a8i4 b8i4 c8i41
  • a8i5 b8i5 c8i51
  • a8i6 b8i6 c8i61
  • a8i7 b8i7 c8i71

10
REGISTER ALLOCATION
  • Limit the number of local variables in the
    internal loop of functions to 12
  • Use the important variables in the innermost loop
    to help the compiler

11
CALLING FUNCTIONS
  • Try to restrict functions to four arguments. Use
    structures to group related arguments and pass
    structure pointers instead
  • Define small functions in the same source file
    and before the functions that call them.

12
REGISTER ALLOCATION
  • Limit the number of internal loop variables to 12
    so they can be stored in registers

13
SUMMARY
  • Use signed int and unsigned int types for local
    variables, function arguments and return values
  • The most efficient form of loop is the do-while
    loop that counts down to zero
  • Unroll important loops
  • Try to limit functions to four arguments.
  • Avoid divisions. Use multiplication by reciprocal
  • Use the inline assembler

14
ARM INLINE ASSEMBLY
  • int main()
  • int n1,n2,m
  • n15
  • n23
  • __asm //inline assembly code
  • MUL m,n1,n2
  • printf("The result is d\n",m)
  • return(0)

15
USING INLINE ASSEMBLY
  • Used for ARM instructions not supported by the C
    compiler (coprocessor instruction set extensions)
  • Creates portability issues

16
ALTERNATIVE CALLING ASSEMBLY FUNCTION FROM C
  • include ltstdio.hgt
  • extern void multip(int n1, int n2, int m)
  • int main()
  • int n1,n2,m
  • n15 //Assigning numbers
  • n23
  • multip(n1,n2,m) //calling function
  • printf("The result is\n",m)

17
Assembly function
  • AREA example, CODE, READONLY
  • EXPORT multip external function name
  • IMPORT n1 input
  • IMPORT n2
  • IMPORT m return variable
  • Multip function begins
  • LDR r3,n1 load data from memory to
    registers
  • LDR r1,r3
  • LDR r4,n2
  • LDR r2,r4
  • LDR r5,m
  • LDR r0,r5
  • MUL r0,r1,r2
  • STR r0,r5 store result to m memory location
  • MOV pc,lr return from call
  • END

18
PORTABILITY ISSUES
  • Char type Unsigned on ARM, signed on many other
    processors
  • Alignment ARM lw, sw instructions assume the
    address is a multiple of the type you are loading
    or storing
  • Endianess Little endian (default), can be
    configured to big endian
  • Inline assembly Separate inline assembly into
    small inlined functions

19
EXAMPLE
  • Write a program that reads 8-element row and
    column vectors from memory and
  • Multiplies both by a scalar also found in memory
  • Calculates the scalar product of the two vectors
  • Assume no partial product may exceed 32 bits
  • Use v1 1 2 3 4 5 6 7 8, v2 0 1 2 3 4 5 6
    7T, s5 as test inputs
  • Unroll the loop by two and four
  • Repeat using inline assembly for the
    multiplications
Write a Comment
User Comments (0)
About PowerShow.com