Title: 57 Code optimization
15-7 Code optimization
Two functions f and g
define MAX 10int aMAX, bMAX, cMAX,
xMAX, yMAXint i, j, r, s. . .int f(int
a, int b) int z z 2 a b return
zint g(int a, int b, int c) int z z
a c c b return z
What code optimization can the compiler do? -O,
-O0, -O1, -O2, -O3, -Os ?
With the O or O0 you have to do all
optimi-zations yourself
2Two for loops
. . .for(i 0 i lt MAX -1 i) xi
f(ai, bi) s 2 rfor(j 0 j lt MAX
- 1 j) yj s g(aj, bj, cj)
What can be done?
We want shorter execution time without increasing
the code!
3Loop integration
The two loops have the same range (0, MAX-1), and
no data dependency (x only in loop1, y only in
loop2). Loops can be integrated saves loop
overhead ( only i )!
s 2 rfor(i 0 i lt MAX - 1 i)
xi f(ai, bi) yj s g(aj, bj,
cj)
4Precalculation at compile time
The defined constant MAX is used as MAX - 1 in
the loop. MAX - 1 could be precalculated as 10
1 9 at compile time!
s 2 rfor(i 0 i lt 9 i) xi
f(ai, bi) yj s g(aj, bj,
cj)
5Algebraic simplification
Rewriting function g can save one multiplication
operation
int g(int a, int b, int c) int z z c
(a b) return z
6Inlining of functions
Both functions f and g are short and their code
could be inserted directly in the loop.
int a10, b10, c10, x10, y10int i, r,
ss 2 rfor(i 0 i lt 9 i) xi
2 ai bi yj s ((ai bi)
ci)
loop unrolling would give shorter execution time,
but it would increase the code size, so it cant
be used in this case.
75-2 Register lifetime
A processor has this instruction type op R1, R2,
R3 all three registers must be different. Code
to run
u c d (1) v a b (2)w a u (3)x
v e (4)
How many registers are needed?
8Register Life Time Graph
u c d (1) v a b (2)w a u (3)x
v e (4)
Four registers are needed!
9Data Flow Graph
A Data Flow Graph can detect data dependencies.
u c d (1) v a b (2)w a u (3)x
v e (4)
- Must be before (3)
- Must be before (4)
(2) and (3) can change execution order!
10New Register Life Time Graph
New instruction order
u c d (1) w a u (2)v a b
(3)x v e (4)
Now only 3 registers needed. Saving 25.
115-8 CDFG
- Control and Data Flow Graph (CDFG)
- Multiplication takes 3 cycles, all other
instructions take 1 cycle. Best/Worst execution
time?
mode 0 TBest 11 2
y 0if(mode 1) for(i 0 i lt 5
i) y ai bi
mode 1 TWorst 111(51) 54 5 34
12Multiply Accumulate operation
c) MAC-unit! R1 R1 R2 R3 in one cycle!
y ai bi / one cycle /
TWorst 111(51) 51 5 19
19/34 0.56. With MAC 56 of ordinary processor
execution time.
13Processes on a CPU
14Scheduling states of process
15Priority Driven Scheduling
- Each process has fixed priority
- The ready process with the highest priority
executes - Process executes until completion or preemtion
by higher priority process
166-2 Processor utilisation and feasible scheduling
P(execution time, period, deadline) P1(3, 9, 9)
P2(1, 2, 2) P3(1, 6, 6)
Timeline least-common multiple of process
periods 9, 2, 6 3?3, 2, 2?3 3?3?2 18
CPU utilisation
100 ?
17Rate Monotonic Scheduling
RMS shortest period is assigned the highest
priority and so on.
RMS guarantee, feasible schedule exists if
This case U 1 so there is no guarantee!
n 3 U lt 0.78
18RMS figure
Priorities P2 gt P3 gt P1 (2 lt 6 lt 9)
P1 misses the deadline! No feasible schedule with
RMS!
19Earliest Deadline First Scheduling
EDF guarantee, feasible schedule exists if U
? 1This case U 1, EDF shall produce a feasible
schedule.
206.3 Scheduling and semaphores
P(execution time, period, deadline) P1(1, 3, 3)
P2(1, 4, 4) P3(2, 6, 6) 3, 2?2, 2?3 3?2?2
12
RMS P1 gt P2 gt P3 (3 lt 4 lt 6)
Sem1 is a binary semaphore. accessSem1() and
releaseSem1() takes 0 time.
21RMS with no critical sections
22RMS with critical sections