General Optimization Issues - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

General Optimization Issues

Description:

General Optimization Issues – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 26
Provided by: michael298
Category:

less

Transcript and Presenter's Notes

Title: General Optimization Issues


1
General Optimization Issues
  • Solving the exercise issues

2
To be tackled today
  • Exercise 1
  • Solving the loop problem SIZE 128
  • Exercise 2
  • Solving the loop problem SIZE 127
  • Exercise 3
  • Moving from SISD to SIMD mode, SIZE 128
  • Exercise 4
  • Removing any expected stalls

3
Most optimized SIMD Floating point(32-bit)TigerSH
ARC instruction
  • xR30 CB Qj0 4 yR30 CB Qk0 4
    xyFR4 R5 R6 xyFR7 R8 R9, FR10 R8 -
    R9
  • xR30 CB Qj0 4 / Fetches 4 values on J
    BUS into x compute registers XR3, XR2,
    XR1, XR0 Increments J register and
    adjusts for circular buffer
    operation /
  • yR30 CB Qk0 4 / Fetches 4 values on J
    BUS into x compute registers XR3, XR2,
    XR1, XR0 Increments J register and
    adjusts for circular buffer
    operation /
  • xyFR4 R5 R6 / Two multiplications XFR5
    XFR6 and YFR5 YFR6 /
  • xyFR7 R8 R9, FR10 R8 - R9 / Two
    additions XFR8 XFR9 and YFR8 YFR9 AND Two
    subtractions XFR8 - XFR9 and YFR8 - YFR9 /
  • / Same register must be used either side
    of and operators /

4
Steps to optimize
  • Get the algorithm to work in C
  • Determine how much time is available
  • If Timing already okay quit
  • Determine maximum number of each type of
    operation (add, subtract, multiple, memory
    fetches)
  • Divide the calculated maximum by the number of
    available resources for that type of operation
  • The largest division result is the in theory
    number of cycles needed for the algorithm
  • If that minimum time is more than 100 of the
    time available find a new algorithm
  • If that minimum time is less than 40 of the time
    available perhaps you can optimize the code to
    meet the speed requirements

5
Code optimization 32 bit integersor 32-bit
floats
2 SIZE additions 2 SIZE Memory fetches Left
fetched on J-bus And done in X-compute Right
fetched on K-bus And done in Y-compute SIZE / 2
cycles in theory
6
STAGE 1Get the C code to work
7
Stage 2 Rewrite in simplest format
Note naming convention Single operation per
line Note other changes
8
Step 3 -- Unwrap the loop
Again Note naming convention
9
Step 4Overlap the first and second parts of
loops
Note The C code goes no faster, but using
this format for translating into parallel
assembly code will Step 1 -- 4 N Step 3 8
(N / 2) 2 Step 4 6 (N / 2) 2
10
Step 5A - Rearrange start-up and ending code
Software Pipeline Move first read outside Need
to add extra read at the end of the
loop Timing 2 (N/2 1) 6 Need to adjust
loop start (Is it done correctly? Are we
one-out) CAUTION NEED TO FIX
11
Step 5B - Rearrange start-up and ending code
Can now parallel additional adds and memory
fetches Note loop still in error
12
Exercise 1 -- Get the loop control correct
BUFFER_SIZE 1 BUFFER_SIZE 2 BUFFER_SIZE
4 BUFFER_SIZE 5 BUFFER_SIZE 8 BUFFER_SIZE
128
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Unrecognized second key error What is it? How do
you fix it?
17
(No Transcript)
18
Exercise 2 -- Rewrite the code when it is known
that BUFFER_SIZE 129
SIZE 129 But loop only handles 128 Since
129 / 2 128 / 2
19
(No Transcript)
20
(No Transcript)
21
Code to this point is SISD parallel optimization
  • SISD single instruction single data
  • Using X_compute block and J memory bus
  • Next stage SIMD single instruction multiple
    data
  • Using X_compute block and J memory bus for left
  • Using Y_compute block and K memory bus for right
  • Will need similar but different code when you are
    doing FIR in Lab. 3

22
Exercise 3 -- BUFFER_SIZE 128Rewrite so that
X and Y ops done together
23
(No Transcript)
24
Exercise 4 -- BUFFER_SIZE 128Rewrite so that
expect no data dependency stalls
BUFFER_SIZE 1 N 2 N 4 N 5 N 8 N 128
Leave this one for a while until we have handled
multiple memory accesses asanswer may changes
25
Tackled today
  • Exercise 1
  • Solving the loop problem SIZE 128
  • Exercise 2
  • Solving the loop problem SIZE 127
  • Exercise 3
  • Moving from SISD to SIMD mode, SIZE 128
  • Incomplete
  • Exercise 4
  • Removing any expected stalls left for later
Write a Comment
User Comments (0)
About PowerShow.com