Title: Software and Hardware Circular Buffer Operations
1Software and Hardware Circular Buffer Operations
- M. R. Smith, ECEUniversity of CalgaryCanada
2Tackled today
- Have moved the DCremoval( ) over to the X Compute
block - Circular Buffer Issues
- DCRemoval( )
- FIR( )
- Coding a software circular buffer in C and
TigerSHARC assembly code - Coding a hardware circular buffer
- Where to next?
3DCRemoval( )
MemoryintensiveAdditionintensive Loops
formain code FIFO implementedas
circularbuffer
- Not as complex as FIR, but many of the same
requirements - Easier to handle
- You use same ideas in optimizing FIR over Labs 2
and 3 - Two issues speed and accuracy. Develop suitable
tests for CPP code and check that various
assembly language versions satisfy the same tests
4Next stage in improving code speedSoftware
circular buffers
- Set up pointers to buffers
- Insert values into buffers
- SUM LOOP
- SHIFT LOOP
- Update outgoing parameters
- Update FIFO
- Function return
- 2
- 4
- 4 N 5
- 1 Was 1 2 log2N
- 6
- 3 6 N
- 2
- ---------------------------
- 23 11 N Was 22 11 N 2 log2N
- N 128 instructions 1430
- 1430 300 delay cycles 1730 cycles
5DCRemoval( )
FIFO implementedas circularbuffer
- If there are N points in the circular buffer,
then this approach of moving the data from memory
to memory location requires - N Memory read / N Memory write (possible data bus
conflicts) - 2N memory address calculations
6Alternative approach
- Move pointers rather than memory values
- In principle 1 memory read, 1 memory write,
pointer addition, conditional equate
7Note Software circular buffer is NOT necessarily
more efficient than data moves
- Watch out my version of FIR uses a different
sort of circular buffer - FIR FIFO newest element earliest in array
(matching FIR equation) - DCremoval FIFO newest element latest in array
because that is the way I thought of it
8Note Software circular buffer is NOT necessarily
more efficient than data moves
- Now spending more time on moving / checking the
software circular buffer pointers than moving the
data?
SLOWERFASTER
9On TigerSHARC
- Since we can have multiply instructions on one
line, then perhaps if we can avoid pipeline
delays then software circular buffer is faster
than memory moves
Pipeline delay XR4 R4 R5 XR4 R4 R6 Second instruction needs result of first No Pipeline delay XR4 R4 R5 XR3 R4 R6 Second instruction DOES NOT need result of first
10Generate the tests for the software circular
buffer routine
11New static pointers needed in Software circular
buffer code
12New sets of register definesNow using many of
TigerSHARC registers
13Code for storing new value into FIFO requires
knowledge of next-empty location
- First you must get the address of where the
static variable saved_next_pointer - Second you must access that address to get the
actual pointer - Third you must use the pointer value
- Will be problem in labs and exams with static
variables stored in memory
14Adjustment of software circular buffer pointer
must be done carefully
Get and update pointer Check the
pointer Save corrected pointer
15Next stage in improving code speedSoftware
circular buffers
- Set up pointers to buffers
- Insert values into buffers
- SUM LOOP
- SHIFT LOOP
- Update outgoing parameters
- Update FIFO
- Function return
- 2
- 8 Was 4
- 4 N 5
- 1 Was 1 2 log2N
- 6
- 14 Was 3 6 N
- 2
- ---------------------------
- 37 5 N Was 23 11 N
- N 128 instructions 677 cycles
- 677 360 delay cycles 1011 cycles
- Was
- 1430 300 delay cycles 1730 cycles
16Next step Hardware circular buffer
- Do exactly the same pointer calculations as with
software circular buffers, but now the
calculations are done behind the scenes high
speed using specialized pointer features - Only available with J0, J1, J2 and J3 registers
(On older ADSP-21061 all pointer registers) - Jx -- The pointer register
- JBx The BASE register set to start of the
FIFO array - JLx The length register set to length of the
FIFO array -
- VERY BIG WARNING? Reset to zero. On older
ADSP-21061 it was very important that the length
register be reset to zero, otherwise all the
other functions using this register would
suddenly start using circular buffer by mistake. - Still advisable but need special syntax for
causing circular buffer operations to occur
17Setting up the circular buffer functionsRemember
all the tests to start with
18Store values into hardware FIFO
- CB instruction ONLY works on POST-MODIFY
operations
19Now perform Math operation using circular buffer
operation
- MUST NOT DO XR2 CB J0 i_J8
- Save N cycles as no longer need to increment index
20Update the static variablesFurther special CB
instructions
A few cycles saved here
21Next stage in improving code speedHardware
circular buffers
- 2
- 8 Was 4
- 3 N 4 Was 4 N 5
- 1 Was 1 2 log2N
- 6
- 14 Was 3 6 N
- 2
- ---------------------------
- 37 4 N Was 23 5 N
- N 128 instructions 549 cycles
- 549 300 delay cycle 879 cyclesDelays are now
gt50 of useful time - Was
- 677 360 delay cycles 1011 cycle
- Set up pointers to buffers
- Insert values into buffers
- SUM LOOP
- SHIFT LOOP
- Update outgoing parameters
- Update FIFO
- Function return
22Tackle the summation part of FIR Exercise in
using CB (Assignment 2)
23Place assembly code here
24The code is too slow because we are not taking
advantage of the available resources
- Bring in up to 128 bits (4 instructions) per
cycle - Ability to bring in 4 32-bit values along J data
bus (data1) and 4 along K bus (data2) - Perform address calculations in J and K ALU
single cycle hardware circular buffers - Perform math operations on both X and Y compute
blocks - Background DMA activity
- Off-load some of the processing to the second
processor
25Tackled today
- Have moved the DCremoval( ) over to the X Compute
block - Circular Buffer Issues
- DCRemoval( )
- FIR( )
- Coding a software circular buffer in C and
TigerSHARC assembly code - Coding a hardware circular buffer
- Where to next?