Software and Hardware Circular Buffer Operations - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Software and Hardware Circular Buffer Operations

Description:

Software and Hardware Circular Buffer Operations M. R. Smith, ECE University of Calgary Canada – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 26
Provided by: Micha1232
Category:

less

Transcript and Presenter's Notes

Title: Software and Hardware Circular Buffer Operations


1
Software and Hardware Circular Buffer Operations
  • M. R. Smith, ECEUniversity of CalgaryCanada

2
Tackled today
  • Have moved the DCremoval( ) over to the X Compute
    block
  • Circular Buffer Issues
  • DCRemoval( )
  • FIR( )
  • Coding a software circular buffer in C and
    TigerSHARC assembly code
  • Coding a hardware circular buffer
  • Where to next?

3
DCRemoval( )
MemoryintensiveAdditionintensive Loops
formain code FIFO implementedas
circularbuffer
  • Not as complex as FIR, but many of the same
    requirements
  • Easier to handle
  • You use same ideas in optimizing FIR over Labs 2
    and 3
  • Two issues speed and accuracy. Develop suitable
    tests for CPP code and check that various
    assembly language versions satisfy the same tests

4
Next stage in improving code speedSoftware
circular buffers
  • Set up pointers to buffers
  • Insert values into buffers
  • SUM LOOP
  • SHIFT LOOP
  • Update outgoing parameters
  • Update FIFO
  • Function return
  • 2
  • 4
  • 4 N 5
  • 1 Was 1 2 log2N
  • 6
  • 3 6 N
  • 2
  • ---------------------------
  • 23 11 N Was 22 11 N 2 log2N
  • N 128 instructions 1430
  • 1430 300 delay cycles 1730 cycles

5
DCRemoval( )
FIFO implementedas circularbuffer
  • If there are N points in the circular buffer,
    then this approach of moving the data from memory
    to memory location requires
  • N Memory read / N Memory write (possible data bus
    conflicts)
  • 2N memory address calculations

6
Alternative approach
  • Move pointers rather than memory values
  • In principle 1 memory read, 1 memory write,
    pointer addition, conditional equate

7
Note Software circular buffer is NOT necessarily
more efficient than data moves
  • Watch out my version of FIR uses a different
    sort of circular buffer
  • FIR FIFO newest element earliest in array
    (matching FIR equation)
  • DCremoval FIFO newest element latest in array
    because that is the way I thought of it

8
Note Software circular buffer is NOT necessarily
more efficient than data moves
  • Now spending more time on moving / checking the
    software circular buffer pointers than moving the
    data?

SLOWERFASTER
9
On TigerSHARC
  • Since we can have multiply instructions on one
    line, then perhaps if we can avoid pipeline
    delays then software circular buffer is faster
    than memory moves

Pipeline delay XR4 R4 R5 XR4 R4 R6 Second instruction needs result of first No Pipeline delay XR4 R4 R5 XR3 R4 R6 Second instruction DOES NOT need result of first
10
Generate the tests for the software circular
buffer routine
11
New static pointers needed in Software circular
buffer code
12
New sets of register definesNow using many of
TigerSHARC registers
13
Code for storing new value into FIFO requires
knowledge of next-empty location
  • First you must get the address of where the
    static variable saved_next_pointer
  • Second you must access that address to get the
    actual pointer
  • Third you must use the pointer value
  • Will be problem in labs and exams with static
    variables stored in memory

14
Adjustment of software circular buffer pointer
must be done carefully
Get and update pointer Check the
pointer Save corrected pointer
15
Next stage in improving code speedSoftware
circular buffers
  • Set up pointers to buffers
  • Insert values into buffers
  • SUM LOOP
  • SHIFT LOOP
  • Update outgoing parameters
  • Update FIFO
  • Function return
  • 2
  • 8 Was 4
  • 4 N 5
  • 1 Was 1 2 log2N
  • 6
  • 14 Was 3 6 N
  • 2
  • ---------------------------
  • 37 5 N Was 23 11 N
  • N 128 instructions 677 cycles
  • 677 360 delay cycles 1011 cycles
  • Was
  • 1430 300 delay cycles 1730 cycles

16
Next step Hardware circular buffer
  • Do exactly the same pointer calculations as with
    software circular buffers, but now the
    calculations are done behind the scenes high
    speed using specialized pointer features
  • Only available with J0, J1, J2 and J3 registers
    (On older ADSP-21061 all pointer registers)
  • Jx -- The pointer register
  • JBx The BASE register set to start of the
    FIFO array
  • JLx The length register set to length of the
    FIFO array
  • VERY BIG WARNING? Reset to zero. On older
    ADSP-21061 it was very important that the length
    register be reset to zero, otherwise all the
    other functions using this register would
    suddenly start using circular buffer by mistake.
  • Still advisable but need special syntax for
    causing circular buffer operations to occur

17
Setting up the circular buffer functionsRemember
all the tests to start with
18
Store values into hardware FIFO
  • CB instruction ONLY works on POST-MODIFY
    operations

19
Now perform Math operation using circular buffer
operation
  • MUST NOT DO XR2 CB J0 i_J8
  • Save N cycles as no longer need to increment index

20
Update the static variablesFurther special CB
instructions
A few cycles saved here
21
Next stage in improving code speedHardware
circular buffers
  • 2
  • 8 Was 4
  • 3 N 4 Was 4 N 5
  • 1 Was 1 2 log2N
  • 6
  • 14 Was 3 6 N
  • 2
  • ---------------------------
  • 37 4 N Was 23 5 N
  • N 128 instructions 549 cycles
  • 549 300 delay cycle 879 cyclesDelays are now
    gt50 of useful time
  • Was
  • 677 360 delay cycles 1011 cycle
  • Set up pointers to buffers
  • Insert values into buffers
  • SUM LOOP
  • SHIFT LOOP
  • Update outgoing parameters
  • Update FIFO
  • Function return

22
Tackle the summation part of FIR Exercise in
using CB (Assignment 2)
23
Place assembly code here
24
The code is too slow because we are not taking
advantage of the available resources
  • Bring in up to 128 bits (4 instructions) per
    cycle
  • Ability to bring in 4 32-bit values along J data
    bus (data1) and 4 along K bus (data2)
  • Perform address calculations in J and K ALU
    single cycle hardware circular buffers
  • Perform math operations on both X and Y compute
    blocks
  • Background DMA activity
  • Off-load some of the processing to the second
    processor

25
Tackled today
  • Have moved the DCremoval( ) over to the X Compute
    block
  • Circular Buffer Issues
  • DCRemoval( )
  • FIR( )
  • Coding a software circular buffer in C and
    TigerSHARC assembly code
  • Coding a hardware circular buffer
  • Where to next?
Write a Comment
User Comments (0)
About PowerShow.com