LPC Speech Coder on the TI C6x DSP - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

LPC Speech Coder on the TI C6x DSP

Description:

LPC Speech Coder. on the TI C6x DSP. Mark Anderson, Jeff Burke ... 30-day 'evaluation' license. Draconian copy protection, pulls out the rug from under you ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 27
Provided by: jeffrey181
Learn more at: http://www.seas.ucla.edu
Category:
Tags: dsp | lpc | c6x | coder | speech

less

Transcript and Presenter's Notes

Title: LPC Speech Coder on the TI C6x DSP


1
LPC Speech Coder on the TI C6x DSP
  • Mark Anderson, Jeff Burke
  • EE213A / EE298-2
  • Prof. Ingrid Verbauwhede

2
Summary
  • Implementation platform
  • Texas Instruments TMS320C6000
  • Low-quantity cost US 35 (C6211)
  • Architecture clock frequency
  • 150 MHz (C6211)
  • Throughput
  • 75-80 channels _at_ 8000 samples/sec

3
Summary
  • Total energy per sample
  • 1.8 uJ/sample
  • Area
  • 1.2 of cycle budget per chan. per frame
  • 8.5 of unified memory per channel
  • 25 of unified memory for algorithm

4
Summary
  • Flexibility of implementation
  • High programmable processor with C compiler, GUI
    debugger simulator
  • SegSNR_A
  • ?
  • SegSNR_Q
  • 26 dB (voiced segments)

5
Architecture overview
  • 256-bit VLIW
  • Two clustered data paths
  • Four functional units in each data path
  • 16x16 multiply
  • Two ALUs
  • Data addressing unit
  • 32-bit instruction for each functional unit
  • (256 bit instruction for 8 func. Units)

6
Data path diagram
7
Architecture overview
  • Split register file
  • Only two cross-paths exists
  • Cluster is limited to one source read from
    opposite register file per cycle.
  • Data types
  • 8, 16, 32-bit with 40-bit accumulate
  • 40-bit register pair

8
Memory architecture
  • C6211 (US35) has a cache!
  • 4kB L1 Instruction cache (L1P)
  • 4kB L1 Data cache (L1D)
  • 64kB L2 Unified memory and/or cache
  • Extra DMA channels

9
Memory architecture
10
Design Tools
  • Command-line
  • Compiler, debugger, simulator
  • Code Composer Studio
  • Same tools
  • Windows NT GUI
  • 30-day evaluation license
  • Draconian copy protection, pulls out the rug from
    under you

11
Design Flow
  • Consolidate Matlab reference into a single
    function
  • Matlab rewritten C-style
  • Verified C-style Matlab
  • C prototype created
  • Imported into Code Composer, optimized simulated

12
Fixed-point quantization
  • Input samples
  • 16-bit, normalized to -1,1)
  • lt1.15gt format used
  • Coefficient quantization
  • Hamming window, pre-emphasis, FIR
  • lt1.15gt format used
  • No noticeable change in characteristics

13
Fixed-point quantization
  • Most values 16 bit
  • Take advantage of 16x16 fast multipliers
  • Remain close to other class implementations
  • Add metric for overpowered LPC engine
  • Use of channels as performance metric

14
Fixed-point quantization
  • Energy stored in lt5.27gt
  • Prevent overflow, provide precision for low
    energy segments
  • Temporary values stored in lt10.30gt
  • Take advantage of extended precision
  • Modified autocorrelation used lt16.0gt
  • All whole numbers

15
Fixed-Point SNR
  • Matlab simulation of magnitude truncation
  • Tools again.
  • SegSNR_A ?
  • SegSNR_Q 26 dB
  • Voiced segments only
  • Sent_female test data

16
Performance results
  • Initial version 80,000 CPU cycles/frame
  • Optimization
  • Take advantage of VLIW, pipelining
  • observe assembly, modify C loops
  • Use TIs DSP Library
  • Assembly advantage without assembly
  • Optimized version 30,182 cycles/frame
  • Had to stop early, still at least 5K cycles wasted

17
Performance
  • Then, the tool license expired.
  • The tool would not install on other machines.
  • TI responded, but wasnt too helpful.
  • Moral 1 Avoid the evaluation version.
  • Moral 2 Give tools away to sell hardware

18
Cycle count details
19
Additional optimizations
  • Use more DSPLIB routines
  • Autocorrelation
  • Assembly-level optimization
  • Code size reduction?
  • Reduce number of buffers to reduce L1D usage per
    frame

20
Energy per sample
  • C6211 consumes 1.24W
  • 75 high activity / 25 low activity
  • 1.24W / 80 channels 15.5mW/channel
  • 15.5 mJ/sec/channel 1/8000 1.8 uJ / sample

21
Number of channels
  • 150 x 106 cycles/sec x 0.02 sec/frame
  • 3.0 x 106 cycles/frame
  • 3.0 x 106 cycles/frame / 30,182 cycles
  • 99 channels

22
Memory
  • C6211 Cache complicates estimates
  • Performance is 85-99 of optimal for typical
    applications
  • 30,182 cycles becomes 35,508 cycles/frame for
    85 efficiencygt now support only 86 channels

23
Memory
  • Try to account for off-chip memory transfers
  • 220,000 cycles for 150ns fetches for 80
    channelsgt support 75-80 channels
  • Unable to verify/simulate because of unexpected
    tool expiration

24
Memory
  • L2 usage
  • 16kB Code size thanks to VLIW
  • 512 32-byte instruction clusters
  • More suited for C6201 larger processors
  • Remaining used by data for channels
  • 480 bytes each (8.5 of remaining memory)
  • L1 usage
  • L1P Cant tell because of cache
  • L1D 2.2kB (56)

25
Tool comments
  • Powerful, easy to use IDE
  • When it worked.
  • Licensing problems for eval version
  • Debugging support a bit odd
  • puts/printf

26
C6x Conclusions
  • Easily support 75-80 channels of coding
  • 26 dB fixed-point SNR, 16-bit types
  • VLIW Large code size
  • Cache on a low-end DSP!
  • Good tools,but draconian copy protection
Write a Comment
User Comments (0)
About PowerShow.com