Title: C64x DSP in Embedded Systems
1C64x DSP in Embedded Systems
- Matthias Kassner
- OMAP FAE
- Texas Instruments
2History of TI DSPs
- 1982 First TI DSP
- TMS320C10
- 20MHz / 5MIPS (58.000 transistors)
- 1990 First TI TMS320C50 DSP
- 60MHz / 30MIPS (1 Million transistors)
- More than two billion C5x family DSPs have been
sold to date - 1997 First C62 and C67 DSPs
- 200MHz / 1600MIPs
- 2001 First C64x DSP
- 600MHz / 4800MIPS
- 2004 First 1GHz C64x DSP
- Used for high performance DSP applications
- 2006 First C64x DSP in OMAP
- 330MHz
- Used as low power multimedia accelerator
3TMS32C10 Architecture
4The Basic DSP Algorithm
Coefficients
Results
yn S (ak xk) y0 a0x0 a1x1 a2x2 a3x3
- 4
5C64x DSP Hardware
6C64x DSP Core
7C64x Functional Units
- C64x DSP core uses four different functional
units - Each unit provides specific capabilities
- Many instructions can be executed on more than
one unit
8Pipeline - Basics
- Processor instructions typically require several
consecutive activities - Fetch program word from memory
- Decode (determine from the program word what to
do) - Execute the command
- Write (store the result in a register or in
memory) - Problem Each of these acitivities takes at least
one cycle - Execution of every instruction would take at
least four cycles - Idea Start to execute the next instructions
before the current instruction has completed
9Pipeline - Basics
10Pipeline Advantages
- Big improvement in performance for linear program
sequences - Improvement increases with the number of pipeline
stages - Better hardware partitioning
- Many smaller hardware blocks instead of one big
block - Better performance with slow operations
- Pipeline stages can be added for slow operations
such as memory accesses
11Pipeline Disadvantages
- Pipeline must be full to provide the advantages
- Pipeline latency (time to execute the first
instruction) equals the pipeline length - Operations such as branches can result in an
empty pipeline - Pipeline can introduce so-called pipeline hazards
- Result of an instruction is needed by the next
instruction before it is available - Protected Pipelines automatically wait for the
result to be available - Pipeline protection complexity increases with
pipeline length - Most high-performance processors (such as C64x)
use unprotected pipelines - Pipeline can result in ressource conflicts
- Different stages might try to access the same
ressources (e.g. Memories)
12Scalar, Super-scalar and VLIW Architectures
13OMAP
14Application Processors
Communications Processor
Applications Processor
User Interface
Air Interface
- Real-time media processing
- RTOS
- Non real-time appl control
- Advanced O/S
- User Interface
15OMAP TI Application Processor Family
OMAP1710
OMAP2420
OMAP3430
OMAP2430
Multimedia Processor (high-end)
Smartphone Processor (with Modem)
OMAP850
OMAPV1030
OMAPV1230
OMAP-DM290
OMAP-DM299
OMAP-DM510
Multimedia Accelerators
16OMAP3430 High-level Architecture
17C64x Integration into OMAP3430
IVA22 Megacell
PRCM
1 Wake-up Event
IVA Interrupt Controller
256b
L1 Program Memory Cache
48 Interrupts NMI
256b
256b
OMAP3430 Modules
4 x 64b
IVA DMA
20 DMA Requests
C64x DSP Core
256b
64b
L2 Unified Memory Cache
Extended Memory Controller
Local Resource Controller
A Registers
B Registers
ARM Interrupt Controller
IVA MMU
256b
64b
1
64b
M
L
D
S
M
L
D
S
32b
Config
Host Interface
64b
256b
256b
64b
64b
256b
L1 Data Memory Cache
64b
32b
OMAP3430 System Interconnect
18DSP Operating System
19What is DSP/BIOS?
- DSP BIOS is a modular DSP Operating System
- Deterministic scheduler
- Low footprint (only uses needed modules)
- Low latency
- Used in products with most handset manufacturers
- Supports all TI DSPs
- Easy to use with graphical configuration
interface - Free of charge, no royalties
20DSP BIOS Modules
21Example - Interrupt Vector Setup
22Real-time Analysis
Message Logs
CPU Load
Thread Statistical Information
Execution Graph (Software Logic Analyzer)
23Debug Tools
24Debug Basics
25How does debugging work?
- Debugging requires data and command exchange
between the Host (PC) and the Target processor
(DSP). - Data Exchange enables
- Program Download
- Data manipulation (Register Setting, Memory
read/write...) - Command Exchange enables
- Execution control (Breakpoints, Watchpoints,
Single Step...) - Processor State control (Reset, Restart, Run,
Halt...) - Two basic options exist for this data exchange
- Use a combination of
- Target Hardware for physical interfaceing and
- Target Software for execution control
- ?Bootloader
- Use dedicated Target Hardware only ? JTAG
interface
26The JTAG Interface
- Developed to test devices that are soldered
already into boards - Old Problem How to access the device pins?
- New Solution Boundary Scan Buffer
- Boundary Scan Buffers
- Critical signals are not routed directly from the
core to the pins - Signals go through special Boundary Scane Buffers
- Buffer state mirrors the signal level on the
signal line - Buffer state can be set and read
- Pins can be disconnected from core
- No interaction with external circuitry
- Internal signals can be set / read independently
of external world
27JTAG Scan Chain
- Buffer inputs / outputs of the buffers are not
output individually - They are chained together like a shift register
- Required Signals
- Input
- Output
- Clock
- Control
28Debug Solutions
29TI Code Composer Studio
- CCS is an integrated development environment for
DSP and ARM processors - It integrates
- Editor
- Code Generation Tools (Compiler, Assembler,
Linker) - DSP BIOS Operating System Tools
- Debugger with Breakpoint, Probepoint Capability
- Real Time Data Exchange between Host and Target
- It is flexible
- Can be extended with user-written Plug-ins
- Standardized API
- Will soon move to Eclipse environment
30CCS IDE
Menu Bar
Icon Bars
Source Code Editor
Project View
Output Windows
Message View
Status Bar
31Lauterbach Debugger
- Lauterbach is the industry standard ARM debugger
- Fast, efficient and stable
- Very light-weight, fast and fully-customizable
GUI - Rock-solid
- It is nearly impossible to get a Lauterbach to
crash - Fast JTAG access
- Download speed in excess of 1000 Kbytes per
second to ARM -
- Successful and reliable
- Best selling debug tools set in the world
- Approximately 50,000 systems in use world wide
- LB has been evaluated by all major telecoms
outside of China - LB is the tool of choice in almost all of them
(so I heard)
32TRACE32 GUI
33Demo
- OMAP2430 Video Decoding by C64