Survey of Digital Signal Processors - PowerPoint PPT Presentation

About This Presentation

Title:

Survey of Digital Signal Processors

Description:

Gene's Law will have it's challenges to hold the line! Digital Audio. MP3. Real Audio ... Buy. Now? Yes No. What's Driving Gene's Law? DSP Design Constraints ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 27

Provided by: ECS147

Category:

more less

Transcript and Presenter's Notes

Title: Survey of Digital Signal Processors

1
Survey of Digital Signal Processors

Michael Warner
ECD VLSI Communication Systems

2
Agenda

Industry Trends
DSP Architecture
DSP Micro-Architecture
DSP Systems

3
Agenda

Industry Trends
DSP Architecture
DSP Micro-Architecture
DSP Systems

4
Moores Law Drives Processor Development
But what if energy-delay had to be reduced every
generation by an order of magnitude?
Doubling the number of transistors every 18-24 at
same price point drives significant product
opportunities especially if you have little
regard for power
5
Genes Law DrivesDSP Development
Genes Law will have its challenges to hold the
line!
6
Whats Driving Genes Law?
7
DSP Design Constraints
DEVICE CAPABILITIES
8
Agenda

Industry Trends
DSP Architecture
DSP Micro-Architecture
DSP Systems

9
What Makes a DSP a DSP?

Single-Cycle MAC
Multiple Execution Units
High Bandwidth (Flat) Memory Sub-Systems
Efficient Zero-Overhead Looping
Short Pipeline
High Bandwidth I/O
Specialized Instruction Sets
Sophisticated DMA
Little to No Speculation

10
Single Cycle MAC

MACs Typically Determine DSP Performance and
Pipeline Length (EX)
Most DSPs Have 2-8 MAC Units
MACs Typically Operate in Both a Scalar and
Vector Mode

11
Multiple Instruction Units

VLIW Architectures Driving ILP
Typically Instruction Units
M-Unit - MAC
S-Unit - Shift
L-Unit - ALU
D-Unit Load/Store
Industry Has Converged on a ILP of 8

Registers B0 - B15
Registers A0 - A15
2X
1X
D2
M1
D1
L 1
S1
M2
L2
S2
D
S1
S2
D
S1
S2
D
S1
S2
S1
S2
DL
SL
SL
D
DL
S2
S1
D
S2
D
DL
SL
SL
D
DL
S2
S1
S1
S2
D
S1
DDATA_I2 (load data)
DDATA_I1 (load data)
12
High Bandwidth Memory Sub-Systems

Multiple Load-Store Units Required to Feed Data
Path
Tightly Coupled Memory is Typically Dual Ported
Harvard Architecture is Heavily Banked

PC
CNTL
ARs
P
MUXES
D
MUX
INTERNAL MEMORY
EXTERNAL MEMORY
C
E
CentralArithmeticLogic Unit
MAC
ALU
SHIFTER
B
A
13
Specialized Instruction Sets

Base RISC ISA Plus CISC ISA Driven by End
Application
MAC
SAD
LMS
FIRS
Viterbi
Support For Both Scalar and Vector Instructions
Support For 8, 16 and 32-Bit Instructions
Instructions are Highly Orthogonal

14
Scalar (55x) vs VLIW (64x)

Scalar DSPs Tend to be More CISC Like
Hurts Compiler Performance
Improves Energy-Delay
Improves Code Density
Limits Top End Performance
VLIW DSPs Tend to be More RISC Like
RISC GP Regs Orthogonality Makes For a Good C
Compiler
Assembler Code Is Challenging
RISC ISA Allows for Higher Frequencies
Load-Store Hurts Energy-Delay

15
TMS320C54x
16
TMS320C54x Protected Pipeline
CYCLES
P1
X6
Prefetch Calculate address of instruction
Fetch Collect instruction Decode Interpret
instruction Access Collect address of
operand Read Collect operand Execute Perform
operation
Fully loaded pipeline
Note Protected Pipeline Limits
Micro-Architectural Flexibility and Performance
17
TMS320C6xx
C6xx CPU Core
Program Fetch
Control Registers
Instruction Dispatch
Instruction Decode
Control Logic
Data Path 1
Data Path 2
A Register File
B Register File
Test
Emulation
D1
M1
S1
L1
L2
S2
M2
D2
Interrupts
ArithmeticLogicUnit
Auxiliary LogicUnit
MultiplierUnit
18
TMS320C6xx Exposed Pipeline
Fetch
Decode
Execute
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5

Fetch
PG Program Address Generate
PS Program Address Send
PW Program Access Ready Wait
PR Program Fetch Packet Receive

Decode
DP Instruction Dispatch
DC Instruction Decode
Execute
E1 - E5 Execute 1 through Execute 5

Execute Packet 1
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 2
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 3
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 4
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 5
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 6
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 7
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Note Exposed Pipeline Adds Risk to Programming
Model
19
Agenda

Industry Trends
DSP Architecture
DSP Micro-Architecture
DSP Systems

20
Micro-Architectural Challenges

Accessing (Flat) On Chip Memory At Speed Within
2-3 cycles
Feeding Multiple Functional Units From a Single
Register File
Running 600Mhz with a 7-9 Stage Pipeline
Linking Multiple Functional Units with Result
Forwarding
Implementing CISC Data-path to Meet Area and
Performance Goals
Achieving ARM Like Code Density

21
What Does and Doesnt Work?

Do
Banked Memory
Dual Access Memory
Full Custom Register Files
Split/Multiple Register Files
Custom/Semi-Custom Data-paths
Variable Length Instructions
CISC ISA
Co-Processors
Multi-Core
Dont
Multi-Level Caches
Super-Scalar
VLIW Packet Descriptors
Speculative Branching
Full Synthesis
Dynamic Logic
Consider
Multi-Threading

22
Agenda

Industry Trends
DSP Architecture
DSP Micro-Architecture
DSP Systems

23
DSP Systems
24
VIOP Platform

TNETV3010 Features
6 C55x DSP _at_ 300 MHz
Shared Instruction Memory
Broadcast DMA
24M Bits of On Chip SRAM

25
DaVinci Platform
26
OMAP Platform

OMAP2420 Features
ARM 1136 _at_ 330 MHz, VFP (Vector Floating Point),
32K/32K I/Dcache
DSP _at_ 220 MHz
2D/3D graphics accelerator
IVA supports still images to gt4 Mpixels, 30 fps
VGA video decode
Output to TV for gaming and video playback
Encryption hardware for DRM and security

Imaging VideoAccelerator(IVA)
2D/3DGraphics Accelerator
ARM11 VFP
TMS320C55x DSP
L3 Interconnect
LCD I/FVideoOut
Camera I/F
MemoryController
Internal SRAM
Peripherals
L4 Interconnect
Security
OMAP2420

Write a Comment

User Comments (0)