Title: Section 1B: Digital Signal Processor Hardware Architecture
1Section 1B Digital Signal Processor Hardware
Architecture
- DSP Architecture Types
- von Neumann
- Harvard, modified Harvard
- VLIW, superscalar
2Overview
Fix Point Arithmetic Architecture
Types Selection Criteria
3Outline
- Why a DSP processors?
- Architecture Types
- von Neumann
- Harvard / Modified Harvard
- Very Large Instruction Word (VLIW)
- Super Scalar
4Why specialized DSP processors?
- Difference between digital signal processing
applications and typical applications - is the number of multiplications required
- Example
- Typical application (word processing, spread
sheet) - lt10 Multiply and ACcumulate (MAC)
- DSP Application
- Digital filtering, modulation, sounds, graphics
- 50-95 MAC
5Why specialized DSP processors?
- General purpose processors for DSP applications
are designed to speed up the multiply and
accumulate (MAC)
MAC
- A one clock-cycle hardware multiplier is used
- General purpose processors may use micro-code for
multiplier (several additions and shifts) - May also have multiple data buses for fast data
access
6Why specialized DSP processors?
- Fast multiplier (hardware)
- Hardware for scaling data (barrel shifter)
- on-chip memory (RAM, ROM, FLASH)
- on-chip peripherals
- A/D and D/A converters
- Serial ports
- H/W for compression and encoding data (Verterbi
encoder) - low-cost, low-power (or high performance)
7NEXT
- Why a DSP processors?
- Architecture Types
- von Neumann
- Harvard / Modified Harvard
- Very Large Instruction Word (VLIW)
- Super Scalar
8Architecture Types
- Here are four main architectures
- von Neumann (Ex HC12)
- Harvard/ Modified Harvard (Ex C5416 de TI)
Architectures Multi-Issues - Very Large Instruction Word (VLIW) (Ex C62x)
- Super Scalar (Ex Pentium 4,..C64x..)
- von Neumann
- Harvard/ Modified Harvard Architectures
Multi-Issues - Very Large Instruction Word (VLIW)
- Super Scalar
- In the past DSPs used primarily
9NEXT
- Why a DSP processors?
- Architecture Types
- von Neumann
- Harvard / Modified Harvard
- Very Large Instruction Word (VLIW)
- Super Scalar
10von Neumann
1 Bus only
- data/instructions in memory
- 1 bus for data /instructions
- executes in linear manner PC lt- PC1
- fetches 1 word per cycle
- easy to interface
11von Neumann DSP Processor
- Basic processor with functional blocks shown
- Note One bus for instructions and data
12von Neumann Basic Operation
- Program and data reside in memory for maximum
flexibility - Basic operation of the computer
- fetch next instruction
- execute instruction
- repeat
13von Neumann Control Registers
- PC program counter
- IR instruction register
- MAR memory address register
- MDR memory data register
- IO AR i/o address register
- IO DR i/o data register
- IMR interrupt mask register
- IFR interrupt flag register
- ACC accumulator (general purpose register)
14von Neumann Diagram
- Program and data are stored in same memory space
- MAR, MDR and IR are not visible to user
- This is a simple interface to allow flexible
coding
15von Neumann Steps to execute
- Instruction address calculation
- MAR PC, PCPC1
- Fetch instruction MDR (MAR), IR MDR
- Decode instruction
- Operand address calculation MARaddress
- Fetch operand MDR (MAR)
- repeat 4 and 5 for multiple operands
- Execute operation
- Calculate output operand address
- MAR address destination
- Store result (MAR) (MDR)
16Example Execution of an addition
- ADD B,A memory to memory operation
- (A) lt (B) (A)
- .
- .
- .
- .
- .
- .
Fetch operand (add) Fetch memory contents
(A) Fetch memory contents (B) Add (A) and
(B) Store result in A
17Example Addition Requirements
- 4 memory accesses
- (PC) , (A) read, (B) read, (A) write
- addresses for two operands
- A, B addresses must be included in instruction
- may require many words
- instructions take varying execution time
- depending on memory accesses required.
18Quick Quiz
- What is the main reason for which you would
choose a DSP processor within your designed
system? - What is the main characteristic of the von
Newmann architecture?
19NEXT
- Why a DSP processors?
- Architecture Types
- von Neumann
- Harvard / Modified Harvard
- Very Large Instruction Word (VLIW)
- Super Scalar
20Harvard Architecture
- Separate program and data buses
- Program-bus-address program-bus-data on one bus
- Data-bus-address and data-bus-data on another bus
21Harvard Architecture
- Main feature
- separate data and program bus
- can fetch instruction/data in parallel
- Allows for pipelining of instructions
- Hast repetitive operations
- Pipeline Address calculation of one instruction,
while fetching the previous instruction, while
executing the ex-previous instruction.
22Modified Harvard Architecture
- Many constants are stored in the code
- These constants are read from the program bus
(e.g. 2M_PI)
- In a pure Harvard architecture, there is no way
to get these constants to the ALU to use them as
data - The modified Harvard architecture allows data
(constants) to be read from the PB
23Modified Harvard Architecture C54x
Prog.
Data
24Modified Harvard Architecture C54x
- Program bus
- Three data buses
- PAB program address bus
- PB program data bus
- CAB, CB, DAB, DB data read / io read
- EAB, EB data write / io write
25Modified Harvard Architecture C54x
- Modifications of DSP C54x
- can read data from program bus
- macd, firs, Smem prog(pmad)
- multiple buses for on-chip memory
26NEXT
- Why a DSP processors?
- Architecture Types
- von Neumann
- Harvard / Modified Harvard
- Very Large Instruction Word (VLIW)
- Super Scalar
27Recap Architecture types
- Here are 4 main architectures
- Architectures Single-Issue
- von Neumann
- Harvard / Modified Harvard
- Architectures Multi-Issues
- Very Large Instruction Word (VLIW)
- Super Scalar
28Multi-issues Architectures
- single issue architecture
- one command issued per cycle
- pipelining
- partially execute several commands per cycle
- multi-issue architecture
- group commands in parallel
29VLIW Architecture
- Very Large Instruction Word (VLIW)
- Multi-issues Architecture
- Programmer/compiler must group instructions
- Grouped instructions result in a
- very long instruction word
- Requires parallel algorithms
30VLIW Architecture C62x/67x
Duplicated Hardware
2 possible paths
31VLIW Architecture C62x
32VLIW Architecture
(Parallel Calculation)
- Advantages
- Typically use 32 bit instructions
- General purpose registers
- Simple instructions
- p.ex. macd besoin de mult, add, dmov, ARx
- C54x 1 word (16 bits)
- C67x 4 words (128 bits)
(Flexibility for programmer)
(Simple HW, hence ? cost )
33VLIW Architecture
Ex MACD ? 32bit 32 32 32 32
- Disadvantages
- compiler groups instructions
- difficult for designer
- requires recompilation for new generations
- Large code size
- large word size
- 1 instruction per functional unit
- simple instructions
- often have cache memory
- C67x 8 units, 256 bit instructions per cycle
() Go away from assembly
(Costly in terms of Power and memory)
(Sequential operation could be slow)
34VLIW Architecture
- TI has made compromises on C64x
- 16 bit instructions and 32 bit instructions
- 32 instructions for critical code (fast)
- 16 bit instructions to reduce code size
(limited) - complex instructions added (eg macd)
- more difficult for compiler
- increase execution speed
- assembly optimized libraries included
(? speed)
(? cost in power and memory)
(? )
(? speed)
(mitigates effects of Complex commands )
35NEXT
- Why a DSP processors?
- Architecture Types
- von Neumann
- Harvard / Modified Harvard
- Very Large Instruction Word (VLIW)
- Super Scalar
36Super Scalar Architecture
- multi-issue
- groups of instructions issued in parallel
- dedicated HW on-chip
- determines instructions to execute in parallel
- based on
- data dependencies
- resources available
- Eg Pentium II
- code compatible with 486
- issues 2 486 commands in parallel where possible
37Super Scalar Architecture Example
Distribution module (Advanced instruction
packing) added
38Super Scalar Architecture
- Advantages
- code compatible with older processors
- easier to program
- Disadvantages
- More complicate HW than VLIW
- larger power consumption
- unpredictable execution time
39Super Scalar Architecture Problem
- DSP applications are hard real-time
- require predictable execution time
- DSP applications use almost all cycles
- cell phones 98
- HDTV 90
- Can use worst case execution time
- wastes power/clock cycles
some DSP use superscalar due to high speed
40Quick Quiz
- You are building a cell-phone and need to choose
a DSP. You need to digitally process the voice,
modulate and demodulate the signal, compress and
expand the voice and maximize the length of your
batteries. - What architecture of DSP would you choose?
- (von Neumann, Harvard, VLIW, superscalar)
- Justify your answer?
41Answer
- Von Neumann
- Will not meet computational requirements
- Harvard (or Modified)
- adequate processing power
- Low power consumption
- cheap
- VLIW
- more program memory and power consumption than
Harvard - Processing power not required
- Superscalar
- do not need extra performance
- - Takes much more power (less battery life)
42Quick Quiz 2
- You are building a new Sony Wi. You need to
digitally process images and sound. You need to
play DVDs. As well you need to process user
movements in real-time. - What architecture of DSP would you choose?
- (von Neumann, Harvard, VLIW, superscalar)
- Justify your answer?
43Answer