DSP: an introduction - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

DSP: an introduction

Description:

Number of Views:33

Avg rating:3.0/5.0

Slides: 16

Provided by: sergioc6

Category:

Tags: dsp | accumulate | and | introduction | multiply

Transcript and Presenter's Notes

Title: DSP: an introduction

1
DSP an introduction

2
Why a DSP?

3
Which is the difference between a DSP and a
general purpose processor? (1/4)

Memory architecture and bus
The first processors (in the 40) had a Harvard
architecture separate memories for program and
data
But its complex -gt soon replaced by Von Neumann
architecture no real difference between program
and data (an instruction has two fields
operation and data)
Problem the processor cannot access instructions
and data simultaneously
To improve performance Harvard architecture
again!
In particular
- separate memories and busses for program
and data
- possibly, another separate bus for the DMA

4
Which is the difference between a DSP and a
general purpose processor? (2/4)

5
Which is the difference between a DSP and a
general purpose processor? (3/4)

A common operation in a FIR or IIR filter is
ABCD
a hardware multiplier (introduced in DSPs in the
'70) is needed
multiply and accumulate in only one clock cycle
MAC instruction.
Actually, the MAC is in a loop
H/W for address generation (the access to memory
is not random) zero overhead loop
- autoincrement circular addressing
H/W saturation
Instructions to perform a division quickly
Bit reversal for FFT

6
Which is the difference between a DSP and a
general purpose processor? (4/4)

Note several of these characteristics, which
were born on DSPs, have been ported to general
purpose processors

E.g. the cache in the Pentium processor
is Harvard-like
10

Another example. several units working in
parallel, and splittable ALUs (v. MMX extensions)
in the Pentium 4 processor

11
Pipeline

Example of a 4-stage pipeline (TI C30)
each instruction is executed in 4 clock cycles,
but (normally) can be put just 1 cycle after the
previous one (data are needed only 3 cycles
later)

12
Pipeline branch (e.g. on the C30)

Standard branch the pipeline is flushed to
correctly handle the PC -gt 4 cycles
Delayed branch the pipeline is not flushed, and
the 3 following instructions are loaded before
modifying the PC
-gt only 1 cycle needed!

BRD label delayed branch MPYF
executed ADDF executed SUBF
executed AND not executed label MPYF
fetched after SUBF
13
Architectures

In order to exploit the instruction level
parallelism (ILP) two possible architectures
Superscalar the parallelism is dynamically
managed by the hardware
Very Long Instruction Word (VLIW) the
parallelism is statically managed by the compiler
Which is the problem?
Dependences in data or control can generate
conflicts
- on data (an instruction needs the result of
a previous
instruction, but the results is not ready
yet), or
- on control (conditional jump, but the
condition is not ready yet)
-gt pipeline stall

14
Superscalar

The analysis of the independent instructions is
dynamically done by hardware (which is complex!)
The sequence of instructions can be executed
out-of-order then, the completion of the
instructions (commit) is done in-order to
correctly update the state of the CPU

15
VLIW

Very Long Instruction Word (VLIW) the
parallelism is statically managed by the compiler
The analysis of independent instructions is
statically realized during the compilation phase
- the instructions which can be realized in
parallel are assembled in long instructions and
send to the various functional units in-order
Convenient solution for DSP programs (fixed
length cycles, few conditional operations) less
convenient for general purpose applications
Simpler hardware! But a specific compilation for
each platform is needed
Deterministic behaviour -gt exact computation of
execution times