DSP: an introduction - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

DSP: an introduction

Description:

Problem: the processor cannot access instructions and data simultaneously ... multiply and accumulate in only one clock cycle: MAC instruction. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 16
Provided by: sergioc6
Category:

less

Transcript and Presenter's Notes

Title: DSP: an introduction


1
DSP an introduction
  • Why a DSP?
  • Characteristics of a DSP
  • Some commercial DSPs

2
Why a DSP?
  • Its easy we want an architecture optimized for
    Digital Signal Processing
  • Some versions are further optimized for some
    specific applications
  • - e.g. very low power consumption for mobile
    phones

3
Which is the difference between a DSP and a
general purpose processor? (1/4)
  • Memory architecture and bus
  • The first processors (in the 40) had a Harvard
    architecture separate memories for program and
    data
  • But its complex -gt soon replaced by Von Neumann
    architecture no real difference between program
    and data (an instruction has two fields
    operation and data)
  • Problem the processor cannot access instructions
    and data simultaneously
  • To improve performance Harvard architecture
    again!
  • In particular
  • - separate memories and busses for program
    and data
  • - possibly, another separate bus for the DMA

4
Which is the difference between a DSP and a
general purpose processor? (2/4)
  • A DSP is often used to realize a linear filter
  • The convolution integral
  • is actually a sum
  • ynSixn-ihi
  • - if the number of sums is finite FIR filter
    (finite impulse response),
  • - otherwise IIR (infinite impulse response),
  • - which can be realized using two finite sums
  • ynSixn-ibi Siyn-iai

5
Which is the difference between a DSP and a
general purpose processor? (3/4)
  • A common operation in a FIR or IIR filter is
    ABCD
  • a hardware multiplier (introduced in DSPs in the
    '70) is needed
  • multiply and accumulate in only one clock cycle
    MAC instruction.
  • Actually, the MAC is in a loop
  • H/W for address generation (the access to memory
    is not random) zero overhead loop
  • - autoincrement circular addressing
  • H/W saturation
  • Instructions to perform a division quickly
  • Bit reversal for FFT

6
Which is the difference between a DSP and a
general purpose processor? (4/4)
  • Often, data are 16- o 8-bit wide (e.g., audio or
    images)
  • a 32-bit ALU can be splitted in two 16-bit ALUs
    or four 8-bit ALUs,
  • -gt 2 o 4 operations in parallel
  • several ALUs which work in parallel
  • fixed point ALUs, o 16-bit ALUs, to reduce power
    consumption and costs
  • optimized versions
  • - costs for consumer applications
  • - power for mobile applications
  • - for specific applications, e.g. electric motor
    control

7
  • Example C30 (Texas Instruments, 1982)

8
  • Example FIR filter using a C30

9
  • Note several of these characteristics, which
    were born on DSPs, have been ported to general
    purpose processors

E.g. the cache in the Pentium processor
is Harvard-like
10
  • Another example. several units working in
    parallel, and splittable ALUs (v. MMX extensions)
    in the Pentium 4 processor

11
Pipeline
  • Example of a 4-stage pipeline (TI C30)
  • each instruction is executed in 4 clock cycles,
    but (normally) can be put just 1 cycle after the
    previous one (data are needed only 3 cycles
    later)

12
Pipeline branch (e.g. on the C30)
  • Standard branch the pipeline is flushed to
    correctly handle the PC -gt 4 cycles
  • Delayed branch the pipeline is not flushed, and
    the 3 following instructions are loaded before
    modifying the PC
  • -gt only 1 cycle needed!

BRD label delayed branch MPYF
executed ADDF executed SUBF
executed AND not executed label MPYF
fetched after SUBF
13
Architectures
  • In order to exploit the instruction level
    parallelism (ILP) two possible architectures
  • Superscalar the parallelism is dynamically
    managed by the hardware
  • Very Long Instruction Word (VLIW) the
    parallelism is statically managed by the compiler
  • Which is the problem?
  • Dependences in data or control can generate
    conflicts
  • - on data (an instruction needs the result of
    a previous
  • instruction, but the results is not ready
    yet), or
  • - on control (conditional jump, but the
    condition is not ready yet)
  • -gt pipeline stall

14
Superscalar
  • The analysis of the independent instructions is
    dynamically done by hardware (which is complex!)
  • The sequence of instructions can be executed
    out-of-order then, the completion of the
    instructions (commit) is done in-order to
    correctly update the state of the CPU

15
VLIW
  • Very Long Instruction Word (VLIW) the
    parallelism is statically managed by the compiler
  • The analysis of independent instructions is
    statically realized during the compilation phase
  • - the instructions which can be realized in
    parallel are assembled in long instructions and
    send to the various functional units in-order
  • Convenient solution for DSP programs (fixed
    length cycles, few conditional operations) less
    convenient for general purpose applications
  • Simpler hardware! But a specific compilation for
    each platform is needed
  • Deterministic behaviour -gt exact computation of
    execution times
Write a Comment
User Comments (0)
About PowerShow.com