DSP Lecture Series - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

DSP Lecture Series

Description:

DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000 Computer Architecture and VLSI Technology Pioneer: Lynn Conway Fixed-point DSP datapath What is ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 25
Provided by: HU00
Category:

less

Transcript and Presenter's Notes

Title: DSP Lecture Series


1
DSP Lecture Series
  • DSP Memory Architecture
  • Dr. E.W. Hu
  • Nov. 28, 2000

2
Computer Architecture and VLSI Technology
Pioneer Lynn Conway
In the 1950s, while working at IBM, Lynn Conway
conceived the idea of multi-issue processors ,
the forerunner of todays VLIW processors?
3
Fixed-point DSP datapath
4
What is memory architecture
  • The characteristics of the organization of memory
    and its interconnection with the processors
    datapath is called memory architecture.
  • Memory architecture determines the memory
    bandwidth which is a critical factor that affects
    the performance of a DSP.

5
Memory bandwidth
  • In general, bandwidth w is defined as the rate at
    which the words can be written to (store) or read
    from the memory.
  • For a DSP, it is convenient to think of how many
    instruction cycles are needed to complete a read
    or write operation. If everything else is the
    same, the smaller the number of instruction
    cycles, the higher the bandwidth.

6
Why DSP applications needs large memory bandwidth
  • A high performance datapath is only part of a
    high-performance processor.
  • DSP applications are typical computation-intensive
    , which requires large amount of data to be moved
    to and from the memory quickly (between the
    datapath(s) and the memory module (s), as
    described in the next slide.

7
Typical DSP applications the FIR or finite
impulse response filter
8
At each tap, four memory accesses are needed
for FIR application
  • Fetch the MAC instruction in memory
  • Read the data value from memory(a sample from
    the signal)
  • Read the appropriate coefficient from memory
    (known constant for a particular filter)
  • Write the data value to memory (next location in
    the delay line)

9
The Von Neumann architecture for general-purpose
processors
10
The Harvard architecture design basis for most
DSPs more than two accesses per cycle
11
Variations of the Harvard architecture allow
still more memory accesses per instruction cycle
12
Typical DSPs with two or three independent memory
banks
  • Analog Devices ADSP-21xx
  • ATT DSP 16xx
  • Zilog Z893xx
  • Motorola DSP5600x, DSP563xx, DSP96002

13
Other approaches to achieve multiple accesses to
memories per cycle
  • Examples of some other approaches
  • multiple, sequential accesses per instruction
    cycle over a single set of buses (meaning each
    access takes less than one cycle), e.g., Zoran
    ZR3800.
  • Multi-ported memories that allow multiple
    concurrent memory accesses over two or more
    independent sets of buses (Fig 5.4), e.g., ATT
    DSP32xx.
  • Allows read/write operation to proceed at the
    same time under restricted circumstances, e.g.,
    ATT DSP16xx.

14
Using cache memory to reduce memory accesses
  • On-chip program cache reduces memory accesses
  • There are so many different implementations of
    program caches
  • Single instruction repeat buffer
  • Multiple-instruction cache (e.g., stores a block
    of 16 instructions)
  • Single-sector instruction cache that stores some
    number of most recently used instructions.

15
Using modulo addressing technique to reduce
memory accesses
  • To be discussed in the next seminar memory
    addressing modes

16
Using algorithmic approaches to reduce memory
accesses
  • Algorithms are used to exploit data locality to
    reduce memory accesses.
  • DSP algorithms that operate on blocks of input
    data often fetch the same data from memory
    multiple times during execution, as in the case
    of FIR filter computation.
  • In the example that follows, the filter operates
    on a block of two input samples. Instead of
    computing output samples one at a time, the
    filter instead computes two output samples at a
    time, allowing it to reuse previously fetched
    data. This effectively reduces the memory
    bandwidth required from one instruction fetch and
    two data fetches to one instruction fetch and one
    data fetch per instruction cycle.

17
Illustration of algorithmic approach
18
Memory wait states
  • Wait states are states in which the processor
    cannot execute its program because it is waiting
    for access to memory due to, for example
  • Slow memory
  • Bus sharing

19
On-chip ROM for low-cost embedded applications
  • On-chip ROM (usually small, 256 to 36K words) is
    used to store small application programs and
    constant data for low-cost embedded applications.

20
External memory interfaces
21
External memory interfaces manual caching
  • If a section of often-used program code is stored
    in a slow, off-chip memory, it is programmers
    responsibility to move the code to faster on-chip
    RAM, either at system start-up or when that
    section of program is needed.

22
Dynamic memory
  • Most DSPs use static RAM, which is faster and
    easier to interface, but it is more expensive.
  • For low-cost high-volume product, the designer
    might need to consider dynamic RAM, especially
    the static-column DRAM.

23
Direct memory access
  • DMA allows data transfer to take place (to/from
    processors memory) without the involvement of
    the processor itself.
  • It is typically used to improve the performance
    for I/O devices.

24
Customization
  • Some vendors are flexible enough to customize it
    chip-design for their customers (customizable
    DSPS).
Write a Comment
User Comments (0)
About PowerShow.com