DSP Lecture Series - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

DSP Lecture Series

Description:

DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000 Computer Architecture and VLSI Technology Pioneer: Lynn Conway Fixed-point DSP datapath What is ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 25

Provided by: HU00

Category:

more less

Transcript and Presenter's Notes

Title: DSP Lecture Series

1
DSP Lecture Series

DSP Memory Architecture
Dr. E.W. Hu
Nov. 28, 2000

2
Computer Architecture and VLSI Technology
Pioneer Lynn Conway
In the 1950s, while working at IBM, Lynn Conway
conceived the idea of multi-issue processors ,
the forerunner of todays VLIW processors?
3
Fixed-point DSP datapath
4
What is memory architecture

The characteristics of the organization of memory
and its interconnection with the processors
datapath is called memory architecture.
Memory architecture determines the memory
bandwidth which is a critical factor that affects
the performance of a DSP.

5
Memory bandwidth

In general, bandwidth w is defined as the rate at
which the words can be written to (store) or read
from the memory.
For a DSP, it is convenient to think of how many
instruction cycles are needed to complete a read
or write operation. If everything else is the
same, the smaller the number of instruction
cycles, the higher the bandwidth.

6
Why DSP applications needs large memory bandwidth

A high performance datapath is only part of a
high-performance processor.
DSP applications are typical computation-intensive
, which requires large amount of data to be moved
to and from the memory quickly (between the
datapath(s) and the memory module (s), as
described in the next slide.

7
Typical DSP applications the FIR or finite
impulse response filter
8
At each tap, four memory accesses are needed
for FIR application

Fetch the MAC instruction in memory
Read the data value from memory(a sample from
the signal)
Read the appropriate coefficient from memory
(known constant for a particular filter)
Write the data value to memory (next location in
the delay line)

9
The Von Neumann architecture for general-purpose
processors
10
The Harvard architecture design basis for most
DSPs more than two accesses per cycle
11
Variations of the Harvard architecture allow
still more memory accesses per instruction cycle
12
Typical DSPs with two or three independent memory
banks

Analog Devices ADSP-21xx
ATT DSP 16xx
Zilog Z893xx
Motorola DSP5600x, DSP563xx, DSP96002

13
Other approaches to achieve multiple accesses to
memories per cycle

Examples of some other approaches
multiple, sequential accesses per instruction
cycle over a single set of buses (meaning each
access takes less than one cycle), e.g., Zoran
ZR3800.
Multi-ported memories that allow multiple
concurrent memory accesses over two or more
independent sets of buses (Fig 5.4), e.g., ATT
DSP32xx.
Allows read/write operation to proceed at the
same time under restricted circumstances, e.g.,
ATT DSP16xx.

14
Using cache memory to reduce memory accesses

On-chip program cache reduces memory accesses
There are so many different implementations of
program caches
Single instruction repeat buffer
Multiple-instruction cache (e.g., stores a block
of 16 instructions)
Single-sector instruction cache that stores some
number of most recently used instructions.

15
Using modulo addressing technique to reduce
memory accesses

To be discussed in the next seminar memory
addressing modes

16
Using algorithmic approaches to reduce memory
accesses

Algorithms are used to exploit data locality to
reduce memory accesses.
DSP algorithms that operate on blocks of input
data often fetch the same data from memory
multiple times during execution, as in the case
of FIR filter computation.
In the example that follows, the filter operates
on a block of two input samples. Instead of
computing output samples one at a time, the
filter instead computes two output samples at a
time, allowing it to reuse previously fetched
data. This effectively reduces the memory
bandwidth required from one instruction fetch and
two data fetches to one instruction fetch and one
data fetch per instruction cycle.

17
Illustration of algorithmic approach
18
Memory wait states

Wait states are states in which the processor
cannot execute its program because it is waiting
for access to memory due to, for example
Slow memory
Bus sharing

19
On-chip ROM for low-cost embedded applications

On-chip ROM (usually small, 256 to 36K words) is
used to store small application programs and
constant data for low-cost embedded applications.

20
External memory interfaces
21
External memory interfaces manual caching

If a section of often-used program code is stored
in a slow, off-chip memory, it is programmers
responsibility to move the code to faster on-chip
RAM, either at system start-up or when that
section of program is needed.

22
Dynamic memory

Most DSPs use static RAM, which is faster and
easier to interface, but it is more expensive.
For low-cost high-volume product, the designer
might need to consider dynamic RAM, especially
the static-column DRAM.

23
Direct memory access

DMA allows data transfer to take place (to/from
processors memory) without the involvement of
the processor itself.
It is typically used to improve the performance
for I/O devices.

24
Customization