Lecture 1: Course Intro presentation

About This Presentation

Transcript and Presenter's Notes

Title: Lecture 1: Course Intro

1
Lecture 1 Course Intro

Prof. Mike Schulte
Application-Specific Processor Design
ECE 450-11

2
Course/Instructor Info

Name Application-Specific Processor Design
Number ECE 450-11
Homepage http//www.eecs.lehigh.edu/mschult
e/ece450-00
Location 416 Packard Lab
Time T, Th 500-615 PM
Instructor Michael Schulte
Office 326 Packard Lab
Phone 758-5036
Email mschulte_at_eecs.lehigh.edu
Office hours T, Th 615-700 or by appointment

3
Course Objectives

To provide students with the background needed to
design and analyze application-specific
processors.
Typical applications specific processing systems
include
Digital Signal Processing (DSP) Systems
Cellular telephones, wireless base-stations,
modems
Multimedia Systems
High-definition TV, video conferencing, computer
graphics
Scientific Computing Systems
Partial differential equation solvers, vector
processors
Control Systems
Manufacturing plants, navigation systems,
chemical processing

4
Topics Covered

Textbook Peter Pirsch, Architectures for
Digital Signal Processing, John Wiley Sons,
1998.
Signal Processing Algorithms Implementations
(Chapter 1)
Computer Arithmetic (Chapter 3)
Pipelining and Parallel Processing (Chapters 4)
Array Architectures (Chapter 5)
FIR, IIR, DFT, and FFT Implementations (Chapter 6
and 7)
Digital Signal Processors and Multimedia
Processors (Chapter 8)
Multiprocessor Systems (Chapter 9)
Implementation Strategies (Chapter 10)
Other useful textbooks
Keshab Parhi, VLSI Digital Signal Processing
Systems Design and Implementation , John Wiley
Sons, 1999.
Vijay K. Madisetti, Digital Signal Processors
An Introduction to Rapid Prototyping and Design
Synthesis, IEEE CS Press, 1995.
Course schedule
http//www.eecs.lehigh.edu/mschulte/ece450-00/sch
edule.html

5
Prerequisites and Grading

Prerequisites
A previous course in computer architecture (e.g.,
ECE201)
Experience with hardware description languages
(e.g., VHDL or Verilog)
Course does not assume a knowledge of DSP or
transistor level design.
Grading
Homeworks 20
First exam 20
Second exam 20
Class project 40

6
Course Project

The course project is to
Perform in-depth research on a topic in the field
of application-specific processor design
Research and design an application-specific
processor
The project will consist of
Project proposal (9/28/00)
Status report (10/26/00)
Final report (12/03/00)
Project presentation (12/01/00 and 12/03/00)
Projects to be done by one or two people

7
Sample Projects

Sample projects include
Research and design (in Verilog or VHDL)
DCT or FFT accelerators
Viterbi encoders or decoders
Low-power arithmetic units (e.g., multipliers,
adders, multiply-accumulate units, or function
approximators)
Parallel saturating arithmetic units for GSM
coders
Reed-Solomon or Turbo coders
Novel FIR or IIR implementations
Propose and evaluate compiler, architecture, or
circuit techniques for reducing power
dissipation.
Investigate designs for encryption and
decryption.

8
Useful Web Resources

Application specific processor design links
http//www.eecs.lehigh.edu/mschulte/ece450-00/as
p links
Computer arithmetic links
http//www.eecs.lehigh.edu/mschulte/ece450-00/co
mp-arch links
Digital design links
http//www.eecs.lehigh.edu/mschulte/ece450-00/de
sign links
Literature Search Links
Lehigh University Database Systems
http//www.lehigh.edu/inlib/
IEEE Xplore
http//ieeexplore.ieee.org/lpdocs/epic03/

9
DSP Algorithms
10
Typical DSP AlgorithmsFIR Filters

Filters reduce signal noise and enhance image or
signal quality by removing unwanted frequencies.
Finite Impulse Response (FIR) filters compute
where
x is the input sequence
y is the output sequence
h is the impulse response (filter coefficients)
N is the number of taps (coefficients) in the
filter
Output sequence depends only on input sequence
and impulse response.

11
Typical DSP AlgorithmsIIR Filters

Infinite Impulse Response (IIR) filters compute
Output sequence depends on input sequence,
previous outputs, and impulse response.
Both FIR and IIR filters
require dot product (multiply-accumulate)
operations
Use fixed coefficients
Adaptive filters update their coefficients to
minimize the distance between the filter output
and the desired signal.

12
Typical DSP AlgorithmsDiscrete Fourier Transform

The Discrete Fourier Transform (DFT) allows for
spectral analysis in the frequency domain.
It is computed as
for k 0, 1, , N-1, where
x is the input sequence in the time domain
y is an output sequence in the frequency domain
The Inverse Discrete Fourier Transform is
computed as
The Fast Fourier Transform (FFT) provides an
efficient method for computing the DFT.

13
Typical DSP AlgorithmsDiscrete Cosine Transform

The Discrete Cosine Transform (DCT) is frequently
used in video compression (e.g., MPEG-2).
The DCT and Inverse DCT (IDCT) are computed as
where e(k) 1/sqrt(2) if k 0 otherwise e(k)
1.
A N-Point, 1D-DCT requires N2 MAC operations.

14
Typical DSP AlgorithmsDistance Calculations

Distance calculations are typically used in
pattern recognition, motion estimation, and
coding.
Problem Chose the vector rk whose distance from
the input vector x is minimum.
The distance is typically defined as
The mean absolute difference (MAD or L1 norm)
The means square error (MSE or L2 norm)

15
Typical DSP AlgorithmsMatrix Computations

Matrix computations are typically used to
estimate parameters in DSP systems.
The Gauss-Jordan method for matrix
triangualrization uses the equations
where A is the matrix and anj is the pivot
element.
Givens method rotates a matrix by ?, using the
equations

16
Summary of DSP Applications

DSP Applications typically require
Dot product computations (Filters, Transforms,
Matrices)
Distance Calculations (pattern recognition,
coding)
Division or reciprocals (matrix computations,
normalization)
Functions approximations (givens rotations, DFT,
DCTs)

17
Compuation Rates

To estimate the hardware resources required, we
can use the equation
where
Rc is the computation rate
Rs is the sampling rate
nop is the average number of operations per
sample
For example, a 1-D FIR has nop 2N and a 2-D FIR
has nop 2N2.
What does the above equation assume?

18
Computational Rates for FIR Filtering
19
Computational Requirements
100 GOPs
10 GOPs
P X 64 CIF, 15 f/s, 100kb/s (1.2)
1 GOP
DFSE EQ - 2Mb/s (650)
Full-rate DAB Viterbi Decoder, MPEG II MP_at_ML,
30fps Decode (600)
400 MOPs
400 MOPs
16 X GSM_EFR (380)
300 MOPs
ADSL XCVR - 6.1Mb/s (360)
200 MOPs
100 MOPs
ADSL XCVR - 1.5Mb/s (100)
GSM Terminal (Baseband, HR) (52)
GSM_HR, AC-3 decode, V.34 (20)
GSM_EFR (16)
GSM_FR (2.5)
20
Implementation Hierarchy
Processing Method General Description of Task
Algorithm Actual computations steps and
functional relationships
Architecture Implementation of connected modules
- each with subtask
Circuitry Basic module implementation - as logic
gates or transistors
21
Processor Classification
Processor
DSP
General Purpose
Fixed Point
Floating Point
Integer
Floating Point
16 bit
20 bit
24 bit
32 bit IEEE
Other
32 bit subsets
32/64 bit IEEE
64 bit subsets
Other (80 bit)
22
Simplified DSP Chip
Instruction Memory
DSP Core
Data Memory
Serial Ports
A/D Converter
D/A Converter
23
DSP Basic Features

Fast Multiply-Accumulate (MAC)
DSP filters and transforms are multiply intensive
Multiple Access Memory
1 Instruction, 2 data per cycle
Specialized Addressing
Fifo, Arrays, Permutations
Specialized Program Control
Efficient loops
Fast Interrupt Handling

24
Code CharacteristicsGeneral Purpose vs. DSP

General Purpose
Limited Parallelism
Control Dominated
Inherently Serial
Branch Intensive (20)

DSP
Parallel Inner Loops
Loop Setup, then Compute
Overlapped Parallel Processing
Multiple Independent Streams

Amdahls Law
25
Workload Comparisons
Amdahls Law
Video
DSP
General Purpose
26
DSP vs. General Purpose

Execution Predictability
Required to guarantee real-time constraints
0-overhead Loop Buffer
Complex Instructions
Multiple Operations Issued
Harvard Memory Architecture
Specialized Addressing Modes
Operate on Stream Data

Fast But Non-predictable
Dynamic Instruction Issue
Non-deterministic caches
Branch Prediction
RISC Superscalar Instructions
Multiple Instructions Issued
Von Neumann Architecture
Split Cache has similar benefit
Typically Linear Addressing
Caches Assume Locality

27
Compilable Architectures
Implementations
Optimize
Cost / Power Performance
GSM xDSL DBS LMDS HFC-QAM QPSK DMT CAP DFE MLSE DS
S LMDS HFC OSDM DFSE CELP
DAB DVD DVB MPEG2 MPEG4 AC-3 MUSICAM COFDM FFT FIR
28
System Integration Trends
Current Products
GSM Base Transceiver Station
A/D
8 FR/EFR, 16 HR Channels
DSP
DSP
DSP
DSP
DSP
DSP
A/D
D/A
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
Current Designs
GSM Base Transceiver Station
A/D
8 FR/EFR, 16 HR Channels
DSP1620
DSP1620
DSP1620
A/D
Pre-equalizer
Equalizer
Channel Coding
D/A
Next Generation Designs
GSM Base Transceiver Station
A/D
8 FR/EFR, 16 HR Channels
DSP Future
A/D
Pre-equalizer Equalizer Channel Coding
D/A

Write a Comment

User Comments (0)

About PowerShow.com

Lecture 1: Course Intro PowerPoint PPT Presentation