Title: Lecture 1: Course Intro
1Lecture 1 Course Intro
- Prof. Mike Schulte
- Application-Specific Processor Design
- ECE 450-11
2Course/Instructor Info
- Name Application-Specific Processor Design
- Number ECE 450-11
- Homepage http//www.eecs.lehigh.edu/mschult
e/ece450-00 - Location 416 Packard Lab
- Time T, Th 500-615 PM
- Instructor Michael Schulte
- Office 326 Packard Lab
- Phone 758-5036
- Email mschulte_at_eecs.lehigh.edu
- Office hours T, Th 615-700 or by appointment
3Course Objectives
- To provide students with the background needed to
design and analyze application-specific
processors. - Typical applications specific processing systems
include - Digital Signal Processing (DSP) Systems
- Cellular telephones, wireless base-stations,
modems - Multimedia Systems
- High-definition TV, video conferencing, computer
graphics - Scientific Computing Systems
- Partial differential equation solvers, vector
processors - Control Systems
- Manufacturing plants, navigation systems,
chemical processing
4Topics Covered
- Textbook Peter Pirsch, Architectures for
Digital Signal Processing, John Wiley Sons,
1998. - Signal Processing Algorithms Implementations
(Chapter 1) - Computer Arithmetic (Chapter 3)
- Pipelining and Parallel Processing (Chapters 4)
- Array Architectures (Chapter 5)
- FIR, IIR, DFT, and FFT Implementations (Chapter 6
and 7) - Digital Signal Processors and Multimedia
Processors (Chapter 8) - Multiprocessor Systems (Chapter 9)
- Implementation Strategies (Chapter 10)
- Other useful textbooks
- Keshab Parhi, VLSI Digital Signal Processing
Systems Design and Implementation , John Wiley
Sons, 1999. - Vijay K. Madisetti, Digital Signal Processors
An Introduction to Rapid Prototyping and Design
Synthesis, IEEE CS Press, 1995. - Course schedule
- http//www.eecs.lehigh.edu/mschulte/ece450-00/sch
edule.html
5Prerequisites and Grading
- Prerequisites
- A previous course in computer architecture (e.g.,
ECE201) - Experience with hardware description languages
(e.g., VHDL or Verilog) - Course does not assume a knowledge of DSP or
transistor level design. - Grading
- Homeworks 20
- First exam 20
- Second exam 20
- Class project 40
6Course Project
- The course project is to
- Perform in-depth research on a topic in the field
of application-specific processor design - Research and design an application-specific
processor - The project will consist of
- Project proposal (9/28/00)
- Status report (10/26/00)
- Final report (12/03/00)
- Project presentation (12/01/00 and 12/03/00)
- Projects to be done by one or two people
7Sample Projects
- Sample projects include
- Research and design (in Verilog or VHDL)
- DCT or FFT accelerators
- Viterbi encoders or decoders
- Low-power arithmetic units (e.g., multipliers,
adders, multiply-accumulate units, or function
approximators) - Parallel saturating arithmetic units for GSM
coders - Reed-Solomon or Turbo coders
- Novel FIR or IIR implementations
- Propose and evaluate compiler, architecture, or
circuit techniques for reducing power
dissipation. - Investigate designs for encryption and
decryption.
8Useful Web Resources
- Application specific processor design links
- http//www.eecs.lehigh.edu/mschulte/ece450-00/as
p links - Computer arithmetic links
- http//www.eecs.lehigh.edu/mschulte/ece450-00/co
mp-arch links - Digital design links
- http//www.eecs.lehigh.edu/mschulte/ece450-00/de
sign links - Literature Search Links
- Lehigh University Database Systems
- http//www.lehigh.edu/inlib/
- IEEE Xplore
- http//ieeexplore.ieee.org/lpdocs/epic03/
-
9DSP Algorithms
10Typical DSP AlgorithmsFIR Filters
- Filters reduce signal noise and enhance image or
signal quality by removing unwanted frequencies.
- Finite Impulse Response (FIR) filters compute
- where
- x is the input sequence
- y is the output sequence
- h is the impulse response (filter coefficients)
- N is the number of taps (coefficients) in the
filter - Output sequence depends only on input sequence
and impulse response.
11Typical DSP AlgorithmsIIR Filters
- Infinite Impulse Response (IIR) filters compute
- Output sequence depends on input sequence,
previous outputs, and impulse response. - Both FIR and IIR filters
- require dot product (multiply-accumulate)
operations - Use fixed coefficients
- Adaptive filters update their coefficients to
minimize the distance between the filter output
and the desired signal.
12Typical DSP AlgorithmsDiscrete Fourier Transform
- The Discrete Fourier Transform (DFT) allows for
spectral analysis in the frequency domain. - It is computed as
- for k 0, 1, , N-1, where
- x is the input sequence in the time domain
- y is an output sequence in the frequency domain
- The Inverse Discrete Fourier Transform is
computed as - The Fast Fourier Transform (FFT) provides an
efficient method for computing the DFT.
13Typical DSP AlgorithmsDiscrete Cosine Transform
- The Discrete Cosine Transform (DCT) is frequently
used in video compression (e.g., MPEG-2). - The DCT and Inverse DCT (IDCT) are computed as
- where e(k) 1/sqrt(2) if k 0 otherwise e(k)
1. - A N-Point, 1D-DCT requires N2 MAC operations.
14Typical DSP AlgorithmsDistance Calculations
- Distance calculations are typically used in
pattern recognition, motion estimation, and
coding. - Problem Chose the vector rk whose distance from
the input vector x is minimum. - The distance is typically defined as
- The mean absolute difference (MAD or L1 norm)
- The means square error (MSE or L2 norm)
15Typical DSP AlgorithmsMatrix Computations
- Matrix computations are typically used to
estimate parameters in DSP systems. - The Gauss-Jordan method for matrix
triangualrization uses the equations - where A is the matrix and anj is the pivot
element. - Givens method rotates a matrix by ?, using the
equations
16Summary of DSP Applications
- DSP Applications typically require
- Dot product computations (Filters, Transforms,
Matrices) - Distance Calculations (pattern recognition,
coding) - Division or reciprocals (matrix computations,
normalization) - Functions approximations (givens rotations, DFT,
DCTs) -
17Compuation Rates
- To estimate the hardware resources required, we
can use the equation - where
- Rc is the computation rate
- Rs is the sampling rate
- nop is the average number of operations per
sample - For example, a 1-D FIR has nop 2N and a 2-D FIR
has nop 2N2. - What does the above equation assume?
18Computational Rates for FIR Filtering
19Computational Requirements
100 GOPs
10 GOPs
P X 64 CIF, 15 f/s, 100kb/s (1.2)
1 GOP
DFSE EQ - 2Mb/s (650)
Full-rate DAB Viterbi Decoder, MPEG II MP_at_ML,
30fps Decode (600)
400 MOPs
400 MOPs
16 X GSM_EFR (380)
300 MOPs
ADSL XCVR - 6.1Mb/s (360)
200 MOPs
100 MOPs
ADSL XCVR - 1.5Mb/s (100)
GSM Terminal (Baseband, HR) (52)
GSM_HR, AC-3 decode, V.34 (20)
GSM_EFR (16)
GSM_FR (2.5)
20Implementation Hierarchy
Processing Method General Description of Task
Algorithm Actual computations steps and
functional relationships
Architecture Implementation of connected modules
- each with subtask
Circuitry Basic module implementation - as logic
gates or transistors
21Processor Classification
Processor
DSP
General Purpose
Fixed Point
Floating Point
Integer
Floating Point
16 bit
20 bit
24 bit
32 bit IEEE
Other
32 bit subsets
32/64 bit IEEE
64 bit subsets
Other (80 bit)
22Simplified DSP Chip
Instruction Memory
DSP Core
Data Memory
Serial Ports
A/D Converter
D/A Converter
23DSP Basic Features
- Fast Multiply-Accumulate (MAC)
- DSP filters and transforms are multiply intensive
- Multiple Access Memory
- 1 Instruction, 2 data per cycle
- Specialized Addressing
- Fifo, Arrays, Permutations
- Specialized Program Control
- Efficient loops
- Fast Interrupt Handling
24Code CharacteristicsGeneral Purpose vs. DSP
- General Purpose
- Limited Parallelism
- Control Dominated
- Inherently Serial
- Branch Intensive (20)
- DSP
- Parallel Inner Loops
- Loop Setup, then Compute
- Overlapped Parallel Processing
- Multiple Independent Streams
Amdahls Law
25Workload Comparisons
Amdahls Law
Video
DSP
General Purpose
26DSP vs. General Purpose
- Execution Predictability
- Required to guarantee real-time constraints
- 0-overhead Loop Buffer
- Complex Instructions
- Multiple Operations Issued
- Harvard Memory Architecture
- Specialized Addressing Modes
- Operate on Stream Data
- Fast But Non-predictable
- Dynamic Instruction Issue
- Non-deterministic caches
- Branch Prediction
- RISC Superscalar Instructions
- Multiple Instructions Issued
- Von Neumann Architecture
- Split Cache has similar benefit
- Typically Linear Addressing
- Caches Assume Locality
27Compilable Architectures
Implementations
Optimize
Cost / Power Performance
GSM xDSL DBS LMDS HFC-QAM QPSK DMT CAP DFE MLSE DS
S LMDS HFC OSDM DFSE CELP
DAB DVD DVB MPEG2 MPEG4 AC-3 MUSICAM COFDM FFT FIR
28System Integration Trends
Current Products
GSM Base Transceiver Station
A/D
8 FR/EFR, 16 HR Channels
DSP
DSP
DSP
DSP
DSP
DSP
A/D
D/A
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
Current Designs
GSM Base Transceiver Station
A/D
8 FR/EFR, 16 HR Channels
DSP1620
DSP1620
DSP1620
A/D
Pre-equalizer
Equalizer
Channel Coding
D/A
Next Generation Designs
GSM Base Transceiver Station
A/D
8 FR/EFR, 16 HR Channels
DSP Future
A/D
Pre-equalizer Equalizer Channel Coding
D/A