Title: Khurram Kazi
1Khurram Kazi ECE 645 Computer Arithmetic
Implementations in Hardware and Software Email
kkazi_at_gmu.edu Adjunct faculty lounge STII, 2nd
floor Office hours Mondays 330 430 pm Course
Web page http//teal.gmu.edu/kkazi/spring2003/ec
e645.html
2Grading Policy
- Homeworks Ā Ā Ā Ā Ā Ā Ā Ā 30 (will contain mini
projects along with Questions from the text) - 2 Midterm Tests Ā Ā 30
- Project Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā 40
3Course in a nutshell Back to basics
Efficient
- addition and subtraction
- multiplication
- division and modular reduction
- exponentiation
- Elements
- of the Galois
- field GF(2n)
- polynomial base
- normal base
Integers unsigned and signed
Real numbers
- fixed point
- single and double precision
- floating point
4Lecture Topics
- INTRODUCTION
- 1. Applications of computer arithmetic
algorithms - 2. Number representation
- Unsigned Numbers
- Signed Numbers
5Lecture Topics (Continued)
- ADDITION AND SUBTRACTION
- 3. Basic addition, subtraction, and counting
- 4. Carry-lookahead adders
- 5. Adders based on Parallel Prefix Networks
- 6. Carry-skip adders
- 7. Carry-select adders
- 8. Hybrid adders
6Lecture Topics (Continued)
- MULTIOPERAND ADDITION (time permitting)
- 9. Carry-save adders
- 10. Wallace and Dadda Trees
- 11. Adding multiple signed numbers
7Lecture Topics (Continued)
- MULTIPLICATION
- In hardware
- 12. Basic hardware multipliers
- 13. High-radix multipliers
- 14. Tree multipliers
- 15. Array multipliers
- 16. Multiplication of signed numbers and
squaring - In software
- 17. Survey of software multiplication algorithms
8Lecture Topics (Continued)
- DIVISION
- In hardware
- 18. Basic hardware dividers
- 19. High-radix dividers
- 20. Array dividers
- In software
- 21. Survey of algorithms for division
- modular reduction, and modular
- exponentiation
9Lecture Topics (Continued)
- FLOATING POINT ARITHMETIC
- 22. Floating-point number representations
- 23. Floating-point operations
- GALOIS FIELD ARITHMETIC
- 24. Representations of elements of the Galois
Field - 25. Galois Field operations
10Similar Courses at different Universities
- University of California, Santa Barbara,
Behrooz Parhami, - ECE252B Computer Arithmetic.
- Lehigh University, Michael Schulte,
- ECE496 High-Speed Computer Arithmetic.
- Oregon State University, Cetin Koc,
- ECE577 Computer Arithmetic
- Stanford University, Michael Flynn,
- EE486 Advanced Computer Arithmetic.
- University of California, Davies, Vojin
Oklobdzija, - ECE278 Computer Arithmetic for Digital
Implementation. - University of Massachusetts, Amherst Mass, Israel
Koren - ECE 666 Digital Computer Arithmetic.
- Tel-Aviv University, Guy Even, Computer
Arithmetic.
11Homeworks
- reading assignments (main textbooks articles)
- analysis of hardware and software algorithms
- and implementations
- design of small hardware units
- Optional assignments
- Studying trade off between
- software vs. hardware
- theory vs. practice
- analysis vs. design
12Homeworks 1st Mini projects
- Write synthesizable VHDL code to design fast
adder (Start with an 8 bit adder). - Use structural VHDL to implement 2 different
adders (one optimized for area and the other for
speed) - Synthesize the structural netlist.
- 2) Write RTL code for an adder and use
constraints put on synthesis tools to optimize
the adder implementation that would result in - (i) minimum area.
- (ii) minimum delay
- Compare the two methods
- DUE MARCH 3
13Homeworks 2nd Mini projects
- Write synthesizable VHDL code to design 64 bit
fast adder. Optimize it for - (i) Area
- (ii) Speed
- Repeat the design process for a 128 bits and 256
bits wide adders. Analyze the affects of the
larger bit width in the implementation of the
adders in terms of area and speed. How are the
constraints used in achieving the desired
results. - DUE MARCH 24
14Design Project
- Find mathematical equations that are used in used
in some specific application. Analyze it and find
the best way of solving that equation in
Hardware. Possible topics can be from Networking
(Scheduling, traffic management etc), from DSP
digital filter, Cryptography, Video processing . - Be creative in selecting a topic while ensuring
that it can be implemented in the prescribed
time. Initial scope (abstract) of the final
project is due on April 7. - Maximum of two people can collaborate on the same
topic, assuming the complexity of the project
warrants it. RTL, Testebench, Synthesis and a
comprehensive report is due by the last day of
the class. Oral presentation of the project is
required by each person too. Most likely Oral
presentations will be schedule for the last day
of classes. The report should have a prospective
as to how your work fits in a larger system and
its applications.
15Prerequisites
- It is assumed that you are well versed in VHDL or
Verilog and know C. - Projects will require usage of Synthesizable HDL
code along with C for a reference model and
testbench (VHDL or Verilog can be used for
testbench instead of C)
16Design Environment
HDL Design (VHDL or Verilog
Testbench (Analyzer In C or HDL)
Testbench (Generator In C or HDL)
Reference Model ( In C )
17VLSI Design Tools to be used
- MTI VHDL or Verilog Simulator
- Synopsys Design Compiler (aka DC)
- LSI Logics ASIC Library
- C
- (Demo session of how to use Synopsys DC will
take place on February 24. Bring you RTL code for
the session that you will be synthesizing)
18Degrees of freedom and possible trade-offs
speed
area
ECE 645
power
testability
ECE 682
ECE 586, 681
19Degrees of freedom and possible trade-offs
(Controlled by Synthesis constraints)
Speed
latency
area
throughput
20Timing parameters
definition
units
pipelining
time point?point
ns
delay
ns
bad
latency
time input?output
throughput
Mbits/s
good
output bits/time unit
rising edge ?rising edge of clock
ns
good
clock period
1
MHz
clock frequency
good
clock period
21Overview of Some of the steps in an ASIC flow
22RTL Block Synthesis
Estimated Area
Estimated Timing
Simplified design flow
23Overview of Synthesizable VHDL
- Library and Library Declarations
- Entity Declaration
- Architecture
- Configuration
24Overview of Synthesizable VHDL
- Package contains commonly used declarations
- Constants maybe defined here
- Enumerated data types (Add, up_count, Sub)
- Combinatorial functions (performing a decode
function returns single value) - Procedures (can return multiple values)
- Component declarations
25Overview of Synthesizable VHDL
- Entity
- Defines the component name, its inputs and
outputs (I/Os) and related declarations. - Can use same Entity for different architecture to
study various design trade offs. - Use std_logic and std_logic_vector(n downto 0)
they are synthesis friendly. - Avoid enumerated type of I/Os.
- Avoid using port type buffer or bidir (unless
have to)
26Overview of Synthesizable VHDL
- Architecture
- Defines the functionality of the design
- Normally consists of processes and concurrent
signal assignments - Synchronous and/or combinatorial logic can be
inferred from the way functionality is defined in
the Processes. - Avoid deep nested loops
- Avoid generate statements with large indices
- Always think hardware when developing code!
27Some useful design practices
- Organize Your Design Workspace
- Define naming convention (especially if multiple
designers are on the project - Completely Specify Sensitivity Lists
- Try to separate combinatorial logic from
sequential logic
28Separation of Combinatorial and Sequential Logic
29Synthesis of if then elsif statement
30 Case statement Synthesis
31What is synthesized from this Code?
Missing else Otherwise a latch is inferred
Process (A, B) begin if (A 1) then Q
lt B end if end process // there are 2
outputs, Q and Z Process (c) begin case C is
when 0 gt Q lt 1 Z lt 0 when others
gt Q lt 0 end case end process
Missing Z output Otherwise a latch is inferred
32for loop synthesis
Example(0) lt a(0) and b(5) Example(1) lt a(1)
and b(4) Example(2) lt a(2) and b(3) Example(3)
lt a(3) and b(2) Example(4) lt a(4) and
b(1) Example(5) lt a(5) and b(0)
Process (a,b) begin for i in 0 to 5 loop
example (i) lt a(i) and b(5-i) end
loop end process
for loops are unrolled and then synthesized.
33Learn to deal with approximations
- In digital arithmetic one has to come to grips
with approximation and questions like - When is approximation good enough
- What margin of error is acceptable
- Be aware of the applications you are designing
the arithmetic circuit or program for. - Analyze the implications of your approximation
34Consequences of approximations
Example Failure of Patriot Missile (1991 Feb.
25) Source http//www.math.psu.edu/dna/455.f96/dis
asters.html American Patriot Missile battery in
Dharan, Saudi Arabia, failed to intercept
incoming Iraqi Scud missile The Scud struck an
American Army barracks, killing 28 Cause, per
GAO/IMTEC-92-26 report software problem
(inaccurate calculation of the time since boot)
Specifics of the problem time in tenths of
second as measured by the systems internal clock
was multiplied by 1/10 to get the time in seconds
Internal registers were 24 bits wide 1/10
0.0001 1001 1001 1001 1001 100 (chopped to 24 b)
Error _at_ 0.1100 1100 2 23 _at_ 9.5 10 8 Error
in 100-hr operation period _at_ 9.5 10 8 100
60 60 10 0.34 s Distance traveled by Scud
(0.34 s) (1676 m/s) _at_ 570 m This put the Scud
outside the Patriots range gate Ironically,
the fact that the bad time calculation had been
improved in some (but not all) code parts
contributed to the problem, since it meant that
inaccuracies did not cancel out
35Consequences of approximations
Example Explosion of Ariane Rocket (1996 June
4) Source http//www.math.psu.edu/dna/455.f96/disa
sters.html Unmanned Ariane 5 rocket launched by
the European Space Agency veered off its flight
path, broke up, and exploded only 30 seconds
after lift-off (altitude of 3700 m) The 500
million rocket (with cargo) was on its 1st voyage
after a decade of development costing 7 billion
Cause software error in the inertial reference
system Specifics of the problem a 64 bit
floating point number relating to the horizontal
velocity of the rocket was being converted to a
16 bit signed integer An SRI software exception
arose during conversion because the 64-bit
floating point number had a value greater than
what could be represented by a 16-bit signed
integer (max 32 767)
36Calculators
u
v 21/1024 1.000 677 131
1.000 677 131
10 times
y (((v2)2))2 1.999 999 983
x (((u2)2))2 1.999 999 963
10 times
10 times
x u1024 1.999 999 973
y v1024 1.999 999 994
Hidden digits in the internal representation of
numbers Different algorithms give slightly
different results Very good accuracy
37Primary applications (1)
Execution units of general purpose microprocessors
Integer units
Floating point units
Integers (8, 16, 32, 64 bits)
Real numbers (32, 64 bits)
38Primary applications (2)
Digital signal and digital image processing
e.g., digital filters Discrete
Fourier Transform Discrete Hilbert
Transform spectrum analysis
General purpose DSP processors
Specialized circuits
Real numbers
39Primary applications (3)
Coding
Error detection codes Error correcting codes
Elements of the Galois field GF(2n)
(4-64 bits)
40Primary applications (4)
Cryptography
Secret key cryptography
IDEA, RC6, Mars
Twofish, Rijndael
Elements of the Galois field GF(2n)
(4, 8 bits)
Integers (16, 32 bits)
41Primary applications (5)
- Traffic management in IP networks using Random
Exponential Marking (REM) - Marking Probability 1 (F)price
- Where
- price price_prev g(Q_length
(1-a)Q_length_prev) - atarget_Q_length
- a 0.1
- 0.001
- F 1.001
42Topic 1
C A B mod 232, C A2 mod 232
Function 32-bit unsigned
multiplication and squaring
modulo 232
Application modern secret-key ciphers,
candidates to the new
Advanced Encryption
Standard (AES) MARS developed by IBM
RC6 developed at MIT
Environment hardware, software for 8-bit
processors
Optimization
- maximum throughput
- minimum latency
- minimum area
43256
C ? Ai Bi
Topic 2
i1
Function 64-bit signed
multiplier-accumulator (MAC)
accumulating at least 256 partial products
Application digital filters
Environment hardware,
software for a general purpose DSP or
an 8-bit processor
Optimization
Hardware - maximum throughput
limited area Software minimum execution time,
limited memory
44Topic 3
C A B CA / B
Function multiplication of two 64-bit
signed numbers
division of a 128-bit number by a 64-bit
number
Application general purpose microprocessor
Environment hardware,
software for a 64-bit processor without
multiplication and
division built in
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory
45Topic 4
C AE mod N
Function modular exponentiation CME
mod N M, N arbitrary
768-bit numbers, E2161
Application modern public-key ciphers
RSA
Diffie-Hellman
Elliptic
Curve Cryptosystems
Environment hardware, software for 32-bit or
8-bit processors
Optimization
Hardware - minimum latency
limited area Software minimum execution time,
limited memory
46Topic 5
Z XY Z X Y
Function floating point addition and
multiplication
according to ANSI/IEEE 754
Application general purpose microprocessor
or digital signal
processor
Environment hardware,
software for a 32-bit processor without
floating point
operations
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory