Khurram Kazi - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Khurram Kazi

Description:

Analyze it and find the best way of solving that equation in Hardware. ... Define naming convention (especially if multiple designers are on the project ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 47

Provided by: khurra

Category:

more less

Transcript and Presenter's Notes

Title: Khurram Kazi

1
Khurram Kazi ECE 645 Computer Arithmetic
Implementations in Hardware and Software Email
kkazi_at_gmu.edu Adjunct faculty lounge STII, 2nd
floor Office hours Mondays 330 430 pm Course
Web page http//teal.gmu.edu/kkazi/spring2003/ec
e645.html
2
Grading Policy

Homeworks 30 (will contain mini
projects along with Questions from the text)
2 Midterm Tests 30
Project 40

3
Course in a nutshell Back to basics
Efficient

addition and subtraction
multiplication
division and modular reduction
exponentiation

Elements
of the Galois
field GF(2n)
polynomial base
normal base

Integers unsigned and signed
Real numbers

fixed point
single and double precision
floating point

4
Lecture Topics

INTRODUCTION
1. Applications of computer arithmetic
algorithms
2. Number representation
Unsigned Numbers
Signed Numbers

5
Lecture Topics (Continued)

ADDITION AND SUBTRACTION
3. Basic addition, subtraction, and counting
4. Carry-lookahead adders
5. Adders based on Parallel Prefix Networks
6. Carry-skip adders
7. Carry-select adders
8. Hybrid adders

6
Lecture Topics (Continued)

MULTIOPERAND ADDITION (time permitting)
9. Carry-save adders
10. Wallace and Dadda Trees
11. Adding multiple signed numbers

7
Lecture Topics (Continued)

MULTIPLICATION
In hardware
12. Basic hardware multipliers
13. High-radix multipliers
14. Tree multipliers
15. Array multipliers
16. Multiplication of signed numbers and
squaring
In software
17. Survey of software multiplication algorithms

8
Lecture Topics (Continued)

DIVISION
In hardware
18. Basic hardware dividers
19. High-radix dividers
20. Array dividers
In software
21. Survey of algorithms for division
modular reduction, and modular
exponentiation

9
Lecture Topics (Continued)

FLOATING POINT ARITHMETIC
22. Floating-point number representations
23. Floating-point operations
GALOIS FIELD ARITHMETIC
24. Representations of elements of the Galois
Field
25. Galois Field operations

10
Similar Courses at different Universities

University of California, Santa Barbara,
Behrooz Parhami,
ECE252B Computer Arithmetic.
Lehigh University, Michael Schulte,
ECE496 High-Speed Computer Arithmetic.
Oregon State University, Cetin Koc,
ECE577 Computer Arithmetic
Stanford University, Michael Flynn,
EE486 Advanced Computer Arithmetic.
University of California, Davies, Vojin
Oklobdzija,
ECE278 Computer Arithmetic for Digital
Implementation.
University of Massachusetts, Amherst Mass, Israel
Koren
ECE 666 Digital Computer Arithmetic.
Tel-Aviv University, Guy Even, Computer
Arithmetic.

11
Homeworks

reading assignments (main textbooks articles)
analysis of hardware and software algorithms
and implementations
design of small hardware units
Optional assignments
Studying trade off between
software vs. hardware
theory vs. practice
analysis vs. design

12
Homeworks 1st Mini projects

Write synthesizable VHDL code to design fast
adder (Start with an 8 bit adder).
Use structural VHDL to implement 2 different
adders (one optimized for area and the other for
speed)
Synthesize the structural netlist.
2) Write RTL code for an adder and use
constraints put on synthesis tools to optimize
the adder implementation that would result in
(i) minimum area.
(ii) minimum delay
Compare the two methods
DUE MARCH 3

13
Homeworks 2nd Mini projects

Write synthesizable VHDL code to design 64 bit
fast adder. Optimize it for
(i) Area
(ii) Speed
Repeat the design process for a 128 bits and 256
bits wide adders. Analyze the affects of the
larger bit width in the implementation of the
adders in terms of area and speed. How are the
constraints used in achieving the desired
results.
DUE MARCH 24

14
Design Project

Find mathematical equations that are used in used
in some specific application. Analyze it and find
the best way of solving that equation in
Hardware. Possible topics can be from Networking
(Scheduling, traffic management etc), from DSP
digital filter, Cryptography, Video processing .
Be creative in selecting a topic while ensuring
that it can be implemented in the prescribed
time. Initial scope (abstract) of the final
project is due on April 7.
Maximum of two people can collaborate on the same
topic, assuming the complexity of the project
warrants it. RTL, Testebench, Synthesis and a
comprehensive report is due by the last day of
the class. Oral presentation of the project is
required by each person too. Most likely Oral
presentations will be schedule for the last day
of classes. The report should have a prospective
as to how your work fits in a larger system and
its applications.

15
Prerequisites

It is assumed that you are well versed in VHDL or
Verilog and know C.
Projects will require usage of Synthesizable HDL
code along with C for a reference model and
testbench (VHDL or Verilog can be used for
testbench instead of C)

16
Design Environment
HDL Design (VHDL or Verilog
Testbench (Analyzer In C or HDL)
Testbench (Generator In C or HDL)
Reference Model ( In C )
17
VLSI Design Tools to be used

MTI VHDL or Verilog Simulator
Synopsys Design Compiler (aka DC)
LSI Logics ASIC Library
C
(Demo session of how to use Synopsys DC will
take place on February 24. Bring you RTL code for
the session that you will be synthesizing)

18
Degrees of freedom and possible trade-offs
speed
area
ECE 645
power
testability
ECE 682
ECE 586, 681
19
Degrees of freedom and possible trade-offs
(Controlled by Synthesis constraints)
Speed
latency
area
throughput
20
Timing parameters
definition
units
pipelining
time point?point
ns
delay
ns
bad
latency
time input?output
throughput
Mbits/s
good
output bits/time unit
rising edge ?rising edge of clock
ns
good
clock period
1
MHz
clock frequency
good
clock period
21
Overview of Some of the steps in an ASIC flow
22
RTL Block Synthesis
Estimated Area
Estimated Timing
Simplified design flow
23
Overview of Synthesizable VHDL

Library and Library Declarations
Entity Declaration
Architecture
Configuration

24
Overview of Synthesizable VHDL

Package contains commonly used declarations
Constants maybe defined here
Enumerated data types (Add, up_count, Sub)
Combinatorial functions (performing a decode
function returns single value)
Procedures (can return multiple values)
Component declarations

25
Overview of Synthesizable VHDL

Entity
Defines the component name, its inputs and
outputs (I/Os) and related declarations.
Can use same Entity for different architecture to
study various design trade offs.
Use std_logic and std_logic_vector(n downto 0)
they are synthesis friendly.
Avoid enumerated type of I/Os.
Avoid using port type buffer or bidir (unless
have to)

26
Overview of Synthesizable VHDL

Architecture
Defines the functionality of the design
Normally consists of processes and concurrent
signal assignments
Synchronous and/or combinatorial logic can be
inferred from the way functionality is defined in
the Processes.
Avoid deep nested loops
Avoid generate statements with large indices
Always think hardware when developing code!

27
Some useful design practices

Organize Your Design Workspace
Define naming convention (especially if multiple
designers are on the project
Completely Specify Sensitivity Lists
Try to separate combinatorial logic from
sequential logic

28
Separation of Combinatorial and Sequential Logic
29
Synthesis of if then elsif statement
30
Case statement Synthesis
31
What is synthesized from this Code?
Missing else Otherwise a latch is inferred
Process (A, B) begin if (A 1) then Q
lt B end if end process // there are 2
outputs, Q and Z Process (c) begin case C is
when 0 gt Q lt 1 Z lt 0 when others
gt Q lt 0 end case end process
Missing Z output Otherwise a latch is inferred
32
for loop synthesis
Example(0) lt a(0) and b(5) Example(1) lt a(1)
and b(4) Example(2) lt a(2) and b(3) Example(3)
lt a(3) and b(2) Example(4) lt a(4) and
b(1) Example(5) lt a(5) and b(0)
Process (a,b) begin for i in 0 to 5 loop
example (i) lt a(i) and b(5-i) end
loop end process
for loops are unrolled and then synthesized.
33
Learn to deal with approximations

In digital arithmetic one has to come to grips
with approximation and questions like
When is approximation good enough
What margin of error is acceptable
Be aware of the applications you are designing
the arithmetic circuit or program for.
Analyze the implications of your approximation

34
Consequences of approximations
Example Failure of Patriot Missile (1991 Feb.
25) Source http//www.math.psu.edu/dna/455.f96/dis
asters.html American Patriot Missile battery in
Dharan, Saudi Arabia, failed to intercept
incoming Iraqi Scud missile The Scud struck an
American Army barracks, killing 28 Cause, per
GAO/IMTEC-92-26 report software problem
(inaccurate calculation of the time since boot)
Specifics of the problem time in tenths of
second as measured by the systems internal clock
was multiplied by 1/10 to get the time in seconds
Internal registers were 24 bits wide 1/10
0.0001 1001 1001 1001 1001 100 (chopped to 24 b)
Error _at_ 0.1100 1100 2 23 _at_ 9.5 10 8 Error
in 100-hr operation period _at_ 9.5 10 8 100
60 60 10 0.34 s Distance traveled by Scud
(0.34 s) (1676 m/s) _at_ 570 m This put the Scud
outside the Patriots range gate Ironically,
the fact that the bad time calculation had been
improved in some (but not all) code parts
contributed to the problem, since it meant that
inaccuracies did not cancel out
35
Consequences of approximations
Example Explosion of Ariane Rocket (1996 June
4) Source http//www.math.psu.edu/dna/455.f96/disa
sters.html Unmanned Ariane 5 rocket launched by
the European Space Agency veered off its flight
path, broke up, and exploded only 30 seconds
after lift-off (altitude of 3700 m) The 500
million rocket (with cargo) was on its 1st voyage
after a decade of development costing 7 billion
Cause software error in the inertial reference
system Specifics of the problem a 64 bit
floating point number relating to the horizontal
velocity of the rocket was being converted to a
16 bit signed integer An SRI software exception
arose during conversion because the 64-bit
floating point number had a value greater than
what could be represented by a 16-bit signed
integer (max 32 767)
36
Calculators
u
v 21/1024 1.000 677 131
1.000 677 131
10 times
y (((v2)2))2 1.999 999 983
x (((u2)2))2 1.999 999 963
10 times
10 times
x u1024 1.999 999 973
y v1024 1.999 999 994
Hidden digits in the internal representation of
numbers Different algorithms give slightly
different results Very good accuracy
37
Primary applications (1)
Execution units of general purpose microprocessors
Integer units
Floating point units
Integers (8, 16, 32, 64 bits)
Real numbers (32, 64 bits)
38
Primary applications (2)
Digital signal and digital image processing
e.g., digital filters Discrete
Fourier Transform Discrete Hilbert
Transform spectrum analysis
General purpose DSP processors
Specialized circuits
Real numbers
39
Primary applications (3)
Coding
Error detection codes Error correcting codes
Elements of the Galois field GF(2n)
(4-64 bits)
40
Primary applications (4)
Cryptography
Secret key cryptography
IDEA, RC6, Mars
Twofish, Rijndael
Elements of the Galois field GF(2n)
(4, 8 bits)
Integers (16, 32 bits)
41
Primary applications (5)

Traffic management in IP networks using Random
Exponential Marking (REM)
Marking Probability 1 (F)price
Where
price price_prev g(Q_length
(1-a)Q_length_prev)
atarget_Q_length
a 0.1
0.001
F 1.001

42
Topic 1
C A B mod 232, C A2 mod 232
Function 32-bit unsigned
multiplication and squaring
modulo 232
Application modern secret-key ciphers,
candidates to the new
Advanced Encryption
Standard (AES) MARS developed by IBM
RC6 developed at MIT
Environment hardware, software for 8-bit
processors
Optimization

maximum throughput
minimum latency
minimum area

43
256
C ? Ai Bi
Topic 2
i1
Function 64-bit signed
multiplier-accumulator (MAC)
accumulating at least 256 partial products
Application digital filters
Environment hardware,
software for a general purpose DSP or
an 8-bit processor
Optimization
Hardware - maximum throughput
limited area Software minimum execution time,
limited memory
44
Topic 3
C A B CA / B
Function multiplication of two 64-bit
signed numbers
division of a 128-bit number by a 64-bit
number
Application general purpose microprocessor
Environment hardware,
software for a 64-bit processor without
multiplication and
division built in
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory
45
Topic 4
C AE mod N
Function modular exponentiation CME
mod N M, N arbitrary
768-bit numbers, E2161
Application modern public-key ciphers
RSA
Diffie-Hellman
Elliptic
Curve Cryptosystems
Environment hardware, software for 32-bit or
8-bit processors
Optimization
Hardware - minimum latency
limited area Software minimum execution time,
limited memory
46
Topic 5
Z XY Z X Y
Function floating point addition and
multiplication
according to ANSI/IEEE 754
Application general purpose microprocessor
or digital signal
processor
Environment hardware,
software for a 32-bit processor without
floating point
operations
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory

Write a Comment

User Comments (0)