Khurram Kazi - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Khurram Kazi

Description:

Analyze it and find the best way of solving that equation in Hardware. ... Define naming convention (especially if multiple designers are on the project ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 47
Provided by: khurra
Category:
Tags: kazi | khurram

less

Transcript and Presenter's Notes

Title: Khurram Kazi


1
Khurram Kazi ECE 645 Computer Arithmetic
Implementations in Hardware and Software Email
kkazi_at_gmu.edu Adjunct faculty lounge STII, 2nd
floor Office hours Mondays 330 430 pm Course
Web page http//teal.gmu.edu/kkazi/spring2003/ec
e645.html
2
Grading Policy
  • Homeworks Ā Ā Ā Ā Ā Ā Ā Ā  30 (will contain mini
    projects along with Questions from the text)
  • 2 Midterm Tests Ā Ā  30
  • Project Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā  40

3
Course in a nutshell Back to basics
Efficient
  • addition and subtraction
  • multiplication
  • division and modular reduction
  • exponentiation
  • Elements
  • of the Galois
  • field GF(2n)
  • polynomial base
  • normal base

Integers unsigned and signed
Real numbers
  • fixed point
  • single and double precision
  • floating point

4
Lecture Topics
  • INTRODUCTION
  • 1. Applications of computer arithmetic
    algorithms
  • 2. Number representation
  • Unsigned Numbers
  • Signed Numbers

5
Lecture Topics (Continued)
  • ADDITION AND SUBTRACTION
  • 3. Basic addition, subtraction, and counting
  • 4. Carry-lookahead adders
  • 5. Adders based on Parallel Prefix Networks
  • 6. Carry-skip adders
  • 7. Carry-select adders
  • 8. Hybrid adders

6
Lecture Topics (Continued)
  • MULTIOPERAND ADDITION (time permitting)
  • 9. Carry-save adders
  • 10. Wallace and Dadda Trees
  • 11. Adding multiple signed numbers

7
Lecture Topics (Continued)
  • MULTIPLICATION
  • In hardware
  • 12. Basic hardware multipliers
  • 13. High-radix multipliers
  • 14. Tree multipliers
  • 15. Array multipliers
  • 16. Multiplication of signed numbers and
    squaring
  • In software
  • 17. Survey of software multiplication algorithms

8
Lecture Topics (Continued)
  • DIVISION
  • In hardware
  • 18. Basic hardware dividers
  • 19. High-radix dividers
  • 20. Array dividers
  • In software
  • 21. Survey of algorithms for division
  • modular reduction, and modular
  • exponentiation

9
Lecture Topics (Continued)
  • FLOATING POINT ARITHMETIC
  • 22. Floating-point number representations
  • 23. Floating-point operations
  • GALOIS FIELD ARITHMETIC
  • 24. Representations of elements of the Galois
    Field
  • 25. Galois Field operations

10
Similar Courses at different Universities
  • University of California, Santa Barbara,
    Behrooz Parhami,
  • ECE252B Computer Arithmetic.
  • Lehigh University, Michael Schulte,
  • ECE496 High-Speed Computer Arithmetic.
  • Oregon State University, Cetin Koc,
  • ECE577 Computer Arithmetic
  • Stanford University, Michael Flynn,
  • EE486 Advanced Computer Arithmetic.
  • University of California, Davies, Vojin
    Oklobdzija,
  • ECE278 Computer Arithmetic for Digital
    Implementation.
  • University of Massachusetts, Amherst Mass, Israel
    Koren
  • ECE 666 Digital Computer Arithmetic.
  • Tel-Aviv University, Guy Even, Computer
    Arithmetic.

11
Homeworks
  • reading assignments (main textbooks articles)
  • analysis of hardware and software algorithms
  • and implementations
  • design of small hardware units
  • Optional assignments
  • Studying trade off between
  • software vs. hardware
  • theory vs. practice
  • analysis vs. design

12
Homeworks 1st Mini projects
  • Write synthesizable VHDL code to design fast
    adder (Start with an 8 bit adder).
  • Use structural VHDL to implement 2 different
    adders (one optimized for area and the other for
    speed)
  • Synthesize the structural netlist.
  • 2) Write RTL code for an adder and use
    constraints put on synthesis tools to optimize
    the adder implementation that would result in
  • (i) minimum area.
  • (ii) minimum delay
  • Compare the two methods
  • DUE MARCH 3

13
Homeworks 2nd Mini projects
  • Write synthesizable VHDL code to design 64 bit
    fast adder. Optimize it for
  • (i) Area
  • (ii) Speed
  • Repeat the design process for a 128 bits and 256
    bits wide adders. Analyze the affects of the
    larger bit width in the implementation of the
    adders in terms of area and speed. How are the
    constraints used in achieving the desired
    results.
  • DUE MARCH 24

14
Design Project
  • Find mathematical equations that are used in used
    in some specific application. Analyze it and find
    the best way of solving that equation in
    Hardware. Possible topics can be from Networking
    (Scheduling, traffic management etc), from DSP
    digital filter, Cryptography, Video processing .
  • Be creative in selecting a topic while ensuring
    that it can be implemented in the prescribed
    time. Initial scope (abstract) of the final
    project is due on April 7.
  • Maximum of two people can collaborate on the same
    topic, assuming the complexity of the project
    warrants it. RTL, Testebench, Synthesis and a
    comprehensive report is due by the last day of
    the class. Oral presentation of the project is
    required by each person too. Most likely Oral
    presentations will be schedule for the last day
    of classes. The report should have a prospective
    as to how your work fits in a larger system and
    its applications.

15
Prerequisites
  • It is assumed that you are well versed in VHDL or
    Verilog and know C.
  • Projects will require usage of Synthesizable HDL
    code along with C for a reference model and
    testbench (VHDL or Verilog can be used for
    testbench instead of C)

16
Design Environment
HDL Design (VHDL or Verilog
Testbench (Analyzer In C or HDL)
Testbench (Generator In C or HDL)
Reference Model ( In C )
17
VLSI Design Tools to be used
  • MTI VHDL or Verilog Simulator
  • Synopsys Design Compiler (aka DC)
  • LSI Logics ASIC Library
  • C
  • (Demo session of how to use Synopsys DC will
    take place on February 24. Bring you RTL code for
    the session that you will be synthesizing)

18
Degrees of freedom and possible trade-offs
speed
area
ECE 645
power
testability
ECE 682
ECE 586, 681
19
Degrees of freedom and possible trade-offs
(Controlled by Synthesis constraints)
Speed
latency
area
throughput
20
Timing parameters
definition
units
pipelining
time point?point
ns
delay
ns
bad
latency
time input?output
throughput
Mbits/s
good
output bits/time unit
rising edge ?rising edge of clock
ns
good
clock period
1
MHz
clock frequency
good
clock period
21
Overview of Some of the steps in an ASIC flow
22
RTL Block Synthesis
Estimated Area
Estimated Timing
Simplified design flow
23
Overview of Synthesizable VHDL
  • Library and Library Declarations
  • Entity Declaration
  • Architecture
  • Configuration

24
Overview of Synthesizable VHDL
  • Package contains commonly used declarations
  • Constants maybe defined here
  • Enumerated data types (Add, up_count, Sub)
  • Combinatorial functions (performing a decode
    function returns single value)
  • Procedures (can return multiple values)
  • Component declarations

25
Overview of Synthesizable VHDL
  • Entity
  • Defines the component name, its inputs and
    outputs (I/Os) and related declarations.
  • Can use same Entity for different architecture to
    study various design trade offs.
  • Use std_logic and std_logic_vector(n downto 0)
    they are synthesis friendly.
  • Avoid enumerated type of I/Os.
  • Avoid using port type buffer or bidir (unless
    have to)

26
Overview of Synthesizable VHDL
  • Architecture
  • Defines the functionality of the design
  • Normally consists of processes and concurrent
    signal assignments
  • Synchronous and/or combinatorial logic can be
    inferred from the way functionality is defined in
    the Processes.
  • Avoid deep nested loops
  • Avoid generate statements with large indices
  • Always think hardware when developing code!

27
Some useful design practices
  • Organize Your Design Workspace
  • Define naming convention (especially if multiple
    designers are on the project
  • Completely Specify Sensitivity Lists
  • Try to separate combinatorial logic from
    sequential logic

28
Separation of Combinatorial and Sequential Logic
29
Synthesis of if then elsif statement
30
Case statement Synthesis
31
What is synthesized from this Code?
Missing else Otherwise a latch is inferred
Process (A, B) begin if (A 1) then Q
lt B end if end process // there are 2
outputs, Q and Z Process (c) begin case C is
when 0 gt Q lt 1 Z lt 0 when others
gt Q lt 0 end case end process
Missing Z output Otherwise a latch is inferred
32
for loop synthesis
Example(0) lt a(0) and b(5) Example(1) lt a(1)
and b(4) Example(2) lt a(2) and b(3) Example(3)
lt a(3) and b(2) Example(4) lt a(4) and
b(1) Example(5) lt a(5) and b(0)
Process (a,b) begin for i in 0 to 5 loop
example (i) lt a(i) and b(5-i) end
loop end process
for loops are unrolled and then synthesized.
33
Learn to deal with approximations
  • In digital arithmetic one has to come to grips
    with approximation and questions like
  • When is approximation good enough
  • What margin of error is acceptable
  • Be aware of the applications you are designing
    the arithmetic circuit or program for.
  • Analyze the implications of your approximation

34
Consequences of approximations
Example Failure of Patriot Missile (1991 Feb.
25) Source http//www.math.psu.edu/dna/455.f96/dis
asters.html American Patriot Missile battery in
Dharan, Saudi Arabia, failed to intercept
incoming Iraqi Scud missile The Scud struck an
American Army barracks, killing 28 Cause, per
GAO/IMTEC-92-26 report software problem
(inaccurate calculation of the time since boot)
Specifics of the problem time in tenths of
second as measured by the systems internal clock
was multiplied by 1/10 to get the time in seconds
Internal registers were 24 bits wide 1/10
0.0001 1001 1001 1001 1001 100 (chopped to 24 b)
Error _at_ 0.1100 1100 2 23 _at_ 9.5 10 8 Error
in 100-hr operation period _at_ 9.5 10 8 100
60 60 10 0.34 s Distance traveled by Scud
(0.34 s) (1676 m/s) _at_ 570 m This put the Scud
outside the Patriots range gate Ironically,
the fact that the bad time calculation had been
improved in some (but not all) code parts
contributed to the problem, since it meant that
inaccuracies did not cancel out
35
Consequences of approximations
Example Explosion of Ariane Rocket (1996 June
4) Source http//www.math.psu.edu/dna/455.f96/disa
sters.html Unmanned Ariane 5 rocket launched by
the European Space Agency veered off its flight
path, broke up, and exploded only 30 seconds
after lift-off (altitude of 3700 m) The 500
million rocket (with cargo) was on its 1st voyage
after a decade of development costing 7 billion
Cause software error in the inertial reference
system Specifics of the problem a 64 bit
floating point number relating to the horizontal
velocity of the rocket was being converted to a
16 bit signed integer An SRI software exception
arose during conversion because the 64-bit
floating point number had a value greater than
what could be represented by a 16-bit signed
integer (max 32 767)
36
Calculators
u
v 21/1024 1.000 677 131
1.000 677 131
10 times
y (((v2)2))2 1.999 999 983
x (((u2)2))2 1.999 999 963
10 times
10 times
x u1024 1.999 999 973
y v1024 1.999 999 994
Hidden digits in the internal representation of
numbers Different algorithms give slightly
different results Very good accuracy
37
Primary applications (1)
Execution units of general purpose microprocessors
Integer units
Floating point units
Integers (8, 16, 32, 64 bits)
Real numbers (32, 64 bits)
38
Primary applications (2)
Digital signal and digital image processing
e.g., digital filters Discrete
Fourier Transform Discrete Hilbert
Transform spectrum analysis
General purpose DSP processors
Specialized circuits
Real numbers
39
Primary applications (3)
Coding
Error detection codes Error correcting codes
Elements of the Galois field GF(2n)
(4-64 bits)
40
Primary applications (4)
Cryptography
Secret key cryptography
IDEA, RC6, Mars
Twofish, Rijndael
Elements of the Galois field GF(2n)
(4, 8 bits)
Integers (16, 32 bits)
41
Primary applications (5)
  • Traffic management in IP networks using Random
    Exponential Marking (REM)
  • Marking Probability 1 (F)price
  • Where
  • price price_prev g(Q_length
    (1-a)Q_length_prev)
  • atarget_Q_length
  • a 0.1
  • 0.001
  • F 1.001

42
Topic 1
C A B mod 232, C A2 mod 232
Function 32-bit unsigned
multiplication and squaring
modulo 232
Application modern secret-key ciphers,
candidates to the new
Advanced Encryption
Standard (AES) MARS developed by IBM
RC6 developed at MIT
Environment hardware, software for 8-bit
processors
Optimization
  • maximum throughput
  • minimum latency
  • minimum area

43
256
C ? Ai Bi
Topic 2
i1
Function 64-bit signed
multiplier-accumulator (MAC)
accumulating at least 256 partial products
Application digital filters
Environment hardware,
software for a general purpose DSP or
an 8-bit processor
Optimization
Hardware - maximum throughput
limited area Software minimum execution time,
limited memory
44
Topic 3
C A B CA / B
Function multiplication of two 64-bit
signed numbers
division of a 128-bit number by a 64-bit
number
Application general purpose microprocessor
Environment hardware,
software for a 64-bit processor without
multiplication and
division built in
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory
45
Topic 4
C AE mod N
Function modular exponentiation CME
mod N M, N arbitrary
768-bit numbers, E2161
Application modern public-key ciphers
RSA
Diffie-Hellman
Elliptic
Curve Cryptosystems
Environment hardware, software for 32-bit or
8-bit processors
Optimization
Hardware - minimum latency
limited area Software minimum execution time,
limited memory
46
Topic 5
Z XY Z X Y
Function floating point addition and
multiplication
according to ANSI/IEEE 754
Application general purpose microprocessor
or digital signal
processor
Environment hardware,
software for a 32-bit processor without
floating point
operations
Optimization
Hardware minimum latency
maximum throughput limited
area Software minimum execution time,
limited memory
Write a Comment
User Comments (0)
About PowerShow.com