Chapter 8 Problems - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Chapter 8 Problems

Description:

The HP PA RISC multiply algorithm is based on the following notions: ... To multiply the multiplicand by one hex digit of the multiplier, use an ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 55
Provided by: Lee144
Category:

less

Transcript and Presenter's Notes

Title: Chapter 8 Problems


1
Chapter 8 Problems
  • Prof. Sin-Min Lee
  • Department of Mathematics and Computer Science

2
(No Transcript)
3
(No Transcript)
4
Addition And Subtraction
  • For both the non-negative and twos complement
    notations, addition and subtraction are fairly
    straightforward.
  • However, when the result cannot be represented as
    an 8-bit value, a problem arises. For the
    non-negative notation, consider the addition
    2551, 1111 1111 0000 0001. Straight binary
    addition yields the result 1 0000 0000, a 9-bit
    value, which cannot be stored in an 8-bit
    register.

5
Overflow
  • Arithmetic overflow The extra bit generates a
    carry out from the parallel adder. In
    non-negative notation, this carry bit can set an
    overflow flag, signaling the rest of the system
    that an overflow has occurred, and that the
    result generated is not entirely correct. The
    rest of the system can either fix the result of
    handle the error appropriately.

6
(No Transcript)
7
Overflow
  • In twos complement notation an overflow can
    occur at either end of the numeric range. At the
    positive end, adding 1271, 0111 1111 0000
    0001, yields a result of 1 0000 0000. However,
    in twos complement notation, this is -128, not
    the desired value of 128. In this notation, the
    key to recognizing overflow is to check not only
    the carry out, but also the carry in to the most
    significant bit of the result. If two carries
    are equal, then there is no overflow.

8
Overflow
  • Overflow only occurs when two numbers with the
    same sign are added. Adding two numbers with
    different signs always produces a valid result.

9
Overflow generation in unsigned twos complement
addition
  • 126 01111110
  • 1 00000001
  • 01111111
  • 00
  • 127 01111111
  • 1 00000001
  • 10000000
  • 01
  • -127 10000001
  • (-1) 11111111
  • 10000000
  • 11
  • -128 10000000
  • (-1) 11111111
  • 01111111
  • 10

10
  • Whether there is an overflow or not depends on
    the interpretation (signed or unsigned number).

11
each partial product is the product of one bit of
the multiplier times the multiplicand. There are
as many partial products as there are bits in the
multiplier, and there are as many bits in each
partial product as there are bits in the
multiplicand.
12
(No Transcript)
13
In looking for an algorithmic statement of this
approach to binary multiplication, it was found
that a group of Russian peasants used precisely
this method to multiply decimal numbers, and as a
result, the binary multiplication algorithm given
here is commonly known as the Russian Peasant
Algorithm
14
(No Transcript)
15
The best of these fast multiplication algorithms
was developed by HP for the HP PA RISC
architecture. In developing the first generation
of this architecture, the HP designers concluded
that a hardware multiply instruction was not
justified, because most multiplies are multiplies
by constants and can be replaced by add-and-shift
instructions (as on the Hawk) and because the
rare multiply of one variable by another could be
done quickly enough in software.
16
  • The HP PA RISC multiply algorithm is based on the
    following notions
  • First, do the multiply in base 16, so each step
    involves multiplying the multiplicand by the
    least significant 4 bits of the multiplier and
    then adding the partial product to the result.
  • To multiply the multiplicand by one hex digit of
    the multiplier, use an efficient case/select
    control structure to select one of 16 different
    blocks of code for multiplying by a constant.
    Each of these blocks will be only a few
    instructions long because of the use of
    add-and-shift instructions.
  • To avoid the cost of loop control instructions,
    note that the fixed word size implies that each
    multiply can be done in a fixed number of
    iterations (4 hex digits in a 16 bit multiplier,
    or 8 hex digits in a 32 bit multiplier). So, just
    write out the body of the loop 4 or 8 times in
    series and eliminate any need for loop counters.

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
What are BCD Numbers?
  • Binary Coded Decimal numbers are actually binary
    numbers that are "coded" to represent decimal
    numbers
  • Coded numbers are "not real numbering systems".
    In fact, they are just what they say they are,
    "codes that represent actual numbers". Although
    the actual numbers can be mathematically
    manipulated, codes follow no such rules.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Example 1   BCD (365)_10 -------------gt 0011
0110 0101
Example 2.
34
Addition
  • Addition is analogous to decimal addition with
    normal binary addition taking place from right to
    left. For example,

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Where the result of any addition exceeds 9(1001)
then six (0110) must be added to the sum to
account for the six invalid BCD codes that are
available with a 4-bit number. This is
illustrated in the example below
39
  • When one adds two BCD digits,
  • if the binary sum is less than 1010, the
    corresponding BCD sum digit is correct.
  • if the binary sum is greater than or equal to
    1010, add (0110)2 to the corresponding BCD sum
    digit and produce a carry.

40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
8.4.1 Pipelining
  • 1. Data enters a stage of the pipeline, and it
  • will through go through different stages
  • of arithmetic operation till the final
  • computation is completed.
  • Note Each stage only perform its specific
  • function.
  • 2. Pipeline improves the performance by
  • overlapping computation that is, each stage
  • operate on different data simultaneously.

45
Compare process time between Pipeline and
Non-Pipeline
  • Consider this code
  • For i 1 to 100 Do Ai (Bi Ci)
    Di
  • (assume multiplication Addition require
    10ns)
  • Non-pipeline 10ns 2 100 2000ns

46
An Example on Arithmetic Pipeline
  • Consider the following snippet code
  • FOR I 1 to 100 DO AI?(BI . CI)
    DI
  • Assume that each operation, multiplication and
    addition, requires 10 ns to complete. A
    non-pipelined uniprocessor take 20 ns to
    calculate AI, and 2000 ns to execute the code.
  • A pipeline unit could break this computation into
    two stages as shown in the next slide.

47
A two-stage pipeline to implement the program loop
  • Bi
  • Ci
  • CLK
  • Di

  • CLK
  • Stage 1 ()
    Stage 2 ()


Latch

Latch
Latch
48
Example A two-stage pipeline to implement the
program loop
During the first 10 ns, the first stage
calculates B1.C1. In the next 10 ns, stage 2
adds this value to D1and stores the result in
A1. At the same time, stage 1 multiplies B2
and C2. During the following 10 ns, stage 1
forms B3.C3 and stage 2 calculate the final
value A2. Instead of 2000 ns which a
non-pipelined uniprocessor would take, this
pipeline executes the code in 1010 ns.
49
Speedup for Pipeline
  • A pipelines speedup S is the time needed to
    process n pieces of data using non-pipeline
  • (T1), divided by the time needed to process same
    data using k-stage pipeline (Tk).
  • Speedup of pipeline can expressed as

n
nT1
Sn
(k n -1) Tk
50
Continue
  • Apply Speedup expression using previous example

T1, is the time to to process n pieces data using
non- pipeline ( and ).
n
100 20ns
S100
(2 100 - 1) 10ns
1.98
Tk
k Two stages ( and )
51
Speed up
  • The most popular metrics used to measure the
    performance of a pipeline are throughput (the
    number of results generated per time unit) and
    speedup.
  • A pipelines speedup, Sn, is the amount of time
    needed to process n pieces of data using a
    non-pipelined arithmetic unit, divided by the
    time need to process the same data using a
    k-stage pipeline.
  • Sn nT1 / (k n - 1) x Tk

52
Calculating Speedup
  • Sn nT1 / (k n -1) x Tk
  • T1 the time required to calculate using
    non-pipeline
  • Tk is the the clock period of the k-stage
    pipeline
  • The pipeline unit requires k time units, each
    of duration Tk, to move the first piece of data
    to the pipeline. Using the last example, the
    speedup can be calculated as followed
  • S100 100x20ns / (2 100 - 1) x 10ns
  • 1.98
  • As n approaches infinity, n / (k n - 1)
    approaches 1, so S? T1 / Tk
  • The maximum speedup occurs when each stage has
    the same delay.

53
8.4.2 Lookup Tables
  • 1. Theoretically, any combinatorial circuit can
  • be implemented by a ROM configured as a
  • lookup table.
  • 2. The ROM is programmed with data such
  • that the correct values are output for any
  • possible input values.
  • 3. A 4x1 ROM is like a two-input AND gate.
  • The input of the AND gate serves as the
  • address input for ROM, and the output of
  • ROM corresponds to the AND gate output.

54
Lookup ROM equivalent to AND gate

A1 ROM D0 A0
x
Address Data 0 0
0 0 1 0 1 0
0 1 1 1
4 x 1
x y
y
x
AND
x y
y
All possible And gate inputs are stored in ROM
Write a Comment
User Comments (0)
About PowerShow.com