Title: Chapter 8 Problems
1Chapter 8 Problems
- Prof. Sin-Min Lee
- Department of Mathematics and Computer Science
2(No Transcript)
3(No Transcript)
4Addition And Subtraction
- For both the non-negative and twos complement
notations, addition and subtraction are fairly
straightforward. - However, when the result cannot be represented as
an 8-bit value, a problem arises. For the
non-negative notation, consider the addition
2551, 1111 1111 0000 0001. Straight binary
addition yields the result 1 0000 0000, a 9-bit
value, which cannot be stored in an 8-bit
register.
5Overflow
- Arithmetic overflow The extra bit generates a
carry out from the parallel adder. In
non-negative notation, this carry bit can set an
overflow flag, signaling the rest of the system
that an overflow has occurred, and that the
result generated is not entirely correct. The
rest of the system can either fix the result of
handle the error appropriately.
6(No Transcript)
7Overflow
- In twos complement notation an overflow can
occur at either end of the numeric range. At the
positive end, adding 1271, 0111 1111 0000
0001, yields a result of 1 0000 0000. However,
in twos complement notation, this is -128, not
the desired value of 128. In this notation, the
key to recognizing overflow is to check not only
the carry out, but also the carry in to the most
significant bit of the result. If two carries
are equal, then there is no overflow.
8Overflow
- Overflow only occurs when two numbers with the
same sign are added. Adding two numbers with
different signs always produces a valid result.
9Overflow generation in unsigned twos complement
addition
- 126 01111110
- 1 00000001
- 01111111
- 00
- 127 01111111
- 1 00000001
- 10000000
- 01
- -127 10000001
- (-1) 11111111
- 10000000
- 11
- -128 10000000
- (-1) 11111111
- 01111111
- 10
10- Whether there is an overflow or not depends on
the interpretation (signed or unsigned number).
11each partial product is the product of one bit of
the multiplier times the multiplicand. There are
as many partial products as there are bits in the
multiplier, and there are as many bits in each
partial product as there are bits in the
multiplicand.
12(No Transcript)
13In looking for an algorithmic statement of this
approach to binary multiplication, it was found
that a group of Russian peasants used precisely
this method to multiply decimal numbers, and as a
result, the binary multiplication algorithm given
here is commonly known as the Russian Peasant
Algorithm
14(No Transcript)
15The best of these fast multiplication algorithms
was developed by HP for the HP PA RISC
architecture. In developing the first generation
of this architecture, the HP designers concluded
that a hardware multiply instruction was not
justified, because most multiplies are multiplies
by constants and can be replaced by add-and-shift
instructions (as on the Hawk) and because the
rare multiply of one variable by another could be
done quickly enough in software.
16- The HP PA RISC multiply algorithm is based on the
following notions - First, do the multiply in base 16, so each step
involves multiplying the multiplicand by the
least significant 4 bits of the multiplier and
then adding the partial product to the result. - To multiply the multiplicand by one hex digit of
the multiplier, use an efficient case/select
control structure to select one of 16 different
blocks of code for multiplying by a constant.
Each of these blocks will be only a few
instructions long because of the use of
add-and-shift instructions. - To avoid the cost of loop control instructions,
note that the fixed word size implies that each
multiply can be done in a fixed number of
iterations (4 hex digits in a 16 bit multiplier,
or 8 hex digits in a 32 bit multiplier). So, just
write out the body of the loop 4 or 8 times in
series and eliminate any need for loop counters.
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29What are BCD Numbers?
- Binary Coded Decimal numbers are actually binary
numbers that are "coded" to represent decimal
numbers - Coded numbers are "not real numbering systems".
In fact, they are just what they say they are,
"codes that represent actual numbers". Although
the actual numbers can be mathematically
manipulated, codes follow no such rules.
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Example 1 BCD (365)_10 -------------gt 0011
0110 0101
Example 2.
34Addition
- Addition is analogous to decimal addition with
normal binary addition taking place from right to
left. For example,
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Where the result of any addition exceeds 9(1001)
then six (0110) must be added to the sum to
account for the six invalid BCD codes that are
available with a 4-bit number. This is
illustrated in the example below
39- When one adds two BCD digits,
- if the binary sum is less than 1010, the
corresponding BCD sum digit is correct. - if the binary sum is greater than or equal to
1010, add (0110)2 to the corresponding BCD sum
digit and produce a carry.
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
448.4.1 Pipelining
- 1. Data enters a stage of the pipeline, and it
- will through go through different stages
- of arithmetic operation till the final
- computation is completed.
- Note Each stage only perform its specific
- function.
- 2. Pipeline improves the performance by
- overlapping computation that is, each stage
- operate on different data simultaneously.
-
45Compare process time between Pipeline and
Non-Pipeline
- Consider this code
- For i 1 to 100 Do Ai (Bi Ci)
Di - (assume multiplication Addition require
10ns) - Non-pipeline 10ns 2 100 2000ns
-
46 An Example on Arithmetic Pipeline
- Consider the following snippet code
- FOR I 1 to 100 DO AI?(BI . CI)
DI - Assume that each operation, multiplication and
addition, requires 10 ns to complete. A
non-pipelined uniprocessor take 20 ns to
calculate AI, and 2000 ns to execute the code. - A pipeline unit could break this computation into
two stages as shown in the next slide.
47A two-stage pipeline to implement the program loop
- Bi
- Ci
- CLK
- Di
-
CLK -
- Stage 1 ()
Stage 2 ()
Latch
Latch
Latch
48Example A two-stage pipeline to implement the
program loop
During the first 10 ns, the first stage
calculates B1.C1. In the next 10 ns, stage 2
adds this value to D1and stores the result in
A1. At the same time, stage 1 multiplies B2
and C2. During the following 10 ns, stage 1
forms B3.C3 and stage 2 calculate the final
value A2. Instead of 2000 ns which a
non-pipelined uniprocessor would take, this
pipeline executes the code in 1010 ns.
49Speedup for Pipeline
- A pipelines speedup S is the time needed to
process n pieces of data using non-pipeline - (T1), divided by the time needed to process same
data using k-stage pipeline (Tk). - Speedup of pipeline can expressed as
n
nT1
Sn
(k n -1) Tk
50Continue
- Apply Speedup expression using previous example
T1, is the time to to process n pieces data using
non- pipeline ( and ).
n
100 20ns
S100
(2 100 - 1) 10ns
1.98
Tk
k Two stages ( and )
51Speed up
- The most popular metrics used to measure the
performance of a pipeline are throughput (the
number of results generated per time unit) and
speedup. - A pipelines speedup, Sn, is the amount of time
needed to process n pieces of data using a
non-pipelined arithmetic unit, divided by the
time need to process the same data using a
k-stage pipeline. - Sn nT1 / (k n - 1) x Tk
52Calculating Speedup
- Sn nT1 / (k n -1) x Tk
- T1 the time required to calculate using
non-pipeline - Tk is the the clock period of the k-stage
pipeline - The pipeline unit requires k time units, each
of duration Tk, to move the first piece of data
to the pipeline. Using the last example, the
speedup can be calculated as followed - S100 100x20ns / (2 100 - 1) x 10ns
- 1.98
- As n approaches infinity, n / (k n - 1)
approaches 1, so S? T1 / Tk - The maximum speedup occurs when each stage has
the same delay.
538.4.2 Lookup Tables
- 1. Theoretically, any combinatorial circuit can
- be implemented by a ROM configured as a
- lookup table.
- 2. The ROM is programmed with data such
- that the correct values are output for any
- possible input values.
- 3. A 4x1 ROM is like a two-input AND gate.
- The input of the AND gate serves as the
- address input for ROM, and the output of
- ROM corresponds to the AND gate output.
54Lookup ROM equivalent to AND gate
A1 ROM D0 A0
x
Address Data 0 0
0 0 1 0 1 0
0 1 1 1
4 x 1
x y
y
x
AND
x y
y
All possible And gate inputs are stored in ROM