Chapter 8 Problems presentation

About This Presentation

Transcript and Presenter's Notes

Title: Chapter 8 Problems

1
Chapter 8 Problems

Prof. Sin-Min Lee
Department of Mathematics and Computer Science

2
(No Transcript)
3
(No Transcript)
4
Addition And Subtraction

For both the non-negative and twos complement
notations, addition and subtraction are fairly
straightforward.
However, when the result cannot be represented as
an 8-bit value, a problem arises. For the
non-negative notation, consider the addition
2551, 1111 1111 0000 0001. Straight binary
addition yields the result 1 0000 0000, a 9-bit
value, which cannot be stored in an 8-bit
register.

5
Overflow

Arithmetic overflow The extra bit generates a
carry out from the parallel adder. In
non-negative notation, this carry bit can set an
overflow flag, signaling the rest of the system
that an overflow has occurred, and that the
result generated is not entirely correct. The
rest of the system can either fix the result of
handle the error appropriately.

6
(No Transcript)
7
Overflow

In twos complement notation an overflow can
occur at either end of the numeric range. At the
positive end, adding 1271, 0111 1111 0000
0001, yields a result of 1 0000 0000. However,
in twos complement notation, this is -128, not
the desired value of 128. In this notation, the
key to recognizing overflow is to check not only
the carry out, but also the carry in to the most
significant bit of the result. If two carries
are equal, then there is no overflow.

8
Overflow

Overflow only occurs when two numbers with the
same sign are added. Adding two numbers with
different signs always produces a valid result.

9
Overflow generation in unsigned twos complement
addition

126 01111110
1 00000001
01111111
00

127 01111111
1 00000001
10000000
01

-127 10000001
(-1) 11111111
10000000
11

-128 10000000
(-1) 11111111
01111111
10

Whether there is an overflow or not depends on
the interpretation (signed or unsigned number).

11
each partial product is the product of one bit of
the multiplier times the multiplicand. There are
as many partial products as there are bits in the
multiplier, and there are as many bits in each
partial product as there are bits in the
multiplicand.
12
(No Transcript)
13
In looking for an algorithmic statement of this
approach to binary multiplication, it was found
that a group of Russian peasants used precisely
this method to multiply decimal numbers, and as a
result, the binary multiplication algorithm given
here is commonly known as the Russian Peasant
Algorithm
14
(No Transcript)
15
The best of these fast multiplication algorithms
was developed by HP for the HP PA RISC
architecture. In developing the first generation
of this architecture, the HP designers concluded
that a hardware multiply instruction was not
justified, because most multiplies are multiplies
by constants and can be replaced by add-and-shift
instructions (as on the Hawk) and because the
rare multiply of one variable by another could be
done quickly enough in software.
16

The HP PA RISC multiply algorithm is based on the
following notions
First, do the multiply in base 16, so each step
involves multiplying the multiplicand by the
least significant 4 bits of the multiplier and
then adding the partial product to the result.
To multiply the multiplicand by one hex digit of
the multiplier, use an efficient case/select
control structure to select one of 16 different
blocks of code for multiplying by a constant.
Each of these blocks will be only a few
instructions long because of the use of
add-and-shift instructions.
To avoid the cost of loop control instructions,
note that the fixed word size implies that each
multiply can be done in a fixed number of
iterations (4 hex digits in a 16 bit multiplier,
or 8 hex digits in a 32 bit multiplier). So, just
write out the body of the loop 4 or 8 times in
series and eliminate any need for loop counters.

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
What are BCD Numbers?

Binary Coded Decimal numbers are actually binary
numbers that are "coded" to represent decimal
numbers
Coded numbers are "not real numbering systems".
In fact, they are just what they say they are,
"codes that represent actual numbers". Although
the actual numbers can be mathematically
manipulated, codes follow no such rules.

30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
Example 1 BCD (365)_10 -------------gt 0011
0110 0101
Example 2.
34
Addition

Addition is analogous to decimal addition with
normal binary addition taking place from right to
left. For example,

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Where the result of any addition exceeds 9(1001)
then six (0110) must be added to the sum to
account for the six invalid BCD codes that are
available with a 4-bit number. This is
illustrated in the example below
39

When one adds two BCD digits,
if the binary sum is less than 1010, the
corresponding BCD sum digit is correct.
if the binary sum is greater than or equal to
1010, add (0110)2 to the corresponding BCD sum
digit and produce a carry.

40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
8.4.1 Pipelining

1. Data enters a stage of the pipeline, and it
will through go through different stages
of arithmetic operation till the final
computation is completed.
Note Each stage only perform its specific
function.
2. Pipeline improves the performance by
overlapping computation that is, each stage
operate on different data simultaneously.

45
Compare process time between Pipeline and
Non-Pipeline

Consider this code
For i 1 to 100 Do Ai (Bi Ci)
Di
(assume multiplication Addition require
10ns)
Non-pipeline 10ns 2 100 2000ns

46
An Example on Arithmetic Pipeline

Consider the following snippet code
FOR I 1 to 100 DO AI?(BI . CI)
DI
Assume that each operation, multiplication and
addition, requires 10 ns to complete. A
non-pipelined uniprocessor take 20 ns to
calculate AI, and 2000 ns to execute the code.
A pipeline unit could break this computation into
two stages as shown in the next slide.

47
A two-stage pipeline to implement the program loop

Bi
Ci
CLK
Di
CLK
Stage 1 ()
Stage 2 ()

Latch

Latch
Latch
48
Example A two-stage pipeline to implement the
program loop
During the first 10 ns, the first stage
calculates B1.C1. In the next 10 ns, stage 2
adds this value to D1and stores the result in
A1. At the same time, stage 1 multiplies B2
and C2. During the following 10 ns, stage 1
forms B3.C3 and stage 2 calculate the final
value A2. Instead of 2000 ns which a
non-pipelined uniprocessor would take, this
pipeline executes the code in 1010 ns.
49
Speedup for Pipeline

A pipelines speedup S is the time needed to
process n pieces of data using non-pipeline
(T1), divided by the time needed to process same
data using k-stage pipeline (Tk).
Speedup of pipeline can expressed as

n
nT1
Sn
(k n -1) Tk
50
Continue

Apply Speedup expression using previous example

T1, is the time to to process n pieces data using
non- pipeline ( and ).
n
100 20ns
S100
(2 100 - 1) 10ns
1.98
Tk
k Two stages ( and )
51
Speed up

The most popular metrics used to measure the
performance of a pipeline are throughput (the
number of results generated per time unit) and
speedup.
A pipelines speedup, Sn, is the amount of time
needed to process n pieces of data using a
non-pipelined arithmetic unit, divided by the
time need to process the same data using a
k-stage pipeline.
Sn nT1 / (k n - 1) x Tk

52
Calculating Speedup

Sn nT1 / (k n -1) x Tk
T1 the time required to calculate using
non-pipeline
Tk is the the clock period of the k-stage
pipeline
The pipeline unit requires k time units, each
of duration Tk, to move the first piece of data
to the pipeline. Using the last example, the
speedup can be calculated as followed
S100 100x20ns / (2 100 - 1) x 10ns
1.98
As n approaches infinity, n / (k n - 1)
approaches 1, so S? T1 / Tk
The maximum speedup occurs when each stage has
the same delay.

53
8.4.2 Lookup Tables

1. Theoretically, any combinatorial circuit can
be implemented by a ROM configured as a
lookup table.
2. The ROM is programmed with data such
that the correct values are output for any
possible input values.
3. A 4x1 ROM is like a two-input AND gate.
The input of the AND gate serves as the
address input for ROM, and the output of
ROM corresponds to the AND gate output.

54
Lookup ROM equivalent to AND gate

A1 ROM D0 A0
x
Address Data 0 0
0 0 1 0 1 0
0 1 1 1
4 x 1
x y
y
x
AND
x y
y
All possible And gate inputs are stored in ROM

Write a Comment

User Comments (0)

About PowerShow.com

Chapter 8 Problems PowerPoint PPT Presentation