Bit operations and number representations

About This Presentation

Title:

Bit operations and number representations

Description:

Radix-complement is the inverse with respect to addition ... 9999 A is 9-complement (radix-minus-one complement) of A //1s complement in base 2 ... – PowerPoint PPT presentation

Number of Views:286

Avg rating:3.0/5.0

Slides: 68

Provided by: berrinya

Category:

more less

Transcript and Presenter's Notes

Title: Bit operations and number representations

1
Bit operations and number representations
2
Representation of Data

All data in a the computers memory is
represented as a sequence of bits
Bit unit of storage, represents the level of an
electrical charge. Can be either 0 or 1.
_
0/1
A bit sequence can represent many different
things
We will see that the bit string 10000010 can
mean several different things depending on the
representation that is agreed upon.
So, how should to represent integers, characters,
real numbers, strings, structures, in terms of
bits?
Representations must be efficent and convenient

3
Numbers

Fundamental problem
Fixed-size representation (e.g. 4 bytes for
integers) cant encode all numbers
Limit number range and precision.
Usually sufficient in most applications,
But a potential source of bugs
Other problems
How to represent real numbers
How to represent negative numbers, floating
points...?
- Historically, many different representations.
How to do addition, subtraction etc ?

4
Base 2 unsigned numbers

0 0 0 0 0 0 0 0 ? 0 //8-bit binary
representation of positive integers
0 0 0 0 0 0 0 1 ? 1
0 0 0 0 0 0 1 0 ? 2
0 0 0 0 0 0 1 1 ? 3
...
1 1 1 1 1 1 1 1 ? 255
Representation an n-bit number A in base b has
decimal value
Example for base 2 (binary) 1011 1 x 20 1 x
21 0 x 22 1 x 23

5
Sign/Magnitude representation (also called
signed representation)

use one of the bits (the first bit Most
Significant Bit) as a sign bit.
use the rest for magnitude
e.g.
000 0
001 1
010 2 positive numbers
011 3
100 -0
101 -1
110 -2 negative numbers
111 -3
range -(2 (n-1)-1) to (2 (n-1) -1), where n is
the total number of bits
For n 4, -23-1 , 23-1
-7 , 7

6
Alternative representations

Most computers dont use a sign and magnitude
representation
Drawbacks of the Sign-Magnitude representation
two 0s one positive one negative
addition and subtraction involving negative
numbers are complicated
Alternatives?
1s complement representation
2s complement representation
These different representations simplify the
hardware

7
Signed 1s complement

Use the first bit to indicate the sign (0 for
positive numbers and 1 for negative numbers)
Positive numbers first bit is 0, and the rest is
the binary equivalent of the number.
Negative numbers represented by the 1s
complement of the corresponding positive number
1s complement invert all the bits
e.g since 8 0000 1000 (0 for sign, and 000
1000 for 8)
- 8 1111 0111
How about 0?

8
Number 1s-complement 7
0111 6 0110 5 0101 4 0100 3 0011 2
0010 1 0001 0 0000 -0 1111 -1 111
0 -2 1101 -3 1100
-4 1011 -5 1010 -6 1001 -7 1000

As with the signed representation,
there is a and - 0

Range -2n-1-1, 2n-1-1 For n 4,
-23-1 , 23-1 -7 , 7

9
Signed 2s complement

Signed 2s complement is the common
representation for signed numbers
First bit is the sign bit ( 0 for positive and 1
for negative)
For positive numbers, the rest of the bits are
the binary equivalent of it.
Negative numbers are represented by the 2s
complement of the corresponding positive number.
2s complement invert all bits and add 1
(or copy all the bits from right to left until
and including the first 1, invert the rest)
Ex 8 0000 1000
-8 1111 1000
single 0
addition and subtraction complexities is
simplified
note the range (one more negative) -2 (n-1) ...
(2 (n-1) -1)
todays standard for representing integers

Range -2n-1, 2n-1-1 For n 4, -23 ,
23-1 -8 , 7
10
Possible Representations summary

Sign Magnitude One's
Complement Two's Complement 000
0 000 0 000 0 001 1 001 1 001
1 010 2 010 2 010 2 011 3 011
3 011 3 100 -0 100 -3 100 -4 101
-1 101 -2 101 -3 110 -2 110 -1 110
-2 111 -3 111 -0 111 -1
Notice Positive numbers are represented the same
way (same bit strings) in all representations!
Issues in choosing a representation scheme
number of zeros, ease of arithmetic operations

11
The Complement Theory

Radix-complement is the inverse with respect to
addition
The following equation holds when subtracting one
number from another in FIXED decimal width (here
with 4-digit decimals)
Y B A
B A (9999 1 - 10000)
B (9999 - A 1) 10000
B (9999 A 1) 10000
9999 A is 9-complement (radix-minus-one
complement) of A //1s complement in base 2
9999 A 1 is 10-complement (radix
complement) of A //2s complement in
base 2
In k-bit binary, 2s comp., we can benefit from
the same feature (i.e. we can add the 2s
complement and drop the overflow bit, instead of
subtraction)
-A 2k-A ( considering them
unsigned numbers - look at slide 10)
Hence, Y B A
B (2k-A)
I.e. we can use addition of complement instead of
subtraction.
Similarly, AA mod 2k 2k-A A
mod 2k 0

12
Why the Complement? Example

No borrowing is necessary when subtracting
6142 6142- 4816 5184 1326 11326
- 10000
1326
Note that 4816 9999 5184 1 (10s
complement)
Due to fixed width of the registers, the leading
1 is lost automatically due to carry overflow.

13
Two's Complement Negation

Negating a two's complement number
Start at least significant bit. Copy through the
first 1 after that, invert each bit.
Example 0010101100
1101010100
Alternatively, invert all bits and add one to the
most significant bit
If you negate twice, you will arrive to the same
number (as you should)
0011 3
1101 -3
0011 3

14
Properties of Complements

Let a k-bit number X have a 1s complement 1X and
a 2s complement 2X.
Then, the following hold (considering them as
unsigned values)
X 1X 2k 1 since 2k 1 is the
max. number you can represent with k bits
(e.g., 0011 3 1100 12 15)
X 2X 0 (mod 2k) since 2X 1X 1 by
definition

15
Decimal conversion

If you are given a bit string, you can find the
decimal equivalent depending on the number
representation used. E.g. find the decimal
equivalent of e.g. 1001 0010
if signed representation is used
10010010 is equivalent to 18 (- 1x16 1x2)
if 1s complement representation is used
invert all bits gt 01101101
find the positive number corresponding to the
negated string
1x641x321x81x41 109
10010010 is equivalent to 109
note that this is the reverse operation of what
we would do if we wanted to find the bit
representation of 109 (find the bit rep. of 109,
take 1s complement)

16
Decimal conversion ctd.

If you are given a bit string, you can find the
decimal equivalent depending on the number
representation used. E.g. find the decimal
equivalent of e.g. 1001 0010
if 2s complement representation is used
invert all bits gt 0110 1101
add 1 gt 0110 1110
find the positive number corresponding to the
negated string (01101110)
1x641x321x81x41x2 110
10010010 is equivalent to 110
note that this is the reverse operation of what
we would do if we wanted to find the bit
representation of 110 (find the bit rep. of 110,
take 2s complement)
2s complement of 01101110 110 10 is 10010010

17
Alternative decimal conversion 2s comp.

You can also directly/quickly find the decimal
equivalent of a 2s complement number
use the usual binary to decimal conversion,
using at the most significant bit the negative of
the coefficient

26
25
24
23
22
21
20

Hence 100100102 -1x27 1x24 1x21 -11010

18
conversion to decimal with 32 bit numbers 2s
comp.

Same idea as 8 bit 2s complement integers, but
the most significant bit is 231.

-2,147,483,648 64 32 16 8 4 2 1
-231
27
26
25
24
23
22
21
20
-231 230 ... 26
... 20
19
Miscalenaeous

Converting n bit numbers into numbers with more
than n bits
copy the most significant bit (the sign bit) into
the other bits 0010 -gt 0000 0010 1010 -gt
1111 1010

4-bit 2s complement 8-bit 2s complement
numbers numbers
20
Important Note!

2s complement (or twos complement) does not
mean a negative number!
2s complement is a representation used to
represent all integers, not just negative
integers!
So 2s complement is a format specification, but
we also talk about the (2s) complement of a
number as its negation
e.g. when we want to negate a number (-4 -gt 4 or
4 -gt -4), not necessarily a negative/positive
number, we may say take its 2s complement

21
Arithmetic Overflow
22
Addition Subtraction

Just like in grade school (carry 1)
Two's complement operations are easy
subtraction using addition of negative numbers
subtracting 6 from 7 is adding 6 to 7
0111 ( 7) 1010 (-6)
1 0001 ( 1)
Overflow is the only problem (result too large
to fit in the allocated space)
adding two n-bit numbers does not yield an n-bit
number
0111 ( 7) 0001 ( 1) note that overflow
term is somewhat misleading, 1000 (-8) it does
not mean a carry overflowed, but that the
result does not fit in 4 bits (8 cannot be
represented in 4 bits 2s compl).The above
subtraction example (7-6) is NOT an overflow!

23
Overflow - definition

How can we tell when too many bits in the result
means overflow and when its OK?
overflow means the right answer wont fit !
Overflow
If the sign of the numbers is the same -AND-
the sign of the result is different than the
sign of the numbers,
then we have overflow!

24
Overflow and 8 bit addition
1
1
1
1
01111000 01111000
11110000
Overflow!
It fits, but its still overflow!
Reminder Max 2s comp. Range with 8 bits -128
to 127 01111000 1x64 1x32 1x16 1x8
12010 11110000 -1x128 1x64 1x32 1x16
-1610
25
Detecting Overflow

There cant be an overflow when adding a positive
and a negative number
There cant be an overflow when signs are the
same for subtraction
Why?
Overflow occurs when the value affects the sign
overflow when adding two positives yields a
negative
or, adding two negatives gives a positive
or, subtract a negative from a positive and get a
negative (similar to 1)
or, subtract a positive from a negative and get a
positive (similar to 2)
Overflow is detected at hardware level (simple
comparison of the sign bits).
You as a programmer is expected to
handle the overflow once it is detected (warn the
user, not let the program crash etc).

26
Built-in Types

Numbers int, float,
Char

27
Positive Numbers

If we will only deal with positive integers, you
should define your data type as unsigned
unsigned char
unsigned int
unsigned long int
...
and use the full range and interpret the results
as the positive binary equivalent
e.g. unsigned char c 255 //11111111

28
Numbers

Three most common today
Unsigned for non-negative integers
Twos complement for integers (negative or
positive)
IEEE 754 floating-point for reals
Unless otherwise noted (as unsigned etc.), always
assume that numbers we consider are in 2s
complement representation.

29
Integer Ranges

Unsigned UMinn UMaxn 0 2n-1
32 bits 0 ... 4,294,967,295 unsigned int
64 bits 0 ... 18,446,744,073,709,551,615 unsigne
d long int
2s Complement TMinn TMaxn -2n-1 2n-1-1
32 bits -2,147,483,648 ... 2,147,483,647 int,
long int
64 bits -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807
Note C/C numeric ranges are platform dependent!

30
Limits

You can include limits.h which defines this
ranges (depending on your platform/computer)
include ltlimits.hgt
Tip Type include ltlimits.hgt (or any other
filename) in your program, then go to that line,
and right click on the file name and choose Open
Document.That will bring you this header file.
You can do this in general and it will save you
the effort lo locate the file.

31
Bit Operations
32
Why we need to work with bits

Sometimes one bit is enough to store your data
say the gender of the student (e.g. 0 for men, 1
for women). We dont have a 1-bit type, so for
gender, you will have to use a char type
variable.
But if you need say 8 such 1-bit variables, say
to record if the student were present in the
times when attendance was taken, then you can
actually combine all into one char variable.
class student
private
unsigned char attendance
//now I can fit 8 bits into this, we will
see how

33
Packing bits

Packing 8 1-bit variables into 1 char variable
is easy.
Say you know that the student were present in the
first 3 class when attendance was taken and not
in the last 5.The variable attendance can be set
as
unsigned char attendance 0x07

where the last 3bits represent the first 3
attendances (just a choice, it could be the other
way around as well). But in addition to being
able to set a variables value, we need to be
able to handle each bit separately, for which we
need bit operators.
34
Bit Operators

Bitwise and
Bitwise or
Bitwise exclusive or
Complement
ltlt shift left
gtgt shift right
Do not confuse with and
with

35
Bitwise Operations AND

Take the AND of the two numbers, bit by bit.

char x,y,z x 0xb5 y 0x6c zxy
x
y
z
36
Bitwise AND

unsigned char c1, c2, c3
c1 0x45
c2 0x71
c3 c1c2
c1 0100 0101
c2 0111 0001
c3 0100 0001 (0x41 4161 65 10)

37
Bitwise OR

Take the OR of the two numbers, bit by bit.
unsigned char c1, c2, c3
c1 0x45
c2 0x71
c3 c1 c2
c1 0100 0101
c2 0111 0001
c3 0111 0101 (0x75 7x165 11710 )

38
Bitwise Complement

Complement operation () converts bits with value
0 to 1 and bits with value 1 to 0.
unsigned char b1 0x01 //0000 0001
unsigned char b4 0x08 //0000 1000
b4 b1 //1111 1110

39
Self-Quiz what do these twostatements do?

char x 0xA5
if ( x 0x01 )
//what does this mean?
x x 0x02
//what happened to x?
These are two of the most important bit
operations! We will see more later, but basically
you can access a particular bit and you can set a
particular bit with these two opertions.

40
Logic Operators versus Bitwise Logic Ops.

The operators , and ! are not bitwise logic
operators!
The result of and is an integral data type
with the value 0 (every bit is a zero) or 1
(LeastSignificant bit is 1, all the others are
0).
The if statement (e.g. if (a b)) treats any
non-zero value as TRUE, and only the value 0 as
FALSE.

41
Shift Operators

Shift operators move bits left or right, filling
the other side with 0s.
ltlt means shift left
gtgt means shift right
y x ltlt 1

how many times it is shifted
what is shifted
x
y
42
Bit operations left shift

Suppose we want to shift the bits of a number N,
k bits to the left
denoted N ltlt k
drop leftmost k bits
append k 0s to the right
Ex
unsigned char c2 0x1C //00011100
c c2 ltlt 1 //00111000
Note that shifting a number left by one position
is equal to multiplying it by 2 (provided that
the result is in the range)
What is the effect of shifting a number left by
3?

43
Bit Shifting as Multiplication

Shift left (x ltlt 1) multiplies by 2
-Works as multiplication for both unsigned 2s
complement numbers
-Can overflow.
Why is 1101 -3? (remember 2s complement
numbers)
1101 1x-23 1x22 0x21 1x20 -8 4 1
-3
1010 1x-23 0x22 1x21 0x20 -8 2 -6

44
Bit operations right shift

As opposed to the left shift, the right shift
works differently for signed or unsigned numbers.
Suppose we want to shift N by k to the right
(denoted N gtgt k)
For unsigned numbers
drop rightmost k bits
append k 0s to the left
For signed numbers
drop the rightmost k bits
append the sign bit k times to the left

45
right shift operatorexamples

Signed (all your variables are signed unless you
specify unsigned specifically)
positives
char c 8 //0000 1000
c c gtgt 2 //0000 0010 210
negatives
char c - 8 //1111 1000 in 2s comp. 0xF8
c c gtgt 2 //1111 1110 -210
Called arithmetic shift
Unsigned
unsigned char d 0xF8 //1111 1000 (24810)
d d gtgt 2 //0011 1110 (6210)
Called logical shift

46
right shift operator details

negatives
char c - 8 //1111 1000 in 2s comp.
c c gtgt 2 //1111 1110 -210
Reminder Why is 8 1111 1000?
1111 1000 - (00000111 1) - (8)
(1s complement 1)
What if we had filled with 0s instead of the sign
bit?
it would not satisfy the shift as multiplication
concept.

47
Bit Shifting as Division summary
48
Bit Shifting as Multiplication Division

Why useful?
Simpler, thus faster, than general multiplication
division
This is a standard compiler optimization
Can shift multiple positions at once
Multiplies or divides by corresponding power of
2.
a ltlt 5 (multiply by 25)
a gtgt 5 (divide by 25)

49
Signed vs Unsigned Important

What happens when we say
unsigned char c 0x80 //1000 0000
char d 0x80 //1000 0000
These two statements fill the corresponding byte
with the same bit string, which is clearly
indicated by the hex number 0x80.
But their interpreted decimal values are
different, because we told the computer that in
one case we will not use the sign bit for
magnitude (c), and in the other, we said that the
most significant bit (MSB) should be reserved for
the sign bit.
c is 128
d is 128 (-1280-128 since our machine uses the
2s complement representation)
3) Shifting operation will depend on type
c c gtgt 2 //will fill with 0s c will
be 0010 0000 0x20
d d gtgt 2 //will fill with sign bit d will
be 1110 0000 0xe0
4) When you print these, the characters that
correspond to the values in c and d will be
printed (same character for this case, but does
not have to be)
cout ltlt c //will print what corresponds 128
(which is Ç)
cout ltlt d //will print what corresponds 128.
Really fine print which does not relate to
this topic There isnt a character mapped to
128, but what is printed is what corresponds
to (256-128 128), since character mapping
wraps-around, or equivalently only the magnitude
part of the char variable is used for the
mapping.

50
Signed vs Unsigned Important

What happens when we say
unsigned char c 0x80 //1000 0000
char d 0x80 //1000 0000
..
4)
myarrayc will access the 128th element
myarrayd will access the -128th element
problem
Hence, even though the internal bit
representations are the same, the interpretation
of signed and unsigned nums will be different,
which may sometimes cause problems.

51
Self-Quiz

What are the resulting values of x,y and z
char x,y,z
x 0x33
x (x ltlt 3) 0x0F
y (x gtgt 1) 0x0F
z x y

What are all these for
knowing how numbers are represented and the
ranges of various data types, preventing
unintended behaviour
to set bit flags
to pack 8 bits into byte etc.
used to set a flag byte where one bit correspond
to one flag/error etc. see next slide
to pack bits of binary images
...

53
Example 1 setting, testing, and clearing the
bits of bytes
const int IO_ERROR 0x01 //LSB (1st
right-most bit) const int CHANNEL_DOWN
0x10 char flags 0 //if ever CHANNEL_DOWN
event happens, set its corresponding bit in the
flags variable flags flags CHANNEL_DOWN
// set the 5th bit . //to check what errors
may have happened... if ((flags IO_ERROR ) !
0) // check the 1st bit cout ltlt I/O
error flag is set" else if ((flags
CHANNEL_DOWN) ! 0) // check the 5th bit
cout ltlt Channel down error flag is set" flags
flags IO_ERROR // clear the ERROR
flag //This is also called masking
54
Example 2 packed bitmaps

Similar to the previous code (flags), in packing
a binary (0/1) image, we need to set the bits of
a byte independently.
Say your images first byte in row y needs to
have 00001001, for the first 8 bits in that row
You need to set the ON bits (5th and 8th bits
from left, in here)
We do this by having a bitmask (0x80) that we
shift to obtain a byte with only one column set
(e.g. 00001000 or 00000001) and then
Bitwise-ORin with the already accumulated data
...

55
Example packed bitmaps

Say your images first byte in row y needs to
have 00001001, for the first 8 bits in that row
You need to set the ON bits (5th and 8th bits
from left, in here)
Bitmapy0 0
BitMask 0x80 //1000 0000
//Shift BitMask to have the single 1 in the
appropriate column
BitMask BitMask gtgt 4 //0000 1000
Bitmapy0 Bitmapy0 BitMask //0000
1000
BitMask 0x80
//Shift BitMask to have the single 1 in the
appropriate column
BitMask BitMask gtgt 7 //0000 0001
Bitmapy0 Bitmapy0 BitMask //0000
1001

56
General case for packed bitmaps SKIP

Assume you are reading a special file format
where only the ON pixels are marked with their
col number
while(!input.eof())
BitMask 0x80 //1000 0000
...
/ read the next ON/black column (where to put
the 1)/
if (inputgtgtcol)
//col is from the left starting from 0
BitMask BitMask gtgt (col8)
//we take mod so that when col 9, it still
works
//but for the next byte
BitMaprowcol/8BitMaprowcol/8
BitMask
...

57
Floating Point Representation
58
Floating Point (a brief look)

We need a way to represent
numbers with fractions, e.g., 3.1416
very small numbers, e.g., .000000001
very large numbers, e.g., 3.1 x 1020
Solution A floating (decimal) point
representation
IEEE 754 floating point representation is the
standard
- --------------------------
---- --------- /- .. X 2
-------
sign mantissa
exponent
single precision 1 bit sign, 23 bit significand
(mantissa), 8 bit exponent
more bits for significand gives more accuracy
more bits for exponent increases range
Range approximately ?1044 to 1038

59
IEEE Floating Point Std. - Details

The Mantissa
The mantissa, also known as the significand,
represents the precision bits of the number.
To find out the value of the implicit leading
bit, consider that any number can be expressed in
scientific notation in many different ways. For
example, the number five can be represented as
any of these
5.00 100
0.05 102
5000 10-3
In order to maximize the quantity of
representable numbers, floating-point numbers are
typically stored in normalized form. This
basically puts the radix point after the first
non-zero digit. In normalized form, five is
represented as 5.0 100.

60
Floating Point what floats?

For simplicity, lets use a decimal
representation and assume we have 1 digit for
sign,
8 digits for the mantissa and 3 digits for the
exponent
/- - - - - - - - - - - -
We will illustrate the format for the number
0.000000000023
-10
0.000000000023 . 23 0 0 0 0 0 0 x 10 - - -
So it will be stored as . 2 3 0 0 0 0 0 0 -
1 0
mantissa
exponent
The actual IEEE Floating point representation
follows this principle, but differs from this in
details
- normalization (floaing point comes
after the first nonzero digit)
- binary instead of decimal
- exponent (not sign/magnitude but
biased)

61
IEEE Floating Point Std. - Normalization

Since the only possible non-zero digit is 1, in
the IEEE floating point standard, we can just
assume a leading digit of 1, and don't need to
represent it explicitly. As a result, the
mantissa has effectively 24 bits of resolution,
by way of 23 fraction bits.

62
IEEE Floating Point Std. - Binary

We convert decimal to binary, simply as
decimal -.75 - (0.5 0.25)
binary -.11 (since we have bits for
22 21 20 . 2-1 2-2 etc)
canonical form -1.1 x 2-1 (note shifting the
radix point by k is same as multip./dividing by
radixk)
Stored sign -
Stored mantissa .100000000 since leading
bit is always 1
Stored exponent -1 (basically but I wont go
into details, a bias is actually used)
decimal 8.625 80.50.125
binary 1000.101
canonical form 1.000101 x 23
Stored sign
Stored mantissa .00010100 since leading bit
is always 1
Stored exponent -1 (basically but I wont go
into details, a bias is actually used)

63
Bias why?

Since we want to represent both positive and
negative exponents, e.g. 1011 and 10-11, we can
do two things
Reserve a separate sign bit for the exponent
Use only positive exponents, together with a bias
The bias (e.g. 127) is subtracted from whatever
is stored in the exponent, to find the real
exponent
Stored exponent 0 real exponent 0 127
-127
Stored exponent227 real exponent 227 127
100

64
Bias of the Exponent

The Exponent
The exponent field needs to represent both
positive and negative exponents. To do this, a
bias is added to the actual exponent in order to
get the stored exponent.
For IEEE single-precision floats, this value is
127.
Thus,
if the real exponent is zero, 127 is stored in
the exponent field.
if 200 is stored in the exponent field, it
actually indicates a real exponent of
(200-127), or 73.
Exponents of -127 (all 0s) and 128 (all 1s) are
reserved for special numbers (NaN, Infnty)

65
IEEE 754 floating-point standard summary

Leading 1 bit of significand is implicit
Exponent is biased to make sorting easier
all 0s is smallest exponent, all 1s is largest
bias of 127 for single precision (note addition
of the bias while storing, subtracting of the
bias while converting to decimal)
Decimal equivalent (1)sign (1significand)
2exponent - bias
Example
decimal -.75 - (0.5 0.25)
binary -.11
canonical form -1.1 x 2-1 (note shifting the
radix point by k is same as multip./dividing by
radixk)
stored exponent 126 01111110
Resulting IEEE single precision representation
1 10000000000000000000000 01111110

sign mantissa
exponent
66
A more complex example

Let us encode the decimal number -118.625 using
the IEEE 754 system.
First we need to get the sign, the exponent and
the fraction. Because it is a negative number,
the sign is "1".
Now, we write the number (without the sign i.e.
unsigned, no two's complement) using binary
notation. The result is 1110110.101 (notice how
we represent .625)
Next, let's move the radix point left, leaving
only a 1 at its left
1110110.101 1.110110101 26. This is the
normalized floating point number. The mantissa is
the part at the right of the radix point, filled
with 0 on the right until we get all 23 bits.
That is 11011010100000000000000.
The exponent is 6, but we need to bias it and
convert it to binary (so the most negative
exponent is stored as 0, and all exponents are
non-negative binary numbers). For the 32-bit IEEE
754 format, the bias is 127 and so the stored
exponent is 6 127 133. In binary, this is
written as 10000101.
Putting them all together
This example is from wikipedia.

67
IEEE Floating Point Ranges
Explanation for minimum positive (just a sign
chg. for negative)
.00000000....0 000000001 (01) x
21-127 1.0 x 2-126 0
1 23 bits mantissa
8 bits exponent
Note1 Exponent 00000000 is reserved for
special numbers, so min is 00000001 Note2
Approx. conversion between 2s powers and 10s
powers Ex. 2-149 10-44.85 since
23.3 10 and 149/3.3 45
68
IEEE Floating Point Ranges
Explanation for maximum positive (just change
sign for negative)
.111.....1 11111110 (1- 2-23 1) x
2254-127 1.0 x 2127 1- 2-23
254 23 bits mantissa 8
bits exponent
Note1 Since it represents the part after the
radix point, .1111111 1-2-23 , just as
.11 1-2-2 Note2 11111111 as exponent is
reserved for special numbers, so max is 11111110
69
Summary

Computer arithmetic is constrained by limited
precision
Bit patterns have no inherent meaning but
standards do exist
twos complement
IEEE 754 floating point
Computer instructions determine meaning of the
bit patterns
http//babbage.cs.qc.edu/courses/cs341/IEEE-754.ht
ml

70
Floating Point Complexities

In addition to overflow we can have underflow
A number that is smaller than what is
representable (e.g. lt 2-126)
Accuracy can be a big problem
IEEE 754 keeps two extra bits, guard and round
four rounding modes
positive divided by zero yields infinity
zero divide by zero yields not a number
other complexities

Write a Comment

User Comments (0)

About PowerShow.com

Bit operations and number representations - PowerPoint PPT Presentation

Bit operations and number representations

Radix-complement is the inverse with respect to addition ... 9999 A is 9-complement (radix-minus-one complement) of A //1s complement in base 2 ... – PowerPoint PPT presentation