Title: Bit operations and number representations
1Bit operations and number representations
2Representation of Data
- All data in a the computers memory is
represented as a sequence of bits - Bit unit of storage, represents the level of an
electrical charge. Can be either 0 or 1. - _
- 0/1
- A bit sequence can represent many different
things - We will see that the bit string 10000010 can
mean several different things depending on the
representation that is agreed upon. - So, how should to represent integers, characters,
real numbers, strings, structures, in terms of
bits? - Representations must be efficent and convenient
3Numbers
- Fundamental problem
- Fixed-size representation (e.g. 4 bytes for
integers) cant encode all numbers - Limit number range and precision.
- Usually sufficient in most applications,
- But a potential source of bugs
- Other problems
- How to represent real numbers
- How to represent negative numbers, floating
points...? - - Historically, many different representations.
- How to do addition, subtraction etc ?
4Base 2 unsigned numbers
- 0 0 0 0 0 0 0 0 ? 0 //8-bit binary
representation of positive integers - 0 0 0 0 0 0 0 1 ? 1
- 0 0 0 0 0 0 1 0 ? 2
- 0 0 0 0 0 0 1 1 ? 3
- ...
- 1 1 1 1 1 1 1 1 ? 255
- Representation an n-bit number A in base b has
decimal value - Example for base 2 (binary) 1011 1 x 20 1 x
21 0 x 22 1 x 23
5Sign/Magnitude representation (also called
signed representation)
- use one of the bits (the first bit Most
Significant Bit) as a sign bit. - use the rest for magnitude
- e.g.
- 000 0
- 001 1
- 010 2 positive numbers
- 011 3
- 100 -0
- 101 -1
- 110 -2 negative numbers
- 111 -3
-
- range -(2 (n-1)-1) to (2 (n-1) -1), where n is
the total number of bits - For n 4, -23-1 , 23-1
- -7 , 7
6Alternative representations
- Most computers dont use a sign and magnitude
representation - Drawbacks of the Sign-Magnitude representation
- two 0s one positive one negative
- addition and subtraction involving negative
numbers are complicated - Alternatives?
- 1s complement representation
- 2s complement representation
- These different representations simplify the
hardware
7Signed 1s complement
- Use the first bit to indicate the sign (0 for
positive numbers and 1 for negative numbers) - Positive numbers first bit is 0, and the rest is
the binary equivalent of the number. - Negative numbers represented by the 1s
complement of the corresponding positive number - 1s complement invert all the bits
- e.g since 8 0000 1000 (0 for sign, and 000
1000 for 8) - - 8 1111 0111
- How about 0?
8 Number 1s-complement 7
0111 6 0110 5 0101 4 0100 3 0011 2
0010 1 0001 0 0000 -0 1111 -1 111
0 -2 1101 -3 1100
-4 1011 -5 1010 -6 1001 -7 1000
- As with the signed representation,
- there is a and - 0
Range -2n-1-1, 2n-1-1 For n 4,
-23-1 , 23-1 -7 , 7
9Signed 2s complement
- Signed 2s complement is the common
representation for signed numbers - First bit is the sign bit ( 0 for positive and 1
for negative) - For positive numbers, the rest of the bits are
the binary equivalent of it. - Negative numbers are represented by the 2s
complement of the corresponding positive number. - 2s complement invert all bits and add 1
- (or copy all the bits from right to left until
and including the first 1, invert the rest) - Ex 8 0000 1000
- -8 1111 1000
- single 0
- addition and subtraction complexities is
simplified - note the range (one more negative) -2 (n-1) ...
(2 (n-1) -1) - todays standard for representing integers
Range -2n-1, 2n-1-1 For n 4, -23 ,
23-1 -8 , 7
10Possible Representations summary
- Sign Magnitude One's
Complement Two's Complement 000
0 000 0 000 0 001 1 001 1 001
1 010 2 010 2 010 2 011 3 011
3 011 3 100 -0 100 -3 100 -4 101
-1 101 -2 101 -3 110 -2 110 -1 110
-2 111 -3 111 -0 111 -1 - Notice Positive numbers are represented the same
way (same bit strings) in all representations! - Issues in choosing a representation scheme
number of zeros, ease of arithmetic operations
11The Complement Theory
- Radix-complement is the inverse with respect to
addition - The following equation holds when subtracting one
number from another in FIXED decimal width (here
with 4-digit decimals) -
Y B A - B A (9999 1 - 10000)
- B (9999 - A 1) 10000
- B (9999 A 1) 10000
- 9999 A is 9-complement (radix-minus-one
complement) of A //1s complement in base 2 - 9999 A 1 is 10-complement (radix
complement) of A //2s complement in
base 2 - In k-bit binary, 2s comp., we can benefit from
the same feature (i.e. we can add the 2s
complement and drop the overflow bit, instead of
subtraction) - -A 2k-A ( considering them
unsigned numbers - look at slide 10) - Hence, Y B A
- B (2k-A)
I.e. we can use addition of complement instead of
subtraction. - Similarly, AA mod 2k 2k-A A
mod 2k 0
12Why the Complement? Example
- No borrowing is necessary when subtracting
- 6142 6142-Â 4816 Â 5184 1326 11326
- 10000 - 1326
- Note that 4816 9999 5184 1 (10s
complement) - Due to fixed width of the registers, the leading
1 is lost automatically due to carry overflow.
13Two's Complement Negation
- Negating a two's complement number
- Start at least significant bit. Copy through the
first 1 after that, invert each bit. - Example 0010101100
- 1101010100
- Alternatively, invert all bits and add one to the
most significant bit - If you negate twice, you will arrive to the same
number (as you should) - 0011 3
- 1101 -3
- 0011 3
14Properties of Complements
- Let a k-bit number X have a 1s complement 1X and
a 2s complement 2X. - Then, the following hold (considering them as
unsigned values) - X 1X 2k 1 since 2k 1 is the
max. number you can represent with k bits - (e.g., 0011 3 1100 12 15)
- X 2X 0 (mod 2k) since 2X 1X 1 by
definition
15Decimal conversion
- If you are given a bit string, you can find the
decimal equivalent depending on the number
representation used. E.g. find the decimal
equivalent of e.g. 1001 0010 - if signed representation is used
- 10010010 is equivalent to 18 (- 1x16 1x2)
- if 1s complement representation is used
- invert all bits gt 01101101
- find the positive number corresponding to the
negated string - 1x641x321x81x41 109
- 10010010 is equivalent to 109
- note that this is the reverse operation of what
we would do if we wanted to find the bit
representation of 109 (find the bit rep. of 109,
take 1s complement) -
16Decimal conversion ctd.
- If you are given a bit string, you can find the
decimal equivalent depending on the number
representation used. E.g. find the decimal
equivalent of e.g. 1001 0010 - if 2s complement representation is used
- invert all bits gt 0110 1101
- add 1 gt 0110 1110
- find the positive number corresponding to the
negated string (01101110) - 1x641x321x81x41x2 110
- 10010010 is equivalent to 110
- note that this is the reverse operation of what
we would do if we wanted to find the bit
representation of 110 (find the bit rep. of 110,
take 2s complement) - 2s complement of 01101110 110 10 is 10010010
-
17Alternative decimal conversion 2s comp.
- You can also directly/quickly find the decimal
equivalent of a 2s complement number - use the usual binary to decimal conversion,
using at the most significant bit the negative of
the coefficient
26
25
24
23
22
21
20
- Hence 100100102 -1x27 1x24 1x21 -11010
18conversion to decimal with 32 bit numbers 2s
comp.
- Same idea as 8 bit 2s complement integers, but
the most significant bit is 231.
-2,147,483,648 64 32 16 8 4 2 1
-231
27
26
25
24
23
22
21
20
-231 230 ... 26
... 20
19Miscalenaeous
- Converting n bit numbers into numbers with more
than n bits - copy the most significant bit (the sign bit) into
the other bits 0010 -gt 0000 0010 1010 -gt
1111 1010
4-bit 2s complement 8-bit 2s complement
numbers numbers
20Important Note!
- 2s complement (or twos complement) does not
mean a negative number! - 2s complement is a representation used to
represent all integers, not just negative
integers! - So 2s complement is a format specification, but
we also talk about the (2s) complement of a
number as its negation - e.g. when we want to negate a number (-4 -gt 4 or
4 -gt -4), not necessarily a negative/positive
number, we may say take its 2s complement
21Arithmetic Overflow
22Addition Subtraction
- Just like in grade school (carry 1)
- Two's complement operations are easy
- subtraction using addition of negative numbers
- subtracting 6 from 7 is adding 6 to 7
- 0111 ( 7)Â 1010 (-6)
- 1 0001 ( 1)
- Overflow is the only problem (result too large
to fit in the allocated space) - adding two n-bit numbers does not yield an n-bit
number - 0111 ( 7) Â 0001 ( 1) note that overflow
term is somewhat misleading, 1000 (-8) it does
not mean a carry overflowed, but that the
result does not fit in 4 bits (8 cannot be
represented in 4 bits 2s compl).The above
subtraction example (7-6) is NOT an overflow!
23Overflow - definition
- How can we tell when too many bits in the result
means overflow and when its OK? - overflow means the right answer wont fit !
- Overflow
- If the sign of the numbers is the same -AND-
- the sign of the result is different than the
sign of the numbers, - then we have overflow!
24Overflow and 8 bit addition
1
1
1
1
01111000 01111000
11110000
Overflow!
It fits, but its still overflow!
Reminder Max 2s comp. Range with 8 bits -128
to 127 01111000 1x64 1x32 1x16 1x8
12010 11110000 -1x128 1x64 1x32 1x16
-1610
25Detecting Overflow
- There cant be an overflow when adding a positive
and a negative number - There cant be an overflow when signs are the
same for subtraction - Why?
- Overflow occurs when the value affects the sign
- overflow when adding two positives yields a
negative - or, adding two negatives gives a positive
- or, subtract a negative from a positive and get a
negative (similar to 1) - or, subtract a positive from a negative and get a
positive (similar to 2) - Overflow is detected at hardware level (simple
comparison of the sign bits). - You as a programmer is expected to
- handle the overflow once it is detected (warn the
user, not let the program crash etc).
26Built-in Types
27Positive Numbers
- If we will only deal with positive integers, you
should define your data type as unsigned - unsigned char
- unsigned int
- unsigned long int
- ...
- and use the full range and interpret the results
as the positive binary equivalent - e.g. unsigned char c 255 //11111111
28Numbers
- Three most common today
- Unsigned for non-negative integers
- Twos complement for integers (negative or
positive) - IEEE 754 floating-point for reals
- Unless otherwise noted (as unsigned etc.), always
assume that numbers we consider are in 2s
complement representation.
29Integer Ranges
- Unsigned UMinn UMaxn 0 2n-1
- 32 bits 0 ... 4,294,967,295 unsigned int
- 64 bits 0 ... 18,446,744,073,709,551,615 unsigne
d long int - 2s Complement TMinn TMaxn -2n-1 2n-1-1
- 32 bits -2,147,483,648 ... 2,147,483,647 int,
long int - 64 bits -9,223,372,036,854,775,808 to
9,223,372,036,854,775,807 - Note C/C numeric ranges are platform dependent!
30Limits
- You can include limits.h which defines this
ranges (depending on your platform/computer) - include ltlimits.hgt
- Tip Type include ltlimits.hgt (or any other
filename) in your program, then go to that line,
and right click on the file name and choose Open
Document.That will bring you this header file. - You can do this in general and it will save you
the effort lo locate the file.
31Bit Operations
32Why we need to work with bits
- Sometimes one bit is enough to store your data
say the gender of the student (e.g. 0 for men, 1
for women). We dont have a 1-bit type, so for
gender, you will have to use a char type
variable. - But if you need say 8 such 1-bit variables, say
to record if the student were present in the
times when attendance was taken, then you can
actually combine all into one char variable. - class student
-
- private
- unsigned char attendance
- //now I can fit 8 bits into this, we will
see how
33Packing bits
- Packing 8 1-bit variables into 1 char variable
is easy. - Say you know that the student were present in the
first 3 class when attendance was taken and not
in the last 5.The variable attendance can be set
as - unsigned char attendance 0x07
where the last 3bits represent the first 3
attendances (just a choice, it could be the other
way around as well). But in addition to being
able to set a variables value, we need to be
able to handle each bit separately, for which we
need bit operators.
34Bit Operators
- Bitwise and
- Bitwise or
- Bitwise exclusive or
- Complement
- ltlt shift left
- gtgt shift right
- Do not confuse with and
- with
35Bitwise Operations AND
- Take the AND of the two numbers, bit by bit.
char x,y,z x 0xb5 y 0x6c zxy
x
y
z
36Bitwise AND
- unsigned char c1, c2, c3
- c1 0x45
- c2 0x71
- c3 c1c2
- c1 0100 0101
- c2 0111 0001
- c3 0100 0001 (0x41 4161 65 10)
37Bitwise OR
- Take the OR of the two numbers, bit by bit.
- unsigned char c1, c2, c3
- c1 0x45
- c2 0x71
- c3 c1 c2
- c1 0100 0101
- c2 0111 0001
- c3 0111 0101 (0x75 7x165 11710 )
38Bitwise Complement
- Complement operation () converts bits with value
0 to 1 and bits with value 1 to 0. - unsigned char b1 0x01 //0000 0001
- unsigned char b4 0x08 //0000 1000
- b4 b1 //1111 1110
39Self-Quiz what do these twostatements do?
- char x 0xA5
- if ( x 0x01 )
- //what does this mean?
- x x 0x02
- //what happened to x?
- These are two of the most important bit
operations! We will see more later, but basically
you can access a particular bit and you can set a
particular bit with these two opertions.
40Logic Operators versus Bitwise Logic Ops.
- The operators , and ! are not bitwise logic
operators! - The result of and is an integral data type
with the value 0 (every bit is a zero) or 1
(LeastSignificant bit is 1, all the others are
0). - The if statement (e.g. if (a b)) treats any
non-zero value as TRUE, and only the value 0 as
FALSE.
41Shift Operators
- Shift operators move bits left or right, filling
the other side with 0s. - ltlt means shift left
- gtgt means shift right
- y x ltlt 1
how many times it is shifted
what is shifted
x
y
42Bit operations left shift
- Suppose we want to shift the bits of a number N,
k bits to the left - denoted N ltlt k
- drop leftmost k bits
- append k 0s to the right
- Ex
- unsigned char c2 0x1C //00011100
- c c2 ltlt 1 //00111000
- Note that shifting a number left by one position
is equal to multiplying it by 2 (provided that
the result is in the range) - What is the effect of shifting a number left by
3?
43Bit Shifting as Multiplication
- Shift left (x ltlt 1) multiplies by 2
- -Works as multiplication for both unsigned 2s
complement numbers - -Can overflow.
- Why is 1101 -3? (remember 2s complement
numbers) - 1101 1x-23 1x22 0x21 1x20 -8 4 1
-3 - 1010 1x-23 0x22 1x21 0x20 -8 2 -6
44Bit operations right shift
- As opposed to the left shift, the right shift
works differently for signed or unsigned numbers. - Suppose we want to shift N by k to the right
(denoted N gtgt k) - For unsigned numbers
- drop rightmost k bits
- append k 0s to the left
- For signed numbers
- drop the rightmost k bits
- append the sign bit k times to the left
45right shift operatorexamples
- Signed (all your variables are signed unless you
specify unsigned specifically) - positives
- char c 8 //0000 1000
- c c gtgt 2 //0000 0010 210
- negatives
- char c - 8 //1111 1000 in 2s comp. 0xF8
- c c gtgt 2 //1111 1110 -210
- Called arithmetic shift
- Unsigned
- unsigned char d 0xF8 //1111 1000 (24810)
- d d gtgt 2 //0011 1110 (6210)
- Called logical shift
46right shift operator details
- negatives
- char c - 8 //1111 1000 in 2s comp.
- c c gtgt 2 //1111 1110 -210
- Reminder Why is 8 1111 1000?
- 1111 1000 - (00000111 1) - (8)
- (1s complement 1)
- What if we had filled with 0s instead of the sign
bit? - it would not satisfy the shift as multiplication
concept.
47Bit Shifting as Division summary
48Bit Shifting as Multiplication Division
- Why useful?
- Simpler, thus faster, than general multiplication
division - This is a standard compiler optimization
- Can shift multiple positions at once
- Multiplies or divides by corresponding power of
2. - a ltlt 5 (multiply by 25)
- a gtgt 5 (divide by 25)
49Signed vs Unsigned Important
- What happens when we say
- unsigned char c 0x80 //1000 0000
- char d 0x80 //1000 0000
- These two statements fill the corresponding byte
with the same bit string, which is clearly
indicated by the hex number 0x80. - But their interpreted decimal values are
different, because we told the computer that in
one case we will not use the sign bit for
magnitude (c), and in the other, we said that the
most significant bit (MSB) should be reserved for
the sign bit. - c is 128
- d is 128 (-1280-128 since our machine uses the
2s complement representation) - 3) Shifting operation will depend on type
- c c gtgt 2 //will fill with 0s c will
be 0010 0000 0x20 - d d gtgt 2 //will fill with sign bit d will
be 1110 0000 0xe0 - 4) When you print these, the characters that
correspond to the values in c and d will be
printed (same character for this case, but does
not have to be) - cout ltlt c //will print what corresponds 128
(which is Ç) - cout ltlt d //will print what corresponds 128.
- Really fine print which does not relate to
this topic There isnt a character mapped to
128, but what is printed is what corresponds
to (256-128 128), since character mapping
wraps-around, or equivalently only the magnitude
part of the char variable is used for the
mapping.
50Signed vs Unsigned Important
- What happens when we say
- unsigned char c 0x80 //1000 0000
- char d 0x80 //1000 0000
- ..
- 4)
- myarrayc will access the 128th element
- myarrayd will access the -128th element
problem - Hence, even though the internal bit
representations are the same, the interpretation
of signed and unsigned nums will be different,
which may sometimes cause problems.
51Self-Quiz
- What are the resulting values of x,y and z
- char x,y,z
- x 0x33
- x (x ltlt 3) 0x0F
- y (x gtgt 1) 0x0F
- z x y
52- What are all these for
- knowing how numbers are represented and the
ranges of various data types, preventing
unintended behaviour - to set bit flags
- to pack 8 bits into byte etc.
- used to set a flag byte where one bit correspond
to one flag/error etc. see next slide - to pack bits of binary images
- ...
53Example 1 setting, testing, and clearing the
bits of bytes
const int IO_ERROR 0x01 //LSB (1st
right-most bit) const int CHANNEL_DOWN
0x10 char flags 0 //if ever CHANNEL_DOWN
event happens, set its corresponding bit in the
flags variable flags flags CHANNEL_DOWN
// set the 5th bit . //to check what errors
may have happened... if ((flags IO_ERROR ) !
0) // check the 1st bit cout ltlt I/O
error flag is set" else if ((flags
CHANNEL_DOWN) ! 0) // check the 5th bit
cout ltlt Channel down error flag is set" flags
flags IO_ERROR // clear the ERROR
flag //This is also called masking
54Example 2 packed bitmaps
- Similar to the previous code (flags), in packing
a binary (0/1) image, we need to set the bits of
a byte independently. - Say your images first byte in row y needs to
have 00001001, for the first 8 bits in that row - You need to set the ON bits (5th and 8th bits
from left, in here) - We do this by having a bitmask (0x80) that we
shift to obtain a byte with only one column set
(e.g. 00001000 or 00000001) and then
Bitwise-ORin with the already accumulated data - ...
55Example packed bitmaps
- Say your images first byte in row y needs to
have 00001001, for the first 8 bits in that row - You need to set the ON bits (5th and 8th bits
from left, in here) - Bitmapy0 0
- BitMask 0x80 //1000 0000
- //Shift BitMask to have the single 1 in the
appropriate column - BitMask BitMask gtgt 4 //0000 1000
- Bitmapy0 Bitmapy0 BitMask //0000
1000 - BitMask 0x80
- //Shift BitMask to have the single 1 in the
appropriate column - BitMask BitMask gtgt 7 //0000 0001
- Bitmapy0 Bitmapy0 BitMask //0000
1001
56General case for packed bitmaps SKIP
- Assume you are reading a special file format
where only the ON pixels are marked with their
col number - while(!input.eof())
- BitMask 0x80 //1000 0000
-
- ...
- / read the next ON/black column (where to put
the 1)/ - if (inputgtgtcol)
- //col is from the left starting from 0
- BitMask BitMask gtgt (col8)
- //we take mod so that when col 9, it still
works - //but for the next byte
- BitMaprowcol/8BitMaprowcol/8
BitMask -
- ...
57Floating Point Representation
58Floating Point (a brief look)
- We need a way to represent
- numbers with fractions, e.g., 3.1416
- very small numbers, e.g., .000000001
- very large numbers, e.g., 3.1 x 1020
- Solution A floating (decimal) point
representation -
- IEEE 754 floating point representation is the
standard - - --------------------------
---- --------- /- .. X 2
------- - sign mantissa
exponent - single precision 1 bit sign, 23 bit significand
(mantissa), 8 bit exponent - more bits for significand gives more accuracy
- more bits for exponent increases range
- Range approximately ?1044 to 1038
59IEEE Floating Point Std. - Details
- The Mantissa
- The mantissa, also known as the significand,
represents the precision bits of the number. - To find out the value of the implicit leading
bit, consider that any number can be expressed in
scientific notation in many different ways. For
example, the number five can be represented as
any of these - 5.00 100
- 0.05 102
- 5000 10-3
- In order to maximize the quantity of
representable numbers, floating-point numbers are
typically stored in normalized form. This
basically puts the radix point after the first
non-zero digit. In normalized form, five is
represented as 5.0 100.
60Floating Point what floats?
- For simplicity, lets use a decimal
representation and assume we have 1 digit for
sign, - 8 digits for the mantissa and 3 digits for the
exponent - /- - - - - - - - - - - -
- We will illustrate the format for the number
0.000000000023 -
-10 - 0.000000000023 . 23 0 0 0 0 0 0 x 10 - - -
- So it will be stored as . 2 3 0 0 0 0 0 0 -
1 0 - mantissa
exponent - The actual IEEE Floating point representation
follows this principle, but differs from this in
details - - normalization (floaing point comes
after the first nonzero digit) - - binary instead of decimal
- - exponent (not sign/magnitude but
biased)
61IEEE Floating Point Std. - Normalization
-
- Since the only possible non-zero digit is 1, in
the IEEE floating point standard, we can just
assume a leading digit of 1, and don't need to
represent it explicitly. As a result, the
mantissa has effectively 24 bits of resolution,
by way of 23 fraction bits.
62IEEE Floating Point Std. - Binary
- We convert decimal to binary, simply as
- decimal -.75 - (0.5 0.25)
- binary -.11 (since we have bits for
22 21 20 . 2-1 2-2 etc) - canonical form -1.1 x 2-1 (note shifting the
radix point by k is same as multip./dividing by
radixk) - Stored sign -
- Stored mantissa .100000000 since leading
bit is always 1 - Stored exponent -1 (basically but I wont go
into details, a bias is actually used) - decimal 8.625 80.50.125
- binary 1000.101
- canonical form 1.000101 x 23
- Stored sign
- Stored mantissa .00010100 since leading bit
is always 1 - Stored exponent -1 (basically but I wont go
into details, a bias is actually used)
63Bias why?
- Since we want to represent both positive and
negative exponents, e.g. 1011 and 10-11, we can
do two things - Reserve a separate sign bit for the exponent
- Use only positive exponents, together with a bias
- The bias (e.g. 127) is subtracted from whatever
is stored in the exponent, to find the real
exponent - Stored exponent 0 real exponent 0 127
-127 - Stored exponent227 real exponent 227 127
100
64Bias of the Exponent
- The Exponent
- The exponent field needs to represent both
positive and negative exponents. To do this, a
bias is added to the actual exponent in order to
get the stored exponent. - For IEEE single-precision floats, this value is
127. - Thus,
- if the real exponent is zero, 127 is stored in
the exponent field. - if 200 is stored in the exponent field, it
actually indicates a real exponent of - (200-127), or 73.
- Exponents of -127 (all 0s) and 128 (all 1s) are
reserved for special numbers (NaN, Infnty)
65IEEE 754 floating-point standard summary
- Leading 1 bit of significand is implicit
- Exponent is biased to make sorting easier
- all 0s is smallest exponent, all 1s is largest
- bias of 127 for single precision (note addition
of the bias while storing, subtracting of the
bias while converting to decimal) - Decimal equivalent (1)sign (1significand)
2exponent - bias - Example
- decimal -.75 - (0.5 0.25)
- binary -.11
- canonical form -1.1 x 2-1 (note shifting the
radix point by k is same as multip./dividing by
radixk) - stored exponent 126 01111110
- Resulting IEEE single precision representation
- 1 10000000000000000000000 01111110
sign mantissa
exponent
66A more complex example
- Let us encode the decimal number -118.625 using
the IEEE 754 system. - First we need to get the sign, the exponent and
the fraction. Because it is a negative number,
the sign is "1". - Now, we write the number (without the sign i.e.
unsigned, no two's complement) using binary
notation. The result is 1110110.101 (notice how
we represent .625) - Next, let's move the radix point left, leaving
only a 1 at its left - 1110110.101 1.110110101 26. This is the
normalized floating point number. The mantissa is
the part at the right of the radix point, filled
with 0 on the right until we get all 23 bits.
That is 11011010100000000000000. - The exponent is 6, but we need to bias it and
convert it to binary (so the most negative
exponent is stored as 0, and all exponents are
non-negative binary numbers). For the 32-bit IEEE
754 format, the bias is 127 and so the stored
exponent is 6 127 133. In binary, this is
written as 10000101. - Putting them all together
- This example is from wikipedia.
-
67IEEE Floating Point Ranges
Explanation for minimum positive (just a sign
chg. for negative)
.00000000....0 000000001 (01) x
21-127 1.0 x 2-126 0
1 23 bits mantissa
8 bits exponent
Note1 Exponent 00000000 is reserved for
special numbers, so min is 00000001 Note2
Approx. conversion between 2s powers and 10s
powers Ex. 2-149 10-44.85 since
23.3 10 and 149/3.3 45
68IEEE Floating Point Ranges
Explanation for maximum positive (just change
sign for negative)
.111.....1 11111110 (1- 2-23 1) x
2254-127 1.0 x 2127 1- 2-23
254 23 bits mantissa 8
bits exponent
Note1 Since it represents the part after the
radix point, .1111111 1-2-23 , just as
.11 1-2-2 Note2 11111111 as exponent is
reserved for special numbers, so max is 11111110
69Summary
- Computer arithmetic is constrained by limited
precision - Bit patterns have no inherent meaning but
standards do exist - twos complement
- IEEE 754 floating point
- Computer instructions determine meaning of the
bit patterns - http//babbage.cs.qc.edu/courses/cs341/IEEE-754.ht
ml
70Floating Point Complexities
- In addition to overflow we can have underflow
- A number that is smaller than what is
representable (e.g. lt 2-126) - Accuracy can be a big problem
- IEEE 754 keeps two extra bits, guard and round
- four rounding modes
- positive divided by zero yields infinity
- zero divide by zero yields not a number
- other complexities