Bit operations and number representations - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Bit operations and number representations

Description:

Radix-complement is the inverse with respect to addition ... 9999 A is 9-complement (radix-minus-one complement) of A //1s complement in base 2 ... – PowerPoint PPT presentation

Number of Views:286
Avg rating:3.0/5.0
Slides: 68
Provided by: berrinya
Category:

less

Transcript and Presenter's Notes

Title: Bit operations and number representations


1
Bit operations and number representations
2
Representation of Data
  • All data in a the computers memory is
    represented as a sequence of bits
  • Bit unit of storage, represents the level of an
    electrical charge. Can be either 0 or 1.
  • _
  • 0/1
  • A bit sequence can represent many different
    things
  • We will see that the bit string 10000010 can
    mean several different things depending on the
    representation that is agreed upon.
  • So, how should to represent integers, characters,
    real numbers, strings, structures, in terms of
    bits?
  • Representations must be efficent and convenient

3
Numbers
  • Fundamental problem
  • Fixed-size representation (e.g. 4 bytes for
    integers) cant encode all numbers
  • Limit number range and precision.
  • Usually sufficient in most applications,
  • But a potential source of bugs
  • Other problems
  • How to represent real numbers
  • How to represent negative numbers, floating
    points...?
  • - Historically, many different representations.
  • How to do addition, subtraction etc ?

4
Base 2 unsigned numbers
  • 0 0 0 0 0 0 0 0 ? 0 //8-bit binary
    representation of positive integers
  • 0 0 0 0 0 0 0 1 ? 1
  • 0 0 0 0 0 0 1 0 ? 2
  • 0 0 0 0 0 0 1 1 ? 3
  • ...
  • 1 1 1 1 1 1 1 1 ? 255
  • Representation an n-bit number A in base b has
    decimal value
  • Example for base 2 (binary) 1011 1 x 20 1 x
    21 0 x 22 1 x 23

5
Sign/Magnitude representation (also called
signed representation)
  • use one of the bits (the first bit Most
    Significant Bit) as a sign bit.
  • use the rest for magnitude
  • e.g.
  • 000 0
  • 001 1
  • 010 2 positive numbers
  • 011 3
  • 100 -0
  • 101 -1
  • 110 -2 negative numbers
  • 111 -3
  • range -(2 (n-1)-1) to (2 (n-1) -1), where n is
    the total number of bits
  • For n 4, -23-1 , 23-1
  • -7 , 7

6
Alternative representations
  • Most computers dont use a sign and magnitude
    representation
  • Drawbacks of the Sign-Magnitude representation
  • two 0s one positive one negative
  • addition and subtraction involving negative
    numbers are complicated
  • Alternatives?
  • 1s complement representation
  • 2s complement representation
  • These different representations simplify the
    hardware

7
Signed 1s complement
  • Use the first bit to indicate the sign (0 for
    positive numbers and 1 for negative numbers)
  • Positive numbers first bit is 0, and the rest is
    the binary equivalent of the number.
  • Negative numbers represented by the 1s
    complement of the corresponding positive number
  • 1s complement invert all the bits
  • e.g since 8 0000 1000 (0 for sign, and 000
    1000 for 8)
  • - 8 1111 0111
  • How about 0?

8
Number 1s-complement 7
0111 6 0110 5 0101 4 0100 3 0011 2
0010 1 0001 0 0000 -0 1111 -1 111
0 -2 1101 -3 1100
-4 1011 -5 1010 -6 1001 -7 1000
  • As with the signed representation,
  • there is a and - 0

Range -2n-1-1, 2n-1-1 For n 4,
-23-1 , 23-1 -7 , 7

9
Signed 2s complement
  • Signed 2s complement is the common
    representation for signed numbers
  • First bit is the sign bit ( 0 for positive and 1
    for negative)
  • For positive numbers, the rest of the bits are
    the binary equivalent of it.
  • Negative numbers are represented by the 2s
    complement of the corresponding positive number.
  • 2s complement invert all bits and add 1
  • (or copy all the bits from right to left until
    and including the first 1, invert the rest)
  • Ex 8 0000 1000
  • -8 1111 1000
  • single 0
  • addition and subtraction complexities is
    simplified
  • note the range (one more negative) -2 (n-1) ...
    (2 (n-1) -1)
  • todays standard for representing integers

Range -2n-1, 2n-1-1 For n 4, -23 ,
23-1 -8 , 7
10
Possible Representations summary
  • Sign Magnitude One's
    Complement Two's Complement 000
    0 000 0 000 0 001 1 001 1 001
    1 010 2 010 2 010 2 011 3 011
    3 011 3 100 -0 100 -3 100 -4 101
    -1 101 -2 101 -3 110 -2 110 -1 110
    -2 111 -3 111 -0 111 -1
  • Notice Positive numbers are represented the same
    way (same bit strings) in all representations!
  • Issues in choosing a representation scheme
    number of zeros, ease of arithmetic operations

11
The Complement Theory
  • Radix-complement is the inverse with respect to
    addition
  • The following equation holds when subtracting one
    number from another in FIXED decimal width (here
    with 4-digit decimals)

  • Y B A
  • B A (9999 1 - 10000)
  • B (9999 - A 1) 10000
  • B (9999 A 1) 10000
  • 9999 A is 9-complement (radix-minus-one
    complement) of A //1s complement in base 2
  • 9999 A 1 is 10-complement (radix
    complement) of A //2s complement in
    base 2
  • In k-bit binary, 2s comp., we can benefit from
    the same feature (i.e. we can add the 2s
    complement and drop the overflow bit, instead of
    subtraction)
  • -A 2k-A ( considering them
    unsigned numbers - look at slide 10)
  • Hence, Y B A
  • B (2k-A)
    I.e. we can use addition of complement instead of
    subtraction.
  • Similarly, AA mod 2k 2k-A A
    mod 2k 0

12
Why the Complement? Example
  • No borrowing is necessary when subtracting
  • 6142 6142- 4816  5184 1326 11326
    - 10000
  • 1326
  • Note that 4816 9999 5184 1 (10s
    complement)
  • Due to fixed width of the registers, the leading
    1 is lost automatically due to carry overflow.

13
Two's Complement Negation
  • Negating a two's complement number
  • Start at least significant bit. Copy through the
    first 1 after that, invert each bit.
  • Example 0010101100
  • 1101010100
  • Alternatively, invert all bits and add one to the
    most significant bit
  • If you negate twice, you will arrive to the same
    number (as you should)
  • 0011 3
  • 1101 -3
  • 0011 3

14
Properties of Complements
  • Let a k-bit number X have a 1s complement 1X and
    a 2s complement 2X.
  • Then, the following hold (considering them as
    unsigned values)
  • X 1X 2k 1 since 2k 1 is the
    max. number you can represent with k bits
  • (e.g., 0011 3 1100 12 15)
  • X 2X 0 (mod 2k) since 2X 1X 1 by
    definition

15
Decimal conversion
  • If you are given a bit string, you can find the
    decimal equivalent depending on the number
    representation used. E.g. find the decimal
    equivalent of e.g. 1001 0010
  • if signed representation is used
  • 10010010 is equivalent to 18 (- 1x16 1x2)
  • if 1s complement representation is used
  • invert all bits gt 01101101
  • find the positive number corresponding to the
    negated string
  • 1x641x321x81x41 109
  • 10010010 is equivalent to 109
  • note that this is the reverse operation of what
    we would do if we wanted to find the bit
    representation of 109 (find the bit rep. of 109,
    take 1s complement)

16
Decimal conversion ctd.
  • If you are given a bit string, you can find the
    decimal equivalent depending on the number
    representation used. E.g. find the decimal
    equivalent of e.g. 1001 0010
  • if 2s complement representation is used
  • invert all bits gt 0110 1101
  • add 1 gt 0110 1110
  • find the positive number corresponding to the
    negated string (01101110)
  • 1x641x321x81x41x2 110
  • 10010010 is equivalent to 110
  • note that this is the reverse operation of what
    we would do if we wanted to find the bit
    representation of 110 (find the bit rep. of 110,
    take 2s complement)
  • 2s complement of 01101110 110 10 is 10010010

17
Alternative decimal conversion 2s comp.
  • You can also directly/quickly find the decimal
    equivalent of a 2s complement number
  • use the usual binary to decimal conversion,
    using at the most significant bit the negative of
    the coefficient

26
25
24
23
22
21
20
  • Hence 100100102 -1x27 1x24 1x21 -11010

18
conversion to decimal with 32 bit numbers 2s
comp.
  • Same idea as 8 bit 2s complement integers, but
    the most significant bit is 231.


-2,147,483,648 64 32 16 8 4 2 1
-231
27
26
25
24
23
22
21
20
-231 230 ... 26
... 20
19
Miscalenaeous
  • Converting n bit numbers into numbers with more
    than n bits
  • copy the most significant bit (the sign bit) into
    the other bits 0010 -gt 0000 0010 1010 -gt
    1111 1010

4-bit 2s complement 8-bit 2s complement
numbers numbers
20
Important Note!
  • 2s complement (or twos complement) does not
    mean a negative number!
  • 2s complement is a representation used to
    represent all integers, not just negative
    integers!
  • So 2s complement is a format specification, but
    we also talk about the (2s) complement of a
    number as its negation
  • e.g. when we want to negate a number (-4 -gt 4 or
    4 -gt -4), not necessarily a negative/positive
    number, we may say take its 2s complement

21
Arithmetic Overflow
22
Addition Subtraction
  • Just like in grade school (carry 1)
  • Two's complement operations are easy
  • subtraction using addition of negative numbers
  • subtracting 6 from 7 is adding 6 to 7
  • 0111 ( 7) 1010 (-6)
  • 1 0001 ( 1)
  • Overflow is the only problem (result too large
    to fit in the allocated space)
  • adding two n-bit numbers does not yield an n-bit
    number
  • 0111 ( 7)  0001 ( 1) note that overflow
    term is somewhat misleading, 1000 (-8) it does
    not mean a carry overflowed, but that the
    result does not fit in 4 bits (8 cannot be
    represented in 4 bits 2s compl).The above
    subtraction example (7-6) is NOT an overflow!

23
Overflow - definition
  • How can we tell when too many bits in the result
    means overflow and when its OK?
  • overflow means the right answer wont fit !
  • Overflow
  • If the sign of the numbers is the same -AND-
  • the sign of the result is different than the
    sign of the numbers,
  • then we have overflow!

24
Overflow and 8 bit addition
1
1
1
1
01111000 01111000
11110000
Overflow!
It fits, but its still overflow!
Reminder Max 2s comp. Range with 8 bits -128
to 127 01111000 1x64 1x32 1x16 1x8
12010 11110000 -1x128 1x64 1x32 1x16
-1610
25
Detecting Overflow
  • There cant be an overflow when adding a positive
    and a negative number
  • There cant be an overflow when signs are the
    same for subtraction
  • Why?
  • Overflow occurs when the value affects the sign
  • overflow when adding two positives yields a
    negative
  • or, adding two negatives gives a positive
  • or, subtract a negative from a positive and get a
    negative (similar to 1)
  • or, subtract a positive from a negative and get a
    positive (similar to 2)
  • Overflow is detected at hardware level (simple
    comparison of the sign bits).
  • You as a programmer is expected to
  • handle the overflow once it is detected (warn the
    user, not let the program crash etc).

26
Built-in Types
  • Numbers int, float,
  • Char

27
Positive Numbers
  • If we will only deal with positive integers, you
    should define your data type as unsigned
  • unsigned char
  • unsigned int
  • unsigned long int
  • ...
  • and use the full range and interpret the results
    as the positive binary equivalent
  • e.g. unsigned char c 255 //11111111

28
Numbers
  • Three most common today
  • Unsigned for non-negative integers
  • Twos complement for integers (negative or
    positive)
  • IEEE 754 floating-point for reals
  • Unless otherwise noted (as unsigned etc.), always
    assume that numbers we consider are in 2s
    complement representation.

29
Integer Ranges
  • Unsigned UMinn UMaxn 0 2n-1
  • 32 bits 0 ... 4,294,967,295 unsigned int
  • 64 bits 0 ... 18,446,744,073,709,551,615 unsigne
    d long int
  • 2s Complement TMinn TMaxn -2n-1 2n-1-1
  • 32 bits -2,147,483,648 ... 2,147,483,647 int,
    long int
  • 64 bits -9,223,372,036,854,775,808 to
    9,223,372,036,854,775,807
  • Note C/C numeric ranges are platform dependent!

30
Limits
  • You can include limits.h which defines this
    ranges (depending on your platform/computer)
  • include ltlimits.hgt
  • Tip Type include ltlimits.hgt (or any other
    filename) in your program, then go to that line,
    and right click on the file name and choose Open
    Document.That will bring you this header file.
  • You can do this in general and it will save you
    the effort lo locate the file.

31
Bit Operations
32
Why we need to work with bits
  • Sometimes one bit is enough to store your data
    say the gender of the student (e.g. 0 for men, 1
    for women). We dont have a 1-bit type, so for
    gender, you will have to use a char type
    variable.
  • But if you need say 8 such 1-bit variables, say
    to record if the student were present in the
    times when attendance was taken, then you can
    actually combine all into one char variable.
  • class student
  • private
  • unsigned char attendance
  • //now I can fit 8 bits into this, we will
    see how

33
Packing bits
  • Packing 8 1-bit variables into 1 char variable
    is easy.
  • Say you know that the student were present in the
    first 3 class when attendance was taken and not
    in the last 5.The variable attendance can be set
    as
  • unsigned char attendance 0x07

where the last 3bits represent the first 3
attendances (just a choice, it could be the other
way around as well). But in addition to being
able to set a variables value, we need to be
able to handle each bit separately, for which we
need bit operators.
34
Bit Operators
  • Bitwise and
  • Bitwise or
  • Bitwise exclusive or
  • Complement
  • ltlt shift left
  • gtgt shift right
  • Do not confuse with and
  • with

35
Bitwise Operations AND
  • Take the AND of the two numbers, bit by bit.

char x,y,z x 0xb5 y 0x6c zxy
x
y
z
36
Bitwise AND
  • unsigned char c1, c2, c3
  • c1 0x45
  • c2 0x71
  • c3 c1c2
  • c1 0100 0101
  • c2 0111 0001
  • c3 0100 0001 (0x41 4161 65 10)


37
Bitwise OR
  • Take the OR of the two numbers, bit by bit.
  • unsigned char c1, c2, c3
  • c1 0x45
  • c2 0x71
  • c3 c1 c2
  • c1 0100 0101
  • c2 0111 0001
  • c3 0111 0101 (0x75 7x165 11710 )

38
Bitwise Complement
  • Complement operation () converts bits with value
    0 to 1 and bits with value 1 to 0.
  • unsigned char b1 0x01 //0000 0001
  • unsigned char b4 0x08 //0000 1000
  • b4 b1 //1111 1110

39
Self-Quiz what do these twostatements do?
  • char x 0xA5
  • if ( x 0x01 )
  • //what does this mean?
  • x x 0x02
  • //what happened to x?
  • These are two of the most important bit
    operations! We will see more later, but basically
    you can access a particular bit and you can set a
    particular bit with these two opertions.

40
Logic Operators versus Bitwise Logic Ops.
  • The operators , and ! are not bitwise logic
    operators!
  • The result of and is an integral data type
    with the value 0 (every bit is a zero) or 1
    (LeastSignificant bit is 1, all the others are
    0).
  • The if statement (e.g. if (a b)) treats any
    non-zero value as TRUE, and only the value 0 as
    FALSE.

41
Shift Operators
  • Shift operators move bits left or right, filling
    the other side with 0s.
  • ltlt means shift left
  • gtgt means shift right
  • y x ltlt 1

how many times it is shifted
what is shifted
x
y
42
Bit operations left shift
  • Suppose we want to shift the bits of a number N,
    k bits to the left
  • denoted N ltlt k
  • drop leftmost k bits
  • append k 0s to the right
  • Ex
  • unsigned char c2 0x1C //00011100
  • c c2 ltlt 1 //00111000
  • Note that shifting a number left by one position
    is equal to multiplying it by 2 (provided that
    the result is in the range)
  • What is the effect of shifting a number left by
    3?

43
Bit Shifting as Multiplication
  • Shift left (x ltlt 1) multiplies by 2
  • -Works as multiplication for both unsigned 2s
    complement numbers
  • -Can overflow.
  • Why is 1101 -3? (remember 2s complement
    numbers)
  • 1101 1x-23 1x22 0x21 1x20 -8 4 1
    -3
  • 1010 1x-23 0x22 1x21 0x20 -8 2 -6

44
Bit operations right shift
  • As opposed to the left shift, the right shift
    works differently for signed or unsigned numbers.
  • Suppose we want to shift N by k to the right
    (denoted N gtgt k)
  • For unsigned numbers
  • drop rightmost k bits
  • append k 0s to the left
  • For signed numbers
  • drop the rightmost k bits
  • append the sign bit k times to the left

45
right shift operatorexamples
  • Signed (all your variables are signed unless you
    specify unsigned specifically)
  • positives
  • char c 8 //0000 1000
  • c c gtgt 2 //0000 0010 210
  • negatives
  • char c - 8 //1111 1000 in 2s comp. 0xF8
  • c c gtgt 2 //1111 1110 -210
  • Called arithmetic shift
  • Unsigned
  • unsigned char d 0xF8 //1111 1000 (24810)
  • d d gtgt 2 //0011 1110 (6210)
  • Called logical shift

46
right shift operator details
  • negatives
  • char c - 8 //1111 1000 in 2s comp.
  • c c gtgt 2 //1111 1110 -210
  • Reminder Why is 8 1111 1000?
  • 1111 1000 - (00000111 1) - (8)
  • (1s complement 1)
  • What if we had filled with 0s instead of the sign
    bit?
  • it would not satisfy the shift as multiplication
    concept.

47
Bit Shifting as Division summary
48
Bit Shifting as Multiplication Division
  • Why useful?
  • Simpler, thus faster, than general multiplication
    division
  • This is a standard compiler optimization
  • Can shift multiple positions at once
  • Multiplies or divides by corresponding power of
    2.
  • a ltlt 5 (multiply by 25)
  • a gtgt 5 (divide by 25)

49
Signed vs Unsigned Important
  • What happens when we say
  • unsigned char c 0x80 //1000 0000
  • char d 0x80 //1000 0000
  • These two statements fill the corresponding byte
    with the same bit string, which is clearly
    indicated by the hex number 0x80.
  • But their interpreted decimal values are
    different, because we told the computer that in
    one case we will not use the sign bit for
    magnitude (c), and in the other, we said that the
    most significant bit (MSB) should be reserved for
    the sign bit.
  • c is 128
  • d is 128 (-1280-128 since our machine uses the
    2s complement representation)
  • 3) Shifting operation will depend on type
  • c c gtgt 2 //will fill with 0s c will
    be 0010 0000 0x20
  • d d gtgt 2 //will fill with sign bit d will
    be 1110 0000 0xe0
  • 4) When you print these, the characters that
    correspond to the values in c and d will be
    printed (same character for this case, but does
    not have to be)
  • cout ltlt c //will print what corresponds 128
    (which is Ç)
  • cout ltlt d //will print what corresponds 128.
  • Really fine print which does not relate to
    this topic There isnt a character mapped to
    128, but what is printed is what corresponds
    to (256-128 128), since character mapping
    wraps-around, or equivalently only the magnitude
    part of the char variable is used for the
    mapping.

50
Signed vs Unsigned Important
  • What happens when we say
  • unsigned char c 0x80 //1000 0000
  • char d 0x80 //1000 0000
  • ..
  • 4)
  • myarrayc will access the 128th element
  • myarrayd will access the -128th element
    problem
  • Hence, even though the internal bit
    representations are the same, the interpretation
    of signed and unsigned nums will be different,
    which may sometimes cause problems.

51
Self-Quiz
  • What are the resulting values of x,y and z
  • char x,y,z
  • x 0x33
  • x (x ltlt 3) 0x0F
  • y (x gtgt 1) 0x0F
  • z x y

52
  • What are all these for
  • knowing how numbers are represented and the
    ranges of various data types, preventing
    unintended behaviour
  • to set bit flags
  • to pack 8 bits into byte etc.
  • used to set a flag byte where one bit correspond
    to one flag/error etc. see next slide
  • to pack bits of binary images
  • ...

53
Example 1 setting, testing, and clearing the
bits of bytes
const int IO_ERROR 0x01 //LSB (1st
right-most bit) const int CHANNEL_DOWN
0x10 char flags 0 //if ever CHANNEL_DOWN
event happens, set its corresponding bit in the
flags variable flags flags CHANNEL_DOWN
// set the 5th bit . //to check what errors
may have happened... if ((flags IO_ERROR ) !
0) // check the 1st bit cout ltlt I/O
error flag is set" else if ((flags
CHANNEL_DOWN) ! 0) // check the 5th bit
cout ltlt Channel down error flag is set" flags
flags IO_ERROR // clear the ERROR
flag //This is also called masking
54
Example 2 packed bitmaps
  • Similar to the previous code (flags), in packing
    a binary (0/1) image, we need to set the bits of
    a byte independently.
  • Say your images first byte in row y needs to
    have 00001001, for the first 8 bits in that row
  • You need to set the ON bits (5th and 8th bits
    from left, in here)
  • We do this by having a bitmask (0x80) that we
    shift to obtain a byte with only one column set
    (e.g. 00001000 or 00000001) and then
    Bitwise-ORin with the already accumulated data
  • ...

55
Example packed bitmaps
  • Say your images first byte in row y needs to
    have 00001001, for the first 8 bits in that row
  • You need to set the ON bits (5th and 8th bits
    from left, in here)
  • Bitmapy0 0
  • BitMask 0x80 //1000 0000
  • //Shift BitMask to have the single 1 in the
    appropriate column
  • BitMask BitMask gtgt 4 //0000 1000
  • Bitmapy0 Bitmapy0 BitMask //0000
    1000
  • BitMask 0x80
  • //Shift BitMask to have the single 1 in the
    appropriate column
  • BitMask BitMask gtgt 7 //0000 0001
  • Bitmapy0 Bitmapy0 BitMask //0000
    1001

56
General case for packed bitmaps SKIP
  • Assume you are reading a special file format
    where only the ON pixels are marked with their
    col number
  • while(!input.eof())
  • BitMask 0x80 //1000 0000
  • ...
  • / read the next ON/black column (where to put
    the 1)/
  • if (inputgtgtcol)
  • //col is from the left starting from 0
  • BitMask BitMask gtgt (col8)
  • //we take mod so that when col 9, it still
    works
  • //but for the next byte
  • BitMaprowcol/8BitMaprowcol/8
    BitMask
  • ...

57
Floating Point Representation
58
Floating Point (a brief look)
  • We need a way to represent
  • numbers with fractions, e.g., 3.1416
  • very small numbers, e.g., .000000001
  • very large numbers, e.g., 3.1 x 1020
  • Solution A floating (decimal) point
    representation
  • IEEE 754 floating point representation is the
    standard
  • - --------------------------
    ---- --------- /- .. X 2
    -------
  • sign mantissa
    exponent
  • single precision 1 bit sign, 23 bit significand
    (mantissa), 8 bit exponent
  • more bits for significand gives more accuracy
  • more bits for exponent increases range
  • Range approximately ?1044 to 1038

59
IEEE Floating Point Std. - Details
  • The Mantissa
  • The mantissa, also known as the significand,
    represents the precision bits of the number.
  • To find out the value of the implicit leading
    bit, consider that any number can be expressed in
    scientific notation in many different ways. For
    example, the number five can be represented as
    any of these
  • 5.00 100
  • 0.05 102
  • 5000 10-3
  • In order to maximize the quantity of
    representable numbers, floating-point numbers are
    typically stored in normalized form. This
    basically puts the radix point after the first
    non-zero digit. In normalized form, five is
    represented as 5.0 100.

60
Floating Point what floats?
  • For simplicity, lets use a decimal
    representation and assume we have 1 digit for
    sign,
  • 8 digits for the mantissa and 3 digits for the
    exponent
  • /- - - - - - - - - - - -
  • We will illustrate the format for the number
    0.000000000023

  • -10
  • 0.000000000023 . 23 0 0 0 0 0 0 x 10 - - -
  • So it will be stored as . 2 3 0 0 0 0 0 0 -
    1 0
  • mantissa
    exponent
  • The actual IEEE Floating point representation
    follows this principle, but differs from this in
    details
  • - normalization (floaing point comes
    after the first nonzero digit)
  • - binary instead of decimal
  • - exponent (not sign/magnitude but
    biased)

61
IEEE Floating Point Std. - Normalization
  • Since the only possible non-zero digit is 1, in
    the IEEE floating point standard, we can just
    assume a leading digit of 1, and don't need to
    represent it explicitly. As a result, the
    mantissa has effectively 24 bits of resolution,
    by way of 23 fraction bits.

62
IEEE Floating Point Std. - Binary
  • We convert decimal to binary, simply as
  • decimal -.75 - (0.5 0.25)
  • binary -.11 (since we have bits for
    22 21 20 . 2-1 2-2 etc)
  • canonical form -1.1 x 2-1 (note shifting the
    radix point by k is same as multip./dividing by
    radixk)
  • Stored sign -
  • Stored mantissa .100000000 since leading
    bit is always 1
  • Stored exponent -1 (basically but I wont go
    into details, a bias is actually used)
  • decimal 8.625 80.50.125
  • binary 1000.101
  • canonical form 1.000101 x 23
  • Stored sign
  • Stored mantissa .00010100 since leading bit
    is always 1
  • Stored exponent -1 (basically but I wont go
    into details, a bias is actually used)

63
Bias why?
  • Since we want to represent both positive and
    negative exponents, e.g. 1011 and 10-11, we can
    do two things
  • Reserve a separate sign bit for the exponent
  • Use only positive exponents, together with a bias
  • The bias (e.g. 127) is subtracted from whatever
    is stored in the exponent, to find the real
    exponent
  • Stored exponent 0 real exponent 0 127
    -127
  • Stored exponent227 real exponent 227 127
    100

64
Bias of the Exponent
  • The Exponent
  • The exponent field needs to represent both
    positive and negative exponents. To do this, a
    bias is added to the actual exponent in order to
    get the stored exponent.
  • For IEEE single-precision floats, this value is
    127.
  • Thus,
  • if the real exponent is zero, 127 is stored in
    the exponent field.
  • if 200 is stored in the exponent field, it
    actually indicates a real exponent of
  • (200-127), or 73.
  • Exponents of -127 (all 0s) and 128 (all 1s) are
    reserved for special numbers (NaN, Infnty)

65
IEEE 754 floating-point standard summary
  • Leading 1 bit of significand is implicit
  • Exponent is biased to make sorting easier
  • all 0s is smallest exponent, all 1s is largest
  • bias of 127 for single precision (note addition
    of the bias while storing, subtracting of the
    bias while converting to decimal)
  • Decimal equivalent (1)sign (1significand)
    2exponent - bias
  • Example
  • decimal -.75 - (0.5 0.25)
  • binary -.11
  • canonical form -1.1 x 2-1 (note shifting the
    radix point by k is same as multip./dividing by
    radixk)
  • stored exponent 126 01111110
  • Resulting IEEE single precision representation
  • 1 10000000000000000000000 01111110

sign mantissa
exponent
66
A more complex example
  • Let us encode the decimal number -118.625 using
    the IEEE 754 system.
  • First we need to get the sign, the exponent and
    the fraction. Because it is a negative number,
    the sign is "1".
  • Now, we write the number (without the sign i.e.
    unsigned, no two's complement) using binary
    notation. The result is 1110110.101 (notice how
    we represent .625)
  • Next, let's move the radix point left, leaving
    only a 1 at its left
  • 1110110.101 1.110110101 26. This is the
    normalized floating point number. The mantissa is
    the part at the right of the radix point, filled
    with 0 on the right until we get all 23 bits.
    That is 11011010100000000000000.
  • The exponent is 6, but we need to bias it and
    convert it to binary (so the most negative
    exponent is stored as 0, and all exponents are
    non-negative binary numbers). For the 32-bit IEEE
    754 format, the bias is 127 and so the stored
    exponent is 6 127 133. In binary, this is
    written as 10000101.
  • Putting them all together
  • This example is from wikipedia.

67
IEEE Floating Point Ranges
Explanation for minimum positive (just a sign
chg. for negative)
.00000000....0 000000001 (01) x
21-127 1.0 x 2-126 0
1 23 bits mantissa
8 bits exponent
Note1 Exponent 00000000 is reserved for
special numbers, so min is 00000001 Note2
Approx. conversion between 2s powers and 10s
powers Ex. 2-149 10-44.85 since
23.3 10 and 149/3.3 45
68
IEEE Floating Point Ranges
Explanation for maximum positive (just change
sign for negative)
.111.....1 11111110 (1- 2-23 1) x
2254-127 1.0 x 2127 1- 2-23
254 23 bits mantissa 8
bits exponent
Note1 Since it represents the part after the
radix point, .1111111 1-2-23 , just as
.11 1-2-2 Note2 11111111 as exponent is
reserved for special numbers, so max is 11111110
69
Summary
  • Computer arithmetic is constrained by limited
    precision
  • Bit patterns have no inherent meaning but
    standards do exist
  • twos complement
  • IEEE 754 floating point
  • Computer instructions determine meaning of the
    bit patterns
  • http//babbage.cs.qc.edu/courses/cs341/IEEE-754.ht
    ml

70
Floating Point Complexities
  • In addition to overflow we can have underflow
  • A number that is smaller than what is
    representable (e.g. lt 2-126)
  • Accuracy can be a big problem
  • IEEE 754 keeps two extra bits, guard and round
  • four rounding modes
  • positive divided by zero yields infinity
  • zero divide by zero yields not a number
  • other complexities
Write a Comment
User Comments (0)
About PowerShow.com