Floating point numbers - PowerPoint PPT Presentation

About This Presentation
Title:

Floating point numbers

Description:

Floating point numbers Computable reals computable numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 22
Provided by: paulcoc
Category:

less

Transcript and Presenter's Notes

Title: Floating point numbers


1
Floating point numbers
2
Computable reals
  • computable numbers may be described briefly as
    the real numbers whose expressions as a decimal
    are calculable by finite means.(A. M. Turing, On
    Computable Numbers with an Application to the
    Entschiedungsproblem, Proc. London Mathematical
    Soc., Ser. 2 , Vol 42, pages 230-265, 1936-7.)

3
Look first at decimal reals
  • A real number may be approximated by a decimal
    expansion with a determinate decimal point.
  • As more digits are added to the decimal expansion
    the precision rises.
  • Any effective calculation is always finite if
    it were not then the calculation would go on for
    ever.
  • There is thus a limit to the precision that the
    reals can be represented as.

4
Transcendental numbers
  • In principle, transcendental numbers such as Pi
    or root 2 have no finite representation
  • We are always dealing with approximations to
    them.
  • We can still treat Pi as a real rather than a
    rational because there is always an algorithmic
    step by which we can add another digit to its
    expansion.

5
First solution
  • Store the numbers in memory just as they are
    printed as a string of characters.
  • 249.75
  • Would be stored as 6 bytes as shown below
  • Note that decimal numbers are in the range 30H to
    39H as ascii codes

Full stop char
Char for 3
6
Implications
  • The number strings can be of variable length.
  • This allows arbitrary precision.
  • This representation is used in systems like
    Mathematica which requires very high accuracy.

7
Example with Mathematica
  • 5!
  • Out1120
  • In210!
  • Out23628800
  • In350!
  • Out33041409320171337804361260816606476884437764
    1568960512000000000000

8
Decimal byte arithmetic
  • 9 8 17 decimal
  • 39H38H71H hexadecimal ascii
  • 5756113 decimal ascii
  • Adjust by taking 30H48 away -gt 41H65
  • If greater than 939H57 take away 100AH and
    carry 1
  • Thus 41H-0Ah 65-105537H so the answer would
    be 31H,37H 17

9
Representing variables
  • Variables are represented as pointers to
    character strings in this system
  • A249.75

A
10
Advantages
  • Arbitrarily precise
  • Needs no special hardware
  • Disadvantages
  • Slow
  • Needs complex memory management

11
Binary Coded Decimal (BCD) or Calculator style
floating point
  • Note that 249.75 can be represented as 2.4975 x
    102
  • Store this 2 digits to a byte to fixed precision
    as follows

exponent
mantissa
24
97
50
02
Each digit uses 4 bits
32 bits overall
12
Normalise
  • Convert N to format with one digit in front of
    the decimal point as follows
  • If Ngt10 then Whilst Ngt10 divide by 10 and add 1
    to the exponent
  • Else whilst Nlt1 multiply by 10 and decrement the
    exponent

13
Add floating point
  • Denormalise smaller number so that exponents
    equal
  • Perform addition
  • Renormalise
  • Eg 949.75 52.0 1002.75
  • 9.49750 E02 ? 9.49750 E02
  • 5.20000 E01 ? 0.52000 E02
  • 10.02750 E02 ? 1.00275 E03

14
Note loss of accuracy
  • Compare Octave which uses floating point numbers
    with Mathematica which uses full precision
    arithmetic
  • Octave floating point gives only 5 figure accuracy

Mathematica 5! Out1120 10! Out23628800 50! O
ut330414093201713378043612608166064768844377641
568960512000000000000
Octave fact(5) ans
120 fact(10) ans 3628800 fact(50) ans
3.0414e64
15
Loss of precison continued
  • When there is a big difference between the
    numbers the addition is lost with floating point

Octave 325000000 108 ans 3.2500D08
Mathematica In1 325000000
108 Out1 325000108
16
IEEE floating point numbers
Institution of Electrical and Electronic Engineers
17
Single Precision
E
F
18
Definition
  • N-1s x 1.F x 2E-128
  • Example 1
  • 3.25
  • In fixed point binary 11.01
  • 1.101 x 21
  • In IEEE format this is
  • s0 E129, F10100 thus in IEEE it is
  • S E F
  • 01000 00011010 0000 0000 0000 0000 000

Delete this bit
19
Example 2
  • -0.375 -3/8
  • In fixed point binary -0.011
  • -11 x 1.1 x 2-2
  • In IEEE format this is
  • s1 E126, F1000 thus in IEEE it is
  • S E F
  • 10111 11101000 0000 0000 0000 0000 000

20
Range
  • IEEE32 1.17 1038 to 3.40 1038
  • IEEE64 2.23 10308 to 1.79 10308
  • 80bit 3.37 104932 to 1.18 104932

21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com