Showing posts with label floatingpoint. Show all posts
Showing posts with label floatingpoint. Show all posts

Tuesday, March 30, 2021

Floating point number madness

 We'd like to represent floating point numbers in binary format.


The IEEE 754 format uses the 32-bit to encode a single precision floating point number as follow.


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

|    |___Exponent________| |_____________Fraction___________________________|

Sign         



Normal: There is a one to the left of the fraction.   

                     (-1)S   x 1.x2e-127

Signed Zero:  Exponent=0 Fraction=0    

                     (-1)S  x 0

Subnormal/Denormal: Exponent=0 shifted by 1 But considering all the fraction values between 0 & 1  

                     (-1)S  x 0.f x 2-126

Infinity pos/neg Exponent=255 Fraction=0

                      (-1)S oo

NaN Exponent = 255 Fraction!=0

                       (-1)S   if b22=0 qNaN 

                                 if b22=1 sNan

b22 is bit 22nd of the Fraction