C'est la vie!: floatingpoint

Tuesday, March 30, 2021

We'd like to represent floating point numbers in binary format.

The IEEE 754 format uses the 32-bit to encode a single precision floating point number as follow.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Sign

Normal: There is a one to the left of the fraction.

(-1)^Sx 1.f x2^e-127

Signed Zero: Exponent=0 Fraction=0

(-1)^Sx 0

Subnormal/Denormal: Exponent=0 shifted by 1 But considering all the fraction values between 0 & 1

(-1)^S x 0.f x 2^-126

Infinity pos/neg Exponent=255 Fraction=0

(-1)^Soo

NaN Exponent = 255 Fraction!=0

(-1)^Sif b22=0 qNaN

if b22=1 sNan

b22 is bit 22nd of the Fraction