Mistybeach Floating Point Encoding Visualizer


Introduction

This is intended to illustrate (as well as explain a little bit) how floating point numbers are encoded. The primary focus is on IEEE-754 floating point numbers. A good understanding of these provides an excellent start to understanding other floating point formats.

IEEE-754 floating point numbers can represent the following values:
IEEE-754 floating point numbers provide these values by providing a sign bit, an exponent and a mantissa for each floating point value.

For IEEE-754 16 bit floating point values, we get a single bit for the sign (with the bit being set meaning the value is negative), five bits of exponent and 10 bits of mantissa. 32-bit IEEE-754 floating point values have one sign bit, 8 bits of exponent and 23 bits of mantissa. A table later in this document shows some other floating point formats.

A non-obvious key insight into representing floating point values is that all values except for zero have at least one bit set. The implication of this is that the leading set bit can be implied rather than explicitly specified for all values except for zero! For all normal non-zero values the mantissa thus contains a hidden leading one bit!

A good way to get a feel for how floating point number encodings work is to play with the floating-point playground here. The playground shows how 16-bit IEEE-754 floating point numbers create explicit binary number bit patters (complete with a decimal point). These 16-bit floating point numbers are similar to 32-bit and 64-bit floating point numbers, but smaller and thus easier to visualize.

Notice the a few key aspects to floating point numbers:

fp16 format

Play with this fp16 playground. Set the sign bit, move the slider to change the exponent bits, set some of the mantissa bits. See what happens!

fp8 (5e2m) format

And to see the tradeoffs as one loses bits and has to decide how to allocate the remaining bits between exponent and mantissa we have an 8-bit floating bit format: