Using 16-bit floating point values

by Richard Russell, August 2007

BBC BASIC for Windows stores floating point numbers in 40-bit and 64-bit resolutions and has no built-in support for 16-bit (half precision) floating point numbers. Half precision floats are sometimes used in graphics applications, including OpenGL and Direct3D, because they are an efficient method of representing luminance and chrominance levels with a high dynamic range.

The following three functions may be used to perform the necessary conversions:


The difference between FN_ConvertToHalf and FN_ConvertToHalfRounded is that the former truncates towards zero, but is slightly faster, whereas the latter generates the half-precision number which is nearest to the supplied value, but is slightly slower. Use FN_ConvertToHalf if you know that the value can be converted exactly into half precision (for example it was returned from FN_ConvertFromHalf) or if you are not too concerned about accuracy. Use FN_ConvertToHalfRounded otherwise.

        DEF FN_ConvertFromHalf(A%)
        LOCAL A#
        IF (A% AND &7C00) = 0 THEN = SGN(A% << 16) * (A% AND &3FF) * 2^-24
        !(^A#+4) = ((A% AND &8000) << 16) + ((A% AND &7FFF) << 10) + &3F000000
        = A#
 
        DEF FN_ConvertToHalf(A#)
        LOCAL A%
        IF ABSA# < 2^-14 THEN = (A# < 0 AND &8000) OR (ABSA# / 2^-24)
        A# /= 65536.0# : A% = !(^A#+4)
        = ((A% >> 16) AND &8000) + ((A% >> 10) AND &7FFF)
 
        DEF FN_ConvertToHalfRounded(A#)
        LOCAL A%
        IF ABSA# < 2^-14 THEN = (A# < 0 AND &8000) OR (ABSA# / 2^-24 + 0.5)
        A# /= 65536.0# : A% = !(^A#+4)
        = ((A% >> 16) AND &8000) + ((A% >> 10) AND &7FFF) + ((A% >> 9) AND 1)

Note that FN_ConvertToHalf and FN_ConvertToHalfRounded perform no range checking to ensure that the value you pass can be represented as a valid half-precision number. If this is important you can add a check as follows:

        DEF FN_ConvertToHalf(A#)
        LOCAL A%
        IF ABSA# >= 2^16 THEN ERROR 20, "Number too big"
        IF ABSA# < 2^-14 THEN = (A# < 0 AND &8000) OR (ABSA# / 2^-24)
        A# /= 65536.0# : A% = !(^A#+4)
        = ((A% >> 16) AND &8000) + ((A% >> 10) AND &7FFF)
 
        DEF FN_ConvertToHalfRounded(A#)
        LOCAL A%
        IF ABSA# >= &FFF0 THEN ERROR 20, "Number too big"
        IF ABSA# < 2^-14 THEN = (A# < 0 AND &8000) OR (ABSA# / 2^-24 + 0.5)
        A# /= 65536.0# : A% = !(^A#+4)
        = ((A% >> 16) AND &8000) + ((A% >> 10) AND &7FFF) + ((A% >> 9) AND 1)

In all cases the 16-bit half-precision value is passed in the least-significant 16-bits of a 32-bit integer.