User Tools

Site Tools


Converting 40-bit floats to 64-bit floats

by Richard Russell, August 2014

The assembly language routine below converts a 40-bit floating-point value, in the registers cl edx, into a 64-bit floating-point ('double') value, in the registers ecx edx:

      movzx   ecx,cl                  ;zero-extend exponent
      add     ecx,895                 ;adjust exponent
      rol     edx,1                   ;move sign to LSB
      shld    ecx,edx,21              ;align exponent
      shl     edx,20                  ;align mantissa
      btr     edx,20                  ;get sign to carry
      rcr     ecx,1                   ;insert sign 

Note that this routine does not deal with variants (i.e. a 40-bit value containing an integer rather than a float). To avoid the necessity of providing extra code for this purpose you can convert a variant into a float by multiplying by 1.0 in BASIC thus:

      var *= 1.0
This website uses cookies. By using the website, you agree with storing cookies on your computer. Also you acknowledge that you have read and understand our Privacy Policy. If you do not agree leave the website.More information about cookies
converting_2040-bit_20floats_20to_2064-bit_20floats.txt · Last modified: 2024/01/05 00:22 by