Brain Failier

Discussions related to using the integrated assembler
Ric
Posts: 261
Joined: Tue 17 Apr 2018, 21:03

Brain Failier

Post by Ric »

Could someone please explain why these two bits of code give different answers.
*FLOAT64 has been used

Code: Select all

        ; REM BASIC VERSION ans# = (TZ%)-((TX%)*slope#)-Z1#+(X1#*slope#)
        ; REM ASM
        fild             DWORD [^TZ%]
        fild             DWORD [^TX%]
        fld              QWORD [^slope#]
        fmulp                  ST1,                       ST0
        fsubp                  ST1,                       ST0
        fld              QWORD [^Z1#]
        fsubp                  ST1,                       ST0
        fld              QWORD [^X1#]
        fld              QWORD [^slope#]
        fmulp                  ST1,                       ST0
        fsubp                  ST1,                       ST0
        fstp             QWORD [^ans#]
Thanks in advance
Kind Regards Ric.

6502 back in the day, BB4W 2017 onwards, BBCSDL from 2023
Richard Russell
Posts: 540
Joined: Tue 18 Jun 2024, 09:32

Re: Brain Failier

Post by Richard Russell »

Ric wrote: Fri 02 Jan 2026, 23:10 Could someone please explain why these two bits of code give different answers.
Do they give very different answers or only slightly different answers? They're bound to give slightly different answers because in the case of the BASIC code there are a lot of conversions taking place between the 8-byte (64-bit) double format and the 10-byte (80-bit) long double format, and each conversion - especially in the long-double to double direction - may introduce a small error. The assembler code does not perform those conversions.

If they give very different answers, list what those answers are. Incidentally, I wouldn't expect *FLOAT 64 to have any effect here, you should be able to omit it without it making any difference.
Ric
Posts: 261
Joined: Tue 17 Apr 2018, 21:03

Re: Brain Failier

Post by Ric »

Thank you for the very swift reply Richard.
The code simply tests which side of a line a point is ( or partially any way) when the basic code is used the program runs perfectly, when the assembly is used, it does not work. I will send a DM because the answer may invlove sending more code which i dont wish to publish yet. Thanks again, i'll send in the morning.
Kind Regards Ric.

6502 back in the day, BB4W 2017 onwards, BBCSDL from 2023
Ric
Posts: 261
Joined: Tue 17 Apr 2018, 21:03

Re: Brain Failier

Post by Ric »

Ihave slimmed the code down so i can place it here, the code is from my test bed program so I apologise for the scrapiness.
When run it produces very different values between ans# and ASDF#. Given that the range of results in ans# should be -800 to +800, when the result is 800ish ASDF# returns a result of 40ish? If you would like to RUN the code i will need to send over a few(well quite a lot) of extra stuff. The code will not run without a world being generated first(about 4Gbytes).

Code: Select all

 5300   REM >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 5310   REM 333333333333333333333333333                                           render scene                                                 333333333333333333333333333
 5320   REM >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
 5330   hp% = (playingWidth%DIV2)+1
 5340   X1# = playingWidth%*COS(indexY+line1Correction)+hp%
 5350   Z1# = playingWidth%*SIN(indexY+line1Correction)+hp%
 5360   X2# = -playingWidth%*COS(indexY-line1Correction)+hp%
 5370   Z2# = -playingWidth%*SIN(indexY-line1Correction)+hp%
 5380   
 5390   IF ABS(X2#-X1#) > 0.01 THEN slope# = (Z2#-Z1#)/(X2#-X1#) ELSE slope# = &FFFFFF
 5400   
 5410   red = 1
 5420   green = 2
 5430   IF X2#=<X1# THEN
 5440     red = 2
 5450     green = 1
 5460   ENDIF
 5470   IF X2#<=X1# THEN
 5480     SWAP X1#,X2#
 5490     SWAP Z1#,Z2#
 5500   ENDIF
 5510   
 5520   FOR TX% = 0 TO playingWidth%-1
 5530     FOR TZ% = 0 TO playingWidth%-1
 5540       
 5550       CALL answer
 5560       ASDF# = ans#
 5570       TZTWO% = TZ%
 5580       TXTWO% = TX%
 5590       slope2# = slope#
 5600       Z1TWO# = Z1#
 5610       X1TWO# = X1#
 5620       
 5630       ans# = (TZ%)-((TX%)*slope#)-Z1#+(X1#*slope#)
 5640       
 5650       IF indexY <> PI/2 THEN
 5660         IF ans# <= 0 THEN render% = green ELSE render% = red
 5670       ELSE
 5680         IF ans# <= 0 THEN render% = red ELSE render% = green
 5690       ENDIF
 5700       
 5710       
 5720       
 5730       IF render% = 2 THEN
 5740         T% = vertArea%((TX%+offsetVX%)MODplayingWidth%,(TZ%+offsetVZ%)MODplayingWidth%)
 5750         REM                                            AAAAAAAAAAAAAAAAAAA  BBBBBBBBBBB  CCCCCCCCCCC
 5760         SYS DevCon.IASetVertexBuffers%, DevCon%, 0, 1, ^vertexBuffers%(T%), ^stride%(1), ^offset%(1)   : REM set the vertexBuffer, stride and offset for shader "vertexShaders%(0)"
 5770         REM A :- address of vertex buffer to be used; buffers contain the vertices fro each section of the landscape
 5780         REM B :- address of the stride to be used;    stride is length of each vertices data
 5790         REM C :- address of the offset;               offset is where in the list of vertices to start
 5800         tot% = totalVertices%(TX%+offsetVX%,TZ%+offsetVZ%)
 5810         REM CALL renderTest
 5820         SYS DevCon.Draw%, DevCon%, totalVertices%(TX%+offsetVX%,TZ%+offsetVZ%), 0                                          : REM render the scene(vertexBuffer vertices) with offset 0 and number of vertices - totalVertices%(x,z)
 5830       ENDIF
 5840     NEXT
 5850   NEXT
 5860   REM <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 5870   REM 333333333333333333333333333                                           render scene                                                 333333333333333333333333333
 5880   REM <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 5890   REM <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 5900   REM 222222222222222222222222222                                   set Shaders and render scene                                         222222222222222222222222222
 5910   REM <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 5920   
 5930   SYS SwapChain.Present%, textureTEXT%(100), 0, 0 : REM swap the current window buffer for the next and be ready to start again
 5940   VDU4
 5950   fps% += 1
 5960   PRINTTAB(0,0) 100*fps%/TIME
 5970   PRINT ASDF# ," ", ans#
 5980   PRINT TZTWO% ," ", TZ%
 5990   PRINT TXTWO% ," ", TX%
 6000   PRINT slope2# ," ", slope#
 6010   PRINT Z1TWO# ," ", Z1#
 6020   PRINT X1TWO# ," ", X1#
 6030 UNTIL FALSE
CALL answer is the routine sent yesterday
Kind Regards Ric.

6502 back in the day, BB4W 2017 onwards, BBCSDL from 2023
Ric
Posts: 261
Joined: Tue 17 Apr 2018, 21:03

Re: Brain Failier

Post by Ric »

I have now rewritten the assambly as

Code: Select all

        .answer
        ; REM ans# = (TZ%)-((TX%)*slope#)-Z1#+(X1#*slope#)
        fld              QWORD [^X1#]
        fld              QWORD [^slope#]
        fmulp                  ST1,                       ST0
  
        fld              QWORD [^Z1#]
        fsubp                  ST1,                       ST0
        fild             DWORD [^TX%]
        fld              QWORD [^slope#]
        fmulp                  ST1,                       ST0
        fsubp                  ST1,                       ST0
        fild             DWORD [^TZ%]
        faddp                  ST1,                       ST0
        fstp             QWORD [^ans#]
and all is good although i cant see a reason, thanks for looking.
Kind Regards Ric.

6502 back in the day, BB4W 2017 onwards, BBCSDL from 2023
Richard Russell
Posts: 540
Joined: Tue 18 Jun 2024, 09:32

Re: Brain Failier

Post by Richard Russell »

Ric wrote: Sat 03 Jan 2026, 10:00 and all is good although i cant see a reason, thanks for looking.
Purely to satisfy my own curiosity, why are you doing all the calculations in 64-bit floats ('double') rather than 80-bit floats ('long double') which is the native floating-point data type of BBC BASIC (when running on an i386 or x86-64 CPU)? Using the latter would be more accurate and eliminate some conversions between the two types, and therefore run slightly more quickly.

If you must use 64-bit floats, for example because you are passing values to or from an OS API which uses that type or because there are large arrays and you want to reduce memory usage, then fair enough. But if not it just seems an unnecessary overhead.
Ric
Posts: 261
Joined: Tue 17 Apr 2018, 21:03

Re: Brain Failier

Post by Ric »

It is to do with memory saving, i am using massive arrays/structures with 4000 vertices * 32 * 32 blocks and for the obvious reasons 8 is better than 10, four would be even better, then no conversion would be needed to comply with d3d, i could use a FN_f4 function but given the numbers its just too slow. Just for the record, i am up to 150 frames per second with a vertices number of 8*8*6*45*45*2048 minus a few hidden sides, with a minecraft type environment where the landscape is 170 blocks in to the distance.
Kind Regards Ric.

6502 back in the day, BB4W 2017 onwards, BBCSDL from 2023
Richard Russell
Posts: 540
Joined: Tue 18 Jun 2024, 09:32

Re: Brain Failier

Post by Richard Russell »

Ric wrote: Sat 03 Jan 2026, 23:17 It is to do with memory saving, i am using massive arrays/structures with 4000 vertices * 32 * 32 blocks and for the obvious reasons 8 is better than 10, four would be even better, then no conversion would be needed to comply with d3d
OK, certainly with very large arrays there will be a memory advantage in using 8-byte floats rather than ten. One thing you could consider for an even greater benefit would be migrating to 64-bit BBC BASIC. This would provide two immediate advantages for your application:
  1. In 64-bit BBC BASIC up to 4 Gbytes of memory is available for your program, heap and stack, at least 8 times greater than with 32-bit BASIC.
  2. In 64-bit BBC BASIC you have the *FLOAT 32 command which makes conversions to and from 32-bit floats faster than using FN_f4 and FN_4f.
You can still use assembler code in 64-bit BASIC (running on an x86) although there are likely to be a few changes necessary to suit the different ABI, for example you generally need to use PC-relative memory-addressing rather than absolute addressing.