String heap management

Discussions about the BBC BASIC language, with particular reference to BB4W and BBCSDL
User avatar
hellomike
Posts: 184
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

String heap management

Post by hellomike »

Hi,

Recently I made a small assembly routine, for a function (FN) to quickly remove trailing spaces from strings.
Effectively if there are trailing spaces the value of !(^string$+4) is lowered.

Apparently, butchering the length of a local string variable that way messes up the mechanism to return heap space after leaving the function as eventually the "No room" message is given, both in BB4W as in BBCSDL.

I can illustrate this with the following:

Code: Select all

      N%=50000
      DIM E% -1
      PRINT 0, ~E%

      FOR I%=1 TO N%
        J%+=LENFNTrim1("ABC01234567890123456789")
      NEXT
      DIM E% -1
      PRINT J%, ~E%

      FOR I%=1 TO N%
        K%+=LENFNTrim2("ABC01234567890123456789")
      NEXT
      DIM E% -1
      PRINT K%, ~E%

      END

      DEF FNTrim1(s$)=LEFT$(s$, 3)

      DEF FNTrim2(s$)
      !(^s$ + 4) = 3
      =s$
The top of the heap address (in E%) is printed three times and It is clear that after the 2nd loop, E% has increased a lot. When I make N%=100000, the is a "No room".

Can there anything be done to return heap space after lowering the length of a local string variable like I pictured?

Thanks,

Mike
KenDown
Posts: 327
Joined: Wed 04 Apr 2018, 06:36

Re: String heap management

Post by KenDown »

Hmmm. I notice in the help file, "Memory reserved in this way causes irreversible loss of heap space". The help claims that DIMp% -1 is the equivalent of PRINT HIMEM-END, but I don't think that can be true, because the DIM statement actually reserves some memory.

Note the statement, "The amount of memory reserved is one byte greater than the value specified. DIM p%% −1 is a special case; it reserves zero bytes of memory." In other words, it reserves -1 bytes +1 byte, which is indeed zero but I suspect that some zeros are worth more than others. As in your case. Perhaps you should try substituting PRINT HIMEM-END for your fancy DIM E%-1 and see if that solves the problem. (Actually, it doesn't.)

However I am a trifle curious. You claim that your functions are to remove trailing spaces, yet you appear to be testing them with strings which do not have any trailing spaces!

Surely if all you want to do is remove trailing spaces, something like this would solve your problem?

Code: Select all

DEFFNtrim(a$):LOCALl%
l%=LENa$
REPEAT:l%-=1:UNTILMID$(a$,l%,1)>" "
=LEFT$(a$,l%)
In fact, this works correctly - and notice that I have introduced a trailing space in your strings.

Code: Select all

      N%=100000
      DIM E% -1
      PRINT 0, ~E%

      FOR I%=1 TO N%
        J%+=LENFNtrim("ABC01234567890123456789 ")
      NEXT
      PRINT J%, ~HIMEM-END

      FOR I%=1 TO N%
        K%+=LENFNtrim("ABC01234567890123456789 ")
      NEXT

      PRINT K%, ~HIMEM-END

      END

      DEFFNtrim(a$):LOCALl%
      l%=LENa$
      REPEAT:l%-=1:UNTILMID$(a$,l%,1)>" "
      =LEFT$(a$,l%)
Edja
Posts: 64
Joined: Tue 03 Apr 2018, 12:07
Location: Belgium

Re: String heap management

Post by Edja »

Hi,
this is not really the answer to your exact question about your assembly routine generating unwanted effects. But unless you have a compelling reason (speed perhaps?) to do this with your assembly routine, doesn't this do the trick efficiently ?

Code: Select all

WHILE RIGHT$(A$)=" " A$=LEFT$(A$) : ENDWHILE
(copied from Richard's STRINGLIB library)
regards,
Edja
User avatar
hellomike
Posts: 184
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: String heap management

Post by hellomike »

Thanks Edja,

Fair reply however, if I'm able to code in assembly, don't you think I would be able to know of, or be able to come up with this BASIC code myself?
I do of course.

As far as I understand, the RIGHT$() and LEFT$() functions will (internally) each make a temporary string for each step in the WHILE loop. So if a string has 5 trailing spaces, removing each of them will require creating and releasing 2 strings internally. I.e. 10 overhead actions.

Instead, checking from the end backwards, for char 32 and lowering its LEN is all is needed without any overhead and thus is far more efficient than the BASIC code.

Yes indeed, I'm trying this for optimal speeds. Also, my assembly routine is not the issue, but lowering the LEN through !(^string$ + 4) is, as the pure BASIC example illustrates.

Regards,

Mike
Hated Moron

Re: String heap management

Post by Hated Moron »

hellomike wrote: Wed 19 Oct 2022, 10:34 Can there anything be done to return heap space after lowering the length of a local string variable like I pictured?
There are no 'hooks' into the internal string memory management routines that you could call from your assembly language code, so the simple answer is no. The only thing I can suggest would be a hybrid approach in which length changes that don't alter the allocation are performed using your assembler code hack, but length changes that do alter the allocation are performed in BASIC. That wouldn't be as fast as doing it all in assembler code, but it would be faster than doing it all in BASIC.

You would need to experiment to determine whether the complexity of this approach is justified by the performance it offers.
User avatar
hellomike
Posts: 184
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: String heap management

Post by hellomike »

Thanks for confirming.
I'm not sure what you mean by altering allocation. Actually the assembly code is mostly an experiment/exercise for me, as it seemed so straightforward. I.e.:

Code: Select all

      DIM T% 127
      FOR I%=0 TO 1
        P%=T%
        [OPT I%*2
  
        cmp dword [eax+4], 0      ; Empty string?
        jz t5%
        mov ebx, [eax]            ; Address in eab
        mov ecx, [eax+4]          ; Length in ecx
  
        ; Remove trailing spaces.
        .t1%
        cmp byte [ebx+ecx-1], 32
        jne t2%
        loop t1%
        .t2%
        mov [eax+4], ecx          ; Put (decreased) length back
        jz t5%                    ; String was spaces only.
  
        ; Remove leading spaces.
        .t3%
        cmp byte [ebx], 32
        jne t4%
        inc ebx
        loop t3%
        .t4%
        mov [eax], ebx            ; Put (increased) address back
        mov [eax+4], ecx          ; Put (decreased) length back
        .t5%
        ret
        ]
      NEXT
      PRINT "|" FNTrim("") "|"
      PRINT "|" FNTrim("ABC") "|"
      PRINT "|" FNTrim("DEF   ") "|"
      PRINT "|" FNTrim("   GHI") "|"
      PRINT "|" FNTrim("   JKL   ") "|"
      PRINT "|" FNTrim("      ") "|"
      END

      DEF FNTrim(s$):LOCAL A%:A%=^s$:CALLT%:=s$
I guess, a solution to avoid a memory leak is to copy the string to a static location first and remove the spaces there. The assembly code would instead return the address of that area and the function itself would look like:

Code: Select all

DEF FNTrim(s$):LOCAL A%:A%=^s$:=$USRT%
Is that what you meant by hybrid?

Regards

Mike
Hated Moron

Re: String heap management

Post by Hated Moron »

hellomike wrote: Thu 20 Oct 2022, 09:18 I'm not sure what you mean by altering allocation.
I mean altering the address in memory at which the string is stored, i.e. the block of memory allocated to the string. Whilst it's no doubt obvious that making a string longer can cause it to be moved to a different place in memory, so can making a string shorter:

Code: Select all

      s$ = "One "
      a%% = PTR(s$)
      PRINT a%%
      s$ = "Two"
      a%% = PTR(s$)
      PRINT a%%
Is that what you meant by hybrid?
What I meant was to remove the trailing space(s) as you are doing now when the address of the string would have remained the same (in which case you're doing exactly what BASIC would be doing internally) but to remove the trailing space(s) in BASIC when that would involve moving the string to another address. Because that will only happen relatively rarely (depending on the length of the string) there would still be a worthwhile saving in time, but there's the added complication of determining ahead-of-time whether the string will move or not.

Copying the string to a static location and back is easier, if the overhead of the copying is acceptable (and if the limitations of fixed strings, i.e. that they cannot contain the terminator character, &0D or &00 depending on the kind, isn't an issue).
User avatar
hellomike
Posts: 184
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: String heap management

Post by hellomike »

Obviously I didn't realize that even when shortening a string, the internal memory management might move it to a new address.
Copying the string at location !^string$ would work for me and I don't even need to copy it back. So something like:

Code: Select all

        ....
        mov eax, s%
        ret
        .s%
        ]
      END

      DEF FNTrim(s$):LOCAL A%:A%=^s$:=$USRT%
Thanks for the help, as always.

Mike
Hated Moron

Re: String heap management

Post by Hated Moron »

hellomike wrote: Thu 20 Oct 2022, 13:30 Obviously I didn't realize that even when shortening a string, the internal memory management might move it to a new address.
You say "obviously" but in general changing the length of the string is what matters, not whether it's increasing or decreasing. I'm pretty sure that's true of the realloc() function in the C library too, and you can't get much more 'standard' than that! The same also applies to Acorn's ARM BASIC of course; it was only their 6502 BASIC (and my Z80 BASIC) that had no string memory management to speak of, so you often had to initialise a string to its maximum length at the start of a program.
Copying the string at location !^string$ would work for me and I don't even need to copy it back.
You should use PTR(string$) to find the memory address of a string; I don't think !^string$ is mentioned in the current documentation, if it is it shouldn't be.
User avatar
hellomike
Posts: 184
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: String heap management

Post by hellomike »

True, using PTR(string$) is the correct notation.