Dynamic memory allocation.

hellomike · Post by **hellomike** » Fri 17 Nov 2023, 15:47

How to deal best with memory allocation when you do not know beforehand how much data will be needed?
This is a general programming issue but I would like to discuss what techniques/tricks to use when using BBC BASIC as language.

For example a program is needed to find all the unique 8-letter words from a text (txt) file as input and as output, list them alphabetically.
The text file size could be 10kb but also 10Gb. It could even be that a very big text file doesn't have any 8-letter words in it. Or the text file has 1000000 8-letter words in it.... which are all the same!

The start loop would be something like:

Code: Select all

      DIM Wrd$(3000)
      F%=OPENIN("Input.txt")
      WHILE NOT EOF#F%
        Word$=GET$#F%
        IF LENWord$ == 8 THEN
          Wrd$(N%)=Word$
          N%+=1
        ENDIF
      ENDWHILE
      CLOSE#F%

So when the input file is small or only has 20 matches, the array size is way too big and the other way around.
One technique I thought of for the start loop is writing a match to a new output text file first to avoid a wrong DIMmed array.
Another trick is to first scan de text file to count the total number of (non-unique) 8-letter words.

What would be a good approach and/or are there BBC BASIC example programs around doing something similar?

Thanks

Mike

Hated Moron · Post by **Hated Moron** » Fri 17 Nov 2023, 17:03

hellomike wrote: ↑Fri 17 Nov 2023, 15:47 What would be a good approach and/or are there BBC BASIC example programs around doing something similar?

All modern Operating Systems support 'Virtual Memory', in other words when you 'allocate' memory what you are actually doing is simply reserving a range of addresses, not using up any physical memory (hence, for example, making it unavailable to another program).

So by far the simplest approach is to initially 'allocate' as much memory (reserve enough address space) as you will ever need. For example in the case of your program you have stated that the largest required array will be around 1,000,000 8-character words.

Because of the way string memory is allocated this would actually need about 8 Mbytes for the string descriptors and 15 Mbytes for the strings themselves. So reserve, say, 24 Mbytes at the start of the program:

Code: Select all

      HIMEM = PAGE + 24 * 1024 * 1024
      DIM Wrd$(1000000)
      F%=OPENIN("Input.txt")
      WHILE NOT EOF#F%
        Word$=GET$#F%
        IF LENWord$ == 8 THEN
          Wrd$(N%)=Word$
          N%+=1
        ENDIF
      ENDWHILE
      CLOSE#F%

The 8 Mbytes of string descriptor is genuinely 'used' memory (even an empty string needs a descriptor) but the 15 Mbytes of string data won't be used unless it is actually needed.

If the potentially 'wasted' 8 Mbytes of memory for the string descriptors worries you (it's a relatively small amount of memory by modern standards) then you could, as you suggest, read the file twice. On the first pass count the number of 8-character strings, then allocate the memory, and on the second pass store the actual strings. But I honestly wouldn't bother.

hellomike · Post by **hellomike** » Sat 18 Nov 2023, 10:38

Thanks for the reply.
I did not say or at least not meant that 1000000 elements is the maximum but that is irrelevant. The thread I started was mainly to find out about ways to dynamically reserve memory.

For Python, I found syntax like this:

Code: Select all

with open('text.txt') as f:
    lines = [w.strip() for w in f.readlines()]
    words = [w for w in lines if len(w) = 8]

Although I'm not at all familiar with the language it seems that the memory allocation/reservation must be very dynamic here.

For BBC BASIC string arrays, I know that 8 bytes per element is reserved. And I'm sure that you are right and that I shouldn't be so touchy about a few megabytes, which is peanuts for modern computers.

Regards,

Mike

Hated Moron · Post by **Hated Moron** » Sat 18 Nov 2023, 12:20

hellomike wrote: ↑Sat 18 Nov 2023, 10:38 I did not say or at least not meant that 1000000 elements is the maximum

When using an array the largest possible number is about 10,000,000 8-character strings, because at around 230 Mbytes that approaches the maximum amount of heap that BBC BASIC for SDL 2.0 currently supports (256 Mbytes), and strings can only be stored on the heap.

The thread I started was mainly to find out about ways to dynamically reserve memory.

It depends on whether you mean dynamic allocation of memory from BASIC's heap (limited to a total of 256 Mbytes) or dynamic allocation from the Operating System (which could be much bigger, especially on a 64-bit system).

In the case of allocation from the heap the usual way in BBC BASIC is to use a variant of the DIM statement thus:

Code: Select all

      DIM memory%% amount%

whereas in the case of allocation from the OS you would call an API function thus:

Code: Select all

      SYS "SDL_malloc", amount% TO memory%%

In both cases memory%% is set to the address of (i.e. a pointer to) a contiguous block of memory of at least amount% bytes in size (strictly amount% + 1 bytes in the case of the first example).

You can use memory thus allocated for any standard data structure. If insufficient memory is available to fulfil the allocation the first case will throw a BASIC error (DIM space) and the second case will set memory%% to zero.

For Python, I found syntax like this:

BBC BASIC doesn't have Dynamic Arrays, as such, because arrays consist of contiguous elements and therefore must be allocated before use. So you would need to choose an alternative data structure which does not require initial allocation. A common example is a linked list, which can elegantly be created in BBC BASIC using structures.

But you could also create a tree or a map or any other standard data structure.

hellomike · Post by **hellomike** » Sat 18 Nov 2023, 13:39

Thanks for the additional information. Again, this was mainly a theoretical question as I have no issues or problems storing and handling (large amounts of) data with BB4W or BBCSDL.

Indeed linked lists seems the closest to dynamically reserving memory.

Mike

Hated Moron · Post by **Hated Moron** » Sat 18 Nov 2023, 14:21

hellomike wrote: ↑Sat 18 Nov 2023, 13:39 this was mainly a theoretical question as I have no issues or problems storing and handling (large amounts of) data with BB4W or BBCSDL.

It's much more difficult to answer a "theoretical question" than to suggest a solution to a practical programming problem! You did qualify your original question with a specific example (reading strings from a file and storing those which are 8 characters long) so it's that example that I concentrated on in my answers.

If you are interested in 'dynamic memory allocation' in more general terms it's too broad a subject to cover comprehensively in a forum message. For example in some applications it may be necessary to free previously-allocated memory, and in that case you might want to use the technique described in the Wiki here.

Or you might be wanting to re-dimension an existing array, in which case there are some routines which support that (in both BB4W and BBCSDL) here. This would be one way to simulate a dynamic array (declare a small array initially and re-dimension it when it gets full, and then each time it fills thereafter).

There isn't a one-size-fits-all solution, it will depend on the specific requirements and constraints of the particular application.

Hated Moron · Post by **Hated Moron** » Sat 18 Nov 2023, 23:34

Hated Moron wrote: ↑Sat 18 Nov 2023, 14:21 This would be one way to simulate a dynamic array (declare a small array initially and re-dimension it when it gets full, and then each time it fills thereafter).

This is probably the closest BBC BASIC approximation to the Python example. Here the 'dynamic array' is doubled in size each time it fills:

Code: Select all

      DIM words$(1) : n% = 0
      f = OPENIN("text.txt")
      WHILE NOT EOF#f
        l$ = GET$#f
        IF LEN(l$) = 8 words$(n%) = l$ : n% += 1 : IF n% > DIM(words$(),1) PROCredim(words$(), 8, 2*n%-1)
      ENDWHILE
      CLOSE #f
      END

      REM!Eject

      DEF PROCredim(RETURN p%%,S%,D%)
      LOCAL a%%,N%,O%
      IF ?p%%<>1 ERROR 14, "Bad use of array"
      N% = 5+S%*(D%+1)
      O% = 5+S%*p%%!1
      SYS "SDL_malloc", N% TO a%%
      IF @platform% AND &40 ELSE a%%=!^a%%
      IF a%%=0 ERROR 11, "DIM space"
      SYS "SDL_memset", a%%, 0, N%
      IF N%>O% SWAP N%,O%
      SYS "SDL_memcpy", a%%, p%%, N%
      a%%!1=D%+1
      IF p%%<LOMEM OR p%%>HIMEM SYS "SDL_free", p%%
      p%% = a%%
      ENDPROC

I wonder if it would be desirable to add the various PROCredim() routines to the arraylib library?

hellomike · Post by **hellomike** » Mon 20 Nov 2023, 09:52

That is some clever piece of source. I tried it and it worked 1st time!

Impressive.

Mike

BBC BASIC forum

Dynamic memory allocation.

Dynamic memory allocation.

Re: Dynamic memory allocation.

Re: Dynamic memory allocation.

Re: Dynamic memory allocation.

Re: Dynamic memory allocation.

Re: Dynamic memory allocation.

Re: Dynamic memory allocation.

Re: Dynamic memory allocation.