Differences

This shows you the differences between two versions of the page.

--- tokeniser [2018/03/31 13:19] – external edit 127.0.0.1
+++ tokeniser [2024/01/05 00:21] (current) – external edit 127.0.0.1
@@ Line 2: / Line 2: @@
 //by JGH, June 2006//\\ \\  BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\  A tokenised line can easily be detokenised, or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:\\
+<code bb4w>
         quote%=FALSE
         REPEAT
@@ Line 8: / Line 9: @@
           addr%=addr%+1
         UNTIL ?addr%=13
+</code>
 Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:\\
+<code bb4w>
         ON NOON GOTO 1,2
+</code>
 the first 'ON' is the token ON, but the second 'ON' is part of the variable 'NOON'. The second 'ON' must be left untokenised.\\ \\  The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\
 ==== In Windows BASIC: ====
+<code bb4w>
         B%=EVAL("0:"+A$)
         token$=$(!332+2)
+</code>
 This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:\\
+<code bb4w>
         IF EVAL("1:"+A$) token$=$(!332+2)
+</code>
 The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:\\
+<code bb4w>
         ON A% GOTO 10,20,30,40,50
+</code>
 The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:\\
+<code bb4w>
         IF EVAL("1RECTANGLE:"+A$) token$=$(!332+3)
+</code>
 \\
 ==== In ARM BASIC: ====
+<code bb4w>
         SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A%
         B%=EVAL("0:"+A$)
         token$=$(A%-14)
+</code>
 \\
 ==== In 6502 BASIC: ====
+<code bb4w>
         A%=EVAL("0:"+A$)
         token$=$((!4 AND &FFFF)-LENA$-1)
+</code>
 \\  By preceding the code you want to tokenise with "0:" you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\  This can be written as functions as follows:\\
+<code bb4w>
         DEF FNTokenise_Win(A$):LOCAL A%,B%
         WHILELEFT$(A$,1)=" ":A$=MID$(A$,2):ENDWHILE
@@ Line 40: / Line 57: @@
         DEF FNTokenise_65(A$):LOCAL A%
         A%=EVAL("0:"+A$):=$((!4 AND &FFFF)-LENA$-1)
+</code>
 \\  These functions are used in full in the 'Tokenise' BASIC library at [[http://mdfs.net/System/Library/BLib|mdfs.net]].\\ \\  A text file can then be tokenised using the following code:\\
+<code bb4w>
       in%=OPENIN(text$)
       out%=OPENOUT(basic$)
@@ Line 54: / Line 73: @@
       CLOSE#out%:out%=0
       CLOSE#in%:in%=0
+</code>
 \\
 ==== Notes ====
- Acorn BBC BASIC programs are stored slightly differently. See [[/Format|Format]] and relevant pages on [[http://beebwiki.jonripley.com/|Acorn-specific sites]] for details.\\ \\  This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):\\
+ Acorn BBC BASIC programs are stored slightly differently. See [[/Format|Format]] and relevant pages on [[http://beebwiki.mdfs.net/|Acorn-specific sites]] for details.\\ \\  This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):\\
+<code bb4w>
         B%=EVAL("0OTHERWISE:"+A$)
         token$=$(!332+3)
+</code>
 \\
 ==== See also ====