User Tools

Site Tools


tokeniser

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
tokeniser [2018/03/31 13:19] – external edit 127.0.0.1tokeniser [2024/01/05 00:21] (current) – external edit 127.0.0.1
Line 2: Line 2:
  
 //by JGH, June 2006//\\ \\  BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\  A tokenised line can easily be detokenised, or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:\\  //by JGH, June 2006//\\ \\  BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\  A tokenised line can easily be detokenised, or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:\\ 
 +<code bb4w>
         quote%=FALSE         quote%=FALSE
         REPEAT         REPEAT
Line 8: Line 9:
           addr%=addr%+1           addr%=addr%+1
         UNTIL ?addr%=13         UNTIL ?addr%=13
 +</code>
 Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:\\  Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:\\ 
 +<code bb4w>
         ON NOON GOTO 1,2         ON NOON GOTO 1,2
 +</code>
 the first 'ON' is the token ON, but the second 'ON' is part of the variable 'NOON'. The second 'ON' must be left untokenised.\\ \\  The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\  the first 'ON' is the token ON, but the second 'ON' is part of the variable 'NOON'. The second 'ON' must be left untokenised.\\ \\  The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\ 
 ==== In Windows BASIC: ==== ==== In Windows BASIC: ====
 +<code bb4w>
         B%=EVAL("0:"+A$)         B%=EVAL("0:"+A$)
         token$=$(!332+2)         token$=$(!332+2)
 +</code>
 This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:\\  This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:\\ 
 +<code bb4w>
         IF EVAL("1:"+A$) token$=$(!332+2)         IF EVAL("1:"+A$) token$=$(!332+2)
 +</code>
 The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:\\  The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:\\ 
 +<code bb4w>
         ON A% GOTO 10,20,30,40,50         ON A% GOTO 10,20,30,40,50
 +</code>
 The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:\\  The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:\\ 
 +<code bb4w>
         IF EVAL("1RECTANGLE:"+A$) token$=$(!332+3)         IF EVAL("1RECTANGLE:"+A$) token$=$(!332+3)
 +</code>
 \\  \\ 
 ==== In ARM BASIC: ==== ==== In ARM BASIC: ====
 +<code bb4w>
         SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A%         SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A%
         B%=EVAL("0:"+A$)         B%=EVAL("0:"+A$)
         token$=$(A%-14)         token$=$(A%-14)
 +</code>
 \\  \\ 
 ==== In 6502 BASIC: ==== ==== In 6502 BASIC: ====
 +<code bb4w>
         A%=EVAL("0:"+A$)         A%=EVAL("0:"+A$)
         token$=$((!4 AND &FFFF)-LENA$-1)         token$=$((!4 AND &FFFF)-LENA$-1)
 +</code>
 \\  By preceding the code you want to tokenise with "0:" you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\  This can be written as functions as follows:\\  \\  By preceding the code you want to tokenise with "0:" you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\  This can be written as functions as follows:\\ 
 +<code bb4w>
         DEF FNTokenise_Win(A$):LOCAL A%,B%         DEF FNTokenise_Win(A$):LOCAL A%,B%
         WHILELEFT$(A$,1)=" ":A$=MID$(A$,2):ENDWHILE         WHILELEFT$(A$,1)=" ":A$=MID$(A$,2):ENDWHILE
Line 40: Line 57:
         DEF FNTokenise_65(A$):LOCAL A%         DEF FNTokenise_65(A$):LOCAL A%
         A%=EVAL("0:"+A$):=$((!4 AND &FFFF)-LENA$-1)         A%=EVAL("0:"+A$):=$((!4 AND &FFFF)-LENA$-1)
 +</code>
 \\  These functions are used in full in the 'Tokenise' BASIC library at [[http://mdfs.net/System/Library/BLib|mdfs.net]].\\ \\  A text file can then be tokenised using the following code:\\  \\  These functions are used in full in the 'Tokenise' BASIC library at [[http://mdfs.net/System/Library/BLib|mdfs.net]].\\ \\  A text file can then be tokenised using the following code:\\ 
 +<code bb4w>
       in%=OPENIN(text$)       in%=OPENIN(text$)
       out%=OPENOUT(basic$)       out%=OPENOUT(basic$)
Line 54: Line 73:
       CLOSE#out%:out%=0       CLOSE#out%:out%=0
       CLOSE#in%:in%=0       CLOSE#in%:in%=0
 +</code>
 \\  \\ 
 ==== Notes ==== ==== Notes ====
- Acorn BBC BASIC programs are stored slightly differently. See [[/Format|Format]] and relevant pages on [[http://beebwiki.jonripley.com/|Acorn-specific sites]] for details.\\ \\  This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):\\ + Acorn BBC BASIC programs are stored slightly differently. See [[/Format|Format]] and relevant pages on [[http://beebwiki.mdfs.net/|Acorn-specific sites]] for details.\\ \\  This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):\\  
 +<code bb4w>
         B%=EVAL("0OTHERWISE:"+A$)         B%=EVAL("0OTHERWISE:"+A$)
         token$=$(!332+3)         token$=$(!332+3)
 +</code>
 \\  \\ 
 ==== See also ==== ==== See also ====
tokeniser.1522502386.txt.gz · Last modified: 2024/01/05 00:16 (external edit)