=====Tokeniser=====
//by JGH, June 2006//\\ \\ BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\ A tokenised line can easily be detokenised, or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:\\
quote%=FALSE
REPEAT
IF (?addr%<128 AND ?addr%>31) OR quote% THEN VDU ?addr% ELSE P.token$(?addr%);
IF ?addr%=34 quote%=NOT quote%
addr%=addr%+1
UNTIL ?addr%=13
Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:\\
ON NOON GOTO 1,2
the first 'ON' is the token ON, but the second 'ON' is part of the variable 'NOON'. The second 'ON' must be left untokenised.\\ \\ The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\
==== In Windows BASIC: ====
B%=EVAL("0:"+A$)
token$=$(!332+2)
This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:\\
IF EVAL("1:"+A$) token$=$(!332+2)
The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:\\
ON A% GOTO 10,20,30,40,50
The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:\\
IF EVAL("1RECTANGLE:"+A$) token$=$(!332+3)
\\
==== In ARM BASIC: ====
SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A%
B%=EVAL("0:"+A$)
token$=$(A%-14)
\\
==== In 6502 BASIC: ====
A%=EVAL("0:"+A$)
token$=$((!4 AND &FFFF)-LENA$-1)
\\ By preceding the code you want to tokenise with "0:" you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\ This can be written as functions as follows:\\
DEF FNTokenise_Win(A$):LOCAL A%,B%
WHILELEFT$(A$,1)=" ":A$=MID$(A$,2):ENDWHILE
B%=EVAL("0:"+A$):=$(!332+2)
:
DEF FNTokenise_ARM(A$):LOCAL A%,B%
SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A%
B%=EVAL("0:"+A$):=$(A%-13)
:
DEF FNTokenise_65(A$):LOCAL A%
A%=EVAL("0:"+A$):=$((!4 AND &FFFF)-LENA$-1)
\\ These functions are used in full in the 'Tokenise' BASIC library at [[http://mdfs.net/System/Library/BLib|mdfs.net]].\\ \\ A text file can then be tokenised using the following code:\\
in%=OPENIN(text$)
out%=OPENOUT(basic$)
line%=1 :REM Start from an arbitary line number
REPEAT
line$=FNTokenise_Win(GET$#in%) :REM Read line and tokenise it
BPUT#out%,LENline$+4 :REM Output line length
BPUT#out%,line%:BPUT#out%,line%DIV256 :REM Output line number
BPUT#out%,line$;:BPUT#out%,13 :REM Output line and
line%+=1 :REM Increment line number
UNTIL EOF#in%
BPUT#out%,0:BPUT#out%,&FF:BPUT#out%,&FF :REM Output program terminator
CLOSE#out%:out%=0
CLOSE#in%:in%=0
\\
==== Notes ====
Acorn BBC BASIC programs are stored slightly differently. See [[/Format|Format]] and relevant pages on [[http://beebwiki.mdfs.net/|Acorn-specific sites]] for details.\\ \\ This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):\\
B%=EVAL("0OTHERWISE:"+A$)
token$=$(!332+3)
\\
==== See also ====
[[http://beebwiki.jonripley.com/Tokeniser|Using the tokeniser]] on BeebWiki for details of using the tokeniser on 6502, Z80, 32000, ARM, DOS and Windows BASIC.\\ \\
==== References ====
Richard Russell, "Using the tokeniser", [[http://tech.groups.yahoo.com/group/bb4w/|BBC BASIC for Windows Yahoo! group]] message [[http://tech.groups.yahoo.com/group/bb4w/message/86|86]].