=====Tokeniser===== //by JGH, June 2006//\\ \\ BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\ A tokenised line can easily be detokenised, or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:\\ quote%=FALSE REPEAT IF (?addr%<128 AND ?addr%>31) OR quote% THEN VDU ?addr% ELSE P.token$(?addr%); IF ?addr%=34 quote%=NOT quote% addr%=addr%+1 UNTIL ?addr%=13 Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:\\ ON NOON GOTO 1,2 the first 'ON' is the token ON, but the second 'ON' is part of the variable 'NOON'. The second 'ON' must be left untokenised.\\ \\ The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\ ==== In Windows BASIC: ==== B%=EVAL("0:"+A$) token$=$(!332+2) This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:\\ IF EVAL("1:"+A$) token$=$(!332+2) The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:\\ ON A% GOTO 10,20,30,40,50 The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:\\ IF EVAL("1RECTANGLE:"+A$) token$=$(!332+3) \\ ==== In ARM BASIC: ==== SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A% B%=EVAL("0:"+A$) token$=$(A%-14) \\ ==== In 6502 BASIC: ==== A%=EVAL("0:"+A$) token$=$((!4 AND &FFFF)-LENA$-1) \\ By preceding the code you want to tokenise with "0:" you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\ This can be written as functions as follows:\\ DEF FNTokenise_Win(A$):LOCAL A%,B% WHILELEFT$(A$,1)=" ":A$=MID$(A$,2):ENDWHILE B%=EVAL("0:"+A$):=$(!332+2) : DEF FNTokenise_ARM(A$):LOCAL A%,B% SYS "XOS_GenerateError",0,STRING$(255,"*") TO ,A% B%=EVAL("0:"+A$):=$(A%-13) : DEF FNTokenise_65(A$):LOCAL A% A%=EVAL("0:"+A$):=$((!4 AND &FFFF)-LENA$-1) \\ These functions are used in full in the 'Tokenise' BASIC library at [[http://mdfs.net/System/Library/BLib|mdfs.net]].\\ \\ A text file can then be tokenised using the following code:\\ in%=OPENIN(text$) out%=OPENOUT(basic$) line%=1 :REM Start from an arbitary line number REPEAT line$=FNTokenise_Win(GET$#in%) :REM Read line and tokenise it BPUT#out%,LENline$+4 :REM Output line length BPUT#out%,line%:BPUT#out%,line%DIV256 :REM Output line number BPUT#out%,line$;:BPUT#out%,13 :REM Output line and line%+=1 :REM Increment line number UNTIL EOF#in% BPUT#out%,0:BPUT#out%,&FF:BPUT#out%,&FF :REM Output program terminator CLOSE#out%:out%=0 CLOSE#in%:in%=0 \\ ==== Notes ==== Acorn BBC BASIC programs are stored slightly differently. See [[/Format|Format]] and relevant pages on [[http://beebwiki.mdfs.net/|Acorn-specific sites]] for details.\\ \\ This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):\\ B%=EVAL("0OTHERWISE:"+A$) token$=$(!332+3) \\ ==== See also ==== [[http://beebwiki.jonripley.com/Tokeniser|Using the tokeniser]] on BeebWiki for details of using the tokeniser on 6502, Z80, 32000, ARM, DOS and Windows BASIC.\\ \\ ==== References ==== Richard Russell, "Using the tokeniser", [[http://tech.groups.yahoo.com/group/bb4w/|BBC BASIC for Windows Yahoo! group]] message [[http://tech.groups.yahoo.com/group/bb4w/message/86|86]].