by Richard Russell, March 2010
BBC BASIC for Windows provides native support for the Unicode Basic Multilingual Plane, allowing you to work with, output and print a wide range of foreign-language and other character sets with very little extra effort. The main Help documentation describes how to enable Unicode support.
The Unicode encoding used by BBC BASIC for Windows is UTF-8. This is used in preference to other encodings (for example UTF-16) for the following reasons:
UTF-8 has only one significant disadvantage compared with UTF-16: it is a variable-length encoding. That means you cannot determine the number of characters in a string using the LEN function (it returns the length in bytes, not in characters). Similarly, the COUNT function and features that depend on it (i.e. the WIDTH statement and the TAB(x) function) won't necessarily work as expected. Note that in any case COUNT, WIDTH and TAB(x) aren't generally useful when a proportionally spaced font is in use.
To overcome this disadvantage the function FNulen is listed below. This takes as a parameter a Unicode (UTF-8) string, and returns the length of the string in characters:
DEF FNulen(U$) LOCAL L% CP_UTF8 = 65001 SYS "MultiByteToWideChar", CP_UTF8, 0, U$, LEN(U$), 0, 0 TO L% = L%
If passed a string containing only 7-bit ASCII text, the function will return the same value as LEN(U$).
If you need to know the extent (that is, the physical width and height) of a Unicode (UTF-8) string, such as you might if you want to centre it on the screen or a printout, you can use the following procedure:
DEF PROCuextent(hdc%, U$, size{}) LOCAL L%, U% L% = FNulen(U$) DIM U% LOCAL 2*L% U% = (U% + 1) AND -2 SYS "MultiByteToWideChar", CP_UTF8, 0, U$, LEN(U$), U%, L% SYS "GetTextExtentPoint32W", hdc%, U%, L%, size{} ENDPROC
If passed a string containing only 7-bit ASCII text, the procedure will return the same size as GetTextExtentPoint32.