hellomike wrote: ↑Wed 16 Aug 2023, 11:27
I copy/paste the code from the BB4W editor window into the BBCSDL editor, the source in the editor displays fine but when run, the string isn't displayed as expected and reported LENgth is 32 bytes.
I suspect that you are forgetting that it is necessary to enable UTF-8 support at run-time using VDU 23,22..... Here's an example which should work correctly if copied-and-pasted into BBCSDL or BB4W
so long as Unicode is enabled in the Options menu:
Code: Select all
10 VDU 23,22,640;512;8,16,2,8 : REM Equivalent to MODE 0 but with UTF-8 enabled
15 INSTALL @lib$ + "utf8lib"
20 Test$ = "Téxt has áccentèd charácters"
30 PRINT
40 PRINT Test$
50 PRINT FN_ulen(Test$)
The key differences from your original are that Unicode (UTF-8) mode has been enabled in line 10 and the length of the string is measured using the library function FN_ulen() which returns the length in
characters whereas LEN() returns the length in
bytes. But these differences have nothing to do with porting from BB4W to BBCSDL, the modified code runs correctly in
both.
it is hard to edit the string in the editor. I.e. in line 20, try clicking between the last 's' and closing quote. The caret won't go there.
What font are you using? I'm using the default
DejaVuSansMono (at 11pt size) and I have no difficulty in positioning the caret between the final 's' and the closing quote. If you tell me what font you are using I will try to reproduce the effect you describe.
For sure all this is due to ANSI vs Unicode or single bytes characters in a string vs multibyte characters.
There's nothing particularly wrong with ANSI so long as you restrict yourself to the ASCII character set, that is characters with codes in the range &20 to &7E. What you should avoid - in both BB4W and BBCSDL - is using the old
Code Page based approach, whereby codes &80 to &FF are mapped to a subset of 'foreign' characters depending on the currently-selected Code Page.
That was a terrible kludge introduced
in the last century to provide limited support for foreign alphabets whilst retaining a one-byte-per-character encoding. Unicode made that obsolete
decades ago and whilst there is still limited support for it in both BB4W and BBCSDL it should ideally not be used.
When porting sources from BB4W to BBCSDL, what is the least painful way to deal with literal strings containing accented characters?
The 'least painful' way is to use Unicode (UTF-8) everywhere.