hellomike wrote: ↑Sun 27 Aug 2023, 13:39
What is going on here?
You've contrived to create an illegal file which contains
both ANSI and UTF-8 encoded characters. Once you create something illegal you can't meaningfully ask
how it will misbehave, it might explode your computer (not really, but my point is that you can't ask the question).
It's not permitted to mix ANSI and UTF-8 encodings in the same file, they are mutually exclusive. That's obviously the case because some sequences of bytes are valid in both encodings, and there is no way of telling which is the correct interpretation.
For example the ANSI string "NESCAFÉ©" (where the last two characters are E-acute and the Copyright symbol) and the UTF-8 string "NESCAFɩ" (where the last character is the lower-case Greek letter iota) have
identical encodings: 4E 45 53 43 41 46 C9 A9.
Try it for yourself. Copy-and-paste this into the BB4W or BBCSDL editor in
ANSI mode and run it:
Code: Select all
a$ = "NESCAFÉ©"
FOR I% = 1 TO LEN(a$)
PRINT " " RIGHT$("0"+STR$~ASCMID$(a$,I%),2);
NEXT
PRINT
Now copy-and-paste this into the BB4W or BBCSDL editor in
Unicode mode and run it:
Code: Select all
a$ = "NESCAFɩ"
FOR I% = 1 TO LEN(a$)
PRINT " " RIGHT$("0"+STR$~ASCMID$(a$,I%),2);
NEXT
PRINT
You should find that they print the same sequence of bytes.
