counting_20the_20characters_20in_20a_20unicode_20string
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
counting_20the_20characters_20in_20a_20unicode_20string [2018/03/31 13:19] – external edit 127.0.0.1 | counting_20the_20characters_20in_20a_20unicode_20string [2024/01/05 00:22] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
=====Counting the characters in a Unicode string===== | =====Counting the characters in a Unicode string===== | ||
- | //by Richard Russell, March 2010//\\ \\ //BBC BASIC for Windows// provides native support for the [[http:// | + | //by Richard Russell, March 2010//\\ \\ //BBC BASIC for Windows// provides native support for the [[http:// |
* UTF-8 is represented as a //byte stream//, which is compatible with BBC BASIC' | * UTF-8 is represented as a //byte stream//, which is compatible with BBC BASIC' | ||
Line 9: | Line 9: | ||
* UTF-8 has only one version, whereas UTF-16 is byte-order dependent (it has little-endian and big-endian versions). | * UTF-8 has only one version, whereas UTF-16 is byte-order dependent (it has little-endian and big-endian versions). | ||
* UTF-8 is the preferred Unicode encoding for emails and web pages. | * UTF-8 is the preferred Unicode encoding for emails and web pages. | ||
- | \\ | + | |
+ | UTF-8 has only one significant disadvantage compared with UTF-16: it is a variable-length encoding. That means you cannot determine the number of characters in a string using the **LEN** function (it returns the length in bytes, not in characters). Similarly, the **COUNT** function and features that depend on it (i.e. the **WIDTH** statement and the **TAB(x)** function) won't necessarily work as expected. Note that in any case COUNT, WIDTH and TAB(x) aren't generally useful when a **proportionally spaced** font is in use.\\ \\ To overcome this disadvantage the function **FNulen** is listed below. This takes as a parameter a Unicode (UTF-8) string, and returns the length of the string in characters: | ||
+ | |||
+ | <code bb4w> | ||
DEF FNulen(U$) | DEF FNulen(U$) | ||
LOCAL L% | LOCAL L% | ||
Line 15: | Line 18: | ||
SYS " | SYS " | ||
= L% | = L% | ||
- | If passed a string containing only 7-bit ASCII text, the function will return the same value as **LEN(U$)**.\\ \\ If you need to know the **extent** (that is, the physical width and height) of a Unicode (UTF-8) string, such as you might if you want to centre it on the screen or a printout, you can use the following procedure:\\ | + | </ |
+ | |||
+ | If passed a string containing only 7-bit ASCII text, the function will return the same value as **LEN(U$)**.\\ \\ If you need to know the **extent** (that is, the physical width and height) of a Unicode (UTF-8) string, such as you might if you want to centre it on the screen or a printout, you can use the following procedure: | ||
+ | |||
+ | <code bb4w> | ||
DEF PROCuextent(hdc%, | DEF PROCuextent(hdc%, | ||
LOCAL L%, U% | LOCAL L%, U% | ||
Line 24: | Line 31: | ||
SYS " | SYS " | ||
ENDPROC | ENDPROC | ||
+ | </ | ||
+ | |||
If passed a string containing only 7-bit ASCII text, the procedure will return the same size as **GetTextExtentPoint32**. | If passed a string containing only 7-bit ASCII text, the procedure will return the same size as **GetTextExtentPoint32**. |
counting_20the_20characters_20in_20a_20unicode_20string.1522502352.txt.gz · Last modified: 2024/01/05 00:18 (external edit)