Character Sets, definitions

Many of the character sets listed in the Advanced search page are in fact approximations. The actual set may be more complicated than that shown.

For example '\t\x0A ' is not the complete definition of White. It's just an example equivalent set, albeit an incomplete one. The correct set that it corresponds with is: '\x09-\x0D\x20\xA0'

Note that other equivalance sets in the manual are not complete either. However that's done on purpose as some sets are large. Alpha for example matches not only 'A-Za-z' but also all accented characters.

StrongED uses the TerritoryManager's character property tables for its predefined sets. To see which characters are included in a particular set you can use this BASIC program: REM Property code | StrongED REM ------------------------------ REM 0 = Control code | Ctrl REM 1 = Uppercase | Upper REM 2 = Lowercase | Lower REM 3 = Alphabetic | Alpha REM 4 = Punctuation | Punct REM 5 = White space | White REM 6 = Digit | Digit REM 7 = Hex digit | Hex property% = 1 :REM set to 0-7 according to above table SYS "Territory_CharacterPropertyTable",-1,property% TO table% FOR c% = 0 TO 255 byte% = c% DIV 8 : bit% = 1<<(c% MOD 8) IF (table%?byte%) AND bit% > 0 THEN PRINT c% NEXT c%

This program will display the character numbers, in decimal notation (0-255) which correspond to that set.

You can download the program, but you will need to change the property% to the value you wish to interrogate.

Other relevant pages

Top of page


Page Information

http://css.torrens.org/valid-html401-bluehttp://css.torrens.org/valid-css Document URI: http://stronged.torrens.org/man/search/charsets.html
Page first published 5th January 2018
Last modified:Wed, 30 May 2018 09:33:39 BST
© 2018 - 2018 Richard Torrens.