For Advanced Search (once the Advanced option has been selected), parts of a search string are interspersed with symbols representing various types of wildcard. All the parts of the search string must be in double inverted commas ("). Anything else, not in double inverted commas, is interpreted as an element to be matched. Of course there is a way (escaping) of including a double inverted comma (or any other special elemental character) inside the parts of the search string.
Search syntax is different from Replace syntax, but both share much of the same syntax - so we will use coloured backgrounds to indicate which the section applies to, thus:
Search Both Replace
The concepts in Advanced Search and Replace are not difficult - but it can get very confusing working out how to express what you are searching for!
Probably the easiest way of doing this, if the expressions are at all complicated, is to work out the Search and Replace expressions separately in a StrongED page.
Once you have a pair of expressions worked out you can combine the clipboard with the Copy key (f7) to transfer both to the dialogue box. You need !IcnClipBrd (which is available from !Store) to do this.
When working out complicated Search expressions it can also be helpful to use the Advanced List of Found F2 to check that your expression catches exactly what you want it to catch.
Slightly more complicated, but may be easier to remember, and doesn't need !IcnClipBrd, is a method using F7 only.
LF | Risc OS, Linux |
CR | BBC, spectrum |
LF+CR | |
CR+LF | Windows |
The newline type to match by these characters is decided during the search so it's perfectly fine to search multiple texts which have different newline types.
For example searching for "foo" $ "bar" will match wherever a line ends with 'foo' and where the next starts with 'bar', regardless of the newline type of the text(s) being searched.
If you want to search for a string which contains " then that must be magiced thus \". Similarly, as \ is part of a "magic" character, if you want to search for a string that contains \ you must use \\.
Thus, for example, to replace \\"" by ""\\ and vice versa, you could use
Search | "\\\\\"\"" |
Replace | "\"\"\\\\" |
A few more examples:
"01234567" | Matches "0" followed by "1" followed by "2" etc |
"string\t" | Matches "string" followed by a Tab character |
"\"" | Matches the " (doublequote) character |
"\\" | Matches the {\} (backslash) character |
There are a number of other "magic" characters which are listed in the section on shorthands
For an example, see the section on Markers.
There are a number of other escaped characters that can be used in Search strings (but not in Replace):
The escape character has many more uses. See Character Sets
Special symbols (@1 though @8) may be inserted in the search string so that parts of the search string can be inserted into the replace string. Mark @0 is already set to the start of the search string and @9 is set to the end - though both @0 and @9 can be reset if necessary. A similar use of @ markers is in Shortcuts (macro expansions) in Modefiles.
To show how these work, let's take a piece of text...
"To be,"@1" or not to be,"@2" that is"@3" the question:
...with this advanced replace string:
"@01="@01$"@02="@02$"@23="@23$"@09="@09$
@01=To be, @02=To be, or not to be, @23= that is @09=To be, or not to be, that is the question:
@09 tends to be the most used, so there is a shorthand for this - @@.
Notice that the replace strings above include everything, including initial and end spaces, between the @ markers. I also used the $ sign in the Replace expression outside of inverted commas this causes (a) NewLine character(s) to be entered.
You may also have noticed that the Search and, less so, Replace strings can get impossibly long. See the hint on Working out Search and Replace expressions
It may seem that the 2 digit @nm elements are only for use in the replace expression. However they can also be used in the searches - after the single digit @n @m markers have been set: this is explained in Back References.
A single asterisk searches * in the line until the next element of the search expression is found.
A double asterisk searches ** in the text until the next element of the search expression is found.
If you have read the page How a search is performed you can understand that what the * wildcard does is to advance the pointer Em to the next element (in this case e5) then to advance the pointer Tm until either the end of line or the element e5 matches. If that happens, the search is continued. But if an end of line occurs first, the match fails.
** does the same thing but does not abort at the end of line.
A real life example is available.
Anything within single quotes is a character set. This will match if the character in the text is any one of the characters in the set. The character set can contain single characters and ranges. Thus you can define your own sets - there is a simple example of a character set,
However StrongED has a number of pre-defined Character sets which probably cover everything most users will ever need.
Some of these sets can be called by a single Character. Most have Shorthands. Some are Named. You can therefore use any of the first four columns below in the Search expression. These should not be inside ' ' single quotes.
Some of these, coloured thus can also be used in Replace expressions.
Entries coloured thus are character shortcut elements which can be used inside strings in both Search and Replace expressions. See note.
Char | Short | Name | Char Set or Hex value | Description |
---|---|---|---|---|
? | \a | Alpha | 'a-zA-Z' | Upper and lower case letters |
\a also includes all accented characters. See Character Sets | ||||
Upper | 'A-Z' | Upper case letters | ||
Lower | 'a-z' | Lower case letters | ||
Ctrl | '\x00-\x1F\x7F' | |||
\b | \x08 | Backspace | ||
\c | '\x00-\x1F\x7F' | All Control characters | ||
# | \d | Digit | '0-9' | Decimal digits |
D | \d | Digit | '0-9' | Decimal digits |
\e | \x1B | Escape character | ||
\f | \x0c | Form feed | ||
\h | Hex | '0-9a-fA-F' | Hexdecimal digits | |
See Hex Character Set | ||||
\i | '0-9a-zA-Z' | Identifier characters | ||
The set which \i matches depends on the definitions in the ModeFile (ID_FirstChar, ID_Middle and ID_LastChar) | ||||
\l | \x0A | Line feed | ||
\p | punct | '!'(),./:;\?' | Punctuation characters | |
\r | \x0D | Carriage Return | ||
\s | white | '\t\x0A ' | White space | |
See Handling White Space | ||||
\t | \x09 | Tab character | ||
\v | \x0B | Vertical tab | ||
AD | \w | AlphaNum | '0-9a-zA-Z' | AlphaNumeric characters |
\x | \xHH | Character code HH | ||
\x is a special case: the two characters following it are interpreted as a hex number and the corresponding letter/code is used. If the two characters following the \x are not a hex character, StrongED simply reports String not found | ||||
\" | Quote character in strings | |||
\\ | Backslash character in strings | |||
\+ | Turns case-sensitive matching on. | |||
\- | Turns case-sensitive matching off. | |||
That's \ minus | ||||
\= | Restores original case-sensitivity as defined in dbox or function. | |||
That's \ equals | ||||
CW | Matches word at caret | |||
CW will find the caret word even when it is part of another word. See Cursor Word example | ||||
$ | \n | NL | Newline | |
StrongED can be set to use LF, CR, LF CR or CR LF as New Line characters. $ or NL will match whatever is currently in use. | ||||
. | A full-stop matches any character other than Newline | |||
@ | Marks in search string, see section on Markers | |||
@@ | Short for @09 |
The upper case equivalent of a Set shorthand matches all characters that are NOT within that set. For example \D matches all characters that are not a decimal digit, i.e. that are not 0-9.
Names and character shorthands are case insensitive.
Use in a string of \ followed by a character not in the above list will simply match the character that comes after the '\'. Note thet this character will obey the case-sensitivity set in the dbox or function, unlike other character shortcuts.
The . (full stop) metacharacter matches any character except for newline character(s). It can be used to absorb any characters that are irrelevant to the overall match. However Care must be taken when using . with a quantifier, eg { . }, because '.' matches everything but newline any subsequent elements may therefore fail to match. See example using ..
To remove any white space at the beginning of all lines in a text.
Search | <white |
Replace |
If you wish to define your own sets in a Search expression some examples are
Some example character sets (note \x precedes a hex number) are
'01234567' | Matches any of the characters 0 to 7 |
'32104765' | Same as above. Ordering doesn't matter |
'0-7' | Same again.. Given as a range |
'a-zA-Z' | Matches any letter, upper or lowercase. |
'\-\' | Matches the "-", and the "\" |
'\x00-\xff' | Matches any single hex character |
'\x41-\x5A' | Matches any single upper case letter |
'\x61-\x7A' | Matches any single lower case letter |
'\x20-\xff' | Matches any single hex character not including control characters but including top bit set characters |
'\t\x0A ' | Matches tab (\t) , linefeed (\x0A) and space |
Character sets can be specially useful when combined with braces (curly brackets). A real life example of this repeat matching is available.
There are more shorthands with special meanings
A search for a group of n items will find the first match in a string of more than n items. See the example Exclusive repeat match
For example:
Search | "B" {"A"} |
Whereas
Search | "B" {"A"}+ |
To count all words you could use Advanced List of Found CouNt facility thus:
Search | {?}+ |
A real life example of set matching is available.
{ } - Repeat Element Matching is very similar to [ ] - Optional element. See Optional or Repeat Matching?
If you have an expression named foobar then you can search for that simply by entering foobar in the search field and making sure advanced search is active. Similarly if you had another expression named foobar you can use that in the replace field. Or any other expression you have named as a Replace expression.
There is more on Named Expressions in Search and Replace Patterns and in the ModeFile Syntax page
Named Expressions can also be used in sections of the ModeFile - Functions, KeyList and ClickList
However if the current element is false, then the next element after the | is evaluated. In this is true, the test proceeds, If false the test is aborted.
So the | tells the search not to advance to the next character in the text but to test the present character once more against the next element in the search string.
Thus you can have many successive elements in a search expression separated by |. Some examples:
"a" "b" | "c" "d" matches abd and acd - i.e. an 'a' then 'b' or 'c' followed by 'd'
("a" "b" ) | ("c" "d") matches 'ab' or 'cd'
The second example uses grouping
Similarly the End of line - > is set before the New Line characters
This flag is very useful in, for example, writing html, when it is likely you will start many htlm tags are likely to be at the start of a new line, depending on your writing style.
Another example might be labels in an assembler text.
If you wanted to capture or delete all whole lines that contain 'foobar' you could use:
< \* "foobar" \* >
the elements being Start of line - search forward in line for string "foobar" - search forward in line for End of line marker.
Another example is in Numbering a List
< and < are most likely to be useful at the start or end of a search expression. If used in the middle of an expression, in most cases a match would not be made as you're already into the search expression and hence the current text line. But they can be useful, for example, in conjunction with **. For example:
Search | "a" ** (@lt; "b") |
As an example of how NOT to use them, I thought it might be useful to add something to the ends of every line in a block:
Search | > |
Replace | "something" |
An example is in Numbering a List
These are similar to the other start and end flags but are set to True at start or end of a whole text.
These can be useful if you want to add something at the starts or ends of several files.
StrongED will apply alternation or negative lookahead only to the next element in the search expression. To make it look further ahead requires the elements to be placed inside parenthesis.
"a" | "b" "c" | matches "a" or "b", followed by "c", so matches "ac" or "bc" |
"a" | ("b" "c") | matches "a" or "bc" |
There is a more involved example of grouping in our examples section.
There is an example of grouping used with a back reference to find html start and end tag pairs.
Example
Search | "a" "b" ["xx"] "c" "d" |
string | ["a"]"b" | {"a"}"b" |
---|---|---|
b | b | b |
ab | ab | ab |
aab | ab | aab |
aaab | ab | aaab |
As an example {"a"}0:1 will match whether a is present or not.
["a"] will optionally match a - so will match whether a is present or not.
["a"] is then equivalent to {"a"}0:1
See also ~ - Ignore
When placed before an element ~ it qualifies that element. This causes text that is matched by the qualified element to be ignored and the Text Match (Tm) pointer stays at the start of the ignored text. The Ta pointer searches ahead for a byte that doesn't match.
When a byte is found that does not match the qualified element, then the element itself is ignored and the Em pointer is moved to the next element, similar to grouped elements.
Since a qualified element is ignored when there is no match, beware of using a qualified element without a following element as, without a following element, the qualified element is re-tested ad infinitum - or until Escape is pressed!
See How a Search is Made to understand the pointers.
There is an example of Negative Lookahead
or ignore vowels
See also [ ] - Optional Element and ( ) - Grouping
It is best not to rely on @0 being automatically set, but to set it implicitly before use. @9 of course won't be set automatically in any event.
There is an example of Back References demonstrating how to find repeat words in a text.
There is an example of back reference used with grouping to find html start and end tag pairs.
Variants are &xxxx and &xxxxxxxx which match 16-bit and 32-bit entities respectively so are mainly of use in program writing.
As a point of interest, the search is little-endian. Thus reversed from other searches. If you were to searrch for the word "byte" you would have to search for letters in the order "etyb" - so search for &65747962.
This can be useful in languages such as PHP, Perl etc., that enclose function bodies in brackets. You could use these shorthands in Advanced LoF, for instance, to list all functions. Or you could search for a function name and then use a block shorthand to skip the function body. But you can also use it to search for any text within brackets.
Brace. \{ Checks that the character at the search position is a { and then searches forward for the matching }. Nesting is taken into account, braces in comments and strings are ignored. The search continues after the matching }.
Bracket. \( Checks that the character at the search position is a ( and then searches forward for the matching ). Any nesting is taken into account, parenthesis in comments and strings are ignored. The search continues after the matching ).
Note that round brackets are used in a search expressions to group things together and { } is used for Repeat Element matching, hence the need to use the escape \ character to opeb the effect.
Specifying &xx=yy (where xx and yy are hex values) in the search does a bit-wise AND of the current character with 'xx'. This is then compared to 'yy'. If there is a match, the search proceeds. If no match, then the search fails.
As an example
Search | &E0=00 |
A more useful example is
Search | &80=80 |
A replace string consists of a number of elements. Each time a match is found and a replacement is made these elements are evaluated to obtain the actual replacement text. This version of StrongED allows a replace expression of a maximum 256 bytes. However the replacement text can be any length (memory permitting).
The elements in the replace expression can be any of:
NL $ | Newline character(s). The actual character(s) inserted are dependent on the line-ending of the text the replacement is applied to. |
Cnt | Inserts current value of the replace counter. After replace, the counter's value is increased. The counter is set up from the Search&Replace dialogue box. |
Name | Name of a replace expression as defined in the ModeFile. When this is used it must be the only thing in the replace string. |
Range | Text between two marks. This allows parts of the matched text to be preserved. |
"String" | Literal text to insert. None of the characters in a string have special meaning with the exception of the character shorthands mentioned below. The double quotes are *not* inserted. |
\b | 0x08 | Backspace |
\e | 0x1B | Escape character |
\f | 0x0C | Form feed |
\l | 0x0A | Line feed |
\r | 0x0D | Carriage return |
\t | 0x09 | Tab character |
\v | 0x0B | Vertical tab |
\x | 0xnn | Character code nn |
\" | 0x22 | Quote character |
\\ | 0x2D | Backslash |
Search | "&" {Hex}+ |