Expressions which may not work as you might expect

It may help if you read this page in conjunction with the page How an Advanced Search is Made

Some of the Advanced Search elements are very powerful. Combining two powerful elements can lead to surprises. There are other occasions where elements may not behave as you first expect. Here are a few of these.

~'abc' is not the same as '~abc'
A negated element can never of itself make a match. Nor does it move the Em pointer. It moves the Tm pointer past any negated text matches and then, when the negated element does not match, matches the next text character against the succeeding positive element.

For example consider

Search ~'abc'
matched against the text: abcdef. a, b and c are skipped (as they match the negated set) and d is found. d is not skipped, so fails. d is then matched against whatever follows ~'abc' - in this case, nothing. But the Tm pointer still points to the a of abcdef so this is tested again, ad infinitum - or until escape is pressed.

But if there is another element after the negated set, then the Em pointer is moved to the next element after the negated set.

Search ~'abc' "d"

This will match the text "abcdef".
x * y is not the same as x {.} y (except when y = \n)
x * > is not the same as x * \n
The element > (end of line) does not include the new line character(s). It will include everything up to, but not including, the new line. $ or \n is the (group of) character(s) that defines the new line in the current document so will include these. A similar situation exists with < (start of line).

In particular don't use > to insert something at the end of a line:

Search >
will fail. The fred will indeed be inserted at the end of the first line. The search will then advance, find an end of line, and insert fred again, and so on until you press escape!
Do not use {~x} or {D}
These two elements do not terminate so an endless loop results.
do not use x {.} y
This element always fails.
Beware of * or ** at the start of an expression
Consider the search string MESSAGE StrongED_OpenDoc SCSI::SSD.$.Torrens.www.stronged/torrens/ 1226 1398 1292 1200 0 0 -autowrap -mode HTML

This was a line from the Obey file saved from StrongED's List of Widows. I wanted to remove all the window handling detail, that is everything after the /inc. I tried the search expression

Search *" "####*$
which unexpectedly removed everything starting with the SCSI (and the preceding space). The reasons for this are complicated, so it is difficult to predict the effect. It is clearly inadvisable to start a search expression with any of the lookahead elements.

Search " "####*$
without the initial * works as you should expect. Fred Graute's explanation follows.

"I'll start with the working " "####*$ pattern. With this we try to match the " " against the 'M' of 'MESSAGE...' which fails so we move one character forward to 'e', which also fails etc. until we come to the space after 'Message'

"This succeeds so we try to match '#' against 'S' of 'StrongED_OpenDoc'. Alas that fails so again we move one character forward. We keep doing this until we reach the space after the filename where finally the whole pattern matches.

"In the above case the match position is moved forward immediately when a match fails. With the failing *" "####*$ this is not the case. -->

"The *" "####*$ pattern is handled much the same except that the match position isn't moved forward when " " fails to match. Only when " " matches and '#' fails is the match position moved forward.

"When we come to 'SCSI...' the match position is at the initial 'S'. We try to match " " which fails but because of the '*' the match position is advanced. Instead we simply look at the next character, this goes on until we find the " " after the filename.

"We then try the rest of the pattern '####*$' which also matches so we a match from 'SCSI' to end of line.

Was this page helpful? Please email me and/or rate this page:

If you want a reply make sure any email address will not get spam-binned!
Optional comment

Other relevant pages

Top of page

Page Information Document URI:
Page first published 30th May 2018
Last modified:Tue, 11 May 2021 08:52:28 BST
© 2018 - 2024 Richard Torrens.