Finding Duplicate Words
As an example of how back references can be very useful, this will search for duplicated words in a text. You should be familiar with
Markers before trying to understand this
The sequence @xy will match the text between the marks 'x' and 'y', where x and y are 0-9. Obviously these marks must have already been set in the search string before you can use them there. You are advised not to use marks 0 or 9 in backreferences.
An example of where backreferences can be useful is in finding out if a text contains doubled-up words such as the the.
Here's an expression, using set shorthands, to do this:
Search | \s|\p @1 {\a}+ @2 \s|\p @12 \s|\p |
- \s|\p looks for a space OR punctuation
- @1 sets mark 1 just before any alpha character.
- {\a}+ repeats look for any alpha character 1 or more times.
- @1 sets mark 2. As we must have finished the repeat process, this must mean that a letter or something other than an alpha character is coming up.
- \s|\p matches the space OR punctuation
- @12 will match the word set between @1 and @2 earlier, so is a repeat word.
- \s|\p followed by space OR punctuation
So if we have arrived here, the search has ended and we have found a duplicated word.
Note that instead of the set shorthands we could have used predefined sets: White for \s, Punct for \p and Alpha for \a
If you wish to experiment you can use the sample text
Was this page helpful? Please email me and/or rate this page:
If you want a reply make sure any email address @torrens.org will not get spam-binned!