In this small tutorial I will explain how find rules work in OmniMark.

Find rules are possible in an uptranslate and crosstranslate conversion. See my post: The OmniMark conversion scheme for the different conversion schemes.

A crosstranslate example

In a crosstranslate we transform structured input data without using a parser.

Find rules are involved to match patterns in the input data. Let’s have an example.

OmniMark source code
  1. process
  2.   submit 'Can you show [me] some find rule examples ?'
  3.  
  4. find ' ?'
  5.   output '?'
  6.  
  7. find ['aeiou'] => myPattern
  8.   output 'ug' % myPattern
  9.  
  10. find '[' or ']'
  11.  
  12. find uc => upperToLower
  13.   output 'lg' % upperToLower
The resulting output
  1. cAn yOU shOw mE sOmE fInd rUlE ExAmplEs?

On line 2 we submit one line of text. In this simple example the text is included in the source code. In a real live example we can of course feed the data from files or a database. While the data streams from input to output, the find rules are triggered in the order an event occurs.

Line 4 ensures that a space is removed before a question mark.

In line 7 a character class is defined with the vowels. Every single ‘a’, ‘e’, ‘i’, ‘o’ or ‘u’ matches and is temporarily saved in the local variable myPattern. On line 8 the pattern is transformed to uppercase.

On line 10 we search for a left or right square bracket. When there is a match, the bracket is consumed and no output goes to the output stream. So find rules with only a header part and no body part consume the data (no replacement).

On line 12, 13 we match every single uppercase letter and transform it to lowercase.

Nesting patterns

Let’s have an other small example.

OmniMark source code
  1. process
  2.   submit 'myfile.txt'
  3.          
  4. find (any** => file '.' any+ => extension) => fileName
  5.   output 'file is: ' || file || '%n'
  6.   output 'extension is: ' || extension || '%n'
  7.   output 'filename is: ' || fileName
The resulting output
  1. file is: myfile
  2. extension is: txt
  3. filename is: myfile.txt

On line 2 we submit the text ‘myfile.txt’ (not the file myfile.txt)

Line 4: any** means any character, zero or more times up to the following pattern, which is in this case a dot. If we had instead used any* the find rule would never be triggered and the output would be ‘myfile.txt’ (the same as the input data).

any* and any+ are “greedy” patterns because they “eat” everything. We use it at the end of the pattern because we want everything after the dot until the end of the stream.

This example demonstrates nested patterns because we assign parts of the pattern to the pattern variables: file and extension. At the same time we assign the whole pattern to fileName.

Conclusion

This short demo is only a glimpse of what you can do with find rules and pattern matching.

OmniMark uses a very intuitive syntax to match patterns that is very readable.

Feel free to ask questions!

Rating 3.00 out of 5
[?]