Solving the streaming paradigm with OmniMark referents
Last modified: 02/12/2018
Oct 09
OmniMark referents, XML 2 Comments
Unlike XSLT processors, OmniMark uses a streaming model to process XML/SGML.
A streaming model has advantages but it also has disadvantages. The XML flows from top to bottom. You can’t randomly access the XML tree in memory.
So how can you process an XML node in the beginning of a document if it depends on a node that hasn’t streamed yet or maybe never will come?
Let’s say you have to add an attribute at the beginning of the document if and only if there is an annex add the end of the document.
Or you want to add simply a table of contents add the beginning of the document.
One solution could be to process the XML document twice, but that is not very efficient.
OmniMark solves this problem with referents.
What is a referent?
A referent is a placeholder you can place in the output stream, and fill at a later time.
Of course the output has to be buffered until the referent is filled. You can decide when this happens: at the end of the process, at the end of every document that is processed or in a certain scope within the document e.g. every table.
A simple example
OmniMark source code, simple referent example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | process ;a comment line starts with a ";" ;'%n' is the escape charater for newline output referent 'c' || '%n' output referent 'a' || '%n' output referent 'b' || '%n' output referent 'c' set referent 'c' to 'C-value' set referent 'a' to 'A-value' set referent 'b' to 'B-value' set referent 'c' to 'the value is C'</pre> <pre lang="txt">the output result the value is C A-value B-value the value is C |
A referent has a uniq name, e.g. referent 'the uniq name'
and can be placed at different places in the output stream e.g. referent 'c'
.
A referent with the same name has the same value.
In this example the referents are output and at a “later” time we set the value for the referents.
The referent 'c'
first gets the value ‘C-value’ and later on it gets the final value 'the value is C'
. So a referent can have a variable value.
Making a toc using referents
OmniMark source code, adding a toc before the doc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | global stream toToc process open toToc as referent 'myToc' do xml-parse scan '<doc><title>Title of doc</title>%n' || '<div><title>first division title</title>' || '<p>A paragraph in div1</p>' || '</div>%n' || '<div><title>second division title</title>' || '<p>A paragraph in div2</p>' || '</div>%n' || '</doc>' output '%sc' done close toToc element #implied output '<%q>%c</%q>' element 'doc' output '<toc>%n' output referent 'myToc' output '</toc>%n' output '<%q>%n' output '%c' output '</%q>' element title put toToc '%t' put #current-output & toToc '<%q>%c</%q>%sn' |
the output result
<title>Title of doc</title>
<title>first division title</title>
<title>second division title</title>
</toc>
<doc>
<title>Title of doc</title>
<div><title>first division title</title>
<p>A paragraph in div1</p></div>
<div><title>second division title</title>
<p>A paragraph in div2</p></div></doc>
We open the stream toToc
and attach that stream to referent 'myToc'
. So if we output data to stream ‘toToc’, it will flow into the referent ‘myToc’.
In line 31 we send the titles to two streams #current-output
and toToc
at the same time. #current-output is a predefined OmniMark stream, its name speaks for itself.
Determine the scope of the referents
Suppose we have an XHTML document with tables and we have to add <col> elements. We have to add as many <col> elements as there are <td> cells in a row. Because the <col> elements come before the <td> elements we can use referents to solve this problem.
OmniMark source code, scope of a referent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | global integer numberOfColumns initial {0} process do xml-parse scan '<html>' || '<table>' || '<tr><td>r1k1</td><td>r1k2</td><td>r1k3</td></tr>' || '<tr><td>r2k1</td><td>r2k2</td><td>r2k3</td></tr>' || '</table>'|| '<table>' || '<tr><td>r1k1</td><td>r1k2</td></tr>' || '<tr><td>r2k1</td><td>r2k2</td></tr>' || '</table>'|| '</html>' output '%sc' done element #implied output '<%q>%c</%q>' element table using nested-referents do output '<%q>%n' output referent 'number of columns' output '%n%c</%q>%sn' done element tr set numberOfColumns to 0 output '<%q>%c</%q>%n' set referent 'number of columns' to '<col/>' ||* numberOfColumns element td increment numberOfColumns output '<%q>%c</%q>' |
first table 3 col, second 2 col
<col/><col/><col/>
<tr><td>r1k1</td><td>r1k2</td><td>r1k3</td></tr>
<tr><td>r2k1</td><td>r2k2</td><td>r2k3</td></tr>
</table>
<table>
<col/><col/>
<tr><td>r1k1</td><td>r1k2</td></tr>
<tr><td>r2k1</td><td>r2k2</td></tr>
</table></html>
The scope of the referent 'number of columns'
is within the <table> so when the end of a table is reached the referent is solved. We only need one referent although we have more than one table.
The scope is set by using the construct using nested-referents
. If this construct wasn’t used every table would have the same referent and so the same number of <col> elements, in this case 2 columns because the last table has 2 columns.
Grant Bailey
Apr 14, 2011 @ 15:05:26
Hi Michiel
Thank you for your posts about OmniMark.
Which version of OmniMark are you using?
Regards,
Grant Bailey
admin
Apr 14, 2011 @ 16:27:43
Hi Grant,
I’m using version 9.0.1
Kind regards,
Michiel