Discarding content with the OmniMark suppress operator
Last modified: 14/10/2010
Sep 07
Until now I only demonstrated examples where XML content is processed without suppressing the content.
A simple suppress example
-
process
-
do xml-parse
-
scan '<doc><title>This is a title</title>' ||
-
'<p>This is a paragraph.</p></doc>'
-
output '%c'
-
done
-
-
element #implied
-
output '<%q>%c</%q>'
-
-
element title
-
suppress
-
<doc><p>This is a paragraph.</p></doc>
What we did here is sending the title element to the built-in #suppress stream. Sending data to this stream is like redirecting to the black hole /dev/null.
So we also could have written line 11 and 12 as:
-
element title
-
put #suppress '%c'
Line 12 says: “Continue parsing, sending the current content to the #suppress stream”. So the title element isn’t just thrown away. If there where sub elements in the title element they would have been parsed too.
Let’s make this more clear with an other example:
-
global stream titleStream
-
-
process
-
open titleStream as file 'listoutput.txt'
-
do xml-parse
-
scan '<doc><title>This is a <i>title</i></title>' ||
-
'<p>This is a paragraph.</p></doc>'
-
suppress
-
done
-
close titleStream
-
-
element #implied
-
output '%c'
-
-
element title
-
using output as titleStream
-
output '%c'
-
This is a title
In line 1 we declared the stream titleStream and attach this stream in line 4 to our output file ‘listoutput.txt’.
At the highest level, line 8, we send the input data to the suppress stream. In the title element rule we temporarily redirect to titleStream which is attached to our file ‘listoutput.txt’.
Because every element must have a rule, the element #implied rule is used, for the <doc> element, the <i> and <p> element.
Why is the content of the <p> element not in the output file if the #implied rule, which is used for the <p> element, has a output '%c'? Well, the redirection to titleStream is only temporarily for the element <title> and its descendants. After the <title> element is parsed, the streaming falls back to the previous stream which is the suppress stream defined on line 8.
What would have happened if we changed:
-
element #implied
-
output '%c'
to
-
element #implied
-
suppress
-
This is a
The child element <i> in <title> is now suppressed, because the <i> element fires line 13 which now states to suppress it.
Let’s take a higher gear
In this example we extract all the division titles from an XML document, number them and indent them. I gave the titles letters to make more clear how they are nested in to each other.
-
<doc>
-
<div><title>[a]</title>
-
<div><title>[a.a]</title>
-
<p>XML is fun</p>
-
</div>
-
<div><title>[a.b]</title>
-
<p>This is a paragraph.</p>
-
<div><title>[a.b.a]</title></div>
-
<div><title>[a.b.b]</title></div>
-
</div>
-
<div><title>[a.c]</title></div>
-
</div>
-
<div><title>[b]</title>
-
<p>More text here.</p>
-
</div>
-
</doc>
-
1 [a]
-
1.1 [a.a]
-
1.2 [a.b]
-
1.2.1 [a.b.a]
-
1.2.2 [a.b.b]
-
1.3 [a.c]
-
2 [b]
-
global stream listWithTitles
-
global integer numberStack variable
-
-
define string function giveNumber()
-
as
-
local stream returningResult
-
local integer countDiv initial {0}
-
repeat over current elements as theElement
-
increment countDiv when name of
-
current element theElement = 'div'
-
again
-
do when countDiv > number of numberStack
-
set new numberStack to 1
-
else when countDiv = number of numberStack
-
increment numberStack
-
else when countDiv < number of numberStack
-
remove numberStack
-
increment numberStack
-
done
-
open returningResult as buffer
-
using output as returningResult
-
repeat over numberStack
-
output 'd' % numberStack
-
output '.' when not #last
-
again
-
close returningResult
-
return '%t' ||* (number of numberStack – 1) ||
-
returningResult
-
-
-
process
-
open listWithTitles as file 'output.txt'
-
using output as #suppress
-
do xml-parse
-
scan file 'input.xml'
-
output '%c'
-
done
-
close listWithTitles
-
-
element #implied
-
output '%c'
-
-
element title when parent is div
-
using output as listWithTitles
-
do
-
output giveNumber()
-
output ' '
-
output '%c%n'
-
done
RSS