This is part 2/14 of my Implementing
IXmlWriter post series.
In the previous post of this series, we ended up with a simple class which could write XML elements and element content to a
std::string. However, this code has a common, serious problem that was mentioned in my post Don’t Form XML Using String Concatenation: it doesn’t properly escape XML special characters such as & and <. This means that if you call
WriteString() with one of these characters, your generated XML will be invalid and will not be able to be parsed by an XML parser.
The rules for XML element value escaping are given by Section 2.4 of the W3C XML 1.0 Recommendation—specifically, by the following passage:
The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings “
&” and “
<” respectively. The right angle bracket (>) MAY be represented using the string “
>”, and MUST, for compatibility, be escaped using either “
>” or a character reference when it appears in the string “
]]>” in content, when that string is not marking the end of a CDATA section.
For simplicity, I will choose to always escape > with
>. As we are using test-driven development, we must first write a test case:
Note how the previous version of
StringXmlWriter fails this test case because it generates the invalid XML string
<root><element>&<></element></root>. The changes to
StringXmlWriter are fairly straightforward (note how I am following the advice from my post Prefer Iteration To Indexing):