Implementing IXmlWriter Part 2: Escaping Element Content
Implementing IXmlWriter c++ ixmlwriter xml
Published: 2005-10-06
Implementing IXmlWriter Part 2: Escaping Element Content

This is part 2/14 of my Implementing IXmlWriter post series.

In the previous post of this series, we ended up with a simple class which could write XML elements and element content to a std::string. However, this code has a common, serious problem that was mentioned in my post Don’t Form XML Using String Concatenation: it doesn’t properly escape XML special characters such as & and <. This means that if you call WriteString() with one of these characters, your generated XML will be invalid and will not be able to be parsed by an XML parser.

The rules for XML element value escaping are given by Section 2.4 of the W3C XML 1.0 Recommendation—specifically, by the following passage:

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings “&amp;” and “&lt;” respectively. The right angle bracket (>) MAY be represented using the string “&gt;”, and MUST, for compatibility, be escaped using either “&gt;” or a character reference when it appears in the string “]]>” in content, when that string is not marking the end of a CDATA section.

For simplicity, I will choose to always escape > with &gt;. As we are using test-driven development, we must first write a test case:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
StringXmlWriter xmlWriter;

xmlWriter.WriteStartElement("root");
  xmlWriter.WriteStartElement("element");
    xmlWriter.WriteString("&<>");
  xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be <root><element>&amp;&lt;&gt;</element></root>

Note how the previous version of StringXmlWriter fails this test case because it generates the invalid XML string <root><element>&<></element></root>. The changes to StringXmlWriter are fairly straightforward (note how I am following the advice from my post Prefer Iteration To Indexing):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class StringXmlWriter
{
private:
    std::stack m_openedElements;
    std::string m_xmlStr;

public:
    void WriteStartElement(const std::string& localName)
    {
        m_openedElements.push(localName);
        m_xmlStr += '<';
        m_xmlStr += localName;
        m_xmlStr += '>';
    }

    void WriteEndElement()
    {
        std::string lastOpenedElement = m_openedElements.top();
        m_xmlStr += "</";
        m_xmlStr += lastOpenedElement;
        m_xmlStr += '>';
        m_openedElements.pop();
    }

    void WriteString(const std::string& value)
    {
        typedef std::string::const_iterator iter_t;
        for (iter_t iter = value.begin(); iter != value.end(); ++iter) {
            if (*iter == '&') {
                m_xmlStr += "&amp;";
            } else if (*iter == '<') {
                m_xmlStr += "&lt;";
            } else if (*iter == '>') {
                m_xmlStr += "&gt;";
            } else {
                m_xmlStr += *iter;
            }
        }
    }

    std::string GetXmlString() const
    {
        return m_xmlStr;
    }
};