Implementing IXmlWriter Part 6: Escaping Attribute Content
Implementing IXmlWriter c++ ixmlwriter xml
Published: 2005-10-12
Implementing IXmlWriter Part 6: Escaping Attribute Content

This is part 6/14 of my Implementing IXmlWriter post series.

Last time’s IXmlWriter has a serious bug: it doesn’t properly handle attribute value escaping and can lead to malformed XML.

Consider the following test case:

1
2
3
4
5
6
7
8
StringXmlWriter xmlWriter;
xmlWriter.WriteStartElement("root");
  xmlWriter.WriteStartElement("element");
    xmlWriter.WriteAttributeString("att", "\"");
  xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();

The previous version of IXmlWriter will generate the XML string <root><element att="""/></root>, which is invalid and will be rejected by a XML parser. The rules for XML attribute escaping are given by Section 2.3 of the XML 1.0 spec—specifically, the AttValue literal:

1
2
AttValue ::= '"' ([^<&"] | Reference)* '"'
          |  "'" ([^<&'] | Reference)* "'"

This Backus-Naur form-like construct says that attribute values can be enclosed in either single or double quotes, and that the characters <, &, and the respective quotation character cannot appear between these quotes. However, with the exception of < (see Well-formedness constraint: No < in Attribute Values—thanks dbt), we can insert escaped versions of these characters. As we always encase attribute values in double quotes, we only need to worry about escaping the " character and not the ' character. Let’s construct a test case:

1
2
3
4
5
6
7
8
9
StringXmlWriter xmlWriter;
xmlWriter.WriteStartElement("root");
  xmlWriter.WriteStartElement("element");
    xmlWriter.WriteAttributeString("att", "\"&");
  xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();

std::string strXML = xmlWriter.GetXmlString();
// strXML should be <root><element att="&quot;&amp;"/></root>

Note that we are now required to perform escaping (albeit with different characters) in two separate functions: WriteString() and WriteAttributeString(). This is a prime candidate for refactoring—we can separate the escaping code into its own function, and we can make such large changes with confidence because we have a test suite to verify that changed code is correct. Here’s the new code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
typedef std::map<char, std::string> translations_t;

std::string TranslateString
    (
    const std::string& value,
    const translations_t& translations
    )
{
    std::string str;
    for (std::string::const_iterator stringIter = value.begin();
         stringIter != value.end();
         ++stringIter) {
        translations_t::const_iterator mapIter = translations.find(*stringIter);
        if (mapIter != translations.end()) {
            str += mapIter->second;
        } else {
            str += *stringIter;
        }
    }

    return str;
}

class StringXmlWriter
{
private:
    std::stack<std::string> m_openedElements;
    std::string m_xmlStr;
    bool m_unclosedStartElement;
    // Translations used in character data
    translations_t m_charDataTranslations;
    // Translations used in attribute values
    translations_t m_attributeTranslations;

public:
    StringXmlWriter() : m_unclosedStartElement(false)
    {
        m_charDataTranslations['&'] = "&amp;";
        m_charDataTranslations['<'] = "&lt;";
        m_charDataTranslations['>'] = "&gt;";
        m_attributeTranslations['&'] = "&amp;";
        m_attributeTranslations['"'] = "&quot;";
    }

    void WriteStartElement(const std::string& localName)
    {
        if (m_unclosedStartElement) {
            m_xmlStr += '>';
            m_unclosedStartElement = false;
        }

        m_openedElements.push(localName);
        m_xmlStr += '<';
        m_xmlStr += localName;
        m_unclosedStartElement = true;
    }

    void WriteEndElement()
    {
        if (m_unclosedStartElement) {
            m_xmlStr += "/>";
            m_unclosedStartElement = false;
        } else {
            std::string lastOpenedElement = m_openedElements.top();
            m_xmlStr += "</";
            m_xmlStr += lastOpenedElement;
            m_xmlStr += '>';
        }
        m_openedElements.pop();
    }

    void WriteString(const std::string& value)
    {
        if (m_unclosedStartElement) {
            m_xmlStr += '>';
            m_unclosedStartElement = false;
        }

        m_xmlStr += TranslateString(value, m_charDataTranslations);
    }

    void WriteElementString(const std::string& localName,
                            const std::string& value)
    {
        WriteStartElement(localName);
        WriteString(value);
        WriteEndElement();
    }

    void WriteAttributeString(const std::string& localName,
                              const std::string& value)
    {
        m_xmlStr += ' ';
        m_xmlStr += localName;
        m_xmlStr += "=\"";
        m_xmlStr += TranslateString(value, m_attributeTranslations);
        m_xmlStr += ""';
    }

    std::string GetXmlString() const
    {
        return m_xmlStr;
    }
};

Because we cannot insert a < character into an attribute value, escaped or otherwise, we should explicitly forbid this value in the function WriteAttributeString(). I will be sure to address this when I get to error handling in a future post. However, be sure to be aware of this constraint when you design your XML schemas!