This is part 6/14 of my Implementing IXmlWriter post series.
Last time’s IXmlWriter has a serious bug: it doesn’t properly handle attribute value escaping and can lead to malformed XML.
Consider the following test case:
StringXmlWriter xmlWriter;
xmlWriter.WriteStartElement("root");
xmlWriter.WriteStartElement("element");
xmlWriter.WriteAttributeString("att", "\"");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();
std::string strXML = xmlWriter.GetXmlString();
The previous version of IXmlWriter will generate the XML string <root><element att="""/></root>, which is invalid and will be rejected by a XML parser. The rules for XML attribute escaping are given by Section 2.3 of the XML 1.0 spec—specifically, the AttValue literal:
AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"
This Backus-Naur form-like construct says that attribute values can be enclosed in either single or double quotes, and that the characters <, &, and the respective quotation character cannot appear between these quotes. However, with the exception of < (see Well-formedness constraint: No < in Attribute Values—thanks dbt), we can insert escaped versions of these characters. As we always encase attribute values in double quotes, we only need to worry about escaping the " character and not the ' character. Let’s construct a test case:
StringXmlWriter xmlWriter;
xmlWriter.WriteStartElement("root");
xmlWriter.WriteStartElement("element");
xmlWriter.WriteAttributeString("att", "\"&");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndElement();
std::string strXML = xmlWriter.GetXmlString();
// strXML should be <root><element att=""&"/></root>
Note that we are now required to perform escaping (albeit with different characters) in two separate functions: WriteString() and WriteAttributeString(). This is a prime candidate for refactoring—we can separate the escaping code into its own function, and we can make such large changes with confidence because we have a test suite to verify that changed code is correct. Here’s the new code:
typedef std::map<char, std::string> translations_t;
std::string TranslateString
(
const std::string& value,
const translations_t& translations
)
{
std::string str;
for (std::string::const_iterator stringIter = value.begin();
stringIter != value.end();
++stringIter) {
translations_t::const_iterator mapIter = translations.find(*stringIter);
if (mapIter != translations.end()) {
str += mapIter->second;
} else {
str += *stringIter;
}
}
return str;
}
class StringXmlWriter
{
private:
std::stack<std::string> m_openedElements;
std::string m_xmlStr;
bool m_unclosedStartElement;
// Translations used in character data
translations_t m_charDataTranslations;
// Translations used in attribute values
translations_t m_attributeTranslations;
public:
StringXmlWriter() : m_unclosedStartElement(false)
{
m_charDataTranslations['&'] = "&";
m_charDataTranslations['<'] = "<";
m_charDataTranslations['>'] = ">";
m_attributeTranslations['&'] = "&";
m_attributeTranslations['"'] = """;
}
void WriteStartElement(const std::string& localName)
{
if (m_unclosedStartElement) {
m_xmlStr += '>';
m_unclosedStartElement = false;
}
m_openedElements.push(localName);
m_xmlStr += '<';
m_xmlStr += localName;
m_unclosedStartElement = true;
}
void WriteEndElement()
{
if (m_unclosedStartElement) {
m_xmlStr += "/>";
m_unclosedStartElement = false;
} else {
std::string lastOpenedElement = m_openedElements.top();
m_xmlStr += "</";
m_xmlStr += lastOpenedElement;
m_xmlStr += '>';
}
m_openedElements.pop();
}
void WriteString(const std::string& value)
{
if (m_unclosedStartElement) {
m_xmlStr += '>';
m_unclosedStartElement = false;
}
m_xmlStr += TranslateString(value, m_charDataTranslations);
}
void WriteElementString(const std::string& localName,
const std::string& value)
{
WriteStartElement(localName);
WriteString(value);
WriteEndElement();
}
void WriteAttributeString(const std::string& localName,
const std::string& value)
{
m_xmlStr += ' ';
m_xmlStr += localName;
m_xmlStr += "=\"";
m_xmlStr += TranslateString(value, m_attributeTranslations);
m_xmlStr += ""';
}
std::string GetXmlString() const
{
return m_xmlStr;
}
};
Because we cannot insert a < character into an attribute value, escaped or otherwise, we should explicitly forbid this value in the function WriteAttributeString(). I will be sure to address this when I get to error handling in a future post. However, be sure to be aware of this constraint when you design your XML schemas!