Quick notes on how to use RapidXML

— richardwb on Friday, February 27, 2009 @ 21:11

There's a C++ XML library called RapidXML which is perfect for most non-enterprise uses of XML. I wouldn't call this a tutorial, but I hope this ends up helping someone. The documentation isn't very explicit on how to output an XML declaration, for example.

How to create your XML from scratch and then output this XML into a string, with an XML declaration:

<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
  <childnode/>
</rootnode>
using namespace rapidxml;

xml_document<> doc;

// xml declaration
xml_node<>* decl = doc.allocate_node(node_declaration);
decl->append_attribute(doc.allocate_attribute("version", "1.0"));
decl->append_attribute(doc.allocate_attribute("encoding", "utf-8"));
doc.append_node(decl);

// root node
xml_node<>* root = doc.allocate_node(node_element, "rootnode");
root->append_attribute(doc.allocate_attribute("version", "1.0"));
root->append_attribute(doc.allocate_attribute("type", "example"));
doc.append_node(root);

// child node
xml_node<>* child = doc.allocate_node(node_element, "childnode");
root->append_node(child);

std::string xml_as_string;
// watch for name collisions here, print() is a very common function name!
print(std::back_inserter(xml_as_string), doc);
// xml_as_string now contains the XML in string form, indented
// (in all its angle bracket glory)

std::string xml_no_indent;
// print_no_indenting is the only flag that print() knows about
print(std::back_inserter(xml_no_indent), doc, print_no_indenting);
// xml_no_indent now contains non-indented XML



Parsing and traversing an XML document like this one:

<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
  <childnode entry="1">
    <evendeepernode attr1="cat" attr2="dog"/>
    <evendeepernode attr1="lion" attr2="wolf"/>
  </childnode>
  <childnode entry="2">
  </childnode>
</rootnode>
void traverse_xml(const std::string& input_xml)
{
    // (input_xml contains the above XML)

    // make a safe-to-modify copy of input_xml
    // (you should never modify the contents of an std::string directly)
    vector<char> xml_copy(input_xml.begin(), input_xml.end());
    xml_copy.push_back('\0');

    // only use xml_copy from here on!
    xml_document<> doc;
    // we are choosing to parse the XML declaration
    // parse_no_data_nodes prevents RapidXML from using the somewhat surprising
    // behavior of having both values and data nodes, and having data nodes take
    // precedence over values when printing
    // >>> note that this will skip parsing of CDATA nodes <<<
    doc.parse<parse_declaration_node | parse_no_data_nodes>(&xml_copy[0]);

    // alternatively, use one of the two commented lines below to parse CDATA nodes, 
    // but please note the above caveat about surprising interactions between 
    // values and data nodes (also read http://www.setnode.com/blog/a-rapidxml-gotcha/)
    // if you use one of these two declarations try to use data nodes exclusively and
    // avoid using value()
    //doc.parse<parse_declaration_node>(&xml_copy[0]); // just get the XML declaration
    //doc.parse<parse_full>(&xml_copy[0]); // parses everything (slowest)

    // since we have parsed the XML declaration, it is the first node
    // (otherwise the first node would be our root node)
    string encoding = doc.first_node()->first_attribute("encoding")->value();
    // encoding == "utf-8"

    // we didn't keep track of our previous traversal, so let's start again
    // we can match nodes by name, skipping the xml declaration entirely
    xml_node<>* cur_node = doc.first_node("rootnode");
    string rootnode_type = cur_node->first_attribute("type")->value();
    // rootnode_type == "example"

    // go straight to the first evendeepernode
    cur_node = cur_node->first_node("childnode")->first_node("evendeepernode");
    string attr2 = cur_node->first_attribute("attr2")->value();
    // attr2 == "dog"

    // and then to the second evendeepernode
    cur_node = cur_node->next_sibling("evendeepernode");
    attr2 = cur_node->first_attribute("attr2")->value();
    // now attr2 == "wolf"
}

[Edit: Thanks to Michele Tavella for catching a silly bug on my part!]
[Edit: Thanks to Wei for noting that parse_no_data_nodes will skip over CDATA nodes]
[Edit: Thanks to remo for catching a typo]

Thanks for the comments everyone! I'm glad to help out where I can, but please do note that I check this blog rather infrequently at this point in time and will often not respond for days, if not weeks. If you have a question to which you need an answer within a reasonable time frame, I'd recommend using Stack Overflow. On the other hand, if you have corrections and/or suggestions, please continue to leave them here and I'll address them as they come up. Thanks again!

comments powered by Disqus