Quick notes on how to use RapidXML

—richardwb on Friday, February 27, 2009 @ 21:11

There’s a C++ XML library called RapidXML which is perfect for most non-enterprise uses of XML. I wouldn’t call this a tutorial, but I hope this ends up helping someone. The documentation isn’t very explicit on how to output an XML declaration, for example.

How to create your XML from scratch and then output this XML into a string, with an XML declaration:

<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
  <childnode/>
</rootnode>
using namespace rapidxml;

xml_document<> doc;

// xml declaration
xml_node<>* decl = doc.allocate_node(node_declaration);
decl->append_attribute(doc.allocate_attribute("version", "1.0"));
decl->append_attribute(doc.allocate_attribute("encoding", "utf-8"));
doc.append_node(decl);

// root node
xml_node<>* root = doc.allocate_node(node_element, "rootnode");
root->append_attribute(doc.allocate_attribute("version", "1.0"));
root->append_attribute(doc.allocate_attribute("type", "example"));
doc.append_node(root);

// child node
xml_node<>* child = doc.allocate_node(node_element, "childnode");
root->append_node(child);

std::string xml_as_string;
// watch for name collisions here, print() is a very common function name!
print(std::back_inserter(xml_as_string), doc);
// xml_as_string now contains the XML in string form, indented
// (in all its angle bracket glory)

std::string xml_no_indent;
// print_no_indenting is the only flag that print() knows about
print(std::back_inserter(xml_as_string), doc, print_no_indenting);
// xml_no_indent now contains non-indented XML


Parsing and traversing an XML document like this one:

<?xml version="1.0" encoding="utf-8"?>
<rootnode version="1.0" type="example">
  <childnode entry="1">
    <evendeepernode attr1="cat" attr2="dog"/>
    <evendeepernode attr1="lion" attr2="wolf"/>
  </childnode>
  <childnode entry="2">
  </childnode>
</rootnode>
void traverse_xml(std::string input_xml)
{
    // (input_xml contains the above XML)

    // make a safe-to-modify copy of input_xml
    // (you should never modify the contents of an std::string directly)
    vector<char> xml_copy(input_xml.begin(), input_xml.end());
    xml_copy.push_back('\0');

    // only use xml_copy from here on!
    xml_document<> doc;
    // we are choosing to parse the XML declaration
    // parse_no_data_nodes prevents RapidXML from using the somewhat surprising
    // behavior of having both values and data nodes, and having data nodes take
    // precedence over values when printing
    // >>> note that this will skip parsing of CDATA nodes <<<
    doc.parse<parse_declaration_node | parse_no_data_nodes>(&xml_copy[0]);

    // alternatively, use one of the two commented lines below to parse CDATA nodes, 
    // but please note the above caveat about surprising interactions between 
    // values and data nodes (also read http://www.ffuts.org/blog/a-rapidxml-gotcha/)
    // if you use one of these two declarations try to use data nodes exclusively and
    // avoid using value()
    //doc.parse<parse_declaration_node>(&xml_copy[0]); // just get the XML declaration
    //doc.parse<parse_full>(&xml_copy[0]); // parses everything (slowest)

    // since we have parsed the XML declaration, it is the first node
    // (otherwise the first node would be our root node)
    string encoding = doc.first_node()->first_attribute("encoding")->value();
    // encoding == "utf-8"

    // we didn't keep track of our previous traversal, so let's start again
    // we can match nodes by name, skipping the xml declaration entirely
    xml_node<>* cur_node = doc.first_node("rootnode");
    string rootnode_type = cur_node->first_attribute("type")->value();
    // rootnode_type == "example"

    // go straight to the first evendeepernode
    cur_node = cur_node->first_node("childnode")->first_node("evendeepernode");
    string attr2 = cur_node->first_attribute("attr2")->value();
    // attr2 == "dog"

    // and then to the second evendeepernode
    cur_node = cur_node->next_sibling("evendeepernode");
    attr2 = cur_node->first_attribute("attr2")->value();
    // now attr2 == "wolf"
}

[Edit: Thanks to Michele Tavella for catching a silly bug on my part!]
[Edit: Thanks to Wei for noting that parse_no_data_nodes will skip over CDATA nodes]

Comments

  1. anon_anon
    Sunday, August 30, 2009 @ 19:49

    You may also want to look at vtd-xml, which is more flexible and powerful than rapidXML

    http://vtd-xml.sf.net

  2. Michele Tavella
    Friday, January 8, 2010 @ 14:13

    In the first example:

    doc.append_node(child);

    should be:

    root->append_node(child);
  3. richardwb
    Wednesday, January 13, 2010 @ 02:00

    Ugh. Yes, of course it should. Thanks for catching that!

  4. tArKi
    Thursday, February 11, 2010 @ 07:31

    Very useful for newbies! Thanks!

  5. Vibhu
    Monday, March 8, 2010 @ 14:25

    Thanks for the post. Without this, I would've wasted few hours to understand the usage.

  6. Milan
    Sunday, April 4, 2010 @ 11:13

    Hi,

    I am novice to rapidXML but first impresion was not positive, I made simple Visual Studio 6 C++ Hello World Application and added RapidXML hpp files to project and in main.cpp I put:

    #include "stdafx.h"
    
    #include <iostream>
    #include <string>
    #include "rapidxml.hpp"
    
    using namespace std;
    using namespace rapidxml;
    
    int main ( )
    {
        char x[] = "<Something>Text</Something>\0" ; //<<<< funktioniert, aber mit '*' nicht
        xml_document<> doc ;
        doc.parse<0>(x) ;
        cout << "Name of my first node is: " << doc.first_node()->name() << endl ;
        xml_node<>* node = doc.first_node("Something") ;
        cout << "Node 'Something' has value: " << node->value() << endl ;
    } 

    And it does not compile, any help ? Is RapidXML possible to run with Visual Studio 6 ?

    BR and Thanks, Milan

  7. richardwb
    Monday, April 5, 2010 @ 12:53

    I don't believe RapidXML has any support for Visual C++ 6 which is over a decade old by this point. I'm sure you've heard this before but you should really consider moving to a modern C++ compiler. Microsoft does provide a free copy of Visual Studio 2008 Express which is quite capable.

  8. Wei
    Wednesday, April 14, 2010 @ 14:26

    Hi

    For the parsing part, I followed your example and lost all the data in CDATA nodes.

    doc.parse<parse_declaration_node | parse_no_data_nodes>(&xml_copy[0]);

    I think the best is to replace parse_no_data_nodes with parse_full (parse_no_data_nodes is contained in parse_full) just so all data will be reserved.

    And this is really helpful. Thanks a lot.

  9. richardwb
    Saturday, April 17, 2010 @ 15:50

    Thanks for the suggestion, I've clarified the effects of parse_no_data_nodes and noted some alternatives.

  10. Rajakumar
    Wednesday, June 2, 2010 @ 04:10

    i trying with rapidxml ,i cannot include a existing xml file ,to do the rapidxml operations,pls help me ,how can write and read an xml in rapidxml....thanks

  11. esdee
    Wednesday, June 2, 2010 @ 06:25

    clean, simple and very useful! thanks!

  12. Magallo
    Thursday, July 29, 2010 @ 02:45

    Thanks a lot! Very well written and clear.

  13. martin
    Saturday, August 7, 2010 @ 12:33

    Many thanks ! good simple exmplanation ! would have taken me some hours to figure out how to use this library without your explanation. helped a lot !

Add a comment
(?)

BBCode