An awesome discovery of today was made in an article titled An XML library for PHP you may not hate. As an unexpected twist I really didn't hate it, in fact it helped me to solve a problem that I had. It is called sabre/xml and it's a part of the sabre/dav project.

XML Namespaces and libxml

XML namespaces look like a really simple concept:

<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns="" xmlns:ns1="" xmlns:ns2="">
    <ns1:elem>Value 1</ns1:elem>
    <ns2:elem>Value 2</ns2:elem>

Here,, are full xml namespace names a.k.a. namespace uris. They should be universally unique among XML documents. On the contrary ns1 and ns2 and also the empty prefix for have any meaning only in the context of this particular document. This concept however is a source of much confusion in the XML world because people tend to treat these prefixes as universal.

I never worked with libxml directly but both PHP wrappers have the same problem. Let's see it at the SimpleXML example:


// open the file
$xml = new SimpleXMLElement(file_get_contents('example.xml'));

// xpath doesn't work with empty namespace prefixes in a namespaced document
$xml->registerXPathNamespace('r', '');
// register prefix that is assigned to another namespace in the document
$xml->registerXPathNamespace('ns1', '');

echo strval($xml->xpath('/r:doc/ns1:elem')[0]); // Value 2? nope, it's Value 1

Some libraries may assign random prefixes so the conflict may be not that obvious. Of course you may check for all prefixes with $xml->getDocNamespaces() but what to do if a conflict is detected? Throw an error? But it's a perfectly valid situation. Assign random prefixes? But it's not convenient.

SimpleXML has a solution for this with explicit namespace calls. Of course we have to drop the convenience of XPath for this:



but it has a bug in another scenario. If you save a document subtree, all namespace declarations are lost:


echo $xml->children('')->elem->asXML();
// <ns2:elem>Value 2</ns2:elem>
// not
// <ns2:elem xmlns:ns2="">Value 2</ns2:elem>

Clark Notation and sabre/xml

So here comes our new hero. sabre/xml drops prefixes entirely and uses so called Clark Notation.


$reader = new \Sabre\Xml\Reader();
$data = $reader->parse();

$data in json:

    "name": "{}doc",
    "value": [
            "name": "{}elem",
            "value": "Value 1",
            "attributes": []
            "name": "{}elem",
            "value": "Value 2",
            "attributes": []
    "attributes": []

As you see the element names contain full namespace uris. Saving subtrees should leave out no data:


$writer = new \Sabre\Xml\Writer();
// you can set default namespace prefixes or the library will generate random ones
$writer->namespaceMap = [
    '' => 'ns2',

// PHP base XMLReader's boilerplate code
// it's conveniently wrapped in \Sabre\Xml\Service
// but I need direct access to the Writer for more control here

// that's not how you retrieve subtrees in an actual code :D
echo $writer->outputMemory();

// <?xml version="1.0"?>
// <ns2:elem xmlns:ns2="">Value 2</ns2:elem>
// Finally!

Of course I left many more nice features of sabre/xml like object mapping, XmlSerializable & XmlDeserializable interfaces, convenience helpers for key-value and collection like data structures and so on. My goal was to show how it helps me to work with xml namespaces in a strict way.


Comments powered by Disqus