sabre/xml

An awesome discovery of today was made in an article titled An XML library for PHP you may not hate. As an unexpected twist I really didn't hate it, in fact it helped me to solve a problem that I had. It is called sabre/xml and it's a part of the sabre/dav project.

XML Namespaces and libxml

XML namespaces look like a really simple concept:

<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns="http://example.com" xmlns:ns1="http://example.com/ns1" xmlns:ns2="http://example.com/ns2">
    <ns1:elem>Value 1</ns1:elem>
    <ns2:elem>Value 2</ns2:elem>
</doc>

Here http://example.com, http://example.com/ns1, http://example.com/ns2 are full xml namespace names a.k.a. namespace uris. They should be universally unique among XML documents. On the contrary ns1 and ns2 and also the empty prefix for http://example.com have any meaning only in the context of this particular document. This concept however is a source of much confusion in the XML world because people tend to treat these prefixes as universal.

I never worked with libxml directly but both PHP wrappers have the same problem. Let's see it at the SimpleXML example:

<?php

// open the file
$xml = new SimpleXMLElement(file_get_contents('example.xml'));

// xpath doesn't work with empty namespace prefixes in a namespaced document
$xml->registerXPathNamespace('r', 'http://example.com');
// register prefix that is assigned to another namespace in the document
$xml->registerXPathNamespace('ns1', 'http://example.com/ns2');

echo strval($xml->xpath('/r:doc/ns1:elem')[0]); // Value 2? nope, it's Value 1

Some libraries may assign random prefixes so the conflict may be not that obvious. Of course you may check for all prefixes with $xml->getDocNamespaces() but what to do if a conflict is detected? Throw an error? But it's a perfectly valid situation. Assign random prefixes? But it's not convenient.

SimpleXML has a solution for this with explicit namespace calls. Of course we have to drop the convenience of XPath for this:

<?php

strval($xml->children('http://example.com/ns2')->elem)

but it has a bug in another scenario. If you save a document subtree, all namespace declarations are lost:

<?php

echo $xml->children('http://example.com/ns2')->elem->asXML();
// <ns2:elem>Value 2</ns2:elem>
// not
// <ns2:elem xmlns:ns2="http://example.com/ns2">Value 2</ns2:elem>

Clark Notation and sabre/xml

So here comes our new hero. sabre/xml drops prefixes entirely and uses so called Clark Notation.

<?php

$reader = new \Sabre\Xml\Reader();
$reader->XML(file_get_contents('example.xml'));
$data = $reader->parse();

$data in json:

{
    "name": "{http://example.com}doc",
    "value": [
        {
            "name": "{http://example.com/ns1}elem",
            "value": "Value 1",
            "attributes": []
        },
        {
            "name": "{http://example.com/ns2}elem",
            "value": "Value 2",
            "attributes": []
        }
    ],
    "attributes": []
}

As you see the element names contain full namespace uris. Saving subtrees should leave out no data:

<?php

$writer = new \Sabre\Xml\Writer();
// you can set default namespace prefixes or the library will generate random ones
$writer->namespaceMap = [
    'http://example.com/ns2' => 'ns2',
];

// PHP base XMLReader's boilerplate code
// it's conveniently wrapped in \Sabre\Xml\Service
// but I need direct access to the Writer for more control here
$writer->openMemory();
$writer->startDocument();

// that's not how you retrieve subtrees in an actual code :D
$writer->write($data['value'][1]);
echo $writer->outputMemory();

// <?xml version="1.0"?>
// <ns2:elem xmlns:ns2="http://example.com/ns2">Value 2</ns2:elem>
// Finally!

Of course I left many more nice features of sabre/xml like object mapping, XmlSerializable & XmlDeserializable interfaces, convenience helpers for key-value and collection like data structures and so on. My goal was to show how it helps me to work with xml namespaces in a strict way.

Comments

Comments powered by Disqus