xpatch - a proposed XML diff format

UPDATE: Also see my follow-up post on an alternative method for replacing nodes.

Joe Gregorio asks about available XML diff formats, and based on some Googling, there really isn't anything robust and/or readable out there.

In particular, most formats seem to hard code differences by node number, which doesn't hold up at all well if you may want to merge diffs from multiple sources from a single code base a la traditional diff merging in code bases.

Given the enormous amounts of XML being processed, it seems incredible that nothing has been defined no standards have emerged so far. So here's a possible option. It's based heavily on the Mozilla XUL Overlays pattern, but generalised for any XML.

Here's our original XML:

<feed xmlns="http://www.w3.org/2005/Atom">
  <title>GuruJ Blog</title>
  <link href="http://guruj.net/"/>
  <updated>2008-01-07T06:25:00Z</updated>
  <author>
    <name>GuruJ</name>
    <email>theguru@guruj.net</email>
  </author>
  <entry>
    <title type="html">Patching XML</title>
    <category>shiny</category>
    <category>scalable</category>
    <link href="/node/138"/>
    <summary>Writing a new XML format</summary>
    <content type="html">Embed <b>arbitrary</b> HTML
 text <br> here</content>
  </entry>
</feed>

and this is the XML we want to end up with:

<feed xmlns="http://www.w3.org/2005/Atom">
  <title>GuruJ Blog</title>
  <link href="http://guruj.net/"/>
  <updated>2008-02-21T09:12:00Z</updated>
  <author>
    <name>GuruJ</name>
    <uri>http://guruj.net</uri>
  </author>
  <entry>
    <title type="html">Patching XML</title>
    <subtitle>now in XHTML!</subtitle>
    <category>shiny</category>
    <category>sparkly</category>
    <category>still scalable</category>
    <link href="/node/138"/>
    <summary>Writing a new XML format (updated)</summary>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        Converted <b>to XHTML</b> text
      </div>
    </content>
  </entry>
</feed>

Then, the xpatch code would look like this:

<xpatch:xpatch xmlns:xpatch="http://guruj.net/2008/02/21/xpatch#" xmlns="http://www.w3.org/2005/Atom">
  <feed>
    <updated xpatch:replace="true">2008-02-21T09:12:00Z</updated>
    <author>
      <email xpatch:remove="true"/>
      <uri xpatch:add="true">http://guruj.net</uri>
    </author>
    <entry>
      <subtitle xpatch:insertAfter="true" xpatch:node="title">now in XHTML!</subtitle>
      <category xpatch:insertAfter="true" xpatch:node="category[1]">sparkly</category>
      <category xpatch:replace="true" xpatch:node="category[2]">still scalable</category>
      <summary xpatch:replace="true">Writing a new XML format (updated)</summary>
      <content type="xhtml" xpatch:replace="true">
        <div xmlns="http://www.w3.org/1999/xhtml">
          Converted <b>to XHTML</b> text
        </div>
      </content>
    </entry>
  </feed>
</xpatch:xpatch>

Note that xpatch:node refers to a sibling node from the current point in the tree. Unresolved issues:

  1. Are XSLT node references dynamic, or are node positions "set" throughout the transform process?
    i.e. should category[2] actually refer to category[3] in the above patch given its preceding insert statement.
  2. Is this efficient for deeply nested XML structures? One way around this is to add an <xpatch:base> element, which sets the root node for any future xpatch operations.

Anyway, any thoughts are welcome.