Wednesday 29 August 2007

RELAX NG good, XML Schema bad?

Here's a nice post on why RELAX NG is better than XML Schema. Since data modelling and language expressiveness are a major interest, I'm finding it fascinating to watch the evolutionary battles in the world of "meta-XML".

One key point made is that RNG is much easier to grasp than XML Schema. Witness this short tutorial which tells you most of what you need to know.

Something I haven't fully understood yet is why RNG fans consider PSVI to be a bad idea. As far as I see, any tool which is a general purpose parser for typed XML is going to have to define a metamodel describing the allowable range of documents. Doesn't it make sense to have a standard model for this?

Another thing I don't see is whether and how RNG supports the constrained extensibility of schemas (i.e. subclassing). This is responsible for a fair bit of the complexity in XML Schema, but is essential in some uses (such as GML, which is really a schema framework for defining other concrete schemas).

Probably the most appealing aspect of RNG is its compact syntax. It's much more readable than XML. (XML is a terrible syntax for expressing a hierarchical language!). Of course, this means the end of the "one parser to scan them all" dream - but perhaps that's a good thing. Come back, YACC, we really didn't mean it and we still love you!

This post gives a rebuttal to the RNG-over-XSD argument. He seems to think that RNG lacks subclassing as well, so maybe that answers one of my questions above.

2 comments:

sgillies said...

This quote from Don Box is a lot of fun: http://simonwillison.net/2007/Mar/31/xsd/. I wonder if we won't be looking back on GML's schema framework someday with a similar perspective.

James Clark had a blog post in April that is also relevant to your post. His opinion is that XSD's OO abstraction is "misguided".

John Cowan said...

Something I haven't fully understood yet is why RNG fans consider PSVI to be a bad idea.

Separation of concerns, basically. The job of a schema is to describe a class of documents; a validator says whether your document belongs to the class or not. Transforming the document (or its underlying infoset) into some other infoset is another job altogether. RELAX NG allows you to seamlessly add extensions to your schema that can guide transformation tools of this sort, but doesn't say anything about how they would work.

Another thing I don't see is whether and how RNG supports the constrained extensibility of schemas (i.e. subclassing).

It's possible to create a pattern that extends another pattern, but there is nothing like the schema-schema constraints of XML Schema (where, for example, it checks that a complex type that is declared to restrict another complex type actually does so). RNG also does not have instance-instance constraints like keys.