RoBlog – Refactoring Reality

Discrete thought packets on .Net, software development and the universe; transmitted by Rob Levine

Making the .Net XmlTextReader accept colons in element names.

by Rob Levine on 6-Mar-2008

This started as an addendum, to What is a valid XML element name?, but then I discovered something that made it worth breaking out into a separate post!

Ayende added a comment to his blog (under my comment) to say that he tried the ‘bad’ xml in question on three parsers and none of them could handle it. Naturally I thought I’d have a quick go too.

First I tried with the .Net System.Xml.XmlDocument class and the System.Xml.XmlTextReader class and neither of these would handle the “double-colon” element names. Next I tried two commercial XML editors, XmlSpy and StylusStudio, both of which were happy to let it pass their well-formed check without complaint (and they do both start complaining if you add other non-allowed characters). I don’t know what parsers either of these products are built on, but on the surface they seemed to be more compliant than .Net

Or so it appeared. One thing I noticed was that both System.Xml.XmlDocument class and System.Xml.XmlTextReader classes barf with the same exception, being raised from within System.Xml.XmlTextReaderImpl.ParseElement().

A quick look at this method using Lutz Roeder’s excellent Reflector revealed something new and interesting. This class (System.Xml.XmlTextReaderImpl) has an internal boolean property, Namespaces, which changes the behaviour of this element parsing method to allow or disallow multiple colons. This makes sense when you think about it; if you don’t support namespaces then there is no issue with multiple colons. If you do support namespaces then the colon is reserved to separate the namespace prefix from the element’s local name. It is this very point that the XML RFC refers to regarding colons, and which I quoted in the previous article.

A closer look still revealed that a Namespaces property is exposed on the System.Xml.XmlTextReader class. And guess what? Setting this property to false allows the reader to start accepting “multi-colon” element names! Well – that is certainly a new one to me.

However, I couldn’t find an equivalent way of changing the System.Xml.XmlDocument‘s behaviour to accept this type of xml. To be honest, I’m not too bothered, because I can’t imagine using this particular style of xml any time soon!

Comments are closed.