(Back to index)

# XML

XML has been the great hype of the 2000's. Everyone talks about XML, every program out there supports XML (at least when they prominently remember to hype the program because of its XML support), everything talks in XML... or almost everything at least. You always hear XML this, XML that...

Some people are so hyped about XML that they will try to squeeze it everywhere. For example, this has been seen several times in the POV-Ray news forum: Some people suggest that the input language of POV-Ray could be changed to be XML-conforming. (Not surprisingly they always fail to give any good reason why that would be a good idea.)

Now, what are all those advantages of XML which cause the world to hype it so much? Perhaps a bit surprisingly, it's quite difficult to find any concrete answers to this question. You will find all kinds of "it makes data transfers and parsing easier" arguments, but they almost always are very vague and they seldom give concrete tangible arguments and comparisons between different options and why precisely XML is the best way to go.

Now, personally I can think of one great advantage in XML (although this advantage is not due to the XML format itself because any standardized format would do): If a program saves its data in XML format (which is reasonable only if the amount of data is not excessively large), it's easier to read that same data with third-party programs. For example, if Microsoft Word saved its documents in XML (which newest versions actually do), it's easier for third-party programs to read these documents without the need to reverse-engineer the .doc format.

However, that's about the only advantage I can think of, and in fact I think XML is actually not the best possible format for that (because XML is way too verbose). But from the point of view of the vast popularization of XML as a standard format, this is one of its advantages.

On the other hand, the disadvantages of XML are plenty.

For one, XML is boasted as a great format for data transfer. This is complete BS. When you want to transfer data, unless you are transferring just some kilobytes of it, you want to transfer it as efficiently as possible. If you want to transfer, for example, 1 gigabyte of database data, 10 gigabytes of video data or whatever very voluminous, XML would be one of the worst possible choices for this.

In fact, even if you were transferring smaller amounts of data, like just a few megabytes, XML would be bad for this if the data needs to be transferred numerous times (eg. some file for people to download in the net).

The reason for this is simple: XML is hyperverbose, and in data transfers it does not offer any advantages. If you properly format some data in XML form, the size of your data will usually at least double or triple. Why in the world would it be advantageous to double or triple the size of the data in order to transfer it over a network (or any other way)?

One argument pro XML is that it's easier for programs to parse it. Basically when data has been written in XML format, it has been split into semantical tokens, and thus the program reading the data doesn't have to do it itself. This argument, however, has two flaws:

The first flaw is related to the data transfer mentioned above: If XML is used (for whatever reason I cannot fathom) for transferring data over a network (or other media), what happens is that the sender tokenizes the input, then sends this marked up version over the network, and then the recipient program reads the tokenized data. The question that raises is why it has to be done like this? Why is it the sender that has to tokenize the data and not the recipient? The disadvantage of this is, as already mentioned, that the amount of data to be transferred grows significantly, and thus more data needs to be transferred.

The second flaw is the inherent idea in this argument that it's somehow good that data is stored already tokenized. The duty of tokenizing the data is transferred to the creator of the data instead of the reader. While this makes writing programs which read the data easier, it just transfers the job of tokenizing the data to the creator program. This is a kind of weird role-reversal: Instead of the programs making the lives of the users easier, it's the other way around now! And as a side-effect the space needed to store the data, as already said twice, grows significantly.

The greatest problem I see with XML is that it's made "human-readable" for no good reason. And what is rather ironic, most XML files, especially ones containing huge amounts of different tokens in a small space, are very human-unreadable.

For instance, consider this simple MathML example (which is XML-compliant):

<mrow>
<mrow>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
<mo>+</mo>
<mrow>
<mn>4</mn>
<mo>&InvisibleTimes;</mo>
<mi>x</mi>
</mrow>
<mo>+</mo>
<mn>4</mn>
</mrow>
<mo>=</mo>
<mn>0</mn>
</mrow>


What this produces, when viewed with a program which supports MathML, is the following:

The ridiculous hyperverbosity of this MathML example is more clear when we compare it to how the same thing is written in LaTeX:

$$x^2+4x+4=0$$


Since XML is "human-readable", one would think that it has been designed to be written by humans. Well, personally I somehow prefer the approach of LaTeX with regard to mathematical formulas than the approach of MathML.

There are many problems with MathML (and XML in general) when trying to write it by hand: It's hard to write, hard to read, and overly complicated. The duty of parsing and tokenizing the input has been transferred to the user even though it's a task clearly for the program. LaTeX does not transfer the interpretation of the mathematical formula to the user, but instead offers the user an easy-to-read syntax and then goes to great lengths to parse it (and it parses it correctly), making it very user-friendly. MathML, on the other hand, is very user-unfriendly and clearly not intended to be written by hand.

So the question raises: If it's not intended to be written by hand, then why does it have to be in a verbose ascii "human-readable" format?

I have one question about MathML: Why it couldn't have been implemented in an easy-to-use way as in LaTeX? What is the reason? "It makes it easier for programs to read and interpret mathematical formulas" is just a lousy excuse. What it causes is to simply transfer the duty of tokenizing the mathematical expression to either the user or all the programs which create MathML in the first place. The user has to be able to create mathematical expressions one way or another. Instead of giving users an easy way of doing it, like LaTeX, MathML forces users to write overly complicated files by hand, or either developers to create overly complicated software to help the user to create the math expressions. Why should there be a need to develop such software when everything would be much easier if MathML supported expressions like LaTeX. What MathML does is to help lazy programmers at the cost of burdening the users.

As a final though, I can describe XML with one single word which says everything:

Bloat.