Dummy Warhead: xml

Wednesday, 24 October 2012

Pubsubhubbub and the Big One

I am keeping an eye on the pubsubhubbub technology since its first day, it is a very simple, yet interesting technology to make RSS/Atom feeds really fast without polling. I wrote my own component called 'SubHub', it is running for a couple of weeks now, it is receiving tons of Atom and RSS messages from popular news-sources like nasa, cnn, bbc and the few thousand other, through pubsubhubbub. Thanks to this hard-to-spell technology, I do not poll at all :-)

The developers envisioned a distributed network of hubs and streams and sure it is distributed by design. What I wanted to know is how distributed is it, since my first guess was that Google will have a little dominance in it. For example blogger.com rss feeds (including Dummy Warhead's) are pointing to google's pubsubhubbub server. The WordPress guys were also very active and they implemented their WordPress plugin, which is turned on in all the wordpress.com blogs, but they only use it for their own content while Google's service is used by anyone... and almost everyone.

Enough talking. Let's see, who is doing most of the work in pubsubhubbub:

The graph was generated from the web server log using the usual linux commands.
So the two upcoming players are SuperFeedr and Pheedo, looks like they are special services for special customers. There might be some minor players in the technology, but I haven't found them yet.

Sunday, 6 November 2011

JSON vs XML

A few days ago I asked a few guys at work that if they were given a choice, would they choose JSON or XML as a format for a REST API. Everyone responded in favor of JSON. What is surprising in this is that they were all java developers, not a single javascript guy. Maybe it is too early to say this, but it seems that after the initial success of XML in the previous decade and stagnation in the last few years, the XML as format is slowly getting to decline, while more compact and simple, schemaless formats seem to rise: JSON and YAML.
JSON and YAML implementations are just as simple as their formats, you can choose from very many of them.

Question: How do JSON parsers compare to XML parsers from performance perspective?

I wrote sample inputs in XML and JSON. The same data, basically. For XML, I have made two input files. The first is the 'traditional' XML, no attributes at all. The second uses some attributes. The third is the JSON version. I used very few XML apis, only the ones packaged into JDK, this is a bit unfair because the alternative XML apis may be a little faster than the default. I have made a test a few years age and that showed them not so much different, so I was more interested in JSON. With JSON format, I used 5 different parsers: json.org, jackson, jsonlib, jsonj and gson.

Even the slowest JSON parser is faster than the XML parsers of JDK, also the format is more compact, looks like it is win. Jackson is much faster than the rest. It's website is right, it is very fast.

Test source code: here.