Monday, 3 December 2012

compression - size vs speed

This is a small comparison of compression algorithms for the JVM:
  • gzip of course, I tested both java's implementation and the one included in Linux (through System.exec)
  • bzip2, the more heavy-weight player
  • deflate, the algoithm used by zip
  • pk200, a seemingly java-specific algorithm
  • xz, the algorithm behind 7zip

Random input

Text input

Friday, 30 November 2012

Kotlin arrays

Let's kick the CPU today with Kotlin. If you haven't heard about kotlin, it is a relatively young JVM language developed by Jetbrains, the guys behind the popular Idea IDE, Teamcity (still my favourite continuous integration tool) and some other tools.

The obvious advantage of kotlin over other alternate JVM languages is that it does have a good IDE integration. I never really liked the groovy IDE plugins (it must be harder to write good IDE for a dynamic language) and never liked scala's eclipse plugin at all :-( It is always outdated, but never really stable. So kotlin, in it early days of creation already has some advantage over the competition.

I think kotlin language itself borrowed a lot from scala, for example everything is an object (long time debate in the java language), it supports closures and there are even more immutable objects.

One potential problem I have run into is the use of Arrays. In kotlin, you either have to pass an initializer closure to the array constructor (e.g. Array<Int>(1024, {i -> i+1})) or you can choose from the java-interoperability arrays: IntArray, FloatArray, DoubleArray, ByteArray, etc. While the java-compatible arrays are just as quick as in java (since they are compiled into java), the Array objects are rather slow and heavy to create. When I ran into this problem, I wanted to know how much slower are they.

So what you see here is that an IntArray is a lot faster than an Array<Int>, however Array<String> is not much slower than Array<Int> (probably since String does not have to be boxed/unboxed). Beware of autoboxing :-)

One day, when we will have something like findbugs for alternative languages like kotlin, probably Array<Int> will have a serious warning in the performance section.

Wednesday, 24 October 2012

Pubsubhubbub and the Big One

I am keeping an eye on the pubsubhubbub technology since its first day, it is a very simple, yet interesting technology to make RSS/Atom feeds really fast without polling. I wrote my own component called 'SubHub', it is running for a couple of weeks now, it is receiving tons of Atom and RSS messages from popular news-sources like nasa, cnn, bbc and the few thousand other, through pubsubhubbub. Thanks to this hard-to-spell technology, I do not poll at all :-)

The developers envisioned a distributed network of hubs and streams and sure it is distributed by design. What I wanted to know is how distributed is it, since my first guess was that Google will have a little dominance in it. For example rss feeds (including Dummy Warhead's) are pointing to google's pubsubhubbub server. The WordPress guys were also very active and they implemented their WordPress plugin, which is turned on in all the blogs, but they only use it for their own content while Google's service is used by anyone... and almost everyone.

Enough talking. Let's see, who is doing most of the work in pubsubhubbub:

The graph was generated from the web server log using the usual linux commands.
So the two upcoming players are SuperFeedr and Pheedo, looks like they are special services for special customers. There might be some minor players in the technology, but I haven't found them yet.