Sunday 31 March 2013

compression again - system vs java implementation

Last time I mentioned that using the operating system's compression utility (gzip on linux) performs better than the java counterpart even if you use it from java (the good old Process.exec()). This is not quite that simple of course :-) So in this test I compare the time needed to compress a random input both by system and java implementations. The size of the input is growing over the test, so does the time needed to compress, but there is something interesting.

So as you see the system gzip is faster, but it has a communication and process creation overhead. The java implementation is running without this overhead is therefore performing better with small inputs. The lines meet at about 512 KB. If the input is bigger, piping through a system command performs better.

This test was performed on Fedora 18 (the disasterous OS) x64 architecture, other architectures and operating systems may have different result.

4 comments:

  1. Any sources for the benchmark? Are you reusing the Deflater objects in java?

    ReplyDelete
  2. Deflate: no, this test is only for gzip, however it would be nice to test if it is similar with deflate and bzip2

    source: sure, I will share the source code of the test and send an update

    ReplyDelete
  3. There's a Deflater object underlying a GZIPOutputStream as well. Unfortunately this gets recreated for every new stream - for compressing small blocks, this overhead is noticeable. See my patch for Jetty here, where Deflaters are reused for gzip: https://bugs.eclipse.org/bugs/show_bug.cgi?id=402885

    ReplyDelete
  4. the google docs are now private in your posts, if I see it right, probably some google policy changes

    ReplyDelete