Showing posts with label compression. Show all posts
Showing posts with label compression. Show all posts

Wednesday, 8 February 2017

gzip -x

Test subject is a linux LVM volume containing a usual minimal Linux root filesystem, I want to move it to another host. LVM does not give any special tool for moving volumes, so one has to use the "standard" unix commands, this one may be a good candidate:

dd if=/dev/vg/lv-1 | ssh other-server "dd of=/dev/vg/lv-1"

Now of course this is simple, but unfortunately not that good, because if this volume is 2 GB, then we generate 2 GB network traffic, we can try with compression:

dd if=/dev/vg/lv-1 | gzip -N | ssh other-server " gzip -d | dd of=/dev/vg/lv-1"


And now the question is: what should that N be? I have always used 9, but is it that much better than 8?

Centos 7 root filesystem

















A blank ext4 filesystem

Now this is the case, where the compression is quick yet I am not satisfied. The problem is that dd does not have a way to skip empty blocks, those too will be transferred. Can gzip help?






That nice drop between 3 and four is interesting. But is it worth waiting 3 seconds to save 7MB transfer? If it does, then your network sucks. Most of the time gzip -1 will do just enough.



Conclusion

More than 5 does not improve significantly on the compression, but comes with a great cost.

gzip -1 may be the best option when we know that the filesystem is mostly empty.

But in any case, an optimization should look for something better than gzip :)

Sunday, 31 March 2013

compression again - system vs java implementation

Last time I mentioned that using the operating system's compression utility (gzip on linux) performs better than the java counterpart even if you use it from java (the good old Process.exec()). This is not quite that simple of course :-) So in this test I compare the time needed to compress a random input both by system and java implementations. The size of the input is growing over the test, so does the time needed to compress, but there is something interesting.

So as you see the system gzip is faster, but it has a communication and process creation overhead. The java implementation is running without this overhead is therefore performing better with small inputs. The lines meet at about 512 KB. If the input is bigger, piping through a system command performs better.

This test was performed on Fedora 18 (the disasterous OS) x64 architecture, other architectures and operating systems may have different result.

Monday, 3 December 2012

compression - size vs speed

This is a small comparison of compression algorithms for the JVM:
  • gzip of course, I tested both java's implementation and the one included in Linux (through System.exec)
  • bzip2, the more heavy-weight player
  • deflate, the algoithm used by zip
  • pk200, a seemingly java-specific algorithm
  • xz, the algorithm behind 7zip

Random input




Text input