Wednesday 15 June 2011

Serialization speed

Serialization is probably the most simple way to send data to an output stream. The three most typical use cases:
  • Session replication in servlet containers
  • A simple way to save information to disk
  • RMI uses it heavily
Of course session replication is kind of important and it would be nice if it was quick. Therefore, I was very interested in the cost of serialization in general, with small data structures and bigger ones alike.

Serialization is smart, it does not serialize the same object twice, it just sends a reference to an already serialized object, this is the mechanism that allows deserialization reconstruct the same object-graph on the other side of the stream. The small drawback of this smart behavior is that if you send through lots of data on the ObjectOutputStream, you should call reset() time after time again, otherwise the ObjectOutputStream will hold a reference to all objects you have serialized and sooner or later you get an OOM. The reset method clears the ObjectOutputGraph's object/ID tracking and subsequent calls to writeObject will actually serialize the data again. So in my tests this method is called after each serialization. Well, there are a couple of ways you can shoot yourself on the leg with serialization :-)
And for IO stream, I implemented something similar to commons-io NullOutputStream, since it is the serialization speed I want to know, not the IO speed. Here comes the google eye-candy:





So... what this tells me is that when you have problems with session replication, your problem is not the serialization, it is more likely the IO speed. While serialization produced 600 MB in just 4 seconds, a gigabit network with tcp/ip connection can transport roughly 60-70 MB/sec, so it will need at least 10 secs to transport your data.
If a component is slow, use it sparingly :-)

Java version: 64 bit Sun/Oracle JDK 1.6
OS: Linux (fedora)