Monday, 4 February 2013

Caching UUID's?

This is a short test inspired by my day work. The question is: is it worth caching objects in memory that could be just parsed? Examples for such objects are Integers, UUID's and some other objects. Well, as you know the first some Integers are actually cached by the runtime, so if you call Integer.parseInt, you may already get cached instances. That makes sense with Integer. With UUID the situation is a bit different, since there are no "first X" UUID's. So let's say that if you use some guid's frequently. The question is: can you get any advantage out of caching UUID objects rather than parsing from string?

Test method

All data series are measured with a 10 different datasets. The datasets differ from each other in the number of repeating UUID's: The first one does not have any, the last one has 90 percent repeating UUID's.

So most importantly, let's measure just plain parsing (no cache) just to compare to something. Then, let's measure caching with a HashMap. I have to add that a HashMap is not a cache and whenever I see a HashMap used as cache, I have terrible nightmares about OOM exceptions coming like zombies from everywhere.
Third, let's measure the performace of a real cache, I chose ehcache. You can choose your own pet cache technology (to be honest I use infinispan, but now for simplicity I just wanted a good old ehcache)

Results

Ok, let's see what we got.
  • As expected, no cache performs more or less the same everytime.
  • HashMap "caching" adds a little speed over 50 percent repeating input. It is a little compensatin for the OOM's you will get, or for the code you will write to avoid it :)
  • Ehcache implementation has some difficulties keeping up, it only beats the "no cache" solution when the percentage of repeating uuid's is over 90%, even then, the gain is little.

So my conclusion is: I would probably not want to cache the objects that are this easy to create. I would definetly try to optimize once the database interactions are optimal, the app scales well to multiple processors and even multiple nodes, and so on... but this looks like a small and painful victory.