Dummy Warhead: stm

Volatile is a rarely used keyword in Java. If you have never used it, don't worry, you are almost certainly right! All it does is that it enforces the program to read the value from the RAM rather than using a cache, so you can be sure that you got the fresh value at least at the moment when you read it. It has a somewhat better performance than a synchronize block since it does not lock. However, you run into trouble if you also want to write the data back, because even a ++ operation is non atomic, it is a read, a calculation and a write, therefore the probability of a wrong result is high.

I was interested in the following questions:

How much slower is the access to volatile fields?
What is the cost of synchronization?
What is the probability of the bad result coming from wrong multi-threading code?
How does the synchronization compare to modern alternatives like STM?

So I wrote a very simple test code to answer all four questions, a Counter interface with 4 implementations:

Simple counter
Synchronized counter
Counter with volatile field
STM counter with Multiverse

The test code starts a number of threads and shares a single counter with them. All threads call hit() on the counter exactly 1 million times and then terminates. The test waits for all the threads to finish and then checks the Counter. It should be exactly a million times the number of threads. And of course, we have two BAD implementation here (number 1 and 3), where I only wanted to know how wrong they are.

Test environment: the usual dual-core AMD nuclear submarine with 2 GB 1066 Hz memory, Linux and java 1.6

Code: https://dummywarhead.googlecode.com/hg/volatilepenalty/

Conclusions

Access to volatile fields is much slower of course, than just normal fields. The results say it is about 4 times that slow, but I believe this also depends on your RAM speed. It is actually not much better than a synchronized block.
The cost of synchronization is high indeed if you just look at those lines, but not high at all if you know that the "simple" and "volatile" solutions produce wrong results.
The probability of bad result coming from wrong concurrency code is huge. If you frequently update something from multiple threads, you need to think about synchronization. Well, this test really is an edge case, but never mind.
From the first moment when I heard of software transactional memory, I love the idea. It sounds just great. But in this test it does not perform great, at least not on 2 cores, but this is something the wiki page mentioned as well. It would be a nice to run it on a 4-core or 8-core computers just to see how the figures change, but my guess is that it does not improve, because it needs to do way too many rollbacks. Optimistic locking should perform better on 4+ cores when the probability of collission is relatively small. This is not actually a fair test for STM, it really needs a smarter one.

About the error rates: My first impression was that the volatile solution produces even higher error than the simple one, but now I am not quite sure. But anyway, they are both just wrong.
Think before Thread.start()!

Saturday, 14 May 2011

Volatile