Dummy Warhead: core

Showing posts with label core. Show all posts

Sunday, 11 May 2014

Hello Babel!

Lucas van Valckenborch's painting
source: commons.wikipedia.org

Even though the new Java 8 is out with long expected features like lambda expressions (closures, for simplicity), millions of software developers are looking for a better java.

There are tons of different reasons why you may consider switching to another language, performance is just one of those. I was interested in a very basic aspect of performance: how quickly is the code compiled from the source code able to create a string.

The tested languages are:

Java 1.7
Scala 2.10.0
Kotlin 0.7.271
Groovy 2.3.0
Clojure 1.3.10

I am really missing ceylon from the list, but it does not have a maven plugin, it is not even published to maven central or any public repositories, so I skipped it.

And the results are...

Well, the single-threaded performance and simple String handling may not be a good reason to switch at the moment. As you see, java far outperformed the other languages. If you look into the bytecode generated by Scala, Groovy and Clojure, the reason is obvious. It does so many things, that it just can't perform quick.
While Kotlin performed only about half as quick as the code compiled from java, the problem is a bit harder to spot. So let me highlight the problem...

       0: new           #19                 // class java/lang/StringBuilder
       3: dup
       4: invokespecial #23                 // Method java/lang/StringBuilder."<init>":()V
       7: ldc           #25                 // String Hello
       9: invokevirtual #29                 // Method java/lang/StringBuilder.append:(Ljava/lang/Object;)Ljava/lang/StringBuilder;
      12: aload_1
      13: invokevirtual #29                 // Method java/lang/StringBuilder.append:(Ljava/lang/Object;)Ljava/lang/StringBuilder;
      16: ldc           #31                 // String !
      18: invokevirtual #29                 // Method java/lang/StringBuilder.append:(Ljava/lang/Object;)Ljava/lang/StringBuilder;
      21: invokevirtual #35                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;

The problem is that Kotlin compiler generated code to append method accepting Object arguments, while it is known that the argument is going to be String. Should be an easy fix, I registered a bug in Kotlin's bugtracker.
Update: I played a little with it and found where the code is generated in the compiler. The compiler with my patch generated line by line the same code as the java compiler, and therefore it performed the same.

The truth is, even javac generated suboptimal code, it could be beaten. And next time I will give it a try.

Test environment:

AMD E2-1800
Oracle java 1.7.0_45
Fedora linux

Test code on github.

Thursday, 4 July 2013

String.split versus Pattern.split

I believe in java most people use String.split() to split a String to pieces. That method is there for ages (java 1.4), everyone knows it and it just works.
The alternative for this is to use a Pattern instance, which is immutable, therefore you only need a single instance of the pattern created only once and it can serve your applications forever. The guys who started to use it are in my opinion smarter, because they knew that the String.split actually needs to create a Pattern object, which is a relatively heavy operation and they can save it.

However, this is not the end of the story. It would be too short and would not make a blog post :-)

String.split() is smart, and it has a special case when the pattern is only one or two characters long and it does not contain special regexp characters. In this case it will not create a Pattern object, but simply process the whole thing in place. That special piece of code is not just accidentally there.

Let's see the speed comparison.
As you see, String.split performs better when the character you use for splitting meets the above requirements. When you need to split with several different characters - I believe this might be the less frequent case - you'd be much better using a Pattern constant.

Monday, 18 March 2013

To kill a GC

This is actually an answer for a recent patch written at work. So what happens if you have an object, which overrides the finallize method and in that it needs to wait for a while, e.g. to wait for a thread to join (database transaction to finish, IO operation, and so on).

This is an example code, it does not wait for a thread for 20 seconds, it only sleeps for 1, but anyway it gives a little work for the GC.

import java.util.Date;

public class Main {

 final int nr;
 final byte[] justsomememory;

 public Main(int nr) {
  this.nr = nr;
  justsomememory = new byte[1024 * 1024];
 }

 public static void main(String[] args) throws Exception {
  for (int i = 0; i < 20000; i++) {
   Main main = new Main(i);
    Thread.sleep(10);
  }
 }

 @Override
 protected void finalize() throws Throwable {
  Thread.sleep(1000);
 }

}

Let's use the jconsole to look under the hood of the VM.

This is a healthy memory usage.

This is what you see if you have any blocking operations in finaly()

I think this is visual enough. So in general, I agree with those who try to avoid having a finallize method in their code. And really why should you have one?

Monday, 4 February 2013

Caching UUID's?

This is a short test inspired by my day work. The question is: is it worth caching objects in memory that could be just parsed? Examples for such objects are Integers, UUID's and some other objects. Well, as you know the first some Integers are actually cached by the runtime, so if you call Integer.parseInt, you may already get cached instances. That makes sense with Integer. With UUID the situation is a bit different, since there are no "first X" UUID's. So let's say that if you use some guid's frequently. The question is: can you get any advantage out of caching UUID objects rather than parsing from string?

Test method

All data series are measured with a 10 different datasets. The datasets differ from each other in the number of repeating UUID's: The first one does not have any, the last one has 90 percent repeating UUID's.

So most importantly, let's measure just plain parsing (no cache) just to compare to something. Then, let's measure caching with a HashMap. I have to add that a HashMap is not a cache and whenever I see a HashMap used as cache, I have terrible nightmares about OOM exceptions coming like zombies from everywhere.
Third, let's measure the performace of a real cache, I chose ehcache. You can choose your own pet cache technology (to be honest I use infinispan, but now for simplicity I just wanted a good old ehcache)

Results

Ok, let's see what we got.

As expected, no cache performs more or less the same everytime.
HashMap "caching" adds a little speed over 50 percent repeating input. It is a little compensatin for the OOM's you will get, or for the code you will write to avoid it :)
Ehcache implementation has some difficulties keeping up, it only beats the "no cache" solution when the percentage of repeating uuid's is over 90%, even then, the gain is little.

So my conclusion is: I would probably not want to cache the objects that are this easy to create. I would definetly try to optimize once the database interactions are optimal, the app scales well to multiple processors and even multiple nodes, and so on... but this looks like a small and painful victory.

Monday, 3 December 2012

compression - size vs speed

This is a small comparison of compression algorithms for the JVM:

gzip of course, I tested both java's implementation and the one included in Linux (through System.exec)
bzip2, the more heavy-weight player
deflate, the algoithm used by zip
pk200, a seemingly java-specific algorithm
xz, the algorithm behind 7zip

Random input

Text input

Saturday, 29 October 2011

turning simple into difficult: date formats

SimpleDateFormat is a nice little class that helps programmers turn a date to string and vice versa. It is very simple to use and it is packaged with standard java since 1.2 probably... that was very long ago. And of course we use it a lot.

At the other hand, SimpleDateFormat is kind of pain in the back because it is not thread-safe and never going to be. I was always wondering why, it seems to be a terribly wrong design decision. It makes things a bit difficult with SimpleDateFormat. Most of the time you just want to create a final static SimpleDateFormat field with your preferred format. If you do this, your code is probably already broken, under high load it will produce stupid results and you will not know why. (Unless you use PMD with the proper rules) If your class needs to be threadsafe -most of them do- the best thing to do is to create a new instance of SimpleDateFormat when you need it, as a local varible in a method.

What is painful with this? It's performance sucks.

This is what everyone knows. Let's see how to deal with the problem:

Workaround :-)
You can have a final static SimpleDateFormat object, and anytime you want to use it, you clone it and use the clone to do the job. Funny right?
This is a theoretical solution. Do not use this! The java culture chose a completely different approach and year by year thousands of java developers get heart-attack after finding a final static SimpleDateFormat field in the code.
Commons Lang is around for a good while as well. Life would be hell without the apache commons projects. Commons-lang provides some support (FastDateFormat) to deal with date formating. Unfortunately it does not support parse. Only format :-( but even that is more than nothing.

Now of course the choice depends on your taste, but I thought I will help you with a short performance benchmark on the topic:

new SimpleDateFormat 100000 times : 170 ms

SimpleDateFormat clone 100000 times : 68 ms

DateFormatUtils 100000 times : 62 ms

FastDateFormat 100000 times : 55 ms
Update: the two lines updated from Zsombor's ThreadLocal idea. Wow, this must be some mistake!

SimpleDateFormat form pre-created ThreadLocal 100000 times : 29 ms
SimpleDateFormat in ThreadLocal check each time 100000 times : 28 ms

For me it is surprising again and again, how bad the SimpleDateFormat performs when you create it again and again. More than half of the processing is just creating the SimpleDateFormat object, an enormous waste of resources. And then once it is created, it is very quick.

Another funny link for jarvana to find out how many packages has a FastDateFormat. I can't decide if it is tragic or funny...

Test code: here.
java version: sun jdk 1.6.22
OS: linux (fedora 15)
Hardware: Lenovo T520 with Intel core i7 vpro (thank you Red Hat :) )

Wednesday, 15 June 2011

Serialization speed

Serialization is probably the most simple way to send data to an output stream. The three most typical use cases:

Session replication in servlet containers
A simple way to save information to disk
RMI uses it heavily

Of course session replication is kind of important and it would be nice if it was quick. Therefore, I was very interested in the cost of serialization in general, with small data structures and bigger ones alike.

Serialization is smart, it does not serialize the same object twice, it just sends a reference to an already serialized object, this is the mechanism that allows deserialization reconstruct the same object-graph on the other side of the stream. The small drawback of this smart behavior is that if you send through lots of data on the ObjectOutputStream, you should call reset() time after time again, otherwise the ObjectOutputStream will hold a reference to all objects you have serialized and sooner or later you get an OOM. The reset method clears the ObjectOutputGraph's object/ID tracking and subsequent calls to writeObject will actually serialize the data again. So in my tests this method is called after each serialization. Well, there are a couple of ways you can shoot yourself on the leg with serialization :-)

And for IO stream, I implemented something similar to commons-io NullOutputStream, since it is the serialization speed I want to know, not the IO speed. Here comes the google eye-candy:

So... what this tells me is that when you have problems with session replication, your problem is not the serialization, it is more likely the IO speed. While serialization produced 600 MB in just 4 seconds, a gigabit network with tcp/ip connection can transport roughly 60-70 MB/sec, so it will need at least 10 secs to transport your data.
If a component is slow, use it sparingly :-)

Java version: 64 bit Sun/Oracle JDK 1.6
OS: Linux (fedora)

Thursday, 19 May 2011

The distribution of java.lang.String hash values

I believe most programs use String keys for HashMap's, because Strings are:

immutable - mutable objects can do tricks in HashMaps and this is not funny when you spend a night debugging them
final - you can not override it with some smart code
and of course Strings represent most of the data stored in business apps (names, filesystem paths, etc)

So, my question is... How does plain english wikipedia article titles play with the String hash algorithm? Is it random like the white-noise or can we see some interesting hills and walleys?

Also, what outputs do other languages generate? E.g. hungarian language has some special extra characters compared to english, chinese writing has some 47.000 extra characters.

Let's see: A java int is a signed 32-bit value. The minimal value is -2147483648, the maximum is 2147483647 and there are 8472583 articles in the latest english wikipedia dump. Therefore there are roughly 506 times as much hash keys as wikipedia articles, at least for english.

I am not going to create 2^32 counters for each possible hash, I will just take the hash/1000000 and put them in a HashMap<Integer, Integer>. (Why not, Integer is also ok for hash key, I love the simple implementation of Integer.hash). So in average there should be a little more than two in each counter, if the distribution is even.
Sorry for the static chart instead of the interactive one, the interactive would seriously kill your browser... So here it comes.

The distribution of english wikipedia title hash values

The distribution of the hungarian wikipedia title hash values

The distribution of the chinese wikipedia title has values

Now there are some similarities. It seems more like a long tail model than an even distribution, there are huge amount of hash values between zero and a million, but the most of the hash values are elsewhere.
There are some peaks that both hungarian and english wikipedia shares, and they both have some unique peaks. The chinese does not have these peaks.

So my conclusion is that the distribution of hash keys somewhat depends on what you use it for, but it is not bad anyway.

I always wanted to know if it is really worth for String to cache the hash value in the 'hash' field, or wouldn't it make more sense to make it transient at least?

Update for my stupid question above: serialization handles String objects specially, they are not serialized as other beans, therefore the hash field is not included. The hash value caching is still an interesting question, I will look into it.

Saturday, 14 May 2011

Volatile

Volatile is a rarely used keyword in Java. If you have never used it, don't worry, you are almost certainly right! All it does is that it enforces the program to read the value from the RAM rather than using a cache, so you can be sure that you got the fresh value at least at the moment when you read it. It has a somewhat better performance than a synchronize block since it does not lock. However, you run into trouble if you also want to write the data back, because even a ++ operation is non atomic, it is a read, a calculation and a write, therefore the probability of a wrong result is high.

I was interested in the following questions:

How much slower is the access to volatile fields?
What is the cost of synchronization?
What is the probability of the bad result coming from wrong multi-threading code?
How does the synchronization compare to modern alternatives like STM?

So I wrote a very simple test code to answer all four questions, a Counter interface with 4 implementations:

Simple counter
Synchronized counter
Counter with volatile field
STM counter with Multiverse

The test code starts a number of threads and shares a single counter with them. All threads call hit() on the counter exactly 1 million times and then terminates. The test waits for all the threads to finish and then checks the Counter. It should be exactly a million times the number of threads. And of course, we have two BAD implementation here (number 1 and 3), where I only wanted to know how wrong they are.

Test environment: the usual dual-core AMD nuclear submarine with 2 GB 1066 Hz memory, Linux and java 1.6

Code: https://dummywarhead.googlecode.com/hg/volatilepenalty/

Conclusions

Access to volatile fields is much slower of course, than just normal fields. The results say it is about 4 times that slow, but I believe this also depends on your RAM speed. It is actually not much better than a synchronized block.
The cost of synchronization is high indeed if you just look at those lines, but not high at all if you know that the "simple" and "volatile" solutions produce wrong results.
The probability of bad result coming from wrong concurrency code is huge. If you frequently update something from multiple threads, you need to think about synchronization. Well, this test really is an edge case, but never mind.
From the first moment when I heard of software transactional memory, I love the idea. It sounds just great. But in this test it does not perform great, at least not on 2 cores, but this is something the wiki page mentioned as well. It would be a nice to run it on a 4-core or 8-core computers just to see how the figures change, but my guess is that it does not improve, because it needs to do way too many rollbacks. Optimistic locking should perform better on 4+ cores when the probability of collission is relatively small. This is not actually a fair test for STM, it really needs a smarter one.

About the error rates: My first impression was that the volatile solution produces even higher error than the simple one, but now I am not quite sure. But anyway, they are both just wrong.
Think before Thread.start()!

Thursday, 10 March 2011

The good old StringBuffer vs StringBuilder

Everyone (or at leasr almost everyone) knows that StringBuffer is synchronized and therefore slower than StringBuilder. How do they compare? It depends on a lots of things, e.g. if you append character by character, you will find that StringBuilder is much faster because it avoided synchronization. If you call append just a couple of times with relatively big strings, the difference will be very little.

However, there is another factor here. You can specify initial capacity for both StringBuilder and StringBuffer, and if you don't (and you do not have pass over an initial string either) the capacity will be set to 16. Not much, but at least not wasting the memory :) When they run out of space while appending, they both double the capacity by allocating a new memory area and arrayCopy the old content. This seems to be something where you can gain a little speed again. If you have a guess for the length of the produced string, you can avoid at least the first some memory allocation and arrayCopy.

Let's see how much it matters...

As you can see, there is a huge difference between StringBuffer and StringBuilder, since this test calls append very many times with very short strings. The another difference between pre-allocated memory and the one slowly growing from 16. Now this test constructed an 1024 character length string, therefore with a good initial capacity the pre-allocated version saved 6 re-allocation. And there is the difference, with a good guess, you can still save lots of processing time.

Now let's rewrite the code and instead of a creating the long string by appending single characters, we can use bigger strings and the bigger the strings, the smaller the difference between created by synchronization and at one point, pre-allocation of the space will have more benefit than the synchronization question, and this could be important. This below chart was generated using 64-char strings to construct the same size at the end.

JVM: 64-bit server JVM 1.6.0_22-b04