Dummy Warhead: March 2011

Thursday, 31 March 2011

Playing with RMI

RMI is a popular RPC solution shipping with JDK since 1.1, however it's popularity faded with the dawn of the web service stacks. RMI is kind of old school, it never played nice with firewalls and proxies, it has never been a good choice for system integration, but a simple and well-known solution for pure-java RPC and in that point of view it is still kicking.

I was interested in the following aspects of using RMI:

Response time - how many remote procedure calls can my code do in a second?
Scalablity - how does the response time change when I use multiple threads?
How does the response time change with the argument size?

Test Hardware

The hardware that I used is not some high-end server, my desktop computer and my previous desktop computers. The RMI server is run by a dual core AMD box, the RMI client is a less powerfull old AMD 1800 Mhz proc. The network between the client and the server is a gigabit ethernet with a switch. Switches are known to increase network latency.
Both computers are running relatively up to date Linux versions (fedora). The test computers generate a unpleasant noise level and I had to wait for the opportunity to finish this test.

There were issues during the test. The old AMD server suddenly kernel-crashed under high load. I do not know yet if this is a hardware failure or a kernel problem, I will repeat the test with another test client.

Test method

The client opens a single connection to the RMI server and starts N threads and sends over a byte array that the server sends back. Very simple test.
I measured the time needed to do 10000 calls. If there are 3 threads, then each thread does 10000/3=3333 calls. This is not quite accurate, but who cares about that last one call.
Code to be shared...

Results

Being honest, RMI was a positive surprise for me, very nice results from such an old-timer.

The network latency

Today's server farms are usually built with gigabit network or better. The 10gigabit networks are getting more and more popular as well, however they are not yet available in the hardware store of mere mortals. Therefore I can not benchmark a 10gigabit network, but I repeated the same test after scaling down my gigabit network to 100 Mb/sec and add some extra network latency by pluging the test server into the 10/100 router. Therefore now the IP pockets are passed through the gigabit switch and the 10/100 router. And the results are lovely.

Conclusions

The network latency is significant if you are in hurry, but if you use several threads, you can work it around. The more the network latency, the more threads...
Anyway, if you need a good response time, do not waste your processing-time on any kind of RPC :)
Sending twice us much data does not take twice as much time, at least as long as we are dealing with small records up to 8192 bytes. This is likely because of the TCP and ethernet overhead, it is very significant with small amounts of data transfered. Actually, the difference is very small, so it makes sense to be smart and send more data with one call than than doing several small data transfers. This is where the Facade design pattern can help.
As you can see, as the size of the transferred data grows over a certain size, the response time is starting to grow accordingly.

Thursday, 10 March 2011

The good old StringBuffer vs StringBuilder

Everyone (or at leasr almost everyone) knows that StringBuffer is synchronized and therefore slower than StringBuilder. How do they compare? It depends on a lots of things, e.g. if you append character by character, you will find that StringBuilder is much faster because it avoided synchronization. If you call append just a couple of times with relatively big strings, the difference will be very little.

However, there is another factor here. You can specify initial capacity for both StringBuilder and StringBuffer, and if you don't (and you do not have pass over an initial string either) the capacity will be set to 16. Not much, but at least not wasting the memory :) When they run out of space while appending, they both double the capacity by allocating a new memory area and arrayCopy the old content. This seems to be something where you can gain a little speed again. If you have a guess for the length of the produced string, you can avoid at least the first some memory allocation and arrayCopy.

Let's see how much it matters...

As you can see, there is a huge difference between StringBuffer and StringBuilder, since this test calls append very many times with very short strings. The another difference between pre-allocated memory and the one slowly growing from 16. Now this test constructed an 1024 character length string, therefore with a good initial capacity the pre-allocated version saved 6 re-allocation. And there is the difference, with a good guess, you can still save lots of processing time.

Now let's rewrite the code and instead of a creating the long string by appending single characters, we can use bigger strings and the bigger the strings, the smaller the difference between created by synchronization and at one point, pre-allocation of the space will have more benefit than the synchronization question, and this could be important. This below chart was generated using 64-char strings to construct the same size at the end.

JVM: 64-bit server JVM 1.6.0_22-b04

Wednesday, 9 March 2011

Starting up...

Hi,

I am Laszlo, .* developer at Duct-tape Solutions Inc. I am working for about 11 years in the information technology industry and I use java probably for 10 years on a daily basis. This is my new blog about java tools focusing on performance. Topics that I would like to cover:

Performance considerations of java core classes
Comparison of standard implementations, e.g. Tomcat versus Jetty, ActiveMQ versus HornetQ
Architecture - architecture is where most projects go wrong
Scaling out: Clustering and computation models - e.g. hadoop versus terracotta
Scaling up: Concurrency, memory, garbage collector and other JVM parameter tuning

Guidelines for posts:

I will share the test code for each post. I will have a mercurial repository at google code.
I will use maven as build tool whenever possible, just to keep things simple.
Graphs
Description of the hardware and software environment. OS, java version, java runtime parameters, hardware components and so on...

Anyway, this is my first blog in english, sorry about the grammar mistakes. I hope you will enjoy!