Thursday, 31 March 2011

Playing with RMI

RMI is a popular RPC solution shipping with JDK since 1.1, however it's popularity faded with the dawn of the web service stacks. RMI is kind of old school, it never played nice with firewalls and proxies, it has never been a good choice for system integration, but a simple and well-known solution for pure-java RPC and in that point of view it is still kicking.

I was interested in the following aspects of using RMI:
  • Response time - how many remote procedure calls can my code do in a second?
  • Scalablity - how does the response time change when I use multiple threads?
  • How does the response time change with the argument size?
Test Hardware

The hardware that I used is not some high-end server, my desktop computer and my previous desktop computers. The RMI server is run by a dual core AMD box, the RMI client is a less powerfull old AMD 1800 Mhz proc. The network between the client and the server is a gigabit ethernet with a switch. Switches are known to increase network latency.
Both computers are running relatively up to date Linux versions (fedora). The test computers generate a unpleasant noise level and I had to wait for the opportunity to finish this test.

There were issues during the test. The old AMD server suddenly kernel-crashed under high load. I do not know yet if this is a hardware failure or a kernel problem, I will repeat the test with another test client.

Test method

The client opens a single connection to the RMI server and starts N threads and sends over a byte array that the server sends back. Very simple test.
I measured the time needed to do 10000 calls. If there are 3 threads, then each thread does 10000/3=3333 calls. This is not quite accurate, but who cares about that last one call.
Code to be shared...

Results

Being honest, RMI was a positive surprise for me, very nice results from such an old-timer.



The network latency

Today's server farms are usually built with gigabit network or better. The 10gigabit networks are getting more and more popular as well, however they are not yet available in the hardware store of mere mortals. Therefore I can not benchmark a 10gigabit network, but I repeated the same test after scaling down my gigabit network to 100 Mb/sec and add some extra network latency by pluging the test server into the 10/100 router. Therefore now the IP pockets are passed through the gigabit switch and the 10/100 router. And the results are lovely.



Conclusions

  1. The network latency is significant if you are in hurry, but if you use several threads, you can work it around. The more the network latency, the more threads...
    Anyway, if you need a good response time, do not waste your processing-time on any kind of RPC :)
  2. Sending twice us much data does not take twice as much time, at least as long as we are dealing with small records up to 8192 bytes. This is likely because of the TCP and ethernet overhead, it is very significant with small amounts of data transfered. Actually, the difference is very small, so it makes sense to be smart and send more data with one call than than doing several small data transfers. This is where the Facade design pattern can help.
    As you can see, as the size of the transferred data grows over a certain size, the response time is starting to grow accordingly.

3 comments:

  1. Next time, please provide values and units for the Y coordinates too :)

    ReplyDelete
  2. If you use spring - and I bet you use it - you can easily test and compare the other remoting solutions, Hessian, Burlap, probably Thrift and Protobuf :)

    ReplyDelete
  3. Yeah, sorry about the Y axis, I am still learning tricks with google docs :-D

    I did not use spring for this test, but thanks for the idea, I will give those solutions a try!

    ReplyDelete