I was interested in the following aspects of using RMI:
- Response time - how many remote procedure calls can my code do in a second?
- Scalablity - how does the response time change when I use multiple threads?
- How does the response time change with the argument size?
Test Hardware
The hardware that I used is not some high-end server, my desktop computer and my previous desktop computers. The RMI server is run by a dual core AMD box, the RMI client is a less powerfull old AMD 1800 Mhz proc. The network between the client and the server is a gigabit ethernet with a switch. Switches are known to increase network latency.
Both computers are running relatively up to date Linux versions (fedora). The test computers generate a unpleasant noise level and I had to wait for the opportunity to finish this test.
There were issues during the test. The old AMD server suddenly kernel-crashed under high load. I do not know yet if this is a hardware failure or a kernel problem, I will repeat the test with another test client.
Test method
The client opens a single connection to the RMI server and starts N threads and sends over a byte array that the server sends back. Very simple test.
I measured the time needed to do 10000 calls. If there are 3 threads, then each thread does 10000/3=3333 calls. This is not quite accurate, but who cares about that last one call.
Code to be shared...
Results
Being honest, RMI was a positive surprise for me, very nice results from such an old-timer.
The network latency
Today's server farms are usually built with gigabit network or better. The 10gigabit networks are getting more and more popular as well, however they are not yet available in the hardware store of mere mortals. Therefore I can not benchmark a 10gigabit network, but I repeated the same test after scaling down my gigabit network to 100 Mb/sec and add some extra network latency by pluging the test server into the 10/100 router. Therefore now the IP pockets are passed through the gigabit switch and the 10/100 router. And the results are lovely.
Conclusions
Both computers are running relatively up to date Linux versions (fedora). The test computers generate a unpleasant noise level and I had to wait for the opportunity to finish this test.
There were issues during the test. The old AMD server suddenly kernel-crashed under high load. I do not know yet if this is a hardware failure or a kernel problem, I will repeat the test with another test client.
Test method
The client opens a single connection to the RMI server and starts N threads and sends over a byte array that the server sends back. Very simple test.
I measured the time needed to do 10000 calls. If there are 3 threads, then each thread does 10000/3=3333 calls. This is not quite accurate, but who cares about that last one call.
Code to be shared...
Results
Being honest, RMI was a positive surprise for me, very nice results from such an old-timer.
The network latency
Today's server farms are usually built with gigabit network or better. The 10gigabit networks are getting more and more popular as well, however they are not yet available in the hardware store of mere mortals. Therefore I can not benchmark a 10gigabit network, but I repeated the same test after scaling down my gigabit network to 100 Mb/sec and add some extra network latency by pluging the test server into the 10/100 router. Therefore now the IP pockets are passed through the gigabit switch and the 10/100 router. And the results are lovely.
Conclusions
- The network latency is significant if you are in hurry, but if you use several threads, you can work it around. The more the network latency, the more threads...
Anyway, if you need a good response time, do not waste your processing-time on any kind of RPC :) - Sending twice us much data does not take twice as much time, at least as long as we are dealing with small records up to 8192 bytes. This is likely because of the TCP and ethernet overhead, it is very significant with small amounts of data transfered. Actually, the difference is very small, so it makes sense to be smart and send more data with one call than than doing several small data transfers. This is where the Facade design pattern can help.
As you can see, as the size of the transferred data grows over a certain size, the response time is starting to grow accordingly.