I am keeping an eye on the pubsubhubbub technology since its first day, it is a very simple, yet interesting technology to make RSS/Atom feeds really fast without polling. I wrote my own component called 'SubHub', it is running for a couple of weeks now, it is receiving tons of Atom and RSS messages from popular news-sources like nasa, cnn, bbc and the few thousand other, through pubsubhubbub. Thanks to this hard-to-spell technology, I do not poll at all :-)
The developers envisioned a distributed network of hubs and streams and sure it is distributed by design. What I wanted to know is how distributed is it, since my first guess was that Google will have a little dominance in it. For example blogger.com rss feeds (including Dummy Warhead's) are pointing to google's pubsubhubbub server. The WordPress guys were also very active and they implemented their WordPress plugin, which is turned on in all the wordpress.com blogs, but they only use it for their own content while Google's service is used by anyone... and almost everyone.
Enough talking. Let's see, who is doing most of the work in pubsubhubbub:
The graph was generated from the web server log using the usual linux commands.
So the two upcoming players are SuperFeedr and Pheedo, looks like they are special services for special customers. There might be some minor players in the technology, but I haven't found them yet.
Showing posts with label networking. Show all posts
Showing posts with label networking. Show all posts
Wednesday, 24 October 2012
Friday, 22 April 2011
JMS and ActiveMQ's configurations
JMS is kind of important and mature piece of the java server side architecture. It is used in quite a lot of situations.
- You want to make sure the message will be received but you do not want to wait until it is received :)
- You want to send a message to multiple targets
- You do not need an answer immediately and you do not want to overload the target system
Also, in some of these cases it is critical to have the message persisted, while other cases really don't. For example typical business messages require persistence. Even if the server fails, after recovering the disks, you still have the message and sooner or later it will be processed.
A distributed ehcache synchronization really does not need any persistence. If the message is lost because of a power outage, the speed and throughput is much more important than persistence. Being honest I would use RMI for ehcache synchronization rather than JMS but RMI is not a good choice when you have firewalls between your servers. Most software developer does not have a choice, JMS is the standard for all kind of messaging, it is part of JEE/J2EE and most other platforms, it is everywhere.
The market for JMS products is narrowed down to a few big players and some agile small ones. The big players are Apache ActiveMQ and IBM's Websphere MQ. If you look at the google trends graphs of the two products, Websphere is in slow decline while ActiveMQ has a stable hit rate at google search. ActiveMQ generates bigger activity since 2009, while Websphere MQ keeps generating money for IBM.
So, if you want a mainstream JMS server, ActiveMQ is the only choice. I would never recommend IBM software for anyone :-)
The small (but agile) players...
ActiveMQ is blue, RabbitMQ is red, it has a nice activity especially since it is backed by springsource/vmware, and HornetQ is the yellow one.
So I was very interested in ActiveMQ and how hard can we kick into it before it is becoming a bottleneck in the system. I used the default configuration and a single threaded sender and receiver to send and receive a text message and it constantly and reliably did not let through more than 25 messages in the second. This was bad experience of course, one would expect much more than that. So I looked into it, and the default configuration runs with KahaDB and forces the message synced to the disk. I disabled the disk sync and it increased the speed to 300 messages/sec. That is closer to what I would like to see :-)
Another persistence options is JDBC, I used the default derby just for a comparison and it produced 100 messages / sec, still not that bad, it would make some sense to give it a try with MySQL and PostgreSQL. With no persistence at all: 700 messages / second. Not bad...
The default configuration is good for anyone. It is safe but its performance is not that impressive. Some applications do not need this safe operation and you can use another storage options that performs much better. Now if you have the two different need in one application, then it seems that you can not configure each queue to behave differently. They will be all forced to write to disk or all of them will be not synced to disk, so it seems to me that in such cases it is better to have two differently configured ActiveMQ for the two different requirement.
A distributed ehcache synchronization really does not need any persistence. If the message is lost because of a power outage, the speed and throughput is much more important than persistence. Being honest I would use RMI for ehcache synchronization rather than JMS but RMI is not a good choice when you have firewalls between your servers. Most software developer does not have a choice, JMS is the standard for all kind of messaging, it is part of JEE/J2EE and most other platforms, it is everywhere.
The market for JMS products is narrowed down to a few big players and some agile small ones. The big players are Apache ActiveMQ and IBM's Websphere MQ. If you look at the google trends graphs of the two products, Websphere is in slow decline while ActiveMQ has a stable hit rate at google search. ActiveMQ generates bigger activity since 2009, while Websphere MQ keeps generating money for IBM.
So, if you want a mainstream JMS server, ActiveMQ is the only choice. I would never recommend IBM software for anyone :-)
The small (but agile) players...
ActiveMQ is blue, RabbitMQ is red, it has a nice activity especially since it is backed by springsource/vmware, and HornetQ is the yellow one.
So I was very interested in ActiveMQ and how hard can we kick into it before it is becoming a bottleneck in the system. I used the default configuration and a single threaded sender and receiver to send and receive a text message and it constantly and reliably did not let through more than 25 messages in the second. This was bad experience of course, one would expect much more than that. So I looked into it, and the default configuration runs with KahaDB and forces the message synced to the disk. I disabled the disk sync and it increased the speed to 300 messages/sec. That is closer to what I would like to see :-)
Another persistence options is JDBC, I used the default derby just for a comparison and it produced 100 messages / sec, still not that bad, it would make some sense to give it a try with MySQL and PostgreSQL. With no persistence at all: 700 messages / second. Not bad...
The default configuration is good for anyone. It is safe but its performance is not that impressive. Some applications do not need this safe operation and you can use another storage options that performs much better. Now if you have the two different need in one application, then it seems that you can not configure each queue to behave differently. They will be all forced to write to disk or all of them will be not synced to disk, so it seems to me that in such cases it is better to have two differently configured ActiveMQ for the two different requirement.
Thursday, 31 March 2011
Playing with RMI
RMI is a popular RPC solution shipping with JDK since 1.1, however it's popularity faded with the dawn of the web service stacks. RMI is kind of old school, it never played nice with firewalls and proxies, it has never been a good choice for system integration, but a simple and well-known solution for pure-java RPC and in that point of view it is still kicking.
I was interested in the following aspects of using RMI:
I was interested in the following aspects of using RMI:
- Response time - how many remote procedure calls can my code do in a second?
- Scalablity - how does the response time change when I use multiple threads?
- How does the response time change with the argument size?
Test Hardware
The hardware that I used is not some high-end server, my desktop computer and my previous desktop computers. The RMI server is run by a dual core AMD box, the RMI client is a less powerfull old AMD 1800 Mhz proc. The network between the client and the server is a gigabit ethernet with a switch. Switches are known to increase network latency.
Both computers are running relatively up to date Linux versions (fedora). The test computers generate a unpleasant noise level and I had to wait for the opportunity to finish this test.
There were issues during the test. The old AMD server suddenly kernel-crashed under high load. I do not know yet if this is a hardware failure or a kernel problem, I will repeat the test with another test client.
Test method
The client opens a single connection to the RMI server and starts N threads and sends over a byte array that the server sends back. Very simple test.
I measured the time needed to do 10000 calls. If there are 3 threads, then each thread does 10000/3=3333 calls. This is not quite accurate, but who cares about that last one call.
Code to be shared...
Results
Being honest, RMI was a positive surprise for me, very nice results from such an old-timer.
The network latency
Today's server farms are usually built with gigabit network or better. The 10gigabit networks are getting more and more popular as well, however they are not yet available in the hardware store of mere mortals. Therefore I can not benchmark a 10gigabit network, but I repeated the same test after scaling down my gigabit network to 100 Mb/sec and add some extra network latency by pluging the test server into the 10/100 router. Therefore now the IP pockets are passed through the gigabit switch and the 10/100 router. And the results are lovely.
Conclusions
Both computers are running relatively up to date Linux versions (fedora). The test computers generate a unpleasant noise level and I had to wait for the opportunity to finish this test.
There were issues during the test. The old AMD server suddenly kernel-crashed under high load. I do not know yet if this is a hardware failure or a kernel problem, I will repeat the test with another test client.
Test method
The client opens a single connection to the RMI server and starts N threads and sends over a byte array that the server sends back. Very simple test.
I measured the time needed to do 10000 calls. If there are 3 threads, then each thread does 10000/3=3333 calls. This is not quite accurate, but who cares about that last one call.
Code to be shared...
Results
Being honest, RMI was a positive surprise for me, very nice results from such an old-timer.
The network latency
Today's server farms are usually built with gigabit network or better. The 10gigabit networks are getting more and more popular as well, however they are not yet available in the hardware store of mere mortals. Therefore I can not benchmark a 10gigabit network, but I repeated the same test after scaling down my gigabit network to 100 Mb/sec and add some extra network latency by pluging the test server into the 10/100 router. Therefore now the IP pockets are passed through the gigabit switch and the 10/100 router. And the results are lovely.
Conclusions
- The network latency is significant if you are in hurry, but if you use several threads, you can work it around. The more the network latency, the more threads...
Anyway, if you need a good response time, do not waste your processing-time on any kind of RPC :) - Sending twice us much data does not take twice as much time, at least as long as we are dealing with small records up to 8192 bytes. This is likely because of the TCP and ethernet overhead, it is very significant with small amounts of data transfered. Actually, the difference is very small, so it makes sense to be smart and send more data with one call than than doing several small data transfers. This is where the Facade design pattern can help.
As you can see, as the size of the transferred data grows over a certain size, the response time is starting to grow accordingly.
Subscribe to:
Posts (Atom)