Sunday, 28 October 2018

Sequence performance





I am frequently getting getting warnings like this from Intellij Idea in kotlin projects:






And if you accept the hint, Intellij will change this into


So what Idea recommends is that if you are performing two or more computations on collections, then you should turn the collection into a sequence and do the computations on those.

But...

Is this always true?


Apparently not. When you transform one list into another list, sequence's seem to consistently under-perform the usual List operations.







When is it right?


When at the end you have to get another list, it does not seem to be a great choice But when you do not want to create another list, you just want to get a sum or avg, that is when it seems to perform always better.



And yes in this case, the transformed code does perform better.





So, having no confirmation of the performance improvement, for now I keep ignoring these hints for the most of my code-base.

Saturday, 27 October 2018

Hackathon lessons

I have been on a hackathon last night, I haven't done this for over a decade. There were a few things I learned:
  1. I am a bad UI engineer, but not terribly bad. Just simply bad. I can live with this :) But also I think I could quickly improve my skills in this direction if I had some time. - meaning at the expense of my other projects
  2. Family duty is difficult to fit together with all-day all-nigh work. Ok, I kind of knew this, but now I know a lot better.

Things that now I remember a lot better (meaning I knew it):
  1. Go prepared. What can be ready ready before the start, must be ready for the start.
  2. Preparation is a team-work. All the team must be prepared. Everyone must have the tools installed on their laptops, they must be tested and working fine, all permissions need to be acquired (like github team membership)
    Also it is better to come to agreement on what tools should be used or preferred.
  3. A hackathon is not about building an application. For that it would be amazingly bad. It is about building an idea, the purpose of everything that you do on the hackathon is to demonstrate that idea.
  4. Perfection is a total enemy in this respect. Whatever works, is fine. Maybe even if it looks crappy. It will be scraped anyway.
    If we are able to demonstrate the idea, someone will have enough time to do this right.
    If we don't not only the code will go to the garbage can, but a potentially valuable idea too.
  5. Be OK with what you have achieved. If you are not, you are discrediting the idea, turning the whole team-effort into a waste.

Wednesday, 8 February 2017

gzip -x

Test subject is a linux LVM volume containing a usual minimal Linux root filesystem, I want to move it to another host. LVM does not give any special tool for moving volumes, so one has to use the "standard" unix commands, this one may be a good candidate:

dd if=/dev/vg/lv-1 | ssh other-server "dd of=/dev/vg/lv-1"

Now of course this is simple, but unfortunately not that good, because if this volume is 2 GB, then we generate 2 GB network traffic, we can try with compression:

dd if=/dev/vg/lv-1 | gzip -N | ssh other-server " gzip -d | dd of=/dev/vg/lv-1"


And now the question is: what should that N be? I have always used 9, but is it that much better than 8?

Centos 7 root filesystem

















A blank ext4 filesystem

Now this is the case, where the compression is quick yet I am not satisfied. The problem is that dd does not have a way to skip empty blocks, those too will be transferred. Can gzip help?






That nice drop between 3 and four is interesting. But is it worth waiting 3 seconds to save 7MB transfer? If it does, then your network sucks. Most of the time gzip -1 will do just enough.



Conclusion

More than 5 does not improve significantly on the compression, but comes with a great cost.

gzip -1 may be the best option when we know that the filesystem is mostly empty.

But in any case, an optimization should look for something better than gzip :)

Sunday, 26 April 2015

Just a little VM tuning: Memory and CPU saving with KVM + KSM

This topic will be somewhat unusal from a java junkie like me, but hopefully interesting for those who are interested in cloud computing and virtualization. To make it easier to understand for everyone, I will start from far-far away, please just skip ahead if you feel like this is nothing new for you, there may be some interesting pieces of information later on.

The basics

This may not be something new for you, feel free to skip ahead to the hypothesis if you know Linux and virtual memory handling.

Virtual Memory


Modern computers break up the memory into pages. When your program reads or writes a memory address, that translates to a page and through the paging table to a physical address.

This is the so called virtual memory and allows swapping, the OS can just swap out some pages from memory to a larger and cheaper storage (typically a disk). When a page is referenced that is not in the memory, the hardware generates an interrupt and the OS takes over, loads the memory page and gives back control to the program. But not only that is possible...

Linux have a small module built in called Kernel Samepage Merging or shortly KSM This module was actually written by the same guy who wrote KVM, and very likely with KVM in mind, but any other system can benefit.
I'd recommend to read the doc in the kernel documentation, but this is what it does in a nutshell:
  1. Periodically checks memory pages
  2. If two identical pages are found, then they are merged and marked with COW (copy on write) -this is because KSM has no idea what the page is used for, it just merges whatever it finds
So if you have two VM's, both running the same OS, then most pages of the kernel and programs can be shared between the two VM's and they will never know. This can save quite some memory and allows big memory overcommit in virtualized environments, if you accept the price:
  1. KSM takes CPU time. If you have a lot of memory, then it will take a lot of CPU time.
  2. Basically it just does not know when to stop, it just keeps running, so additional software is used to manage it. Like ksmtuned.

Cache


While CPU has become faster and faster until the second half of the 2000's, the memory speed did not really keep up with it and CPU's started to use ever growing cache. The cache is in the CPU, it is very quick, but it's size is still limited, even Xeon CPUs have 10 MB of cache, typical desktop CPU's have 1-2 MB.

The hypothesis

Since the cache is small, switching to another VM in a virtualized environment should cause a little performance loss, since the cached pages of the kernel in the VM1 need to be replaced with the actually identical pages of VM2.
The second part of the idea is that KSM could help here by eliminating that performance loss. When pages are shared between the operating systems of the VM's, then a cache miss is less likely after another VM takes the CPU time.
Therefore once pages are merged with KSM and KSMD is turned off, switching between different VM's will be less expensive and respone times improve.

KSM could not only be a memory-saver, but also a CPU-saver.

Test

To test the idea, I prepared 12 web server VM's and one load-balancer. All of them run fedora 20 operating system. The web servers run apache httpd, the load-ballancer runs HAProxy, with more or less default settings. Each VM have 256 MB RAM and a single CPU.

The test host is an Intel NUC D34010WYK with Core i3 CPU (important factors for the test: hyperthreading is enabled, cache size is 3MB) and 2x 8 GB DDR3-1600 RAM.

Nice little box, they could have called it Intel NUKE :)

To generate load, I use the simple apache benchmark (ab) command line utility from my laptop. It is not really relevant, my laptop is a wreck, perfect motivation to speed-optimize software.
Load command:
ab -n 100000 -c 8 http://192.168.1.104/icons/poweredby.png


(This is the small "Powered by Fedora" banner)

Results





Comments, conclusions

The results with VM numbers > 4 seem to prove the theory, but I was surprised to see the performance loss when the number of VM's was less than 4. I do not have an explaination for that yet.
I suspect that the fall of the no-ksm (blue) curve shows the increase of cache misses, it flats out after 10 VM's, basically by then cache misses are becoming so frequent that they can not be a lot more frequent.

Basically on each CPU and memory you will get different values, also different OS'es and programs will generate different values, the intersection may be somewhere else, but the form of the curves should be similar.

TODO

I think it would be interesting to repeat the test with hyperthreading turned off and see how the curve changes.