There are many reports about the core reload/restart taking ages. This mostly happens when you have IDOUtils and a database backend enabled for Icinga Web and/or Reporting. You may ask “How about dropping the database and use something else?”. Well, that’s not really the point. It won’t solve the problem for everyone out there. Even Icinga 2 is not yet production ready to act as a drop-in replacement.

So, what’s the problem at all? The core doesn’t know about config diffs – newly added or deleted objects. When idomod detects a core reload (re)start, it will dump all the config information to the ido socket. ido2db reads from there and pushes the database insert/updates for the configuration objects. This amount of data may get huge in large setups and takes a while being processed.

The configuration dump needs to be finished before any other updates (status, check history) for data integrity reasons (check #1934 for some deeper thoughts). Rewriting the core for config diffs was an idea, but will cost too much resources right now (the configuration format and parsing is one of the major reasons to develop Icinga 2 from scratch).

During Icinga 2 development, we discussed an idomod connector (Compat IDO) and reusing ido2db from Icinga 1.x. That prototyping unveiled these bottlenecks even more, as Icinga 2 is designed for large-scale systems and may generate 100k service checks in  5 minute interval – ido2db did not have fun back there.

We’ve decided to drop that idea (Icinga 2 will add its own ido compatible layer), but the prototyping added 2 nice enhancements for Icinga IDOUtils 1.9:

  • a socket queue (which does not use a kernel message queue, but a thread to proxy the socket data) #3533
  • transactions around large objects (e.g. a service with groups, contacts, dependencies, etc wrapped as single transaction) #3527

Check module/idoutils/config/updates/ido2db.cfg_added_1.8_to_1.9.cfg in Icinga 1.9 for details. These options features are disabled enabled by default (and tagged experimental) not to harm existing installations, but to allow everyone else to test and use them :-)

Known caveats:

  • ido2db requires more CPU and RAM in order to cache and process data (socket queue only)
  • your database must allow transactions for the database user (transactions only)
  • the insert/update performance still depends on your database – database tuning still required

Below is a small comparison of 4k services test config, Debian 6.0.7 VM, 4 Cores, 2GB RAM, MySQL 5.1.66 without tuning. Icinga adds “Eventloop started…” onto logs, but there’s also a dedicated service check in your sample configuration.

Core Startup with pre 1.9, no options enabled (short log):

Apr 15 18:01:32 sol icinga: Icinga 1.9.0 starting... (PID=4699)
Apr 15 18:01:32 sol icinga: Event broker module '/usr/lib/idomod.so' initialized successfully.
Apr 15 18:01:32 sol ido2db: Client connected, data available.
Apr 15 18:04:22 sol icinga: Event loop started...

 

Core Startup with pre 1.9 and both options enabled (short log):

Apr 15 18:07:35 sol icinga: Icinga 1.9.0 starting... (PID=5336)
Apr 15 18:07:35 sol icinga: Event broker module 'IDOMOD' version '1.9.0' from '/usr/lib/idomod.so' initialized successfully.
Apr 15 18:07:35 sol ido2db: Client connected, data available.
Apr 15 18:07:38 sol icinga: Event loop started...

Apr 15 18:07:52 sol ido2db: IDO2DB buffer sizes: left=5946260, right=0

Apr 15 18:10:04 sol ido2db: IDO2DB buffer sizes: left=10586, right=0

Tip: The buffer size output is logged every ~15 seconds if there’s data waiting. From left (queued socket input) to right (output towards db). If there are no more log entries, the queue is idle and data falling through.

Memory and CPU consumption is pretty moderate in exchange of having the core checking hosts/services directly after event loop started :-)

icinga_1.9_ido2db_socket_queue

Please test those options in your setup (git next snapshot or wait til 1.9 on 25.4.2013), and provide feedback to our community support channels! Thanks in advance for helping make Icinga better :-)

Update 4.5.2013: Core release team decided to mark another milestone with 1.9 and set those enhancements the default without any configuration. They’ve been running for months now on our test platforms and we do not want to miss the enhancements. Latest GIT release branch reflects those changes.