Three years ago we’ve released v2.0.0 – today we are happy to bring you Icinga 2 v2.7.0. This release focusses on stability and performance fixes and enhancements with some new additions such as additional metrics or an NSClient++ API check plugin.

 

Performance and Metrics

Lately we’ve been analysing performance bottlenecks in large scale enterprise environments, let’s just say 100k service checks with HA and three level clusters. One of them was somehow leaking memory over time, and we couldn’t really nail it down. Turns out, a slow InfluxDB HTTP API would block the core from processing check results and cluster messages. That honestly hurts, and it had a similar pattern with Graphite and Graylog. We’ve fixed these issues and added asynchronous queues to these features. Our friend Matthias kindly tested the fix in production.

We have also wrapped our head about easier analysis which also includes more metrics from Icinga 2 internals. This includes

  • Work queue logging with calculated trends (as already known from DB IDO)
  • Feature stats including work queue size, rates and more via the REST API at /v1/status
  • Metrics pulled by the “icinga” check into your graphing backend to analyse and correlate performance even better

More details can be found in this extended blog post.

You’ll also notice a new field in the GelfWriter feature for better filtering: “check_command”. The InfluxDB feature also introduces a new configuration attribute called “socket_timeout” which allows to kill hanging TCP connections. Such could occur if InfluxDB is not able to immediately process the flushed metrics buffer.

 

Notifications: Scripts and bugfixes

The example notification scripts for simple mail notifications have been overhauled. New package installations will now introduce rewritten notification scripts with CLI parameter support. We’ve seen that the Director did not have support for “env” in our previous scripts. So we approached Marianne and ask her to push her scripts upstream. She did a marvellous job: We now have beautiful notification scripts with additional features such as Icinga Web 2 URL or setting a “from” address. Make sure to checkout the documentation for the updated example configuration.

Update 2017-08-08: There are configuration updates required for using the new notification scripts. Please checkout the additional documentation notes for 2.7.0.

We’ve learned that large scale environments did run into a race condition with downtimes or disabled notifications on config reload (#4969). This led into unwanted notifications from already executed checks before the downtime was actually activated, or the the modified attribute had been applied. We’ve been testing a fix at a customer in the past months which is now also officially included in v2.7.0.

There also was a long standing issue with persistent comments for acknowledgements which has been resolved by Rune (#4956).

 

NSClients++ API and check_nscp_api plugin

Last year we did a deep dive into NSClient++’s HTTP API. Yet there was a plugin missing which could query the API locally or remote. Jean was looking for a project on his final trainee exam, and so we came up with creating a plugin called “check_nscp_api“. It allows you to query the API endpoints and retrieve runtime metrics from NSClient++, i.e. CPU utilization or Windows event log.

I’ve taken the long road and contributed a documentation patch for all details on the new NSClient++ API. This will help you with finding the right queries, and of course there’s a blog post coming up which covers all the possibilities. Meanwhile check the documentation for check_nscp_api and try it yourself.

 

Configuration enhancements

Icinga 2’s configuration language borrows a lot from common programming languages. If you’ve implemented functions in Icinga 2 already, continue reading. This time we’ve come around the problem that some built-in functions may throw exceptions. An exception immediately stops the function’s execution and triggers a runtime error i.e. during check exection. You can already use the “throw” keyword to do it yourself in user functions. So why not provide “try-catch” to actually deal with them? #5348 solves this.

function get_pdv_by_name(checkable, pdname) {
	var cr = checkable.last_check_result

	if (!cr) {
		return null
	}

	for (pdv in cr.performance_data) {
		if (typeof(pdv) != PerfdataValue) {
			try {
				pdv = parse_performance_data(pdv)
			} except {
				continue
			}
		}

		if (pdv.label == pdname) {
			return pdv
		}
	}
	return null
}

We’ve also enhanced the match() functionality in order to match all or any element in an array (#5263). You can also use Dictionary#values as method with 2.7. Please check the documentation for further examples.

 

Documentation, Logging and more

We’ve changed the Getting Started guide to use the REST API as Icinga Web 2 command transport instead of the old legacy command pipe. You’ll also notice that there is a new chapter “Monitoring Icinga 2” for additional hints to keep your monitoring core safe. The troubleshooting docs have been enhanced too.

We’ve also tackled several log messages to make it easier to troubleshoot why cluster config sync isn’t working, or which features are now started. Several log messages also got a more clear text string.

There are also a couple of patches for ITL CheckCommands and improved documentation. Thanks a lot for sending in so many patches!

We’ve added a GitHub issue template which provides help and hints for all the required details. This helps us developers to understand your problems much more easy. If you are looking into contributing a patch, be it code, documentation or ITL updates, we’ve made sure to guide you inside CONTRIBUTING.md. This also includes a mini Git tutorial to get things going more quickly.

 

Packages and Download

We’ve recently updated our build system and began to work on packages for Debian Stretch. Icinga 2 v2.7.0 is now available on packages.icinga.com. Please note that RHEL and CentOS 5 are EOL and therefore we’ve dropped package support in 2.7. Ensure to upgrade your clients to a current stable release. Note: Old 2.6 RPMs still exist on the mirror, but remain unsupported.

If you are experiencing issues with the recent RHEL 7 Kernel security update, please ensure to raise the stack size limit for the Icinga 2 process. RedHat has released new Kernel packages which solve the issue.

You can grab Icinga 2 v2.7.0 from packages.icinga.com or your favourite distribution’s repository. Before doing so, please note the changes in this release :)

 

Thanks

Thanks to our many community contributors:

Marianne, Michael, Petr, Winfried, Jean-Louisgitmopp, Simon, Marcus, Roland, Carsten, Florian, Andy, Yannick, Hannes, LeeNiflou, moix, Pawel, Andreas, boltronicssaikrishnagaddipati, Thomas, Zachary, Sebastian, Marius, Stephan, Christian, Edgar, TimKálmán, Roman, Mathieu, Georg, Christian, Benedikt, Georg, Patrick :-)

You are doing an awesome job, keep going!

 

More Changes

We are cleaning things up a little with this release, so keep the following changes in mind prior to immediately upgrading your production environment :)

The Changelog is located here.

We’ve removed deprecated features such as the IDO categories notation. Please also note that the “icinga2-classicui-config” package and config files have been deprecated. We’ll remove them in the future 2.8 release. This package just provides the required configuration files, you can still manually configure Classic UI with Icinga 2. “node update-config” is still deprecated and will be removed soon. The next major release will remove the legacy mode setting for Graphite introduced in v2.4.