When it comes to monitoring we like to send out a notification as soon as the problem occurs. What’s better than a system that reports a problem the very moment it appears, right? However, we are aware that alerting alone is not always enough. To identify the problem’s root cause it is often necessary to have more information about a service besides its availability state. To meet this claim, most of the monitoring plugins used with Icinga 2 can return performance data. Additionally, the output contains a string showing one or more performance data metrics, for example the time a host or service check took to execute, the number of bytes being transferred, or the free disk space. With the new InfluxDBWriter feature of Icinga 2.5 it’s possible to automatically store this data in an InfluxDB database – without detour. In order to achieve that, the corresponding Icinga 2 object communicates with the native InfluxDB HTTP API.

There are several different performance data metrics that you can collect with a check plugin. For example, let’s have a look at the check_http plugin:

root@localhost:~# ./check_http -H exchange.icinga.com -w 10 -c 20 -f follow
HTTP OK: HTTP/1.1 200 OK - 85997 bytes in 0.293 second response time |time=0.293414s;10.000000;20.000000;0.000000 size=85997B;;;0

First of all, the output confirms that the website is available, therefore the HTTP service is OK. The response time for the HTTP request was 0.293414 seconds, and during the check about 86 kilobytes were transferred. In the performance data (separated by the pipe from the normal, human-readable output) you can also see the defined warning and critical thresholds (-w 10 and -c 20). The performance data format supports not only keys with values and thresholds, but also minimum and maximum values. Icinga Web 2, for example, displays metrics with maximums as pie charts.

The main goal of collecting metrics is to store them for long term usage and to create graphs to debug problems or identify trends. This is where InfluxDB comes into play. InfluxDB is a database written in the Go programming language. Its purpose is to store time series data. It has no pre-defined schema design and supports thousands of I/O write operations per second. You can query the stored data with an SQL-like query language, either on the command line, in a web interface, or directly via an HTTP API.

We’re happy and proud to announce the brand-new InfluxDBWriter feature of Icinga 2.5. It sends the performance data directly to an InfluxDB database. This is how to enable and how to use it:

Getting started

Let’s assume, you have already set up InfluxDB and configured a database for Icinga 2, a user, and a password. (If not, have a look at the documentation.) For your convenience, here’s how I created my database with the influx tool on the command line:

root@localhost:~# influx
Connected to http://localhost:8086 version 0.13.0
InfluxDB shell version: 0.13.0
> create database icinga2;
> create user icinga2 with password ‘supersecret’;
> grant all on icinga2 to icinga2;

To enable the feature in Icinga 2 itself, type the following command:

icinga2 feature enable influxdb

After enabling the feature, you  may want to adjust some settings to get it up and running correctly. The configuration for the  InfluxDBWriter object happens in the file /etc/icinga2/features-enabled/influxdb.conf:

object InfluxdbWriter "influxdb" {
host = "127.0.0.1"
port = 8086
database = "icinga2"
username = "icinga"
password = "supersecret"

host_template = {
measurement = “$host.check_command$”
tags = {
hostname = “$host.name$”
}
}

service_template = {
measurement = “$service.check_command$”
tags = {
hostname = “$host.name$”
service = “$service.name$”
}
}

}

This is a basic configuration provided by a default Icinga 2 installation. It will forward all your metrics to a defined database. The included host and service templates define a storage; the measurement represents a key by which metrics are grouped. With tags certain measurements of certain hosts or services are identified. You can also enable SSL encrypted connections, change flush intervals or enable sending of thresholds and metadata. Have a look at the documentation for a full list of all InfluxDBWriter settings. Don’t forget to restart Icinga 2 after saving your changes.

Thresholds

When check plugins provide thresholds, you can store them in your time series database as well. Visualising thresholds helps you to understand when a certain service or host failed and recovered. It’s also possible to detect certain patterns. In order to store the thresholds in your InfluxDB database, you need to enable the feature:

enable_send_thresholds = true

Metadata

Icinga 2 can collect metadata for hosts and services, for example, downtimes, states, acknowledged problems, latency, etc. You can use this information to identify slow checks or bottlenecks in your monitoring setup. If you combine the metadata with the threshold settings, a single graph can show you when a service has reached a warning or critical state, if there was a downtime, and if someone has acknowledged the problem. To enable the metadata collection, add this to your configuration:

enable_send_metadata = true

Visualisation

Icinga2 with InfluxDBWhen your data finally arrives, it’s time for visualisation. We find that Grafana is one of the best tools to generate graphs and dashboards. In combination with our GraphiteWriter feature we already made great experiences with it. While Grafana also supports InfluxDB as a backend, it fits perfectly in our ecosystem. Queries can be created easily with click actions and the appearance of graphs can be changed in many ways. To make it easier to start with, we created a sample dashboard that you can use. Download the “Icinga2 with InfluxDB” dashboard on Grafana.net. In addition to the up/down state of a host and line graphs, it also includes thresholds and metadata like downtimes and can be easily extended to fit your needs. While your Icinga2 configuration grows and becomes more complex, so will your dashboard.