Central log management has always been a topic for almost every sysadmin. In the past we used a central syslog server to collect all logs from clients and store them in plain text files. Instead of searching the logs on 10 web servers, the sysadmin had to run just a single grep command on one machine. When someone managed to hack into your server, he probably wasn’t fast or clever enough to disable the remote logging. He did what he did, but your logs were save for later analysis. In an ideal world even network hardware would send their logs to this central syslog instance. Well, in an ideal world the “syslog” format would be the same on each device. In an ideal world developers would use reasonable and standardised timestamp formats. Instead, we have to write regexes for the umpteenth time to parse this stuff. Sigh.

Logstash came up a couple of years ago as a project from the developer Jordan Sissel. In 2013 he was hired by Elastic and since then Logstash is actively maintained by the company but remains open source. Logstash is a data processing pipeline. It is capable of opening ports to receive logs in various formats or collect them actively in different ways. For example, logs could come from a remote syslog daemon to a tcp port opened by Logstash. But Logstash could also read files or even events from a twitter stream. The different methods to gather logs are called input plugins and this is the first step in the Logstash pipeline.

Every received event can be processed to a filter. Filters are basically designed to parse the logs and, if necessary, enrich them with additional information. After the filtering process, a log event is split into separate fields where each field holds a different part of the log event.

This Apache log line:

192.168.1.10 – guest [04/Dec/2013:08:54:23 +0100] "POST /icinga-web/web/api/jsonHTTP/1.1" 200 788 "https://demo.icinga.com/icinga-web/modules/web/portal" "Mozilla/5.0 (X11; Linux x86_64; rv:22.0)"

after filtering, could result in this:

"http_clientip" : "192.168.1.10",
"http_ident" : "-",
"http_auth" : "guest",
"timestamp" : "04/Dec/2013:08:54:23 +0100",
"http_verb" : "POST",
"http_request" : "/icinga-web/web/api/json",
"http_httpversion" : "1.1",
"http_response" : "200",
"http_bytes" : "788",
"http_referrer" : "https://demo.icinga.com/icinga-web/...",
"http_agent" : "Mozilla/5.0 (X11; Linux x86_64; rv:22.0)"

There are plenty filter plugins. You can use them to parse logs, timestamps, resolve IPs or domains and many other things. Parsing logs makes it easier for us to search them afterwards. To store the events for long term, usually Elasticsearch is used. Elasticsearch is basically a database with a very powerful HTTP API. When everything is split into separate fields, searching the logs becomes very easy. The best way to do this is Kibana, a web interface that runs queries against Elasticsearch. Even with tons of logs, it is fast as hell. Parsed logs, stored in indexes and searchable through a web interface. This is the benefit of having the Elastic Stack instead of the classic syslog to syslog setup.

Back to filtering. If a log line reaches Logstash and it needs to be parsed, there a several filter plugins available to do that. One of the most used is the grok filter. Instead of writing your own regexes to parse the logs, grok provides predefined patterns that you can use. Behind the scenes grok uses regex anyways, but we don’t need to deal with it. There are predefined patterns for almost everything. And if there is something missing, you can use existing patterns and combine them for your own log format.

grok {
  match => { "message" => "%{COMBINEDAPACHELOG}" }
}

Many users use Logstash to collect and process logs produced by Icinga 2. This is helpful when debugging an issue or just to keep an eye on what the monitoring is doing. Icinga 2 has a pretty simple but powerful log format. It logs messages together with a severity and a facility. This makes it pretty easy to find certain events. Sometimes logs have multiple lines, this improves the readability for configuration errors or other misbehaviours. The startup log captures everything before the daemon is completely up. If you have misconfigured your Icinga 2, you will find very detailed information about what is wrong there. The debuglog logs all running check commands, MySQL queries and so many other things. The amount of events logged is so big, having them searchable is a big benefit. All in all, the logging mechanisms of Icinga 2 are pretty good and verbose enough to get an insight on what the daemon is doing. But, instead of letting everyone build their own grok patterns, we decided to provide official ones.

You can find them in our logstash-grok-patterns repository and it is pretty easy to install them. The grok patterns included in Logstash can be extended with custom patterns. You just have to tell the grok filter where to find them:

mkdir /etc/logstash/patterns
cd /etc/logstash/patterns
git clone https://github.com/Icinga/logstash-grok-pattern.git icinga
grok {
  patterns_dir => ["/etc/logstash/patterns/icinga"]
  ...
}

For example, you can use the file input of Logstash to read the main log. It doesn’t matter what input you use, this is just an example. Here, you have to take care about multiline logs. The codec accepts multiple parameters do define how to identify multiline events.

input {
  file {
  path => "/var/log/icinga2/icinga2.log"
  type => "icinga.main"
  codec => multiline {
    pattern => "^\["
    negate => true
    what => previous
    auto_flush_interval => 2
    }
  }
}

After collection, the logs have to be parsed. This is where our grok patterns come into play. By default the patterns will split logs into three fields: icinga.main.severityicinga.main.facility and icinga.main.message. For debug and startup logs, the same fields with slightly different names are created.

filter {
  if [type] == "icinga.main" {
    grok {
      patterns_dir   => ["/etc/logstash/patterns/icinga"]
      match          => ["message", "%{ICINGA_MAIN}"]
      remove_field   => ["message"]
      add_tag        => ["filter.grok.icinga.main"]
      tag_on_failure => ["_grokparsefailure", "filter.icinga.main.grok.failure"]
    }

    date {
      match          => ["icinga.main.timestamp", "yyyy-MM-dd HH:mm:ss Z"]
      target         => "@timestamp"
      remove_field   => ["icinga.main.timestamp"]
      tag_on_failure => ["_dateparsefailur", "filter.icinga.debug.date.failure"]
    }
  }
}

I use additionally the date filter to move the timestamp parsed from the log line into the field @timestamp. I do this because Kibana sorts logs by this field by default. Now you only need to send the parsed logs to Elasticsearch:

output {
  elasticsearch { }
}

Once the logs are shipped to Elsaticsearch, you can start searching in them with Kibana. These are some sample queries you could use:

type:icinga.startup
type:icinga.main AND icinga.main.severity:warning
type:icinga.debug AND icinga.debug.facility:IdoMysqlConnection
type:icinga.debug AND icinga.debug.facility:IdoMysqlConnection AND NOT icinga.debug.severity:information