Categorieën
Geen categorie

Monitoring filesystem growth with Zabbix

Introduction

Like many people I use Zabbix for monitoring. I love the web GUI to configure stuff and the API to automate its configuration when I need to.

And although Zabbix comes packed with a lot of usable templates, they are more a starting point for your own infrastructure than a 100% ready solution.

Recently I had the need to start monitoring filesystem usage growth so I would get warned in time when a system was nearing its boundaries. It turns that for a really long time Zabbix has a timeleft function just for this occasion. But how, and where to use it?

Well, usually there is already a template available that does filesystem usage numbers. Those keep track on used space and used inodes. Within the template is a LLD, a Low Level Discovery rule. The result of that rule is a list of stuff found. And, together with ‘prototype’ items and triggers it can automatically add items and triggers to your host.

My setup

If you search in the available templates for ‘Linux filesystems’ you will easily find it. One is called ‘linux filesystems by Zabbix agent’ and the other one is ‘linux filesystems by Zabbix agent active’ (for if you are using active instead of passive checks). In the ‘Discovery’ column you can see that it has (in my case) 1 LLD.

If you click on the ‘Discovery’ you will see the list of LLD’s (in my case a single rule) with the following info:

List of discovery rules

Usually you will see four (4) item and trigger prototypes but my list shows 5 of each. Let’s start with the list if item prototypes:

List of item prototypes

As you can probably guess, the first one is the subject of this blog post. Let’s have a close look at it:

My item definition

Item definition to get timeleft information

If we analyse this item we see the following settings:

  • Name: Since this item will expand (because it is part of a LLD!) it is important to add a macro (#FSNAME) to the name. This allows you to distinguish what filesystem this item is talking about later. Also, without it you would try to create multiple items with the same name and Zabbix would raise an error.
  • Type: We are going to perform a calculation, so the item type is ‘calculated’
  • Key: Since we are performing a calculation, this means there is an input variable. That is what is meant here. We are using the input value of vfs.fs.size.timeleft[{#FSNAME},pused]. This is one of the items already gathered in this LLD.
  • Type of information: Since we are performing a calculation the result will be a number. That is why we select ‘Numeric (float) here.
  • Formula: this is where it’s all about. ((((timeleft(//vfs.fs.size[{#FSNAME},pused],7d,95)/60)/60)/24)/30). This means: Calculate how much time is left for this filesystem to become 95% full, based on the last seven days of data. Since the result is in seconds we have to do some divisions to get to a number of month.

My trigger definition

Nice, but does this give us an alert when it becomes time to have a look at it? No, it doesn’t. For that we have to define a trigger. So we create a trigger prototype in the trigger section of the LLD. Mine looks like this:

My trigger prototype definition

Again, let’s take a closer look at the individual settings:

  • Name: Like with the item name we need to add the #FSNAME macro in our descriptive text
  • Severity: For me, I set the severity to high because an alert like this definitely deserves attention!
  • Expression. This is the expression for when to trigger an alert: last(/Linux filesystems by Zabbix agent/vfs.fs.size.timeleft[{#FSNAME},pused])<3. This means: As soon as the last measured value of the item for this host with key vfs.fs.size.timeleft[{#FSNAME},pused] becomes below three (month) I get an alert.

After having configured this all correctly I head over to the ‘lastes data’ section of Zabbix to see how I’m doing:

Listing with latest timeleft data

Few! As you can see I am in the clear for now but one system will need disc size usage growth attention in one and a half (21-3=18 month) time.

If you found this helpfull please reward my work of researching and writing this. Please go to GitHub or Patreon and show your appreciation.