Do Nagios NCPA memory stats match the output of the Linux utility free?

Many admins like sanity checks when investigating new tools. We sometimes hear the objection that NCPA memory stats don’t match the output of the Linux utility free— a statement that happens (wonderfully) to be both true and not true at the same time. The discrepancies mostly have to do with reporting units, conversions, and how everything other than total memory is determined.

 

Methodology

On a Linux host we are monitoring, with the NCPA agent installed, we’ll run free.  Then, from our XI box, we’ll run the NCPA memory check both as a regular check, and also as a manual check from the command line. Then compare.

 

Total Memory

Let’s start with how NCPA memory metrics match the output of free: total memory. Running free on a test box, we get:

[root@localhost memory]# free

     Total      used       free       shared   buffers   cached
Mem: 8193024    6984832    1208192    0        202888    974112
-/+ buffers/cache:5807832  2385192
Swap: 262136     0         262136

It is important to know that free by default returns memory stats in kibibytes.  Yes, it is true that if you run man free on some distributions, it will say memory stats are given in kilobytes, but that is an old version of the man page.

NCPA by default returns memory stats in gibibytes, so in the XI interface after running the NCPA Wizard against the host, we are going to see a result like this:

 

So, how do we compare total memory between free and NCPA output in the XI interface?

The simplest thing to do is go to your browser and search up a converter BUT be sure to specifiy kibibytes and gibibytes for units. I only point this out because I used incorrect units at least twice.

 

Alternately, you can run check_ncpa.py from the command line and specify output in kibibytes like this:

[root@centos7x64 ~]# /usr/local/nagios/libexec/check_ncpa.py -H
192.168.3.33 -t 'a' -P 5693 -M memory/virtual -u Ki -w 80 -c 90

where the -u flag specificies units in Ki for kibibytes.

We get

OK: Used memory was 67.40 % (Available: 2670920.00 KiB, Total:
8193024.00 KiB, Free: 1202984.00 KiB, Used: 5148448.00 KiB) |
'available'=2670920.00KiB;80;90; 'total'=8193024.00KiB;80;90;
'free'=1202984.00KiB;80;90; 'used'=5148448.00KiB;80;90;

But what about the free/used/available metrics?

I will concede the point that on these metrics, free and NCPA do not entirely agree, but there are simple reasons. The “free” memory value between the two measures is only a little different, and the difference is at least partially attributable to the small amount of memory load from NCPA checking memory.

free and NCPA calculate memory metrics differently. Why? That’s an interesting rabbit hole to go down, but it suffices to say NCPA uses psutil and available memory is “the memory that can be given instantly to processes without the system going into swap.”

That’s a handy metric. The NCPA percentage memory used calculation is ((total – available) / 100), which gives admins a solid idea of how host memory is performing.

The very clever will notice that for NCPA none of (used + free) or (used + available) or (used + free + available) sum to total memory in our example. Again the psutil doc will be helpful here. Basically, they are not meant to sum.

 

Conclusion

Admins applying sanity checks to their NCPA results may indeed initially question the sanity of NCPA output. With a solid understanding of the units of measure in question, as well as what is actually being measured, admins can see that NCPA memory stats check out.