Nagios solves alert fatigue while providing data center in-depth disk, CPU, RAM, and bandwidth information
MCS has been working as a reseller for Nagios software from 2011 through today. During that time, they have helped their customers with licensing the full suite of products offered by Nagios. In addition, they offer complete Nagios integration services. The team at MCS has worked with enterprise customers across the US to help plan, implement, migrate, customize, and train on Nagios projects of all sizes.
In 2006, MCS started a small web and email service to help support our existing IT contract customers. This made their company into a hosting service provider. As time went on, they added servers and service offerings to their customers. As MCS product offerings became more diverse, their ability to keep tabs on availability became more difficult. They needed a way to automatically monitor their growing data center for outages.
By 2010, MCS service offerings had exploded and their data center now offered a wide range of services on many different platforms. They also needed to go beyond basic up/down server monitoring and wanted verbose information whether or not their services were performing as intended. Since their clients needed access to their data 24/7, they needed a way to alert their technicians whenever there is a major outage. This meant on call techs needed to be alerted via their cell phones. They used a SAN for all production servers and the monitoring software would be running from the SAN as well. In the event of an issue on their SAN our technicians still needed to be notified somehow.
MCS decided to install a Nagios Core server. This solution offered enterprise functionality and they liked the open source community driven nature of the platform. This solution was the first time they had automated visibility into the over service health of their data center. While Nagios Core offered the end result we wanted, they found that managing configuration was cumbersome. MCS switched over to Nagios XI from their core installation. The migration process was straightforward and easy. Once all of their service checks were migrated they also created custom checks. MCS also took advantage of the multi-tenancy features of Nagios to create user accounts for key customers so they could login to our Nagios XI server and only see their servers and services.
The Bottom Line
By enabling customers the ability to see their statistics, they happier with their service and were better able to keep tabs on their disk, CPU, RAM, and bandwidth usage and were more likely to request an upgrade before service was interrupted. Notifications are setup to be tiered based on severity to reduce alarm fatigue. Emails are sent for non-service impacting alerts while production issues are alerted via text message.