We use Nagios XI for server monitoring.
Lead Solution Consultant at a tech services company with 51-200 employees
Scalable product with an easy setup process
Pros and Cons
- "It is an open-source platform with valuable features for performance and stability."
- "The product's stability could be even better."
What is our primary use case?
What is most valuable?
It is an open-source platform with valuable features for performance and stability.
What needs improvement?
The product's stability could be even better.
For how long have I used the solution?
We have been using Nagios XI for five years.
Buyer's Guide
Nagios XI
November 2024
Learn what your peers think about Nagios XI. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,636 professionals have used our research since 2012.
What do I think about the stability of the solution?
I rate the product's stability a nine out of ten. There is room for improvement.
What do I think about the scalability of the solution?
I rate the product's scalability a nine out of ten.
How was the initial setup?
The initial setup is easy. It requires a team of ten engineers to execute the process.
What about the implementation team?
Our employees implement the product with the help of third-party vendors.
What other advice do I have?
I rate Nagios XI a nine out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
System Administrator at a hospitality company
Unlimited Insight Into Multiple Infrastructures And You Can Customize It With Basic Scripting Skills
Pros and Cons
- "You want to monitor a specific metric that nobody else has? You can do it even with the most basic of scripting skills, and you can always share it with the vast community of Nagios Exchange."
- "The PNP4Nagios plugin not working easily with XI is an issue for me, because some open source monitoring plugins do not work out of the box. But in the end, you learn to live with it."
What is most valuable?
The main characteristic I adore is the open source character of it. You want to monitor a specific metric that nobody else has? You can do it even with the most basic of scripting skills, and you can always share it with the vast community of Nagios Exchange.
How has it helped my organization?
Like any monitoring tool, it gave me insight into multiple infrastructures I've been a part of, without any limitation (due to the open-source character that I referred to above).
What needs improvement?
It's more what I personally don't like, rather than what areas need improvement. For example, the PNP4Nagios plugin not working easily with XI is an issue for me, because some open source monitoring plugins do not work out of the box. But in the end, you learn to live with it.
For how long have I used the solution?
I've been using Nagios since about 2005. I've seen the development path through the open-source version (and some other forks of it like Icinga and OMD) but for the last four years I've been entirely using the XI branch.
What do I think about the stability of the solution?
On older versions I had some minor issues. Currently, to be honest, it is as stable as I could hope for a monitoring tool.
What do I think about the scalability of the solution?
Up to now, the infrastructures I've been a part of were not so large, up to 200 hosts and 1300 services. Even for XI which uses MySQL on the back end, a host with 8GB RAM and four vCPUs is adequate.
How are customer service and technical support?
With the open source forks, the community is vast and so is the knowledge around the product. Because of this, even though I have a valid commercial support bundle, I have never had the need to use it.
Which solution did I use previously and why did I switch?
No, I started with Nagios. I've used other apps also like Microsoft SCOM (which is not very good), Zabbix (which is very decent), Tivoli (which is also not very good), HP OpenView (which is vast and requires almost a duplicate infrastructure to run to its full extent), Icinga (a very good clone), Centreon (haven't used it much but it seems solid enough), but I've always ended up using Nagios.
How was the initial setup?
For the latest versions, for me, it is pretty straightforward.
What's my experience with pricing, setup cost, and licensing?
For the cost of the commercial product and support, and taking into account the open source characteristics of it, I believe it is difficult to a better value. Yes, it needs some time to configure and address its issues, but seriously, which monitoring solution does not?
Which other solutions did I evaluate?
Before going to Nagios XI (commercial, meaning with support), because of the relationship my company had with Microsoft, I evaluated also SCOM. As with Nagios, I went through the whole installation and configuration process. Because of my previous knowledge, I directly compared it with Nagios, and the latter won, hands down.
What other advice do I have?
Be prepared to put some time into it and research it appropriately. If there is an option for consulting services through the support channel, don't be afraid to use it.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Nagios XI
November 2024
Learn what your peers think about Nagios XI. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,636 professionals have used our research since 2012.
Systems Engineer at a tech services company with 501-1,000 employees
It has somewhat helped improved the workflow and technical processes with regards to response times. A better multi-tenant environment would be ideal.
What is most valuable?
It has a wide variety of plugins in existence, but when there's no plugin available it is quite simple to integrate one's own programs and scripts.
How has it helped my organization?
It has somewhat helped improved the workflow and technical processes with regards to response times, and identify the frequent point of failures so as to architect alternative solutions. Plus all the stuff about SLA is an area that for me that is hard to quantify, but there's a team of accountant types who are dedicated to pulling numbers out of their hats.
What needs improvement?
It would be nice to have a better, or alternative, dashboard, à la Thruk, to see business process groupings, rather than just host and services. Tis might give a better visual representation of how the company is performing, and the availability of mission critical services etc.
A better multi-tenant environment would be ideal where certain users have limited visibility instead of just limited functionality.
For how long have I used the solution?
I've been using it for about six or seven years. We currently have v3.4.1 in production and are currently developing and testing v4.1.1 prior to implementation. Currently, we use various plug-ins including NRPE, NSCA, NagiosQL 3.2.0, PNP4Nagios 0.6.24, NSclient++ 0.3.9, and nagios-plugins 2.1.1.
What was my experience with deployment of the solution?
We have had no issues with the deployment.
What do I think about the stability of the solution?
We're still experiencing stability issues with regards to collecting performance graphs.
What do I think about the scalability of the solution?
It's been able to scale for our needs.
Which solution did I use previously and why did I switch?
We migrated away from being Windows centric. Their prodcut, although quite powerful, it was Windows centric and as our fleet of Linux servers increased an alternative was needed to suport multiple operating systems.
How was the initial setup?
The installation and configuration was straightforward and not complex, but rather time consuming as there was no easy method to centrally deploy plugins or agents to hundreds of remote clients. The only complexity that we encountered was in the network configurations for our firewalls and routing.
What about the implementation team?
All work is done in-house, and over the years it seems easier to Google for a solution to a given problem, but often enough a problem will crop up where no-one has yet found a solution. You need to learn to take notes and document everything that you. This may seem time consuming, but it will certainly save a lot more time and work in the future.
What other advice do I have?
It gets the job done, but there's a lot of room for improvement. Make sure that you clearly identify which are the mission critical services and which aren't so as to avoid cluttering the dashboard and overwhelming IT staff with too much information.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
IP/MPLS Engineer at a comms service provider with 1,001-5,000 employees
It's helpful for seeing where we're having problems with the network, but the display could be more intuitive
Pros and Cons
- "The installation is no problem. I've installed Nagios several times."
- "The way Nagios displays information isn't easy for a new user to understand. It's not intuitive enough. You need to read some tutorials or be trained to understand what it's displaying. Also, I think it needs more features to improve network visibility because there are some things you can't detect."
What is our primary use case?
We use Nagios XI for monitoring and seeing where we're having problems with the network.
What needs improvement?
The way Nagios displays information isn't easy for a new user to understand. It's not intuitive enough. You need to read some tutorials or be trained to understand what it's displaying. Also, I think it needs more features to improve network visibility because there are some things you can't detect.
For how long have I used the solution?
I've been using Nagios for more than five years.
What do I think about the stability of the solution?
Nagios is stable.
What do I think about the scalability of the solution?
Nagios has good scalability.
How was the initial setup?
The installation is no problem. I've installed Nagios several times.
What other advice do I have?
I rate Nagios XI six out of 10. It's one of several network monitoring systems we have, and it has some features we can't get from other platforms.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Engineer at a tech vendor with 51-200 employees
Only Monitoring Tool You will Need
Why is OMD a Better Choice than Zabbix or Zenoss
I spend about 3 weeks vetting through 20+ open source monitoring solutions and at the end of the process, the choices had boiled down to few major ones - OMD (best combination of open source plug-ins put together for Nagios), Zabbix, and Zenoss.
The main components of OMD are Check_MK, PNP4Nagios, Nagvis, and of course Nagios. Among these projects, Check_MK is the core of OMD that makes Nagios easy to configure, easy to scale, and mashed together all the other popular Nagios plug-ins into one unified user interface. Thus the following comparisons are done using Check_MK as the keyword, but I will also cover how other plug-ins makes OMD project stand out from the competitions.
Trend
Check_MK vs Zabbix vs Zenoss Core Trend
A quick Google Trend search will tell you that Check_MK is up and rising. Together with the Nagios’s community size, you can certainly find custom monitoring plug-ins created by community members and save yourself time from reinventing the wheel.
Project Health
Before you pick any open source tool for enterprise projects, you want to make sure that their code is not stale and the community is vibrant for the years to come. Active community and frequent code updates ensure your questions get answered and fast bug fixes. Free service from Ohloh will give you an overview of those aspects on open source projects. The following comparison charts are created from Ohloh.
Number of Code Commits Made by Each Project
Check_MK is a clear winner in this chart. It tells you that Check_MK is constantly making more improvement than the other 2 projects.
Number of Contributor of Each Project
In this chart, Check_MK’s contributor is increasing and will soon surpass Zabbix. And don’t forget it is standing on giant’s shoulder, the largest monitoring community - Nagios.
User Reviews
Don’t just listen to me. Here is one of the blog post that talks about why moving away from Zabbix to Check_MK.
Moving From Zabbix to Check_MK
Architecture Design Advantage
OMD
- What is OMD:
- OMD is a combination of best practices on how Nagios should be setup and integrated. It has incorporated all of the most popular 3rd party Nagios plug-ins in single easy to maintain, easy to install, and easy to upgrade package. Once you have your Linux server running, installing and have your OMD monitoring suite running only takes about 10 minutes with one command.
Administrators can really save time on not having to compile Nagios, or other plug-ins, trying to integrate and mess with configurations between plug-ins and Nagios. It really is a no-brainer to setup and start with.
- Why use OMD instead of other flavors of Nagios combos, e.g. ?
- Founded July, 2010 by a group of well known Nagios community members and Nagios addon developers
- e.g. NagVis, Check_MK, PNP4Nagios, and others
Check_MK
What is Check_MK
Check_MK is an extension to the Nagios monitoring system that allows creating rule-based configuration using Python and offloading work from the Nagios core to make it scale better, allowing more systems to be monitored from a single Nagios server.
There are 2 significant modules that Check_MK uses to improve Nagios performance. One is called Livestatus and the other is called Livecheck.
Livestatus
Before Livestatus ☹
- Monitoring results are stores to a single file
status.dat
. It becomes a bottleneck on CPU and IO for larger installation. -
status
file status is not realtime, default is to update every 10 seconds. - NDOUtils utilize databases for monitoring results (MySQL or PostgreSQL), but still have some severe shortcomings.
- NDOUtils has complex setup.
- NDOUtils needs a databases to be administered, a rapidly growing one.
- NDOUtils eats up significant portion of your CPU resources just to keep the database up to date.
- Some similar projects that still uses NDOUtils:
- Regular housekeeping of the database can hang your Nagios for minutes or even an hour once a day.
After Livestatus ☺
- Livestatus also uses Nagios Event Broker API like NDO, but it does not actively write out data. Instead, it opens a socket by which data can be retrieved on demand.
- Livestatus imposes no measurable burden on CPU at all.
- Livestatus produces zero disk IO when querying status data.
- No configuration is needed. No database is needed. No administration is necessary.
- Livestatus scales well to large installation even beyond 50,000 services.
- Livestatus give you access to Nagios-specific data that is not available to any other methods.
Livecheck
Before Nagios 4.0, Even a perfectly tuned system rarely manages to execute more then a few thousand checks per minute.
What make things worse: while your system is getting larger, the maximum check rate is even getting worse. The more hosts and services your system manages, the less checks per second it will be able to perform. Why?
Existing Problems of Nagios (before Nagios 4.0) ☹
- Each new check creates a new fork
- The new process prepare everything needed to execute the check plug-in, then fork the second time when ready
- Forking is costly even for highly optimized Linux kernel
- The forking of Nagios core (before v.4.0) does not scale on multiple CPUs (single thread process).
- you can well run into a situation where your powerful 16-CPU server is limited to 100 Checks per second while most of its CPU cores are idle most of the time.
How does Livecheck solve those bottlenecks ☺
- It uses a number of helper processes. The core communicate with each helper through a Unix socket (that does not appear in file system).
- Only a small helper program is forked instead of the complete Nagios monitoring core.
- The helper forks distribute over all available CPUs instead of single CPU.
- The total process VM size of Livecheck is about 100KB only!
- Inline implementation of check_icmp (PING tests). To give you an idea of how much improvement this has done, here is a benchmark example using dual core 2800 MHz CPU:
- Before inline check_icmp: 300 ICMP checks per second.
- After inline check_icmp: 2600 ICMP checks per second. The checks generated an ICMP traffic of 45Mb/s.
Nagios Monitoring Core working with the best plug-ins (Check_MK, NagVis, PNP4Nagios and etc)
Multisite - An Advance Web Interface for Nagios
Multisite is part of the Check_MK project as a better web UI alternative for Nagios.
A new and innovative GUI for viewing Nagios status information and controlling your monitoring system. It is based on MK Livestatus and aims at replacing the Nagios web GUI (also known as “the CGIs”). Multisite supports distributed monitoring in a very efficient way.
Zero Configuration Files with WATO
This is one of the most brilliant solutions from Check_MK project to tackle the notorious Nagios configuration disaster. Although Nagios is a flexible and powerful monitoring system, having to mess with its multi-level and confusing configuration files scares many people away. Now, there are many web interface plug-ins that try to take a stab at the issue, but WATO is by far the best that simplify the complexity of Nagios configuration while staying very flexible and more flexible by sitting on top of Check_MK.
WATO is a web based administration tool for Check_MK. It allows you to manage your hosts and services to be monitored and perfectly supports Check_MK’s mechanism of inventory to autodetect services to be checked on a host. WATO allows to move a substantial part of the daily workload from the monitoring administrator to his colleagues.
Monitoring Agent for both Linux and MS Windows
Responsive UI for Mobile Client
Powerful Search Function
Visual Meters with Perf-O-Meter
NOC with Dashboards (Thanks to PNP4Nagios & Nagvis)
PNP4Nagios
Nagvis
NagVis is a visualization addon for the well known network managment system Nagios. NagVis can be used to visualize Nagios Data, e.g. to display IT processes like a mail system or a network infrastructure.
Automation and Web Services for Automated Provisioning
Automation is build into Multisite. You can make web service request against Multisite to automate adding new host, enabling new service checks, or embed any of the host/service check web pages into any other websites.
This feature makes it very easy to integrate with Puppet or Chef for automatically adding new servers(hosts) and services to the monitoring system.
24/7 NOC with Flexible Notification
With Check_MK abstracting the original Nagio’s notification scheme, it has become possible to send notifications of any hosts or services to any number of people at any time.
You can even create custom script to send the notification in some creative ways like having the notification be ☎called via a VoIP server to your cell phone and read you the alert message or have the alert be sent to your ✐instant messenger.
Custom Icons
http://mathias-kettner.de/checkmk_devel_multisite_icons.html
Management and Maintenance
Distributed Monitoring
Distributed WATO allows you to manage several monitoring sites through a logically centralized WATO.
- 1200 Check_MK installations
- Centralized status of all 1200 stores per minute
- Using NagVis to show all 1200 stores’ status on the map using the Geomap function
- All stores’ overall status is aggregated through the use of the Business Intelligence function
Backup of Changes
- Automatic Check_MK configuration backup on every change you make
- Easy restoration with the Thunder icon
Upgrade OMD
- OMD upgrade DEMO Version
- OMD copy a site
Business Intelligence
Available from version 1.2.3
Predictive Monitoring
- Smart threshold that detects anomaly from daily operation
- Set warning level based on prediction
Available from version 1.2.3
Monitor Cronjobs
Before
<code>5 0 * * * root /usr/local/bin/backup >/dev/null</code>
After
<code>5 0 * * * root mk-job nightly-backup /usr/local/bin/backup >/dev/null</code>
Available from version 1.2.3
Dive in to the OMD World
I will be sharing how I install OMD, optimized web interface (Multisite), utilized passive checks, implemented 24/7 on call plan, and integrated with automated business processes. I will add link here once they become available.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Wow very nice write up and detail. We are debating to switch off Nagios over to Solarwinds at the moment but I might have Management check this review.
Systems Administrator at a cloud solution provider with 501-1,000 employees
I've used both Nagios and SolarWinds- different enviorments have different preferences
I've used both Nagios and SolarWinds, although I haven't messed with nagios as much as
solarwinds. Solarwinds had some nice features for monitoring and I
learned a lot about it in the short time that I actually worked with it.
Nagios just seems to work and when a server is down, I investigate.
Solarwinds seemed to have more issues but that could have been because
it was running on a server 2003 box and possibly old hardware where as
the company I'm working at now runs nagios on a linux box with some
decent hardware. Again, I haven't delved deep into Nagios and it's
possible that what I'm looking at is just a webpage front end to nagios
that not everybody uses but it's still pretty nice regardless.
Not sure if different environments have different prefferences but the
company I was working at that used solarwinds was an ISP. The company
that uses nagios is a web hosting company. I've also seen a linux admin
at a previous job use nagios so it may be that nagios is more popular
among linux if not servers altogether.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Can we use Solarwinds to monitor linux as well as solaris operating systesm?
Senior Manager of Engineering at a tech services company with 501-1,000 employees
The Reactor helped improved our script automation and self-response.
Valuable Features:
It has a lot of flexibility for customization and a wide range of metrics. Also, it is opensource and has a big community.
Improvements to My Organization:
The Reactor helped improved our script automation and self-response. XI has provided us with top flexibility for heterogeneous systems monitoring.
Room for Improvement:
Nagios needs to improve their incident manager. Currently, it isn't good for ITIL in my opinion. The Reactor is good to go, but XI needs to improve its reporting functionality.
Use of Solution:
We use bothe Nagios XI and Nagios Reactor.
Deployment Issues:
We have had no issues with the deployment.
Stability Issues:
There have been no performance issues.
Scalability Issues:
It's been able to scale for our needs.
Other Advice:
I would advise that you create a lab with Nagios Core and test what you really need. Although it's exciting to use all the products, only a few are really important in your IT structure. When you are confident with scripting and MIBS integration, you can consider expanding it to your Enterprise systems with Nagios XI and some other modules. I would discourage you from using the ticketing system to start with and choose something more dedicated.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Systems / DevOps Engineer with 1,001-5,000 employees
It's easy to customize through scripts. The user interface needs to be improved.
Valuable Features:
In my experience with Nagios, I've found that the most valuable features are the scalability and extensibility through using scripts. It's easy to customize, and Nagios makes it easy to use languages you're already familiar with such as Bash/Python
Improvements to My Organization:
It's helped us improve as we now have the ability to customize the solution. By doing this through scripting, we are now able to monitor every layer of our stack from infrastructure to applications.
Room for Improvement:
I feel that the maturity and user interface needs to be improved. I think this is handled through its integration with OpsView.
Deployment Issues:
We have had no issues with the deployment.
Stability Issues:
There have been no performance issues.
Scalability Issues:
It's been able to scale for our needs.
Initial Setup:
If you plan ahead of time and thoroughly test the final solution you want to implement, it should be straightforward.
Other Advice:
Plan ahead and take your time through staging the installation and take your time testing your customized scripts before doing the production installation.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Nagios XI Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
IT Infrastructure Monitoring Network Monitoring Software Server Monitoring Cloud Monitoring SoftwarePopular Comparisons
Datadog
Zabbix
New Relic
SolarWinds NPM
PRTG Network Monitor
LogicMonitor
Centreon
ServiceNow IT Operations Management
Nagios Core
Auvik Network Management (ANM)
ScienceLogic
BMC TrueSight Operations Management
Icinga
ITRS Geneos
Checkmk
Buyer's Guide
Download our free Nagios XI Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Should we choose Nagios or PRTG?
- Which network monitoring tool is more customizable: Nagios or Zabbix?
- Can PRTG be used to monitor Oracle SOA suite components or should we choose Nagios instead?
- What are pros and cons of Nagios XI vs alternative NPM tools?
- What is the biggest difference between Nagios Core and Nagios XI?
- Any experience with Event & Incident Analytic engines like Moogsoft?
- Windows 10 - what are your main concerns about upgrading?
- When evaluating IT Infrastructure Monitoring, what aspect do you think is the most important to look for?
- What advice would you give to others looking into implementing a mid-market monitoring solution?
- Zabbix vs. Groundwork vs. other IT Infrastructure Monitoring tools
I agree!