It is used for monitoring services on a bunch of virtual machines.
In terms of the version, we're fairly up to date. We are perhaps not the most up-to-date, but we're fairly current.
It is used for monitoring services on a bunch of virtual machines.
In terms of the version, we're fairly up to date. We are perhaps not the most up-to-date, but we're fairly current.
It provides visibility of the platforms.
It is fairly easy to set up, and we can monitor pretty much everything we want to.
We're using the free version, which limits us in terms of the things that we can do. If we had the paid version, a lot of our issues would probably go away. For example, we can't isolate instances that are being built or updated with the production ones. When they're being built, on Nagios, they're showing in red. It'd be nice to be able to partition those off until they're all green, and then we can bring them into the environment. This is probably because we've got the free version and not the paid version. If we went for the paid version, it would probably allow us to do exactly what we want to or remove the restrictions that we have, but if we are able to isolate instances in the free version, it would make life much easier.
In terms of new features, we're just using it for what it is. We are using what we've got now. We don't have any additional requirements as far as I'm aware.
I have been using this solution for four or five years.
It is fine. There are no concerns there. Our biggest challenge is that we get a lot of timeouts, but that seems to be because of our network setup. There are a whole bunch of spurious events being reported, but they're more timeouts in getting to the Nagios agents.
It seems to be all right at the moment. We don't seem to be having any problems with that. We have upwards of 20 users, and it is being used on a daily basis.
I have not contacted them for a long time.
Nagios is the first one.
From what I heard, it didn't seem difficult to set up. It was quite straightforward.
We're still rolling out and deploying new instances of VMs that we want to monitor. It's an ongoing process.
We deployed it ourselves. Its maintenance is done by one or two people.
We are using the free version.
I would recommend it to others. It does what it is supposed to. It is pretty good.
I would rate it an eight out of 10.
We used Nagios Core to monitor our servers in other countries. Our main server is in Cairo, while we monitor other servers in Germany, which are hosting Jenkins and other web services to make sure that the infrastructure is stable and if anything goes wrong it reports it automatically.
Before using this solution, sometimes Jenkins went down and we didn't know the reason. We eventually discovered that the issue was disc space that exceeded a certain percentage. Now that we have Nagios to monitor the servers, we know if anything goes wrong it can solve the problem before it happens.
The most valuable features to us are the ability to improve memory usage, disc space usage, and the PDU load of each node.
It's not that easy to install the product itself. Also, the UI is a bit hard for regular users to navigate through. In addition, I would appreciate an FNP server for sending emails, which now depends on the resting servers for Nagios Core. If it comes with its own FNP server, it would be much better. Also, if it can be installed in other cores, that would be awesome but right now it only uses Linux.
Alias excavation and configurations from the wall rather than the server itself would be great improvements. Also, general UI enhancements and better UX, user experience.
It's stable because it's a Linux based code, which is very basic. It doesn't have many big features, so it's stable. You can add a node in less than half an hour, I think.
We're only currently using Nagios Core on one to ten servers. In the future, we may add more nodes.
I haven't tried to contact support. I was searching on the support forums, but that was not for me. I tried many solutions from the support forums. One of them is working, but only after a long time.
The initial setup was complex, mainly because it was in Linux and had many packages that we're not used to. I had to install them one by one on the app to configure the complication on the app that was solved to authenticate Nagios on the central app. It comes with regular users in files and in order to authenticate, you have to make a lot of confirmations, using Apache as well as Nagios. This was all very hard, and it took me a week to configure it.
I think deployment took about two weeks at the most. We did the deployment by ourselves. We have two people for deployment and maintenance.
Nagios Core is free to use.
I would rate Nagios Core as seven out of ten because it was hard to configure and the implementation process itself took about two weeks. Also, the UI is not friendly. Other products have features that aren't included in Nagios Core. I think that one was the easiest to restore. Also, Nagios supports only Linux, not A/UX. It can't be installed on the servers. If they supported all of these things, it would be much better.
Everyone ends up using nagios or a
derivative just because... well everyone else does. The size of your org
really matters a lot with what you are doing here as Zabbix might fit
you right or not at all.
Lately I've been setting up nagios with a graphite back end for people.
Then taking advantage of writing your own plugins for nagios to send
data to both systems. You can throw a lot of data at graphite and make
some super pretty graphs if that is what you are after. For example
imagine having all the contents of a vmstat/iostat every X seconds...
for ALL your servers that can be queried with less than a minute
latency. You can do that with nagios+graphite+yourownfixins. ... and
then you show Dev how easy it is to log data into carbon/graphite and
become a super hero.
When you start hoarding this much data you can start asking some really
detailed questions about disk performance, network latencies, system
resources, etc... that before were just guestimates. Now you have the
data and the graphs to back them up.
I'm also a big fan of Pandora FMS but I've never implemented it anywhere professionally and the scope it takes is pretty large.
(I should note, nagios is pretty terrible, it's no better than things we had a decade ago.)
The real truth here is that all the current monitoring systems are pretty terrible given that they are no better than what we had a decade ago. Every good sysadmin group makes them work well enough, but there is a lot of making them work. Great sysadmins go on to combine a couple of them with their own bits to make the system a bit more proactive than reactive, which is what most people expect out of monitoring.
Reactive monitoring is fine for certain companies and certain situations
and it is easily obtainable with nagios, zabbix, home-brew,
stupidspendmoney solution, etc... However reactive monitoring is just
the base point for most, it certainly doesn't handle big problems well,
or have the capacity to predict events slightly before they are
happening. This level of monitoring also doesn't give you much data
after an event to figure out what went wrong.
Great admins go on to add proactive systems monitoring and in some cases
basic logic monitoring. This is what a lot of us do all the time, to
avoid getting paged in the middle of the night, or to know what to pick
up at fry's on the way into the office. Proactive monitors a lot more
things than basic, and it is essentially the level where everyone works
at now, with nagios, etc... That's certainly fine for today and
tomorrow. But it doesn't tell you anything about next quarter, and when
you ask queries about events in the past they are often very basic in
scope.
The other amazingly huge drawback with current monitoring is that if you
want to monitor business or application logic, it is going to be
something you custom fit into whatever monitoring system you have. This
will lead to it being unwieldy and while effective for answering basic
questions like, "What's the impact on sales if we lose the east coast
data center and everything routes through the west?" That's a fine
question but it isn't a question that will get you to the next level,
better than your competitors.
So what's next? I'll tell you where I think we should be going and how I am sort of implementing it at some places.
Predictive monitoring on systems AND business logic, with lots of data,
and very complex questions being answered. This can be done right now
with nagios, graphite and carbon. Nagios fills the monitoring and
alerting needs. Carbon stores lots of numerical data, very fast from a
lot of sources. Finally with Graphite you can start asking really
serious questions like "How did the code push effect overall page
performance time, while one colo site was down? What's the business cost
loss? Where were the bottlenecks in our environment? Server? Disk?
Memory? Network? Code? Traffic?" Once you've constructed one of these
list of questions in graphite you can save it for the future, and not
only monitor it, but because of legacy data kept on so many key points
use it for future predictions.
That said, how do you all that now? Well you throw nagios, graphite and
carbon out there and then you CREATE a whole lot of stuff that is
specific to your org. This is a lot of work, a lot of effort and takes
time and real understanding of the full application and what your end
SLA goals are.
So how do we do all this?
You as an admin do this, by creating custom nagios plugins and data
handlers on your systems and throwing them in to carbon. As an admin you
measure everything, and I mean everything. Think all of the output from
a vmstat and an iostat logged in aggregate one minute chunks on every
single server you have and kept for years.
From the dev site you get the Lead Dev to agree on some key points where
the AppStack should put out some data to carbon. This can be things
like time to login, some balance value, whatever metric you want to
measure. The key here is to have business logic metrics AND system
metrics in the same datastore within Carbon. Now you get to ask question
across both data sets, and you get to ask them frequently and fast. You
are able to easily make predictions about more load impacting the
hardware in what manner, i.e. do we need more spindles, more memory,
etc...
This is what I have been doing with some companies in SV right now. It's
not pretty or fully blown out yet, because it is a big huge problem
and our current monitoring sucks. :D
but it IS doable with current stuff and is quite amazing to know answers to questions that were previously only dreamed about.
What's after that? The pie in the sky next level, would be having an app
box in every app group running in debug mode, receiving less traffic of
course through the load balancers, and loading all that debug data into
carbon. Then you get to ask questions about specific bits of a code
release and performance on your real production environment.
... so those are my initial thoughts. Any comments? :)
Further once you have all this, you can now write nagios plugins to poll
carbon for values on questions you have created and then alert not only
on systems logics and basic app metrics, but real queries that are
complex. Stuff like "How come no one has bought anything off page X in
the last two hours, is it related to these other conditions? Oh. It is.
Create me an alert in nagios so we can be warned when it looks like this
is about to happen again." With much more data across more areas you
can ask and alert on pretty much anything you can imagine. This is how
you make it to next level.
I'm primarily using Nagios Core to monitor infrastructure like servers, virtual machines, and telephone usage like IP-DECT antennas. I don't use all of Nagios Core's data functionality. I only use the monitoring features.
The dashboard and monitoring features could be improved.
We've been using Nagios Core for about five years.
Nagios Core is stable.
The Nagios Core setup is complex, but I can handle it all myself.
I rate Nagios Core seven out of 10. Nagios Core is not easy to use, so I don't recommend it for everyone.
Getting the alerts is the most valuable feature. This way I know when servers are acting up or just plainhosed. It also helps me to know which things need to be recovered and when so I do not have to bother with checking into it immediately.
Before we implemented Nagios, we did not know which servers were up or down until a customer told us. Now, I can see trends over time and it gives me perspective of what needs to be improved and we are able to work proactively as opposed to reactively.
Generally, it does what I need it to do, but better error reporting would be great. It's so flexible that I do not use half the capability that it has. Also, Nagios 4 does not work with NConf or Adagios so we haven't upgraded yet.
I have worked with it as a monitoring and alerting solution for 10 years accross two jobs.
We have had no issues with the deployment.
There have been no performance issues.
We are monitoring under 200 devices and less than 1200 services so I do not need this availability yet.
I've never needed to contact the vendor as I have always found my answers via the documentation and Google searches.
I have used Zabix and Big Brother, but neither was as workable as Nagios.
Setup is not for the GUI lover as it requires you to perform a lot of CLI work.
You do not need a vendor. I have always deployed it myself.
It's free.
I have looked at other solutions but none are as simple, and I would hate to have to learn another system.
It's well worth it to ensure your up time and to catch the bigger issues.
We use Nagios Core to detect any issues in our infrastructure, software, system service, and network issues. It is a centralized monitoring service.
The most valuable feature of Nagios Core is it allows us to develop and add as many plugins as we want.
Nagios Core could improve by adding a user interface. If you want the user interface you have to use Nagios XI.
I have been using Nagios Core for approximately eight years.
Nagios Core is stable.
The scalability of Nagios Core is very good. We can add as many hosts as we like, and we can work with the concept master and client. It's very scalable and we have added the SentryOne as another layer. It's become very easy to use.
This solution is used by two engineering and three technicians. It is not used for end-user.
We use the open-source version of this solution and there is a large community that can provide support for any of our issues.
I am using SCOM in parallel to Nagios Core, it's a monitoring solution by Microsoft. However, I prefer Nagios Core.
Nagios Core is deployed in a Linux operating system and it is simple to do. For a medium-sized infrastructure, the deployment can take a day.
The enterprise version has technical support. The version we are using is free.
The free version of the solution does not have an interface, but the paid version does.
I would recommend this solution to others.
I rate Nagios Core an eight out of ten.
We primarily use the solution for monitoring ops for computers and our server. We're considering adding other device monitoring as well and at points of sale.
The solution is quite efficient.
The system's alerts are quite good.
The solution is very complete and mostly easy to manage.
The latest version is a bit more difficult. There have been some changes that have not really improved the solution.
We have a new manager coming in, and they will watch and see over the course of the year if the solution needs any specific improvements. We're still in the process of testing the solution.
The implementation and deployment might need to be slightly improved.
It would be nice if the company offered a sales or contract manager that was dedicated to our company so that we would have some sort of link to Nagios, and if we had issues or questions, we'd be able to contact them directly.
It would be good if the solution had some sort of alarm system to alert managers to any issues. We get good alerts, they just need to get to the right person more efficiently.
We've used the solution over the last 10 months or so. It's been almost a year. We initiated the product in 2020.
The stability of the solution is quite good. We haven't had any issues per se. It's been reliable.
We haven't had any issues with scalability. If a company needs to expand it, it should be able to.
We have about 100 hosts and about 10 servers at this point and maybe 19 at the point of sale.
We don't really have technical support from the solution. We rely instead on learning the solution and focusing on documentation if we need assistance. There's also a community online that's quite helpful.
Their documentation is very complete and they have pretty good policies in place.
We did previously use a different solution. We still use it. It continues to monitor our network. We have a new CTO that is looking to make changes. We're evaluating more economical options.
The installation is initially a little bit complex.
The process took several months. Originally, we were using Linux systems.
We didn't have installers or another company assist us. We handled the implementation ourselves.
We're just customers and end-users. We don't have a business relationship with Nagios.
We're using the latest version of the solution.
We're still in the early days in terms of usage. We're still feeling the solution out and testing it for its acceptability within the greater framework of our organization's requirements. We're looking to test it at the point of sale to see how successfully it operates.
Overall, I would recommend the solution to other organizations.
I would rate the solution eight out of ten.
I have setup a Nagios server from scratch as well as worked with
Solarwinds pretty extensively. From my perspective they are on two
completely different playing fields. Nagios definitely has its place,
it's free... and it works well in a smaller environment. Solarwinds is
expensive but it is a lot more robust than Nagios. Solarwinds does
require you to install "Modules" in order to have in depth application
monitoring, etc... Then again, so does Nagios... but you have to pay an
arm and a leg for Solarwinds.
So depending on how big your environment is, you'll have to evaluate if
the cost is worth it. Nagios, you'll spend your money you save on time
to set it up. It takes a lot of time and determination to understand its
inner-workings.
Solarwinds is a lot more than just a network monitoring tool. A quick
example: You can develop "ghost runs" of an application and have it
monitor the latency between steps. Meaning, you could configure it to
load a web page, login to the webpage and run a link to gather data, all
the while timing how long it takes to get from step to step. That gives
you an idea of how much more Solarwinds has to it.
Nagios does have many open-source modules you can use (hell I even used
one to telnet into an old AS400 and monitoring running processes).
So like I said, it depends on the environment and what you want out of
the system. To answer the question about netflow, Nagios itself I don't
think can do netflow but it can pair up with another module that can
(and you still get to see it from a single pane of glass). Any specific
questions let me know!
There's a ton of open source software out there that use Nagios and not.
Ninja (front end GUI for nagios), Zenoss, What's Up Gold (YUCK!),
etc... You could also get things like Alienvault (nagios is built in)
that has more than just monitoring in it (it's an Open Source IDS).
Cacti can be paired with Nagios to provide you with graphs for bandwidth
utilization... Ok now I'm starting to blab, I'll end it here.
We've piloted both Nagios and Zenoss here. Since we're starting with nothing Nagios has met our needs well and proved to be a valuable resource almost immediately by setting up simple SSH checks for our Linux hosts and SNMP checks (ie no agent) for our Windows hosts. Zenoss just proved to be overly complicated to get metrics like up/down, disk usage, memory usage etc. Perhaps with more time it would have proved to be more functional than Nagios but the simplicity of Nagios is really appealing.
How do you find installing and configuring Solarwinds vs. Zenoss? Is Solarwinds closer to Nagios or Zenoss?
The one big thing I struggle with with Nagios is that our Windows admins don't want to SSH into a Linux host and configure monitoring by editing text files. Does Ninja include a UI for setting up monitoring of new hosts?
Chris, do you still find this to be true? Is Nagios still a default tool when people are searching for IT Infrastructure Monitoring solutions?