Program Manager - Enterprise Command Center at a financial services firm with 10,001+ employees
Real User
2016-05-30T20:09:32Z
May 30, 2016
** Altug, Your note is very helpful; Thanks very much! The outline of capabilities and requirements is insightful and echoes personal experience. I can see even without product names, you've almost certainly work with and hit your share tooling challenges. The products in this space need to meet the bar you describe.
** Omar/Manish/Phillippe, CA SOI/TESM & CA UIM are capable in that they will deliver Service Modeling and Event Mgmt, but they are both expensive and labor intensive to implement and support for their core functionality. Moreover, a tool that merely presents or produces events should NOT be considered an Event Mgmt solution or an Event Analysis engine.
** Dan, I've haven't taken time to read up on BigPanda. Agreed on the importance of Altug's point. Care & feeding can get out of hand quick....
** Philippe, You hit a point which started my question. Netcool Omnibus was an acquired product, originally by MicroMuse, whose founders have now created Moogsoft. How to compare NOI and Moog, when they are so similar... Real world implementation experience... better yet, a bake off, side by side implementations...?
Having tested Netuitive, Prelert, CA ABA, Tivoli Predictive Insights (PI), and BMC BPPM for Predictive capabilities, no vendor product has been able to pass muster. Both Moog & NOI have predictive'ish functions. Moog's is built in as an 'extension' of Incident Analysis, but I fear it may only be predictive'ish. NOI is a collection of Tivoli tools that require a rather large Tivoli Framework to build on for full visibility. PI is one of those add-ons but will only analyze Event data as part of NOI. Unless additional PI metric feeds are licensed, NOI does not advertise to compete as a Predictive.
What I want to achieve... Ideally?... Efficiency and focus for my staff that is manually handling (trending in source, correlating across in time and CI relation, and isolating business data flows to probable break point) of over a 1000+ events each in a single shift. The Holy Grail would be a tool accurately isolating to the earliest possible Event(s) and a specific Incident as far upstream as possible for a given issue or impact type that is the likely break point.
Search for a product comparison in IT Infrastructure Monitoring
Try Operations Manager I (OMi) from Hewlett Packard Enterprise. Differentiated product, scales from SMB to large Enterprise/xSP networks. Comes in a solution bundle with options to include industry leading ITOA (big data analytics capability). documented reference customers with more than 70% event consolidation/suppression.
www8.hp.com
Program Manager - Enterprise Command Center at a financial services firm with 10,001+ employees
Real User
2016-07-22T22:36:24Z
Jul 22, 2016
Hi Kevin, My team is set to begin a pilot Moogsoft's solution within the next couple weeks, and NOI will stand up in parallel. With any learning algorithm, it seems time & data are key ingredients. We should have some idea of how these compare in coming months. Thanks for checking in! --> R
Enterprise IT Management Consultant with 51-200 employees
Vendor
2016-05-30T18:54:37Z
May 30, 2016
Hi, I have used CA-Unicenter, CA-SOI and now TESM (OpsDirector). People are misguided in thinking that SOI is an event management product. Similarly, it would be wrong to think of Splunk as that too. Unicenter is obsolete and was very onerous in rules. TESM only works with ServiceNow.
I have exposure to CA-UIM, but it is not open enough to be seen as an event management platform. I have an understanding of how Moogsoft (a spin-off Netcool) goes about its business but I have never used it. There is also Netuitive, worth looking into. What exactly are you looking to achieve?
Hi Randall, also have a look at BigPanda (my company). We automate event correlation and have pre-integrations with all leading monitoring tools. BigPanda automatically generates high-level incidents from monitoring events and automatically shares them with external ticketing solutions like ServiceNow and JIRA or collaboration tools like Slack or HipChat. Correlation occurs in the cloud and event collection is typically agentless via secure APIs or webhooks.
Service Health Analytics dashboards provide visibility into key metrics like MTTR, top alerting hosts, and top alerting checks. Most enterprise customers using BigPanda benefit from 99% noise suppression. Configuration takes hours and is code-free. We offer a free trial if you're interested. As Altug mentioned, stay away from solutions that require you to manually maintain rules. Feel free to reply with any questions about BigPanda capabilities or configuration. Hope it's a good fit...
The question should be Monitor or Logging?
Here are the basics:
Log != event
Logs can contain many non-event based data points which are useful in the future, or may become useful in the future.
Engineering your own log collection and analysis system covers the top .5% of users who need that technology. Most clients I speak with cannot engineer their own systems, hence they rely on log analysis products which are purchased versus developed. You are also assuming that users have developers writing the apps which are logging, and that’s very often not the case.
The reason why monitoring and logging are separate in most cases is the monitoring tools don’t do the type of log analysis people want today, they do the log/event analysis people wanted in 1995.
Sorry, don’t have any experience with Moogsoft but take a look at CA Service Operations Insight (SOI). It will provide you that same capability but much more features.
Program Manager - Enterprise Command Center at a financial services firm with 10,001+ employees
Real User
2016-05-26T16:31:31Z
May 26, 2016
Thanks for sharing, Mike! I've seen BMCs approach as well as CA's, IBM Tivoli's, and Moogsoft's most recently.
Event de-dup is indeed a common feature when it comes to the same alert firing repeatedly on a single host. What these other vendors 'promise' is de-dup of same or similar alert events across multiple hosts within an app's infra and even across multiple apps with same similar tiers. The idea is to group Events if they correlate in time and/or CI relationship.
The Incident Analysis functions promised are much as you describe but with a twist, and I couldn't agree more with the challenges you describe. This approach is taking only Event messages (from any/all tool sources) & actual Incident Record details (Ex: ServiceNow) and comparing to Business rules, Service Models, and Knowledge on past occurrences to find a current ticket as far upstream as possible. I've seen many vendors with Triage/Isolation functions which are valuable, but they usually drill down into Host/App/Code/etc. This approach seems promising and worth testing.
** MemberSH/SaleMan, Nothing personal, but I am discounting your Vendor comments for a couple reasons. 1.) looking for comparative details from experience working with multiple vendors. 2.) have to think twice on vendors w anonymous profile names
Principal Solutions Architect at a pharma/biotech company with 1,001-5,000 employees
Real User
2016-05-26T15:17:05Z
May 26, 2016
Hello,
I would think just about any Enterprise Monitoring Solution allows for de-duplication of events out of the box… and just update the Event Count. At least all of the solutions I’ve employed provide this feature.
If I can surmise what Incident Analysis refers to: Probable (Root) Cause Analysis? Most solutions employ something like this as well. However there is always a challenge with event correlation to understand what is impacted, and whether any underlying alerts actually contributed to the problem. This is always dependent upon customer requirements as not all platforms and applications are architected in the same fashion.
I recently attended a good BMC webinar which covers Service Impact Modeling, which may apply here in some way – or at least provide the many things to consider when employing a similar strategy: (You may need to create an account in BMC Communities to view…)
Each vendor has a different take on this aspect, based on their historical
development and the capabilities of the tools they offer.
Some only perform monitoring on a particular infrastructure layer (network,
systems, storage, etc.) and forward them to event analysis engines, some do
a very good job of isolating root cause of each issue and forward only the
pertinent details to upper level processing solutions.
Let me say one thing: if the solutions you consider have a detailed rules
based engine that requires you to enter and update individual rules for
monitoring, please STAY AWAY! It is a very high maintenance solution and
will either suck your resources dry or become obsolete too fast too soon.
Make sure that the solution you are considering can resolve relationships
between infrastructure components and update them automatically (either as
soon as they happen, periodically or through manual triggering).
Make sure that root cause determination takes place at each infrastructure
layer monitoring solution (automated resolution of issues is a plus
wherever applicable) and only this information is sent to higher level
incident monitoring/tracking solutions.
A good solution set at a minimum should consist of solutions that are
capable of:
* network monitoring/management
* systems monitoring/management
* storage infrastructure monitoring/management
* business application performance management/monitoring (if possible)
* higher level incident analysis engine that is fed from each of the above
solutions and has a point and click interface to configure rather than
endless keyboard typing
* service desk solution that is fed from all of the above solutions to be
able to implement ITIL guidelines
But the main hurdle is to engage business side of the company/institution
to be able to gather information to understand what is important for them
and what is not. Remember, IT is there to support business. If you're
monitoring each and everything left and right without understanding the
business, you're just burning resources for a war that's already lost. This
may sound hard for the average IT department but it is an evolutionary step
that is required in today's corporate environment to become a part of
business that adds value, rather than being perceived as a bottomless pit
into which the organization throws money for no apparent benefit.
Please do not hesitate to contact me for further details.
Senior Software Engineer at a financial services firm with 10,001+ employees
Real User
2016-05-26T13:53:54Z
May 26, 2016
Hi,
I don’t have experience in the tools you mentioned below but I have expertise in Infrastructure monitoring with other tools. I know that most of the tools work on the same lines, I have got one question, is IT central station a right place to ask questions, I have also got some questions on Appdynamics, APM tool.
IT operations analytics is a group of practices and processes to monitor systems in order to gather, process, analyze, and interpret data from IT operations to guide decisions and manage risks.
** Altug, Your note is very helpful; Thanks very much! The outline of capabilities and requirements is insightful and echoes personal experience. I can see even without product names, you've almost certainly work with and hit your share tooling challenges. The products in this space need to meet the bar you describe.
** Omar/Manish/Phillippe, CA SOI/TESM & CA UIM are capable in that they will deliver Service Modeling and Event Mgmt, but they are both expensive and labor intensive to implement and support for their core functionality. Moreover, a tool that merely presents or produces events should NOT be considered an Event Mgmt solution or an Event Analysis engine.
** Dan, I've haven't taken time to read up on BigPanda. Agreed on the importance of Altug's point. Care & feeding can get out of hand quick....
** Philippe, You hit a point which started my question. Netcool Omnibus was an acquired product, originally by MicroMuse, whose founders have now created Moogsoft. How to compare NOI and Moog, when they are so similar... Real world implementation experience... better yet, a bake off, side by side implementations...?
Having tested Netuitive, Prelert, CA ABA, Tivoli Predictive Insights (PI), and BMC BPPM for Predictive capabilities, no vendor product has been able to pass muster. Both Moog & NOI have predictive'ish functions. Moog's is built in as an 'extension' of Incident Analysis, but I fear it may only be predictive'ish. NOI is a collection of Tivoli tools that require a rather large Tivoli Framework to build on for full visibility. PI is one of those add-ons but will only analyze Event data as part of NOI. Unless additional PI metric feeds are licensed, NOI does not advertise to compete as a Predictive.
What I want to achieve... Ideally?... Efficiency and focus for my staff that is manually handling (trending in source, correlating across in time and CI relation, and isolating business data flows to probable break point) of over a 1000+ events each in a single shift. The Holy Grail would be a tool accurately isolating to the earliest possible Event(s) and a specific Incident as far upstream as possible for a given issue or impact type that is the likely break point.
Try Operations Manager I (OMi) from Hewlett Packard Enterprise. Differentiated product, scales from SMB to large Enterprise/xSP networks. Comes in a solution bundle with options to include industry leading ITOA (big data analytics capability). documented reference customers with more than 70% event consolidation/suppression.
www8.hp.com
Hi Kevin, My team is set to begin a pilot Moogsoft's solution within the next couple weeks, and NOI will stand up in parallel. With any learning algorithm, it seems time & data are key ingredients. We should have some idea of how these compare in coming months. Thanks for checking in! --> R
Randall - just wondering how your analysis is going?
Hi, I have used CA-Unicenter, CA-SOI and now TESM (OpsDirector). People are misguided in thinking that SOI is an event management product. Similarly, it would be wrong to think of Splunk as that too. Unicenter is obsolete and was very onerous in rules. TESM only works with ServiceNow.
I have exposure to CA-UIM, but it is not open enough to be seen as an event management platform. I have an understanding of how Moogsoft (a spin-off Netcool) goes about its business but I have never used it. There is also Netuitive, worth looking into. What exactly are you looking to achieve?
Hi Randall, also have a look at BigPanda (my company). We automate event correlation and have pre-integrations with all leading monitoring tools. BigPanda automatically generates high-level incidents from monitoring events and automatically shares them with external ticketing solutions like ServiceNow and JIRA or collaboration tools like Slack or HipChat. Correlation occurs in the cloud and event collection is typically agentless via secure APIs or webhooks.
Service Health Analytics dashboards provide visibility into key metrics like MTTR, top alerting hosts, and top alerting checks. Most enterprise customers using BigPanda benefit from 99% noise suppression. Configuration takes hours and is code-free. We offer a free trial if you're interested. As Altug mentioned, stay away from solutions that require you to manually maintain rules. Feel free to reply with any questions about BigPanda capabilities or configuration. Hope it's a good fit...
The question should be Monitor or Logging?
Here are the basics:
Log != event
Logs can contain many non-event based data points which are useful in the future, or may become useful in the future.
Engineering your own log collection and analysis system covers the top .5% of users who need that technology. Most clients I speak with cannot engineer their own systems, hence they rely on log analysis products which are purchased versus developed. You are also assuming that users have developers writing the apps which are logging, and that’s very often not the case.
The reason why monitoring and logging are separate in most cases is the monitoring tools don’t do the type of log analysis people want today, they do the log/event analysis people wanted in 1995.
Sorry, don’t have any experience with Moogsoft but take a look at CA Service Operations Insight (SOI). It will provide you that same capability but much more features.
I have never looked at Moogsoft. We probably want to wait until UIM 84.1 is released since it is suppose add many incident management features.
Thanks for sharing, Mike! I've seen BMCs approach as well as CA's, IBM Tivoli's, and Moogsoft's most recently.
Event de-dup is indeed a common feature when it comes to the same alert firing repeatedly on a single host. What these other vendors 'promise' is de-dup of same or similar alert events across multiple hosts within an app's infra and even across multiple apps with same similar tiers. The idea is to group Events if they correlate in time and/or CI relationship.
The Incident Analysis functions promised are much as you describe but with a twist, and I couldn't agree more with the challenges you describe. This approach is taking only Event messages (from any/all tool sources) & actual Incident Record details (Ex: ServiceNow) and comparing to Business rules, Service Models, and Knowledge on past occurrences to find a current ticket as far upstream as possible. I've seen many vendors with Triage/Isolation functions which are valuable, but they usually drill down into Host/App/Code/etc. This approach seems promising and worth testing.
** MemberSH/SaleMan, Nothing personal, but I am discounting your Vendor comments for a couple reasons. 1.) looking for comparative details from experience working with multiple vendors. 2.) have to think twice on vendors w anonymous profile names
Hello,
I would think just about any Enterprise Monitoring Solution allows for de-duplication of events out of the box… and just update the Event Count. At least all of the solutions I’ve employed provide this feature.
If I can surmise what Incident Analysis refers to: Probable (Root) Cause Analysis? Most solutions employ something like this as well. However there is always a challenge with event correlation to understand what is impacted, and whether any underlying alerts actually contributed to the problem. This is always dependent upon customer requirements as not all platforms and applications are architected in the same fashion.
I recently attended a good BMC webinar which covers Service Impact Modeling, which may apply here in some way – or at least provide the many things to consider when employing a similar strategy: (You may need to create an account in BMC Communities to view…)
communities.bmc.com
Online Documentation: docs.bmc.com
Hope this provides a good start in navigating down this rabbit-hole…
Each vendor has a different take on this aspect, based on their historical
development and the capabilities of the tools they offer.
Some only perform monitoring on a particular infrastructure layer (network,
systems, storage, etc.) and forward them to event analysis engines, some do
a very good job of isolating root cause of each issue and forward only the
pertinent details to upper level processing solutions.
Let me say one thing: if the solutions you consider have a detailed rules
based engine that requires you to enter and update individual rules for
monitoring, please STAY AWAY! It is a very high maintenance solution and
will either suck your resources dry or become obsolete too fast too soon.
Make sure that the solution you are considering can resolve relationships
between infrastructure components and update them automatically (either as
soon as they happen, periodically or through manual triggering).
Make sure that root cause determination takes place at each infrastructure
layer monitoring solution (automated resolution of issues is a plus
wherever applicable) and only this information is sent to higher level
incident monitoring/tracking solutions.
A good solution set at a minimum should consist of solutions that are
capable of:
* network monitoring/management
* systems monitoring/management
* storage infrastructure monitoring/management
* business application performance management/monitoring (if possible)
* higher level incident analysis engine that is fed from each of the above
solutions and has a point and click interface to configure rather than
endless keyboard typing
* service desk solution that is fed from all of the above solutions to be
able to implement ITIL guidelines
But the main hurdle is to engage business side of the company/institution
to be able to gather information to understand what is important for them
and what is not. Remember, IT is there to support business. If you're
monitoring each and everything left and right without understanding the
business, you're just burning resources for a war that's already lost. This
may sound hard for the average IT department but it is an evolutionary step
that is required in today's corporate environment to become a part of
business that adds value, rather than being perceived as a bottomless pit
into which the organization throws money for no apparent benefit.
Please do not hesitate to contact me for further details.
--
Altug Gur
Hi,
I don’t have experience in the tools you mentioned below but I have expertise in Infrastructure monitoring with other tools. I know that most of the tools work on the same lines, I have got one question, is IT central station a right place to ask questions, I have also got some questions on Appdynamics, APM tool.
Thanks
Rohit
Hi!
I have experience with some monitoring tools like:
- Microsoft System Center Operations Manager
- Riverbed Application Performance Management
- Riverbed Network Performance Management
I have experience with incident management (and additional ITIL work items) tools like:
- Microsoft System Center Service Manager
- ProactivaNet
Event management best practices studying can be helpful to select the right tool.
Regards,
José A. Molina