All valid points above.
In our case we find ourselves sometimes in the position to evaluate this on behalf of our customers.
A point which I do not think has been mentioned in the above comments is the maturity of the organisation. Some organisations are very immature / old school with regards to APM, and should therefore start with the minimum and get value from that. More mature organisations might get more value from more complete / complex solutions.
I would suggest these primary criteria:
1. How straightforward is the product to set up and instrument your applications? Remember that you'll probably be adding applications constantly, so your team should be able to understand and execute the installation and configuration process for new applications as they come online. Invest in a rich test environment that looks and acts like production, where you can try instrumenting a new application before deploying it.
2. Does it give you visibility into the applications you care about? Some applications don't let you instrument critical sections of their startup sequence, but once up and running, you shouldn't have major blind spots -- it should see each application tier and the network segments that connect them.
3. How much overhead does it consume? The product shouldn't use "much", but how much depends on how much headroom you have on your systems. It's worth a little extra memory or CPU to get visibility, but plan for it.
4. Does the product help you hunt down the root cause of the problem, or does it just tell you "you have a problem" without getting you any closer to fixing it? Does it make the nature of the problem clear enough for your team to take action?
5. What's the signal-to-noise ratio? Does the product highlight real problems that need immediate attention, and separate these from the things it's nice to know but that aren't immediately actionable?
Works at a tech services company with 51-200 employees
Real User
2014-03-26T11:53:32Z
Mar 26, 2014
Hello Rusty and Bill....
Enteros Upbeat and load2test are very simple uses - very agile methodology techniques for load testing. It provides comprehensive diagnostic reports, real time performance sampling and reporting, low level tracing and containment command and DBA function library. With Grid2Go it provides proactive problem identification and remediation.
Very interesting points Randy. I would be interested to hear from others if they feel the same way or if they've successfully used a matrix to make their decision.
Find out what your peers are saying about Datadog, Dynatrace, New Relic and others in Application Performance Monitoring (APM) and Observability. Updated: November 2024.
Systems Engineer at a healthcare company with 1,001-5,000 employees
Vendor
2014-03-25T18:02:59Z
Mar 25, 2014
Interesting approach Bill, I'll have to think about it.
Rusty, I don't have an evaluation matrix either - frankly I've never needed to build a matrix to make a decision, but they are useful to explain or defend a choice. I'm more used to testing, reading and finding strengths and weaknesses. Make sure features are more than a checkbox - even a single numerical value can be problematic as any given feature may have relative strengths that may not condense to a single number.
Keeping this in mind I start with needs - what do I, or you need a monitoring solution to do? I look for stable and extensible either by the vendor or users. Ease of upgrade and recovery, of course. (Early versions of Solar Winds suffered greatly here, the current product is easy to manage.) Auto discovery, naturally. Performance is a must, as is methods of extending monitoring. Platform choice may be a consideration; forcing a unique platform in an environment may not be the best idea. While on the subject I look for endpoint support too. Does it recognize and integrate with, say your virtual environment, and can it peer deep into the databases?
Then I consider the console. Most all are web consoles but how are the views built? It has to be easy or you'll never do the update maintenance. Are they configurable for different users or groups, and can you view multiple pages at the same time? (This is great for a NOC; a wall of displays from a single server can provide an overview of just about everything.) Most people are graphical, so live maps are helpful. Recursive, or linked maps are moreso. Can a user drill down from a high level alert down to the problem area and understand what the issue is? Bonus points for the ability to actually make changes in the impacted system based of course on access rights. Finally, is the console open (standard web browser) or specific to a platform? Tablets and smartphones are here, now and mobility is incredibly useful when tracking down a point problem that requires physical access and a console view.
Next consider alerting. Dull, boring stuff but it has to be done. What alert options are available? How about out of band alerts? Can you sent to groups, cascade to others when no response is given? How about consolidating alerts through dependencies? One big alert when a trunk line goes down is MUCH better than a thousand alerts where the root cause is buried, although it does take effort to map out the dependencies. Then, what can you alert on? Is programmatic alerting available to add sophisticated responses?
Finally, reports. I don’t see much use for static reports printed and distributed, but that depends on the environment. Much more useful to have a mix of standard reports, configurable reports and free form reports that can be configured by users. Solar Winds suffered here in the past; the report writer was only on the Orion server itself. It remains there at the moment, but there are some reports now configurable from the web console. This is a huge annoyance finally being addressed.
So where does this leave the matrix? Pretty complex, as befits a complex tool. Each feature needs a relative value for importance and execution. If I were to create one I would split it into 4 parts - platform, console, alerts and reports, then consider changing the matrix as systems evolve. I would also suggest some further discussion; I doubt I have captured everything just yet.
Developer at a comms service provider with 501-1,000 employees
Vendor
2014-03-25T13:04:50Z
Mar 25, 2014
I cannot share a matrix, but I do have a general philosophy that helped me get started determining the value and best in breed solutions. There are three axes of functionality that you define for yourself:
1. Internally facing features (supports your infrastructure) vs Externally facing features (supports customers)
2. Availability of core features vs Availability for extension & integration
3. Standard Features vs Disruptive Features
Some solutions will be strong in one but not the other. Look for those that align with what you value.
I would be interested to learn more about this. Randy has raised some good points.
All valid points above.
In our case we find ourselves sometimes in the position to evaluate this on behalf of our customers.
A point which I do not think has been mentioned in the above comments is the maturity of the organisation. Some organisations are very immature / old school with regards to APM, and should therefore start with the minimum and get value from that. More mature organisations might get more value from more complete / complex solutions.
Gartner has a the APM Magic Quadrant and the newly released NPMD Quadrant. They have some good qualifications...
I define that APM side as Agent Instrumentation within JAVA and .NET, while the new NPMD is the Network and Application combo...
Disclaimer: I work for Fluke Networks
I would suggest these primary criteria:
1. How straightforward is the product to set up and instrument your applications? Remember that you'll probably be adding applications constantly, so your team should be able to understand and execute the installation and configuration process for new applications as they come online. Invest in a rich test environment that looks and acts like production, where you can try instrumenting a new application before deploying it.
2. Does it give you visibility into the applications you care about? Some applications don't let you instrument critical sections of their startup sequence, but once up and running, you shouldn't have major blind spots -- it should see each application tier and the network segments that connect them.
3. How much overhead does it consume? The product shouldn't use "much", but how much depends on how much headroom you have on your systems. It's worth a little extra memory or CPU to get visibility, but plan for it.
4. Does the product help you hunt down the root cause of the problem, or does it just tell you "you have a problem" without getting you any closer to fixing it? Does it make the nature of the problem clear enough for your team to take action?
5. What's the signal-to-noise ratio? Does the product highlight real problems that need immediate attention, and separate these from the things it's nice to know but that aren't immediately actionable?
Disclaimer: I work for Riverbed
Hello Rusty and Bill....
Enteros Upbeat and load2test are very simple uses - very agile methodology techniques for load testing. It provides comprehensive diagnostic reports, real time performance sampling and reporting, low level tracing and containment command and DBA function library. With Grid2Go it provides proactive problem identification and remediation.
Very interesting points Randy. I would be interested to hear from others if they feel the same way or if they've successfully used a matrix to make their decision.
Interesting approach Bill, I'll have to think about it.
Rusty, I don't have an evaluation matrix either - frankly I've never needed to build a matrix to make a decision, but they are useful to explain or defend a choice. I'm more used to testing, reading and finding strengths and weaknesses. Make sure features are more than a checkbox - even a single numerical value can be problematic as any given feature may have relative strengths that may not condense to a single number.
Keeping this in mind I start with needs - what do I, or you need a monitoring solution to do? I look for stable and extensible either by the vendor or users. Ease of upgrade and recovery, of course. (Early versions of Solar Winds suffered greatly here, the current product is easy to manage.) Auto discovery, naturally. Performance is a must, as is methods of extending monitoring. Platform choice may be a consideration; forcing a unique platform in an environment may not be the best idea. While on the subject I look for endpoint support too. Does it recognize and integrate with, say your virtual environment, and can it peer deep into the databases?
Then I consider the console. Most all are web consoles but how are the views built? It has to be easy or you'll never do the update maintenance. Are they configurable for different users or groups, and can you view multiple pages at the same time? (This is great for a NOC; a wall of displays from a single server can provide an overview of just about everything.) Most people are graphical, so live maps are helpful. Recursive, or linked maps are moreso. Can a user drill down from a high level alert down to the problem area and understand what the issue is? Bonus points for the ability to actually make changes in the impacted system based of course on access rights. Finally, is the console open (standard web browser) or specific to a platform? Tablets and smartphones are here, now and mobility is incredibly useful when tracking down a point problem that requires physical access and a console view.
Next consider alerting. Dull, boring stuff but it has to be done. What alert options are available? How about out of band alerts? Can you sent to groups, cascade to others when no response is given? How about consolidating alerts through dependencies? One big alert when a trunk line goes down is MUCH better than a thousand alerts where the root cause is buried, although it does take effort to map out the dependencies. Then, what can you alert on? Is programmatic alerting available to add sophisticated responses?
Finally, reports. I don’t see much use for static reports printed and distributed, but that depends on the environment. Much more useful to have a mix of standard reports, configurable reports and free form reports that can be configured by users. Solar Winds suffered here in the past; the report writer was only on the Orion server itself. It remains there at the moment, but there are some reports now configurable from the web console. This is a huge annoyance finally being addressed.
So where does this leave the matrix? Pretty complex, as befits a complex tool. Each feature needs a relative value for importance and execution. If I were to create one I would split it into 4 parts - platform, console, alerts and reports, then consider changing the matrix as systems evolve. I would also suggest some further discussion; I doubt I have captured everything just yet.
I cannot share a matrix, but I do have a general philosophy that helped me get started determining the value and best in breed solutions. There are three axes of functionality that you define for yourself:
1. Internally facing features (supports your infrastructure) vs Externally facing features (supports customers)
2. Availability of core features vs Availability for extension & integration
3. Standard Features vs Disruptive Features
Some solutions will be strong in one but not the other. Look for those that align with what you value.
I would be interested to get one too. Please share, thanks!