Having implemented into production, we found that initially the application had quite a few incidents, alerts, and problems as we used monitoring to unearth some technical debt.
However, after implementing “smoke detectors” and resolving the technical debt, the applications have become much more stable, with fewer production incidents, fewer defects, higher reliability and quality of service.
Cross-platform business transaction tracing, custom triggers, thresholds and alerting, and deep dive root cause analysis are the most valuable features.
Cross-platform business transaction tracing supports the ability to monitor end-to-end performance across the stack, providing granular insight into customer experience KPIs, which are a critical success factor for organizations. Business transaction tracing also allows identification and drill down into problems within the stack, reducing the mean time to detect and resolve problems.
Developing custom triggers and thresholds for problem management allow the introduction of “smoke alarms”, which can prevent severity 1 system failures, improve productivity (fewer war rooms), and increase application stability, reliability, and performance.
I would like them provide more guidance on specific tuning of monitoring options to avoid unacceptable overhead. This has proven to be a little bit of a challenge, particularly in the cross enterprise CICS monitoring overhead, which we are currently working through with CA. They have been very helpful in advising us with this.
We have had stability issues. Not all applications are equal, and what works well for one app may not be applicable to another.
“Chatty” apps, excessive monitoring, or sampling can cause memory issues and server crashes resulting in a little bit of trial and error to determine monitoring levels that have minimal impact on performance and stability of environments.
We have not had any scalability issues.
Technical support is excellent.
Other parts of the organization are using Dynatrace, but both products are very capable.
The initial setup was relatively straightforward, with good documentation and technical support, both via email and on-site.
There is some research needed to determine licensing costs, based on the number of DataPower instances for the Nastel agent, for example.
We didn’t evaluate other options as this was pre-determined via our group administration.
- Make sure you sign up for the appropriate training
- Ensure you have dedicated on-site resources to implement, build, and maintain the tooling
- Make sure you extend training to the production support, developers, and business stakeholders
- Develop a monitoring and alerting strategy, with a supporting roadmap and pipeline for deployment across business-critical systems.
While out-of-the-box monitoring is relatively straightforward, the value comes from custom implementation of alerts and dashboards that focus on specific application functionality.
Very valuable inputs regarding APM Carlos ; Thank you for sharing Ravi Suvvari