What is most valuable?
Features that are valuable to us are the business transaction transparency from one tier to the next and the ability to be able to drill down into the called stack. The ability to identify the stalled and error transactions in real time. And be able to investigate it, pick up the trends. That's one of the useful things. Because we use that as part of our root cause analysis and as a proactive, as well as a reactive way, to look at the incident and see what we can do to fix it.
For example, without getting in to the specifics of the issue, we've had some issues with our application where the capability in which we use tracing functionality to write the logs and stuff like that. And one of them had been enabled and it was writing it to a file instead of writing it to an HW, which was costing a lot of I/O. And unfortunately, at the time, the file share server that was taking all these logs was having an issue with the I/O. But it wasn't apparent because the experience of the customer was that the transaction was taking longer to complete. And we were trying to understand where's the bottleneck because everything looks healthy. But the requests kept stacking up.
But then, when we looked into the AppDynamics it make it very easy for us to identify that it was trying to write it to a log. And that operation, out of the entire chain, was this one step where it was trying to write to a location and that's where it was reporting a huge latency. In a matter of, I'd say about 15-20 minutes, we were able to trace it and be able to basically identify what the issue was and we fixed it. In fact, it drove a chain of reactions, in retrospect. Because obviously, it meant we need to look into these things much more carefully because to avoid these kind of incidents from happening in the future.
How has it helped my organization?
AppDynamics lets you find things that you wouldn't otherwise be able to see.
Without APM, you'd be spending a lot more time to try and investigate into all the individual event logs. Our services are massive. It's not a simple application with a front end and a back end. We have a lot of other micro-services that talk to each other. I think one of the trainers at a recent conference mentioned that one single touch starts a chain reaction. And when you have such a topology, it's very difficult manually to go through every single layer and figure out where the bottleneck is. Versus APM giving you an end-to-end workflow and gives you exactly which layer the AppDynamics thinks is having problem. Then it lets you drill down and further down. The zooming capability is brilliant.
I'm not aware that we use any other AppDynmics products along with APM. I've used the reporting and stuff like that. I'm part of an incident response team, so we are the command center for AppDynamics products. So we are more focused on the operation side of things.
What needs improvement?
One of the things that I've noticed is when you have a massive scale, turning on too much of data logging is not possible. So sometimes what's happening is we would use the snapshot capabilities to a minimum. But then what's happening as a result is we miss certain transactions and we need the snapshot.
I was working on a case and I knew what the problem was. I knew what the root cause was. I was trying to reproduce that case so I can collect the data in APM, which is a lot more user-friendly. Because I knew what the issue was, but if I needed to explain it to someone, I don't want to write an email. So I wanted a diagram view of what the issue was. And I was trying to reproduce it.
It took me a long time to get that snapshot in to the APM, because I think it wasn't taking very frequent snapshots. And it's probably the way we configured APM, because of the volume of data that generates it. They probably deemed it necessary to not just take every snapshot because obviously, it's a very expensive operation and it costs a lot of I/O and performance as well. So, that is something I would probably say that would be useful. To be able to say - I'd like to be able to do a snapshot much more frequently if it's possible in any way.
The monitoring capability could be improved. It's dateless right now. But, at a recent conference, I think one of the CTOs or COs mentioned that they're working with another monitoring solution to integrate it. But at the moment, it does have a monitoring capability, but it's very, very basic. Just to give you an example. Let's say if you get an alert, you don't want another alert in the next five minutes to say that it's down. You need to be able to increase the counter on that alert to say, look, it's still down but I don't want to trigger another alert. And every alert in our space would mean a ticket to our space. So you don't want to flag a hundred alerts for the same type of issue like a hundred times, if you know already what the issue is. So it's those capabilities. The integration, either with the existing monitoring capability, and that smooth transition. In fact, I was just looking at my email today. I have like 15 emails from APM. It's just way too much traffic for me.
What do I think about the scalability of the solution?
Scalability is part of our day to-day jobs. At a recent conference, one of the trainers that mentioned very clearly that none of the databases are not growing. They are growing every day. The users are growing and the expectations are growing. They need faster and faster response times with complex systems. So, scalability is a number one priority for us. Because when the customer gets on-boarded, they are relatively small. But as the time passes by, they grow. But if you provision the capacity based on their initial requirements, eventually you'll hit a problem with the scalability.
So, it's very important to keep those factors in mind. And the best way to look at it is the usage analytics, the response rate. And the best part, and this is something that I took away from recent training is the base-lining. Because you don't want to be too late into identifying that you're hitting scalability issues. By then, customers would start experiencing issues. If you see that a deviation in the performance based on your baseline data, I think that's when you need to start thinking, okay, looks like the usage is going up. How do we scale better? How do we get more capacity, or fine tune if it's in any way possible, or distribute it? So, that's what I do every single day.
How is customer service and technical support?
We have not really used technical support. I'm not on the side of configuring APM. I'm a user of APM. I just look at the data that it's already providing to me. Although there are a few questions, we usually pass them on to our guys who work with the AppDynamics to get them sorted out. I'm more of a subscriber to that.
What other advice do I have?
I want a vendor to be honest. I've never been involved in those kind of conversations. But I'd expect them to tell me what exactly it does and what it exactly doesn't do. Nobody expects a product to be perfect. Nobody expects the product should have every single bell and whistle. But if you sell it that way, you're going to be disappointed. I'd rather know that upfront. And probably setup a roadmap and say, look, we are getting these features in the pipeline, which is a much more realistic conversation.
My advice is that just before you turn on APM, think about what's important to you. Just don't go ballistic on putting everything under the sun under the AppDynamics. The danger of doing so, the side effect of that is you're looking at way too much information and it gets foggy. Start with a subset that is critical to your business. Understand it from a customer perspective. Don't look at it from an operational perspective. Where do the customers feel the pain the most? Start with that and then start instrumenting those. Try and get as specific as possible because that way, whatever you're looking for in APM is important to you. If I'm an operations person and I'm dealing with hundreds of incidents every day, I'd like to see an incident that I'm absolutely working on. So try and reduce the noise ratio as much as possible. And try and look at the important ones that you should be straight away looking into and action on. I think that's probably the key advice that I would give anybody who wants to implement not just AppDynamics, but any APM into their products.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Yes it avoids alert storms. There are cool downs etc.... Yes AppDynamics is doing their best to go full HTML5 in their newest version.