Just to show the response time of the app, and how much time each request spends in the tiers. Database is a tenth of a second, web takes two-tenths, and Ruby takes one-tenth.
I can tell you that there are two use cases. One is to look at how we are performing on average over time. So, assuming the systems are healthy and we look at the last month, the average response time is very good at telling us that. Let's say I add capacity, I can see it trend downwards. The other use case is pinpoint key problems happening right now. Uber called and they are saying that their Freshdesk isn’t working. We look in and can see clearly that only the database is taking 27 times longer than it should, and it helps us solve acute broken things.
It helps with current broken things and help us understand what is happening on our system overall.
It just gives us an idea for how fast were running. That cuts down to it -- how fast we're running, and if there is slowness, where it's coming from. It's really fast at telling us.
I would say that the user interface is maybe a little bit overly complicated. But I don’t have any specific way to improve it. I would say that what’s not commonly used, that are rarely used, should be less conspicuous and buried behind. Like your iPhone has one or two buttons -- if you need settings you can dig down to that stuff, but you don’t want that in your face when you take a call. Similarly, when you are in New Relic there are all these things that I need to sift through mentally. I just want to see how fast the apps are responding. You can’t change that.
I want to see the potential for Docker statistics or container statistics. Like if we start running containers, will it be able to track data for a given container. If one container serves one purpose, we need to track how many requests per day, when does the usage peek, how much does it vary week to week. And then that needs to be organized per container.
Finally, there's lot of noise on the front page that I don’t want to see. If it is customizable then it isn't obvious. Another nuance is that it always shows the wrong throughput for the most recent five minutes, and it's always wrong. The system is wrong for the most recent five minutes. They show wrong data and that changes after the five minutes is up.
We've had it for more than two years.
10/10 - it's very stable. It went down, maybe, once.
Super scalable. For us it scales for what we need. Scale means does it work as well if you have three nodes or 500, and it does. It doesn't break when you scale it up.
I don’t know. Never had a problem. Never had an issue with it. Really simple, like your microwave. Does it well so I never needed support.
This has always been our solution since we started.
We looked for a product that gives the data in a way that is easy to visualize, is reliable, and is simple.
That proactive baseline AI or ML or whatever learning piece, is always the gold that seems hard to find even in products that say they have it. We got something put together in Splunk but it was work. Our NetScout stuff had it but we did not know it was built in until recently. We are testing it right now