What is our primary use case?
We are trying to get a handle on observability. Currently, the overall health of the stack is very anecdotal. Users are reporting issues, and Kubernetes pods are going down. We need to be more scientific and be able to catch problems early and fix them faster.
Given the fact that we are a new company, our user base is relatively small, yet growing very fast. We need to predict usage growth better and identify problem implementations that could cause a bottleneck. Our relatively small size has allowed us to be somewhat complacent with performance monitoring. However, we need to have that visibility.
How has it helped my organization?
We are still taking baby steps with Datadog. Hence, it's hard to come up with quantifiable information. The most immediate benefit is aggregating performance metrics together with log information. Having a better understanding of observability will help my team focus on the business problems they are trying solve and write code that is conducive to being monitored, instead of reinventing the wheel and relying on their own logic to produce metrics that are out of context
What is most valuable?
The most useful feature is the APM. Being able to quickly view which requests are time-consuming, and which calls have failed is invaluable. Being able to click on a UI and be pointed to the exact source of the problem is like magic.
I'm also very intrigued by log management, although I haven't had quite a chance to use it very effectively. In particular, the trace and span IDs don't quite seem to work for me. However, I'm very keen on getting this to work. This will also help my developers to be more diligent and considerate when creating log data.
What needs improvement?
As a new customer, the Datadog user interface is a bit daunting. It gets easier once one has had a chance to get acquainted with it, yet at first, it is somewhat overwhelming. Maybe having a "lite" interface with basic features would make it easier to climb the learning curve.
Maybe the feature already exists. However, I'm not sure how to keep dashboard designs and synthetic tests in source control. For example, we may replace a UI feature, and rebuild a test accordingly in a pre-production environment, yet once the code is promoted to production, the updated test would also need to be promoted.
For how long have I used the solution?
We have just started using the solution and have only used it for about two months.
What do I think about the stability of the solution?
We're new at this. That said, so far, there haven't been any issues to report.
What do I think about the scalability of the solution?
I have not had the opportunity to evaluate the scalability.
How are customer service and support?
Customer support is full of great folks! We're beginning our Datadog journey, so I haven't had that much experience. The little I have had has been great.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
This is all new.
We used to work with New Relic. New Relic has an amazing APM solution. However, it also became cost-prohibitive
How was the initial setup?
Since we are relatively greenfield, it was relatively painless to set up the product.
What about the implementation team?
Our in-house DevOps team did the implementation.
What was our ROI?
I don't know what the ROI is at this stage.
What's my experience with pricing, setup cost, and licensing?
I'm not sure what the exact pricing is.
What other advice do I have?
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.