What is our primary use case?
We primarily use Datadog for alerts. If we're running out of database connections or CPU credits we want to find out in Slack. Datadog provides nice features for that.
Secondarily, we use Datadog for analyzing historical trends and forecasting potential issues.
I'm trying to learn how to add in Continuous Profiler in our primary backend servers and set up Synthetic Tests for monitoring our front end.
Everything is mostly on AWS, and the Datadog integrations help a ton.
How has it helped my organization?
Datadog has helped us a ton by allowing us to set up a multitude of easily configurable alarms across our tech stack and infrastructure. It doesn't matter if it's in AWS Lambda or a Docker container in AWS EC2, Datadog's intuitive interface makes alarms incredibly easy to configure, reducing our resolution time for incidents.
A lot of the value comes from how frictionless the integrations are. Adding in a Datadog agent or flipping a switch on the Datadog UI to start streaming Lambda data makes the product so incredibly appealing for my company.
What is most valuable?
The monitoring feature has been the most valuable.
I really like the dashboard. Monitoring has a straightforward tie-in to business value at my company (i.e. declaring incidents, etc). Things like having a dashboard and APM make my job easier. That said DevX is a little bit of a harder sell to executives in my company.
The dashboard feature makes it so easy to inspect multiple metrics at once across services. It's truly been a lifesaver when I'm personally trying to understand why performance degradation is happening.
What needs improvement?
I found the documentation can sometimes be confusing. I tried configuring APM for some of our Python containers, and I had to cross-reference multiple blog posts and the official documentation to figure out which Datadog-agent to use. If I needed a ddtrace trace, what environment variables I should set, etc.
Furthermore, to generate my own traces, I wasn't aware that ddtrace adds its own "monkey patching," which led to headaches with respect to configuring the service for RabbitMQ.
A more unified and up-to-date documentation suite would be greatly appreciated.
For how long have I used the solution?
I've used the solution for about two years.
What do I think about the stability of the solution?
I don't recall seeing an incident from Datadog in the past couple of years and that's been wonderful.
What do I think about the scalability of the solution?
The solution is incredibly scalable! To be fair, our data throughput to Datadog isn't super huge, however, we have never seen issues as it scaled to handle more of our data.
Which solution did I use previously and why did I switch?
We used to use AWS Cloudwatch for a lot of our monitoring needs. That said, the interface felt clunky, confusing, and limited.
What was our ROI?
We don't have hard numbers on ROI. That said, overall, it has been a wonderful addition to our tooling suite.
Which other solutions did I evaluate?
We also looked at Honeycomb and are currently using both in production.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.