What is our primary use case?
We primarily use DataDog for performance and log monitoring of cloud environments, which include VMs and Azure Services like Azure compute, storage, network, firewall, and app services via event hubs.
Alerting based on monitors via teams and PagerDuty.
Logs collection for Azure services like Azure database, Azure Application Gateway, Azure AKS, and other Azure services.
Custom metrics using a Python script to collect metrics for components not natively supported by Datadog.
Synthetic testing to ensure uptime and browser tests via CI/CD pipeline.
How has it helped my organization?
Datadog has improved our visibility into infrastructure topology and performance. It provided a simplified view and ability to drill down to system performance, process usage, and logs.
We were able to set up monitors for infrastructure and applications, as the metrics were readily available in the platform. Fine-tuning monitors is very easy and the ability to configure monitor alerts with details on how to resolve the alert is a key value add.
Integration with PagerDuty, teams ensure timely alerting. PagerDuty integration bring tags from Datadog to PagerDuty, which is very useful in routing incidents to the right service
What is most valuable?
The Host Map, Live Process provides performance metrics of our application. The support team likes using Datadog for identifying resources affected and obtaining the logs.
Monitors are easy and quick to setup. Metrics are easily accessible and quick to use. The ability to send notifications based on metadata from the monitor is helpful. The setup for monitors is one time and it works for all workloads, whether it is Azure or any other cloud.
Logs rehydration helps us archive and rehydrate logs as we need. We don't need logs to be indexed at all times. Logs are required only for escalations and rehydrating does the job and provides cost savings.
What needs improvement?
We need the ability to create a service dependency map like Splunk ITSI. We have to build this in PagerDuty and it's not the best user experience. The ability to create custom inventory objects based on logs ingested would be a value add. It would be better if Datadog makes this a simple click and enable.
It would be helpful to have the ability to upgrade agents via the Datadog portal. Once agents are connected to the Datadog portal, we should be able to upgrade them quickly.
Security monitoring for Azure and Operating System (Windows and Linux) are features that need to be addressed.
Dashboards for Azure Active Directory metrics and events should be improved.
For how long have I used the solution?
We have been using Datadog for more than six months.
What do I think about the stability of the solution?
Stability-wise, it has been good.
What do I think about the scalability of the solution?
The scalability is good so far.
How are customer service and technical support?
Support team has been very responsive. Only complain is on issues they don't understand, they should have a quick call and unblock the customer.
Which solution did I use previously and why did I switch?
We didn't have a solution in place. The only thing we had were logs.
How was the initial setup?
Setup is hassle-free and pretty straightforward.
What about the implementation team?
What was our ROI?
No returns yet. We are in growth mode. If this becomes expensive we may have to look at alternative options.
What's my experience with pricing, setup cost, and licensing?
The cost is high and this can be justified if the scale of the environment is big.
Datadog needs to provide better pricing for large customers.
Which other solutions did I evaluate?
Prior to implementing Datadog, we evaluated Splunk.
What other advice do I have?
Overall, the Datadog product is really good.
It doesn't need a sales team and yet, the sales team has screwed up on some occasions. It's a great product and the customer success needs to put an extra effort to help customers with best practices rather than passing them off to support.
Customer success doesn't evangelize product features and the customer doesn't know what new is coming unless they ask about it.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.