We are evaluating Datadog for observability and monitoring requirements that we have in our company. In our use case, our intention is to provide some kind of framework for multiple app teams to use the tool for our cyber ability and engineering practices.
AWS Cloud Architect Consultant at a transportation company with 10,001+ employees
Gives us integrated monitoring insights across multiple cloud providers
Pros and Cons
- "They have a very good foundation in capturing metrics, logs, and traces. It's a very nice tool for that and it allows you to apply these monitoring tools in almost any technology."
- "I'm not sure what kind of features are in the roadmap right now, but I encourage the development of features for defining your organization, and allowing the visibility of what kind of metrics you can get. Those features would be really useful for us."
What is our primary use case?
What is most valuable?
They have a very good foundation in capturing metrics, logs, and traces. It's a very nice tool for that and it allows you to apply these monitoring tools in almost any technology.
Even if you have several layers, containers, EC2 instances, build machines or whatever you need in your infrastructure, Datadog can integrate with all of them across multiple cloud providers. It's a great product.
What needs improvement?
One of the improvement opportunities that we have identified in my project concerns how hard it is to manage an organizational structure when you have multiple things in one organization, and you want to provide some kind of isolation between them. At the same time, from the management perspective, you want to see an overall overview of what is happening in your business unit, or as a whole division. This is the kind of limitation we're facing.
I'm not sure what kind of features are in the roadmap right now, but I encourage the development of features for defining your organization, and allowing the visibility of what kind of metrics you can get. Those features would be really useful for us.
For how long have I used the solution?
I have been using Datadog for about six months.
Buyer's Guide
Datadog
January 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
825,609 professionals have used our research since 2012.
What do I think about the scalability of the solution?
It's a very scalable product. Right now we are using the SaaS version, so we don't need to worry about the infrastructure or whatever is needed for the platform it is running on. All the capturing of data is sent to the SaaS product and that can be as scaled as needed.
How are customer service and support?
So far their support is pretty nice. They have established many meetings and training sessions, and they are supporting our requirements very well. I don't have any complaints with Datadog support.
What's my experience with pricing, setup cost, and licensing?
While it is an expensive product, I would rate the pricing level at four out of five.
What other advice do I have?
Normally, the primary reason why people use these kind of tools is observability, but right from the beginning you have to understand what observability is, what it means for your company, and how the tool is going to help you to capture the proper metrics for making your applications observable.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Software Engineer at a computer software company with 51-200 employees
Excellent autocomplete for everything in the UI
Pros and Cons
- "Excellent autocomplete for everything in the UI."
- "It has empowered all our platform engineers with a very powerful and easy to use monitoring system."
- "Going from viewing a metric to creating a monitor alerting on a metric is very easy."
- "The web app has a real-time support chat window in which a support engineer is chatting with you within a minute."
- "It would be nice to be able to graph metrics by excluding certain tags (like you can do in monitors)."
- "It would also be nice if we had more insight into our own usage of Datadog (agents and custom metrics). They provide a usage page which does help, but it is not in real-time."
- "It would be great if usage metrics were automatically created and we could create custom metrics, instead we ended up building some of our own stuff to track and alert on our own usage."
What is our primary use case?
We run the agent in AWS.
How has it helped my organization?
It has empowered all our platform engineers with a very powerful and easy to use monitoring system. Most of our platform organization is now involved in monitoring. Previously, only a handful of platform engineers were involved, because Graphite and Sensu were so cumbersome to use.
What is most valuable?
It is incredibly easy to do common monitoring actions:
- Excellent autocomplete for everything in the UI.
- Using tags is very intuitive (in contrast to the cumbersome regex-like based querying in Graphite).
- Going from viewing a metric to creating a monitor alerting on a metric is very easy. This is very important as the easier it is to create monitors, the more monitors will be created by people. With Graphite and Sensu, the effort required to create and test a monitor was so great that we had only a handful of monitors. We now have over 300 monitors.
What needs improvement?
- It would be nice to be able to graph metrics by excluding certain tags (like you can do in monitors).
- It would also be nice if we had more insight into our own usage of Datadog (agents and custom metrics). They provide a usage page which does help, but it is not in real-time.
- It would be great if usage metrics were automatically created and we could create custom metrics, instead we ended up building some of our own stuff to track and alert on our own usage.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
Very rarely. Maybe only once or twice that we noticed. It is very reliable.
What do I think about the scalability of the solution?
No.
How are customer service and technical support?
It is excellent. The web app has a real-time support chat window in which a support engineer is chatting with you within a minute. That is the "right" way to do support.
Which solution did I use previously and why did I switch?
We previously ran Graphite and Sensu ourselves. By moving to Datadog, we did not need to manage our own monitoring infrastructure anymore. Graphite was somewhat complex to run.
How was the initial setup?
Initial setup is easy. Install the agent and send it metrics. There are StatsD/Datadog libraries available for most languages.
What's my experience with pricing, setup cost, and licensing?
Pricing seems reasonable. It depends on the size of your organization, the size of your infrastructure, and what portion of your overall business costs go toward infrastructure. It is hard to say without looking at all of this.
Which other solutions did I evaluate?
We looked at several competitors at the time (Summer 2016). There did not seem to be any compelling alternatives. Once we did the PoC with Datadog, we loved it and decided to move forward.
What other advice do I have?
Try it out and see if you like it.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Datadog
January 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
825,609 professionals have used our research since 2012.
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees
Good alerts and monitoring with a relatively simple setup
Pros and Cons
- "The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast."
- "Managing dashboards as IaC is a bit hard to work out at times."
What is our primary use case?
Datadog provides us with a solution for data ingesting for all of our application metrics, resource metrics, APM/tracing data etc.
We use it for use in dashboards, monitoring/alerting, SLO targets, incident response etc.
We have a lot of applications across multiple languages/frameworks etc., and have deployed in Kubernetes across multiple regions in AWS, along with underlying managed resources such as SQS, Aurora, etc.
Datadog makes understanding the state of these seamless. We are a company with millions of daily active users, and this level of detail is excellent.
How has it helped my organization?
Datadog has allowed us to rapidly spin up alerting and monitoring that helps our incident responders get alerted quickly when our SLOs are in danger and helps to quickly resolve issues.
It is the single most important tool we have from an SRE perspective.
It also provides us with an easy way to get information at a glance for all of our services through APM and create unified dashboards that track our underlying resources, such as databases, queues, etc., alongside application data.
It has been invaluable to our organization.
What is most valuable?
The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast.
Management of resources using infrastructure-as-code has been a recent game-changer for us. Combining the two has allowed us to provide product teams with a total solution for getting their applications attached to user-focused alerting and monitoring within a matter of days rather than months - and has clearly impacted our ability to discover and respond to significant production incidents.
What needs improvement?
Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us.
There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.
For how long have I used the solution?
I've been using the solution for over five years.
What do I think about the stability of the solution?
I have never seen a major outage that prevented us from using Datadog, although I can't speak for other teams/time zones
What do I think about the scalability of the solution?
This product is massively scalable - I haven't seen any issues as we continue to onboard new technologies and teams
How are customer service and support?
Datadog provides us with a number of direct lines to support, although I haven't personally required their assistance.
Which solution did I use previously and why did I switch?
We previously used LightStep for APM and switched to Datadog to unify all of our application data.
How was the initial setup?
Most elements are quite simple to set up. However, some types of data collection require organization-wide engineering buy-in.
What about the implementation team?
We handled the initial setup in-house.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Engineering Manager,Mobile Wireless Engineering at a comms service provider with 10,001+ employees
Efficient and helps with integration and creating queries
Pros and Cons
- "Datadog is providing efficiency in the products we develop for the wireless device engineering department."
- "We need more integration functionality, including certain metrics integration."
What is our primary use case?
The product is primarily used for the DevOps team.
How has it helped my organization?
It has helped us build pipelines for ops review and other functions.
What is most valuable?
Datadog is providing efficiency in the products we develop for the wireless device engineering department. We had to provide more developer integration tools and also needed to help in creating easy queries that would help in creating efficient toolsets for management to make decisions based on these metrics.
What needs improvement?
We need more integration functionality, including certain metrics integration. We should be able to monitor devs and need it to build more monitoring tools and offer leadership metrics.
For how long have I used the solution?
I've used the solution for almost six months.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Cloud Specialyst at a financial services firm with 501-1,000 employees
Centralized with good observability and many modules
Pros and Cons
- "The most valuable aspect is for us to have everything in one place."
- "We need a lot of modules since we collect all data logs from all operating systems."
What is our primary use case?
We collect all data logs from all operating systems, such as Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers.
Now we are starting a POC to analyze the APM module. In the feature, the next step is to do a POC of security modules.
The final idea is to have a unique portal for observability. This will make it easy to troubleshoot and for layer levels 1 and 2.
How has it helped my organization?
We are looking into a lot of modules. We collect all data logs from all operating systems, including Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers.
We're developing POCs for APM and security modules. We'll also have a unique portal for observability. This will make it easy to troubleshoot.
The most valuable aspect is for us to have everything in one place.
What is most valuable?
We're investigating many modules. We collect all data logs from all operating systems (Windows, Linux, VMware, and bare metal data centers). We also automatize the installation of the agent on servers.
We're doing POCs in APM and security.
Soon, we'll have a unique portal for observability. This will make troubleshooting easy at levels 1 and 2.
The most valuable aspect for us is to have everything in the same place.
What needs improvement?
We need a lot of modules since we collect all data logs from all operating systems.
The most important module for us is log management. The second is the security module. The third one is the APM.
For how long have I used the solution?
We've used the solution for one year.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Test Engineer at a tech services company with 1,001-5,000 employees
Great for monitoring, helps with internal communication and offers improved visibility
Pros and Cons
- "We've been able to glean from the monitors what servers are down, and can alert the team in Slack."
- "The more tools that they can build that allow you to run AWX playbooks, or other similar fixes, would benefit clients greatly."
What is our primary use case?
We're moving towards the cloud yet still have several active data center contracts. As we move to the cloud, we are interested in knowing more about our services, and DataDog APM/logs should give us this perspective.
We currently use the infrastructure monitoring part of DataDog. Still, I've really seen the advantage of moving more data into the cloud for comparison and being able to have one place where we can view all related pieces of information regarding a possible incident or potential issue.
How has it helped my organization?
We've been able to glean from the monitors what servers are down, and can alert the team in Slack. Knowing what we need to do next is what we would like to move to, so seeing the power of Notebooks is key. We also have several other services that we are underutilizing (logging, error tracking, etc.) that would be better housed in DataDog since it gives us more visibility into linking all of the things together into one cohesive picture.
What is most valuable?
Monitoring has been invaluable, and as we start to look to other products, bringing in logs and APM traces will create a full picture of what we need to do to resolve incidents. We're also interested in our users' perspectives and when things are slowing down for them.
Additionally, we are interested in the workflows announced today as an actionable decision tree that will allow our teams to see what is wrong and know what to do based on the connected runbook/notebook. I feel that this is the final piece that will make DataDog so much more valuable than its competition.
What needs improvement?
I have talked to vendors that mention that DataDog is drawing attention to an issue. However, you'll still need to take action on your own. The more tools that they can build that allow you to run AWX playbooks, or other similar fixes, would benefit clients greatly.
For how long have I used the solution?
We've used the solution for two years.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
Technology Competency and Solution Head at LearningMate
Good infrastructure and traffic visualizations help with capacity planning
Pros and Cons
- "The error traceability is an area that can be improved."
What is our primary use case?
We use Datadog for application monitoring, to help identify errors. It is also used to monitor application performance.
It helps organizations to understand User Experience with user behaviour pattern
How has it helped my organization?
Helped to reduce production issues in a defined timeframe
Helped to refine UX
What is most valuable?
Datadog has a very good visualization for my complete infrastructure and network traffic, which enabled me to create a capacity plan.
This product is great because it shows you the SQL and your application request in a single view.
What needs improvement?
The error traceability is an area that can be improved. This is something that helps us to pinpoint the area where a problem is occurring. It is a function stack, and it should be showing us how each function is defined.
For how long have I used the solution?
We have been using Datadog for the past couple of Years.
What do I think about the stability of the solution?
I have not worked on it long enough to properly comment on stability, yet, because it has to be tested across my other platforms.
What do I think about the scalability of the solution?
We have not done a full evaluation yet, but given that it is cloud-based, DataDog has to be scalable.
How are customer service and support?
I have not needed to contact technical support.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We were using New Relic prior to implementing Datadog. In terms of application monitoring, Datadog is not up to the level that New Relic is. It is a better product but the price is too high, which is why we switched.
How was the initial setup?
Yes. It is not complex. It allows you to get a certification of DataDog prior deployment of associates to administration and configuration
What about the implementation team?
Inhouse. We got our Admin team certified.
What was our ROI?
Time to resolution production issue
What's my experience with pricing, setup cost, and licensing?
The price is better than some competing products.
Which other solutions did I evaluate?
NewRelic
What other advice do I have?
This is a good product and I can recommend it to others, although New Relic is still my first choice. Datadog is my second choice.
Overall, it is a good product and my main complaint is that it needs better error traceability.
I would rate this solution a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Infrastructure Engineer at DATACAMP, INC
Easy to set up, supported with good documentation, and the single pane of glass improves efficiency
Pros and Cons
- "The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need."
- "The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts."
What is our primary use case?
We use Datadog as a monitoring platform to achieve visibility into our container environments.
Almost all of our workloads are containerized and with DataDog, we are able to get metrics, logs, alerts, and events about all the containers that we are running. Our developers also extensively use APM to find and diagnose performance issues that might appear.
We use Terraform to automatically create all of the necessary monitors and dashboards that our developers need to make sure that our level of service is sufficient.
How has it helped my organization?
We implemented Datadog around the same time as the company was growing from 30 to 150 people. Before that, we didn't have a standard stack for monitoring. Each team used their own logging solutions, metrics were missing or non-existent, and it was impossible to correlates metrics collected by different teams. DataDog provided us with an out-of-the-box solution that allowed us to focus on putting in place practices and processes around monitoring, rather than focus on implementation details.
Every squad is now confident in their ability to quickly identify and diagnose issues when they arise.
What is most valuable?
The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need.
Thanks to the unified tagging system, it's really easy to jump around the different Datadog products without losing the context. That makes debugging really easy for developers because they can go from APM to logs to metrics in a few clicks.
Watchdog is also a great feature that helped us identify overlooked issues more than once.
What needs improvement?
The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts.
SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like:
- The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful.
- The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.
For how long have I used the solution?
We've been using Datadog for a bit more than two years.
How are customer service and technical support?
There is extensive documentation and the support is very reactive.
Which solution did I use previously and why did I switch?
Prior to using Datadog, each team was using their own solutions. This included a mix of custom tooling, third-party tools, and AWS tools.
How was the initial setup?
The initial setup is very easy.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Product Categories
Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM)Popular Comparisons
Zabbix
New Relic
Azure Monitor
Elastic Observability
SolarWinds NPM
PRTG Network Monitor
ThousandEyes
Nagios XI
LogicMonitor
Centreon
Auvik Network Management (ANM)
ScienceLogic
Icinga
Checkmk
BMC TrueSight Operations Management
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- Any advice about APM solutions?
- Which would you choose - Datadog or Dynatrace?
- What is the biggest difference between Datadog and New Relic APM?
- Which monitoring solution is better - New Relic or Datadog?
- Do you recommend Datadog? Why or why not?
- How is Datadog's pricing? Is it worth the price?
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- What cloud monitoring software did you choose and why?