Try our new research platform with insights from 80,000+ expert users
reviewer2000463 - PeerSpot reviewer
Technical Lead at a wholesaler/distributor with 1,001-5,000 employees
Real User
Great dashboards, easy to tweak, and showcases helpful metrics
Pros and Cons
  • "The ease of correcting these dashboards and widgets when needed is amazing."
  • "The parallel editing of the dashboards should not cause users to lose the work of another person."

What is our primary use case?

We use Datadog for observability and monitoring primarily. Various cross-functional teams have built various dashboards, including Developers, QA, DevOps, and SRE. 

There are also some dashboards created for senior leadership to keep tabs on days to day activities like cost, scale, issues, etc. 

Also, we've set up monitors and alarms that kick off when any metrics go beyond the threshold. With Slack and PagerDuty integration, correct team members get alerted and react to solve the issue based on various runbooks.

How has it helped my organization?

Using Datadog metrics has helped the organization a lot in many manners. With one centralized monitoring place, it's a lot less effort to keep track of the system and applications' health. 

Using this also helps teams be proactive in dealing with any issues before they get escalated by customers. 

Lastly, having so many integrations makes the DevOps and SRE's lives a lot easier when automating the detection and resolution of any issues hidden in the system or applications. Overall, it has helped a lot.

What is most valuable?

My favorite feature is creating dashboards as that empowers me to sleep calmly at night and not to keep watch on critical system metrics. Be it DB metrics or computer-related metrics, it's always easy to view them. 

The ease of correcting these dashboards and widgets when needed is amazing. 

The only issue I face is when more than one person editing these dashboards simultaneously, one or the other person sometimes loses his/her work. That said,  they will resolve that soon. With the variety of widgets, it's so easy to plot the data in a timely manner, and that makes monitoring a lot easier.

What needs improvement?

The solution can be improved in a few areas. 

The parallel editing of the dashboards should not cause users to lose the work of another person. 

Secondly, we would like to see more demos of tools that are in beta version, when they come live. I am sure they will help us a lot.

Buyer's Guide
Datadog
March 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
842,767 professionals have used our research since 2012.

For how long have I used the solution?

I've been using the solution for slightly over two years.

What do I think about the stability of the solution?

I find the solution to be very stable.

What do I think about the scalability of the solution?

I totally love it. It is scalable. 

Which solution did I use previously and why did I switch?

We previously used Sumo Logic.

How was the initial setup?

The initial setup is not so difficult.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

The ROI is very fair so far.

What's my experience with pricing, setup cost, and licensing?

I can't recommend the licensing.

Which other solutions did I evaluate?

I was not involved in any pre-evaluation process.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1994838 - PeerSpot reviewer
Software Engineer at Enable Medicine
User
Centralizes logs and provides high-level views but is quite expensive
Pros and Cons
  • "Datadog has made it much easier to have a central place for people to look for logs and made it much easier to notify them of any elevated error rates or failures."
  • "The product is quite complex, and there are so many features that I either didn't know about or wasn't sure how to use."

What is our primary use case?

We mostly use it to handle log aggregation, monitor our web application, and alert us on data pipeline failures. 

Our system is fully on AWS, and so we pipe in all of our Cloudwatch logs into Datadog to have a central place to index and search logs. 

Our web app is built on an Elastic Beanstalk backend, and we use the Datadog agent to keep track of all of the requests that hit our backend and all of their components. 

We also use the prebuilt AWS pipeline dashboards to monitor our batch jobs and lambdas.

How has it helped my organization?

Datadog has made it much easier to have a central place for people to look for logs and made it much easier to notify them of any elevated error rates or failures. 

It is also easier to get high-level views of platform health, whereas looking directly at AWS tends to provide very specific insight into particular surface areas or products. 

By having the whole team onboard onto Datadog, we also have a single source of truth that everyone can use when triaging and resolving incidents that occur across any surface area.

What is most valuable?

The ease of setting up metrics and alerting and integrating with Slack has significantly reduced the friction of keeping the team up to date on the platform's health. Before creating custom Cloudwatch metrics was never very intuitive, and also it was non-trivial to set up integrations with other services we use, especially Slack

It also provides a good way to gain the context needed when trying to fix issues, as it's a central place to look through logs, requests, AWS metrics, and more - overall contributing to the health of our platform.

What needs improvement?

The product is quite complex, and there are so many features that I either didn't know about or wasn't sure how to use.

One thing that could be improved is somehow surfacing interesting or relevant products that might be applicable given our infrastructure. 

Additionally, the billing can sometimes be confusing and opaque, especially around not making it obvious what the implications can be if you add different AWS integrations. This has caused some unexpected costs in the past due to engineers not understanding how Datadog pricing works.

For how long have I used the solution?

We've used the solution for around two years.

Which solution did I use previously and why did I switch?

This was the first solution we tried.

What's my experience with pricing, setup cost, and licensing?

It is quite expensive, especially if you don't know how the pricing works.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
March 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
842,767 professionals have used our research since 2012.
reviewer1493811 - PeerSpot reviewer
Sr. Architect - SaaS Ops at CommVault
Real User
Improves infrastructure visibility, integrates well, and fine-tuning the monitors is easy to do
Pros and Cons
  • "The ability to send notifications based on metadata from the monitor is helpful."
  • "Once agents are connected to the Datadog portal, we should be able to upgrade them quickly."

What is our primary use case?

We primarily use DataDog for performance and log monitoring of cloud environments, which include VMs and Azure Services like Azure compute, storage, network, firewall, and app services via event hubs.  

Alerting based on monitors via teams and PagerDuty

Logs collection for Azure services like Azure database, Azure Application Gateway, Azure AKS, and other Azure services.

Custom metrics using a Python script to collect metrics for components not natively supported by Datadog.

Synthetic testing to ensure uptime and browser tests via CI/CD pipeline.

How has it helped my organization?

Datadog has improved our visibility into infrastructure topology and performance. It provided a simplified view and ability to drill down to system performance, process usage, and logs.

We were able to set up monitors for infrastructure and applications, as the metrics were readily available in the platform. Fine-tuning monitors is very easy and the ability to configure monitor alerts with details on how to resolve the alert is a key value add. 

Integration with PagerDuty, teams ensure timely alerting. PagerDuty integration bring tags from Datadog to PagerDuty, which is very useful in routing incidents to the right service

What is most valuable?

The Host Map, Live Process provides performance metrics of our application. The support team likes using Datadog for identifying resources affected and obtaining the logs. 

Monitors are easy and quick to setup. Metrics are easily accessible and quick to use. The ability to send notifications based on metadata from the monitor is helpful. The setup for monitors is one time and it works for all workloads, whether it is Azure or any other cloud.

Logs rehydration helps us archive and rehydrate logs as we need. We don't need logs to be indexed at all times. Logs are required only for escalations and rehydrating does the job and provides cost savings.

What needs improvement?

We need the ability to create a service dependency map like Splunk ITSI. We have to build this in PagerDuty and it's not the best user experience. The ability to create custom inventory objects based on logs ingested would be a value add. It would be better if Datadog makes this a simple click and enable.

It would be helpful to have the ability to upgrade agents via the Datadog portal. Once agents are connected to the Datadog portal, we should be able to upgrade them quickly.

Security monitoring for Azure and Operating System (Windows and Linux) are features that need to be addressed.

Dashboards for Azure Active Directory metrics and events should be improved.

For how long have I used the solution?

We have been using Datadog for more than six months.

What do I think about the stability of the solution?

Stability-wise, it has been good.

What do I think about the scalability of the solution?

The scalability is good so far. 

How are customer service and technical support?

Support team has been very responsive. Only complain is on issues they don't understand, they should have a quick call and unblock the customer.

Which solution did I use previously and why did I switch?

We didn't have a solution in place. The only thing we had were logs.

How was the initial setup?

Setup is hassle-free and pretty straightforward. 

What about the implementation team?

I deployed it myself.

What was our ROI?

No returns yet. We are in growth mode. If this becomes expensive we may have to look at alternative options.

What's my experience with pricing, setup cost, and licensing?

The cost is high and this can be justified if the scale of the environment is big.

Datadog needs to provide better pricing for large customers.

Which other solutions did I evaluate?

Prior to implementing Datadog, we evaluated Splunk.

What other advice do I have?

Overall, the Datadog product is really good.

It doesn't need a sales team and yet, the sales team has screwed up on some occasions. It's a great product and the customer success needs to put an extra effort to help customers with best practices rather than passing them off to support.

Customer success doesn't evangelize product features and the customer doesn't know what new is coming unless they ask about it.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1486134 - PeerSpot reviewer
Infrastructure Engineer at DATACAMP, INC
Vendor
Easy to set up, supported with good documentation, and the single pane of glass improves efficiency
Pros and Cons
  • "The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need."
  • "The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts."

What is our primary use case?

We use Datadog as a monitoring platform to achieve visibility into our container environments.

Almost all of our workloads are containerized and with DataDog, we are able to get metrics, logs, alerts, and events about all the containers that we are running. Our developers also extensively use APM to find and diagnose performance issues that might appear.

We use Terraform to automatically create all of the necessary monitors and dashboards that our developers need to make sure that our level of service is sufficient.

How has it helped my organization?

We implemented Datadog around the same time as the company was growing from 30 to 150 people. Before that, we didn't have a standard stack for monitoring. Each team used their own logging solutions, metrics were missing or non-existent, and it was impossible to correlates metrics collected by different teams. DataDog provided us with an out-of-the-box solution that allowed us to focus on putting in place practices and processes around monitoring, rather than focus on implementation details.

Every squad is now confident in their ability to quickly identify and diagnose issues when they arise.

What is most valuable?

The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need.

Thanks to the unified tagging system, it's really easy to jump around the different Datadog products without losing the context. That makes debugging really easy for developers because they can go from APM to logs to metrics in a few clicks.

Watchdog is also a great feature that helped us identify overlooked issues more than once.

What needs improvement?

The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts.

SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like:

  • The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful.
  • The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.

For how long have I used the solution?

We've been using Datadog for a bit more than two years.

How are customer service and technical support?

There is extensive documentation and the support is very reactive.

Which solution did I use previously and why did I switch?

Prior to using Datadog, each team was using their own solutions. This included a mix of custom tooling, third-party tools, and AWS tools.

How was the initial setup?

The initial setup is very easy. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1477686 - PeerSpot reviewer
Senior DevOps Engineer at DigitalOnUs
Real User
Affordably-priced and improves visibility of infrastructure, apps, and services
Pros and Cons
  • "Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers."
  • "The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances."

What is our primary use case?

Our primary use of Datadog includes: 

  • Keeping a close look into our AWS resources. Monitoring our multiple RDS and ElastiCache instances play a big role in our indicators.
  • Kubernetes. We aren't using all of the available Kubernetes integrations but the few of them that work out of the box adds great value to our metrics.
  • Monitoring and alerting. We wired our most relevant monitoring and alerts to services like PagerDuty, and for the rest of them, we keep our engineers up to date with constant Slack updates. 

How has it helped my organization?

Observability is something that a lot of Companies are trying to achieve. Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers.

For a logging solution, we use to have Papertrail. It did the trick but having a single point that manages and indexes all the logs is a BIG improvement. Also, having the option to generate metrics from logs is a game-changer that we're trying to include in our monitoring strategy.

I would like to say the same about APM but the support for PHP seems to be somewhat lacking. It works but I think this service could provide us more information.

What is most valuable?

With respect to logs, we used to integrate various kinds of tools to achieve very basic tasks and it always felt like a very fragile solution. I think logs are by far the most useful feature and at the same time, the one that we could improve.

APM - This is either a hit or miss, allow me to explain: we use various programming languages, mainly PHP and Ruby, and the traces generated don't always provide all of the information we want. For example, we get a great level of detail for the SQL queries that the app generates but not so much for the PHP side. It's hard to track where exactly where all of the bottlenecks are, so some analysis tools for APM could make a good addition.

What needs improvement?

Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support.

The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.

For how long have I used the solution?

We have been using Datadog for one year.

What do I think about the stability of the solution?

It's pretty stable for the main integrations. There was only one time where Datadog was down and that was scary since all of our monitoring is handled by Datadog. There was a lot of uncertainty while the outage was in place.

What do I think about the scalability of the solution?

For everyday use, it's adequate, but for very specific tasks, not so much. There was a time where I had to do a big export and as expected, the API is somewhat limited. Since it was a one-time task, it was not a big deal but if this was a regular task, I wouldn't be happy about it.

How are customer service and technical support?

For small tasks, I think it's great. For specialized support, it feels like you're under-staffed, having to wait days/weeks for a solution is a big NO-NO.

Which solution did I use previously and why did I switch?

I've used a few other products such as NewRelic and AppDynamics. The switch is usually affected by two factors: pricing and convenience.

How was the initial setup?

Getting APM metrics out of Kubernetes is always a painful task. We got support to take a look at this and we had to go through various iterations to get it right, and then AGAIN the next year. This was a bad experience.

What about the implementation team?

It was all implemented in-house. The documentation is fairly up to date, for the most part.

What's my experience with pricing, setup cost, and licensing?

Pricing is somewhat affordable compared to other solutions but in order to really lower the costs of other products you need to plan very carefully your resources usage, otherwise, it can get expensive real quick.

Which other solutions did I evaluate?

Unfortunately, it wasn't my call to include Datadog for this Company but sure I'm glad that the Lead Architect took this decision. It brought many improvements in a small span of time.

What other advice do I have?

Please add PHP profiling soon!

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Cloud Architect at a tech services company
Real User
Good graphs, dashboards, and user-interface
Pros and Cons
  • "This is definitely a good product and I would consider them one of the leaders within the application monitoring and cloud monitoring space."
  • "Additional metrics should be included."

What is our primary use case?

We are a solution provider and Datadog is one of the products that I was working on with one of my clients. They are currently evaluating it for use in cloud monitoring.

Specifically, Datadog is used for monitoring cloud applications in terms of performance. The logs come into this solution from AWS and it provides dashboards for various environments.

What is most valuable?

The most valuable features are the graphs, dashboards, metrics, and the interface.

What needs improvement?

Additional metrics should be included.

Better integration with other solutions is needed.

For how long have I used the solution?

I used Datadog in a project that lasted between one and two years. 

What do I think about the stability of the solution?

In terms of stability, I have not seen any issues and don't have any complaints.

What do I think about the scalability of the solution?

Datadog is easy to scale.

How are customer service and technical support?

We have not contacted technical support.

How was the initial setup?

The initial setup was okay. I was not part of the implementation team but from my understanding, it was not complex.

What about the implementation team?

Our in-house team handled the deployment.

Which other solutions did I evaluate?

My client is currently evaluating several monitoring tools including Datadog, Dynatrace, and AppDynamics. Compared to Dynatrace, Datadog has some room for improvement.

What other advice do I have?

This is definitely a good product and I would consider them one of the leaders within the application monitoring and cloud monitoring space. My advice to anybody who is researching this solution is to consider it within the top three. That said, there are some features and metrics that are available in other products, such as Dynatrace, that are not available in Datadog.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Real User
Great dashboards, good monitoring, and easy SLAs
Pros and Cons
  • "Profiling has been made easier."
  • "Lately, chat support has a longer waiting time."

What is our primary use case?

Our primary use case would be using the dashboards and getting proper insights based on the dashboards.

The monitoring, SLO, and SLA have been better and easier since we started using the Terraform infrastructure. APM has been easier as we had to enable it through the CronJob directly.

Profiling has been made easier. We are able to get many insights into the code. Profiling provides really good insights right now. 

Logs are the most valuable and the best solution so far. Datadog can help solve any slow queries or database-related errors. 

The primary use case would be using the dashboards and getting proper insights based on the dashboards.

How has it helped my organization?

Monitoring has been better and easier since we started using the Terraform infrastructure.

APM has been easier as we had to enable it through the CronJob directly.

Profiling has made it easier in terms of getting many insights into the code.

The logs are the most valuable and the best solution. Datadog can help us to solve any slow queries or database-related errors.

What is most valuable?

Profiling provides really good insights, and APM has really good tracing visibility. 

The SLA and SLO definitions and the monitoring are also really important and very valuable parts of the product and make great Datadog features. 

Datadog support is also really valuable as they provide support for the product through the chat as well. 

The Datadog premium support has helped us to provide faster outcomes for a problem. 

Also, rather than having an email thread, it would be better to get the support on call and sort out the issue, which is the support we get from Datadog CSM.

What needs improvement?

Integration should have been easier. It is very tough to go to all the services and enable Datadog integration for each AWS service. 

We can add the AWS services and the services on one page and show only the services that are enabled. A similar approach should be for any other integration.

Lately, chat support has a longer waiting time. We would love to get faster chat support. We also need additional support for sending the flare files

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2044977 - PeerSpot reviewer
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees
Real User
Good alerts and monitoring with a relatively simple setup
Pros and Cons
  • "The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast."
  • "Managing dashboards as IaC is a bit hard to work out at times."

What is our primary use case?

Datadog provides us with a solution for data ingesting for all of our application metrics, resource metrics, APM/tracing data etc. 

We use it for use in dashboards, monitoring/alerting, SLO targets, incident response etc. 

We have a lot of applications across multiple languages/frameworks etc., and have deployed in Kubernetes across multiple regions in AWS, along with underlying managed resources such as SQS, Aurora, etc. 

Datadog makes understanding the state of these seamless. We are a company with millions of daily active users, and this level of detail is excellent.

How has it helped my organization?

Datadog has allowed us to rapidly spin up alerting and monitoring that helps our incident responders get alerted quickly when our SLOs are in danger and helps to quickly resolve issues. 

It is the single most important tool we have from an SRE perspective. 

It also provides us with an easy way to get information at a glance for all of our services through APM and create unified dashboards that track our underlying resources, such as databases, queues, etc., alongside application data. 

It has been invaluable to our organization.

What is most valuable?

The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast. 

Management of resources using infrastructure-as-code has been a recent game-changer for us. Combining the two has allowed us to provide product teams with a total solution for getting their applications attached to user-focused alerting and monitoring within a matter of days rather than months - and has clearly impacted our ability to discover and respond to significant production incidents.

What needs improvement?

Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us. 

There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.

For how long have I used the solution?

I've been using the solution for over five years.

What do I think about the stability of the solution?

I have never seen a major outage that prevented us from using Datadog, although I can't speak for other teams/time zones

What do I think about the scalability of the solution?

This product is massively scalable - I haven't seen any issues as we continue to onboard new technologies and teams

How are customer service and support?

Datadog provides us with a number of direct lines to support, although I haven't personally required their assistance.

Which solution did I use previously and why did I switch?

We previously used LightStep for APM and switched to Datadog to unify all of our application data.

How was the initial setup?

Most elements are quite simple to set up. However, some types of data collection require organization-wide engineering buy-in.

What about the implementation team?

We handled the initial setup in-house.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: March 2025
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.