Try our new research platform with insights from 80,000+ expert users
reviewer1486134 - PeerSpot reviewer
Infrastructure Engineer at DATACAMP, INC
Vendor
Easy to set up, supported with good documentation, and the single pane of glass improves efficiency
Pros and Cons
  • "The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need."
  • "The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts."

What is our primary use case?

We use Datadog as a monitoring platform to achieve visibility into our container environments.

Almost all of our workloads are containerized and with DataDog, we are able to get metrics, logs, alerts, and events about all the containers that we are running. Our developers also extensively use APM to find and diagnose performance issues that might appear.

We use Terraform to automatically create all of the necessary monitors and dashboards that our developers need to make sure that our level of service is sufficient.

How has it helped my organization?

We implemented Datadog around the same time as the company was growing from 30 to 150 people. Before that, we didn't have a standard stack for monitoring. Each team used their own logging solutions, metrics were missing or non-existent, and it was impossible to correlates metrics collected by different teams. DataDog provided us with an out-of-the-box solution that allowed us to focus on putting in place practices and processes around monitoring, rather than focus on implementation details.

Every squad is now confident in their ability to quickly identify and diagnose issues when they arise.

What is most valuable?

The fact that everything is under a single pane of glass is really valuable, as developers don't have to spend their time copying correlation IDs across tools to find what they need.

Thanks to the unified tagging system, it's really easy to jump around the different Datadog products without losing the context. That makes debugging really easy for developers because they can go from APM to logs to metrics in a few clicks.

Watchdog is also a great feature that helped us identify overlooked issues more than once.

What needs improvement?

The incident management beta looks promising, but it is still missing the ability to automatically create incidents based on certain alerts.

SLOs are also a great way to visualize how you are doing with regard to the level of service that you are providing but it missing crucial components like:

  • The ability to visualize the remaining error budget and how it evolved during the month. An error budget burndown graph would be helpful.
  • The ability to display a different level of alert on an SLO based on how fast it is consuming the error budget. This is the slow burn versus fast burn.
Buyer's Guide
Datadog
April 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.
848,716 professionals have used our research since 2012.

For how long have I used the solution?

We've been using Datadog for a bit more than two years.

How are customer service and support?

There is extensive documentation and the support is very reactive.

Which solution did I use previously and why did I switch?

Prior to using Datadog, each team was using their own solutions. This included a mix of custom tooling, third-party tools, and AWS tools.

How was the initial setup?

The initial setup is very easy. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1477686 - PeerSpot reviewer
Senior DevOps Engineer at DigitalOnUs
Real User
Affordably-priced and improves visibility of infrastructure, apps, and services
Pros and Cons
  • "Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers."
  • "The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances."

What is our primary use case?

Our primary use of Datadog includes: 

  • Keeping a close look into our AWS resources. Monitoring our multiple RDS and ElastiCache instances play a big role in our indicators.
  • Kubernetes. We aren't using all of the available Kubernetes integrations but the few of them that work out of the box adds great value to our metrics.
  • Monitoring and alerting. We wired our most relevant monitoring and alerts to services like PagerDuty, and for the rest of them, we keep our engineers up to date with constant Slack updates. 

How has it helped my organization?

Observability is something that a lot of Companies are trying to achieve. Having a clear view, not only of our infrastructure but our apps and services as well, has brought a great added value to our customers.

For a logging solution, we use to have Papertrail. It did the trick but having a single point that manages and indexes all the logs is a BIG improvement. Also, having the option to generate metrics from logs is a game-changer that we're trying to include in our monitoring strategy.

I would like to say the same about APM but the support for PHP seems to be somewhat lacking. It works but I think this service could provide us more information.

What is most valuable?

With respect to logs, we used to integrate various kinds of tools to achieve very basic tasks and it always felt like a very fragile solution. I think logs are by far the most useful feature and at the same time, the one that we could improve.

APM - This is either a hit or miss, allow me to explain: we use various programming languages, mainly PHP and Ruby, and the traces generated don't always provide all of the information we want. For example, we get a great level of detail for the SQL queries that the app generates but not so much for the PHP side. It's hard to track where exactly where all of the bottlenecks are, so some analysis tools for APM could make a good addition.

What needs improvement?

Please add PHP profiling; you already have it for other popular programming languages such as Python and Java, which is great because we have a little bit of those, but our main app is powered by PHP and we don't have profiling for this yet. I guess it's only a matter of time for this to be added, so in the meanwhile, you can consider this review as a vote for the PHP profiling support.

The pricing model could be simplified as it feels a bit outdated, especially when you look at the billing model of compute instances vs the containers instances.

For how long have I used the solution?

We have been using Datadog for one year.

What do I think about the stability of the solution?

It's pretty stable for the main integrations. There was only one time where Datadog was down and that was scary since all of our monitoring is handled by Datadog. There was a lot of uncertainty while the outage was in place.

What do I think about the scalability of the solution?

For everyday use, it's adequate, but for very specific tasks, not so much. There was a time where I had to do a big export and as expected, the API is somewhat limited. Since it was a one-time task, it was not a big deal but if this was a regular task, I wouldn't be happy about it.

How are customer service and technical support?

For small tasks, I think it's great. For specialized support, it feels like you're under-staffed, having to wait days/weeks for a solution is a big NO-NO.

Which solution did I use previously and why did I switch?

I've used a few other products such as NewRelic and AppDynamics. The switch is usually affected by two factors: pricing and convenience.

How was the initial setup?

Getting APM metrics out of Kubernetes is always a painful task. We got support to take a look at this and we had to go through various iterations to get it right, and then AGAIN the next year. This was a bad experience.

What about the implementation team?

It was all implemented in-house. The documentation is fairly up to date, for the most part.

What's my experience with pricing, setup cost, and licensing?

Pricing is somewhat affordable compared to other solutions but in order to really lower the costs of other products you need to plan very carefully your resources usage, otherwise, it can get expensive real quick.

Which other solutions did I evaluate?

Unfortunately, it wasn't my call to include Datadog for this Company but sure I'm glad that the Lead Architect took this decision. It brought many improvements in a small span of time.

What other advice do I have?

Please add PHP profiling soon!

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
April 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.
848,716 professionals have used our research since 2012.
Cloud Architect at a tech services company
Real User
Good graphs, dashboards, and user-interface
Pros and Cons
  • "This is definitely a good product and I would consider them one of the leaders within the application monitoring and cloud monitoring space."
  • "Additional metrics should be included."

What is our primary use case?

We are a solution provider and Datadog is one of the products that I was working on with one of my clients. They are currently evaluating it for use in cloud monitoring.

Specifically, Datadog is used for monitoring cloud applications in terms of performance. The logs come into this solution from AWS and it provides dashboards for various environments.

What is most valuable?

The most valuable features are the graphs, dashboards, metrics, and the interface.

What needs improvement?

Additional metrics should be included.

Better integration with other solutions is needed.

For how long have I used the solution?

I used Datadog in a project that lasted between one and two years. 

What do I think about the stability of the solution?

In terms of stability, I have not seen any issues and don't have any complaints.

What do I think about the scalability of the solution?

Datadog is easy to scale.

How are customer service and technical support?

We have not contacted technical support.

How was the initial setup?

The initial setup was okay. I was not part of the implementation team but from my understanding, it was not complex.

What about the implementation team?

Our in-house team handled the deployment.

Which other solutions did I evaluate?

My client is currently evaluating several monitoring tools including Datadog, Dynatrace, and AppDynamics. Compared to Dynatrace, Datadog has some room for improvement.

What other advice do I have?

This is definitely a good product and I would consider them one of the leaders within the application monitoring and cloud monitoring space. My advice to anybody who is researching this solution is to consider it within the top three. That said, there are some features and metrics that are available in other products, such as Dynatrace, that are not available in Datadog.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Real User
Great dashboards, good monitoring, and easy SLAs
Pros and Cons
  • "Profiling has been made easier."
  • "Lately, chat support has a longer waiting time."

What is our primary use case?

Our primary use case would be using the dashboards and getting proper insights based on the dashboards.

The monitoring, SLO, and SLA have been better and easier since we started using the Terraform infrastructure. APM has been easier as we had to enable it through the CronJob directly.

Profiling has been made easier. We are able to get many insights into the code. Profiling provides really good insights right now. 

Logs are the most valuable and the best solution so far. Datadog can help solve any slow queries or database-related errors. 

The primary use case would be using the dashboards and getting proper insights based on the dashboards.

How has it helped my organization?

Monitoring has been better and easier since we started using the Terraform infrastructure.

APM has been easier as we had to enable it through the CronJob directly.

Profiling has made it easier in terms of getting many insights into the code.

The logs are the most valuable and the best solution. Datadog can help us to solve any slow queries or database-related errors.

What is most valuable?

Profiling provides really good insights, and APM has really good tracing visibility. 

The SLA and SLO definitions and the monitoring are also really important and very valuable parts of the product and make great Datadog features. 

Datadog support is also really valuable as they provide support for the product through the chat as well. 

The Datadog premium support has helped us to provide faster outcomes for a problem. 

Also, rather than having an email thread, it would be better to get the support on call and sort out the issue, which is the support we get from Datadog CSM.

What needs improvement?

Integration should have been easier. It is very tough to go to all the services and enable Datadog integration for each AWS service. 

We can add the AWS services and the services on one page and show only the services that are enabled. A similar approach should be for any other integration.

Lately, chat support has a longer waiting time. We would love to get faster chat support. We also need additional support for sending the flare files

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2044977 - PeerSpot reviewer
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees
Real User
Good alerts and monitoring with a relatively simple setup
Pros and Cons
  • "The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast."
  • "Managing dashboards as IaC is a bit hard to work out at times."

What is our primary use case?

Datadog provides us with a solution for data ingesting for all of our application metrics, resource metrics, APM/tracing data etc. 

We use it for use in dashboards, monitoring/alerting, SLO targets, incident response etc. 

We have a lot of applications across multiple languages/frameworks etc., and have deployed in Kubernetes across multiple regions in AWS, along with underlying managed resources such as SQS, Aurora, etc. 

Datadog makes understanding the state of these seamless. We are a company with millions of daily active users, and this level of detail is excellent.

How has it helped my organization?

Datadog has allowed us to rapidly spin up alerting and monitoring that helps our incident responders get alerted quickly when our SLOs are in danger and helps to quickly resolve issues. 

It is the single most important tool we have from an SRE perspective. 

It also provides us with an easy way to get information at a glance for all of our services through APM and create unified dashboards that track our underlying resources, such as databases, queues, etc., alongside application data. 

It has been invaluable to our organization.

What is most valuable?

The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast. 

Management of resources using infrastructure-as-code has been a recent game-changer for us. Combining the two has allowed us to provide product teams with a total solution for getting their applications attached to user-focused alerting and monitoring within a matter of days rather than months - and has clearly impacted our ability to discover and respond to significant production incidents.

What needs improvement?

Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us. 

There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.

For how long have I used the solution?

I've been using the solution for over five years.

What do I think about the stability of the solution?

I have never seen a major outage that prevented us from using Datadog, although I can't speak for other teams/time zones

What do I think about the scalability of the solution?

This product is massively scalable - I haven't seen any issues as we continue to onboard new technologies and teams

How are customer service and support?

Datadog provides us with a number of direct lines to support, although I haven't personally required their assistance.

Which solution did I use previously and why did I switch?

We previously used LightStep for APM and switched to Datadog to unify all of our application data.

How was the initial setup?

Most elements are quite simple to set up. However, some types of data collection require organization-wide engineering buy-in.

What about the implementation team?

We handled the initial setup in-house.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003829 - PeerSpot reviewer
Sr Platform Engineer at a pharma/biotech company with 11-50 employees
Real User
Good logging with lots of great integrations and an interesting dashboard
Pros and Cons
  • "Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate."
  • "Some of the interface is still confusing to use."

What is our primary use case?

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

How has it helped my organization?

It's good to have a single location for all the logs. If you have logs coming from a whole lot of sources, it makes it hard to find where the problem lies. 

We had to spend a lot of time logging into various systems and pursuing a billion different log files looking for something that stands out as a possible cause of the issue. That can take a lot of time and doesn't give much visibility into the possible interactions between systems.

What is most valuable?

Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate. 

It has a lot of ability to make fancy and deep searches using regular expressions and to graph them into useful and interesting dashboard graphs. 

The plethora of built-in/downloadable integrations make it much easier to set up for our platforms. Otherwise, we'd have to parse the log files ourselves, which would take a great deal of effort. Had to do it before when had to use an ELK stack for logging, which was painful.

What needs improvement?

Some of the interface is still confusing to use. It has many features, and it takes a lot of effort to figure out what they all mean. Maybe having tooltips or something would be helpful. Also, some of the integrations are better than others.

For how long have I used the solution?

I've used the solution for a month.

What do I think about the stability of the solution?

The solution seems very stable.

Which solution did I use previously and why did I switch?

Have used an ELK stack before. However, it took a lot of effort to maintain, and parsing the logs was difficult.

How was the initial setup?

We implemented the solution in-house.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2004069 - PeerSpot reviewer
support Eng
Real User
Helpful dashboards with a good cloud security posture manager and cloud workload security
Pros and Cons
  • "It helps us better manage our logs."
  • "They should continue expanding and integrating with more third-party apps."

What is our primary use case?

We use the application for our application monitoring, data security monitoring, and log management. What we like about the application is that it helps us to track issues more proactively instead of reactively.

There are other improvements we would like to see.

1. Being able to restrict users from seeing or viewing specific dashboards once they log in

2. They can cut down the prices for Cloud SIEM. It seems very useful, however, the prices are high. Some organizations are finding it difficult to make decisions in terms of getting the tool.

How has it helped my organization?

We use the application for our application monitoring, data security monitoring, and log management. It helps us to track issues proactively instead of reactively.

It helps us better manage our logs.

We can effectively track down issues.

We have dashboards that give us an overview of our environment.

What is most valuable?

The tools I have found useful include the Datadog cloud security posture manager and cloud workload security.

What needs improvement?

Datadog is a great tool, and we value the services they offer. They should continue expanding and integrating with more third-party apps.

For how long have I used the solution?

I've used the solution for three years. 

What do I think about the stability of the solution?

I love its stability.

What do I think about the scalability of the solution?

It is very scalable.

How are customer service and support?

Technical support has been great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used AWS.

How was the initial setup?

The initial setup is not too complex.

What was our ROI?

We've seen an ROI of 50%.

What's my experience with pricing, setup cost, and licensing?

It's a little pricy yet worth it.

Which other solutions did I evaluate?

We did not previously evaluate another solution.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003214 - PeerSpot reviewer
Sr. Director of Software Engineering at a tech consulting company with 1,001-5,000 employees
Real User
Helpful support, good incident management, and helps triage faster
Pros and Cons
  • "The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support."
  • "The pricing is a bit confusing."

What is our primary use case?

The RUM is implemented for customer support session replays to quickly route, triage, and troubleshoot support issues which can be sent to our engineering teams directly. 

Customer Support will log in directly after receiving a customer request and work on the issue. Engineers will utilize the replay along with RUM to pinpoint the issue combined with APM and Infra trace to be able to look for signals to find the direct cause of the customer impact. 

Incident management will be utilized to open a Jira ticket for engineering, and it integrates with ITSM systems and on-call as needed.

How has it helped my organization?

The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support.

The RUM is implemented for customer support. It can quickly route, triage, and troubleshoot support issues that are sent to our engineering teams. 

Customer support can log in and start troubleshooting after receiving a customer request. The replay and RUM help pinpoint the issue. This functionality is combined with APM and Infra trace to be able to look for the cause of the issue. Incident management is leveraged to open a Jira ticket for engineering, and it can integrate with ITSM systems and on-call as needed.

What is most valuable?

RUM with session replay combined with a future use case to support synthetics will help to identify issues earlier in our process. We have not rolled this out yet but plan for it as a future use case for our customer support process. This, combined with integrated automation for incident management, will drive down our MTTR and time spent working through tickets. Overall, we are hoping to use this to look at our data and perfection rate over time in a BI-like way to reduce our customer support headcount by saving on time spent.

What needs improvement?

I would like to see retention options greater than 30-days for session replay. I'd also like to see forwarding options for retention to custom solutions, and a greater ability to event and export data from the tooling overall to BI/DW solutions for reporting across the long term and to see trends as needed.

For how long have I used the solution?

I've used the solution for about nine months.

What do I think about the stability of the solution?

So far, stability has been great.

What do I think about the scalability of the solution?

I'd like to see more bells and whistles added over time. Widgets are coming soon to help with RUM.

How are customer service and support?

Support is very good. They are responsive and gave us the help we need.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We have utilized New Relic, however, not for RUM. We went with Datadog to potentially switch the entire platform into an all-in-one solution that makes sense for a company of our size.

How was the initial setup?

We started on the beta, and the documentation was lagging behind. We also needed direct instructions and links from the customer support/account representative that was not immediately available by searching online.

What about the implementation team?

We implemented the solution ourselves.

What was our ROI?

Ideally, this will inform our strategy to not increase our customer support headcount as significantly into 2023 and beyond.

What's my experience with pricing, setup cost, and licensing?

The pricing is a bit confusing. However, the RUM session replay, in general, is very inexpensive compared to whole solutions.

Which other solutions did I evaluate?

We looked into LogRocket and New Relic.

What other advice do I have?

I'd advise other users to try it out.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2025
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.