Try our new research platform with insights from 80,000+ expert users
Tony Martinez1 - PeerSpot reviewer
Works at VANTA INC
Vendor
Great logging, session replays, and alerting
Pros and Cons
  • "Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening."
  • "The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog."

What is our primary use case?

Our primary use cases include:

  • Alert on errors customers encounter in our product. We've set up logs that go to slack to tell us when a certain error threshold is hit.
  • Investigate slow page load times. We have pages in our app that are loading slowly and the logs help us figure out which queries are taking the longest time.
  • Metrics. We collect metrics on product usage.
  • Session replays. We watch session replays to see what a user was doing when a page took a long time to load or hit an error. This is helpful.

How has it helped my organization?

It's helped us find bugs that customers are experiencing before they're reported to us. Sometimes, customers don't report errors, so being able to catch errors before they're reported helps us investigate before other users find errors

Datadog has helped us investigate slow page loading times and even see the specific queries that are taking a long time to load

Logging lets us see the context around an error. For example, see if a backend service had an error before it surfaced on the frontend.

Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening.

What is most valuable?

The most valuable aspects include: 

  • Logging. Being able to view detailed logs helps debug issues.
  • Session replays. They are helpful for seeing what a customer was doing before they saw an error or had a slow page load
  • Alerting. This is an important part of our on-call process to send alerts to slack when an error threshold is crossed. Alerts/monitors are easy to configure to only alert when we want them to alert.
  • Dashboards. It's helpful to pull up dashboards that show our most common errors or page performance. It's a good way to see how the app is performing from a birds-eye-view.

What needs improvement?

The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog.

The log querying syntax can be confusing. Usually, I filter by finding a facet in a log and selecting to filter by that facet - but I'm not sure how to write the filter myself

The monitor/alert syntax is also somewhat hard to understand.

Overall, it should be easier to learn how to use the product while you're using the product. Perhaps tooltips or a link to learn more about whatever section you're using.

Buyer's Guide
Datadog
March 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
839,319 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for two years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

Which other solutions did I evaluate?

We did not evaluate other options. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Works at Koddi
User
Improved response time and cost-efficiency with good monitoring
Pros and Cons
  • "The server monitoring, service monitoring, and user session monitoring are extremely helpful, as they allow us to be alerted ahead of time of issues that users might experience."
  • "I would also like to see an improvement in the server's data extraction times, as sometimes it can take up to ten minutes to download a report for a critical issue that is costing us money."

What is our primary use case?

We monitor our multiple platforms using Datadog and post alerts to Slack to notify us of server and end-user issues. We also monitor user sessions to help troubleshoot an issue being reported. 

We monitor 3.5 platforms on our Datadog instance, and the team always monitors the trends and Dashboards we set up. We have two instances to span the 3.5 platforms and are currently looking to implement more platform monitoring over time. The user session monitoring is consistent for one of these platforms. 

How has it helped my organization?

Datadog has improved our response time and cost-efficiency in bug reporting and server maintenance. We're able to track our servers more fluidly, allowing us to expand our outreach and decrease response time. 

There are many different ways that Datadog is used, and we monitor three and a half platforms on the Datadog environment at this time. By monitoring all of these platforms in one easy-to-use instance, we're able to track the platform with the issue, the issue itself, and its impact on the end user. 

What is most valuable?

The server monitoring, service monitoring, and user session monitoring are extremely helpful, as they allow us to be alerted ahead of time of issues that users might experience. More often than not, an issue is not only able to be identified, but solved and released before an end user notices an issue. 

We are currently using this as an investigative tool to notice trends, identify issues, and locate areas of our program that we can improve upon that haven't been identified as pain points yet. This is another effective use case. 

What needs improvement?

I would like to see a longer retention time of user sessions, even if by 24 to 48 hours, or even just having the option to be configurable. By doing this, we're enabled to store user sessions that have remained invisible for a long time, and identify issues that people are working around. 

I would also like to see an improvement in the server's data extraction times, as sometimes it can take up to ten minutes to download a report for a critical issue that is costing us money. Regardless, I am very happy with Datadog and love the uses we have for the program so far.  

For how long have I used the solution?

I've used the solution for more than four years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Datadog
March 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
839,319 professionals have used our research since 2012.
Caleb Parks - PeerSpot reviewer
Works at iSpot.tv
User
Lots of features with a rapid log search and an easy setup process
Pros and Cons
  • "The ease of graph building is nice, and MUCH easier than Prometheus."
  • "It is far too easy to run up huge unexpected costs."

What is our primary use case?

We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.

How has it helped my organization?

The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.

What is most valuable?

I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.

What needs improvement?

It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.

It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

I'd rate the stability ten out of ten.

What do I think about the scalability of the solution?

I'd rate the scalability ten out of ten.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The setup is very straightforward. Users just install the helm chart, and boom, you're done.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.

Which other solutions did I evaluate?

We've never evaluated other solutions.

What other advice do I have?

It's a great product. However, you have to pay for quality.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Ajay Thomas - PeerSpot reviewer
Engineering Manager at Dbt labs
Vendor
Great features and synthetic testing but pricing can get expensive
Pros and Cons
  • "We have been impressed with the uptime and clean and light resource usage of the agents."
  • "I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box."

What is our primary use case?

Our primary use case is custom and vendor supplied web application log aggregation, performance tracing and alerting. We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. 

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through use of Datadog across all of our apps we were able to consolidate a number of alerting and error tracking apps and Datadog ties them all together in cohesive dashboards. 

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting edge .NET Core with streaming logs all work. 

The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. The centralized pipeline tracking and error logging provides a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution is very scalable, very customizable.

How are customer service and support?

Service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The setup was generally simple. However, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house. 

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

I'm excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Senior Software Engineer at Clearstory.build
User
Top 10
Excellent for monitoring, analyzing, and optimizing performance
Pros and Cons
  • "Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization."
  • "The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency."

What is our primary use case?

Our primary use case for Datadog is monitoring, analyzing, and optimizing the performance and health of our applications and infrastructure. 

We leverage its logging, metrics, and tracing capabilities to pinpoint issues, track system performance, and improve overall reliability. Datadog’s ability to provide real-time insights and alerting on key metrics helps us quickly address issues, ensuring smooth operations. 

It’s integral for visibility across our microservices architecture and cloud environments.

How has it helped my organization?

Datadog has been incredibly valuable to our organization. Its ability to pinpoint warnings and errors in logs and provide detailed context is essential for troubleshooting. 

The platform's request tracing feature offers comprehensive insights into user flows, allowing us to quickly identify issues and optimize performance. 

Additionally, Datadog's real-time monitoring and alerting capabilities help us proactively manage system health, ensuring operational efficiency across our applications and infrastructure.

What is most valuable?

Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization. This feature helps us quickly identify performance bottlenecks and prioritize improvements. 

Additionally, the ability to filter requests by user email is extremely useful for tracking down user-specific issues faster. It streamlines the troubleshooting process and enables us to provide more targeted support to individual users, improving overall customer satisfaction.

What needs improvement?

The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency. Additionally, the interface can sometimes feel overwhelming, with so much happening at once, which may discourage users from exploring new features. Simplifying the layout or providing clearer guidance could enhance user experience. Any improvements related to query optimization would be highly beneficial, as it would further streamline workflows and boost productivity.

For how long have I used the solution?

I've used the solution for five years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Senior Software Engineer at Clearstory.build
User
Top 10
Capable of pinpointing warnings and errors in logs and provide detailed context
Pros and Cons
  • "Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization."
  • "The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency."

What is our primary use case?

Our primary use case for Datadog is to monitor, analyze, and optimize the performance and health of our applications and infrastructure. 

We leverage its logging, metrics, and tracing capabilities to pinpoint issues, track system performance, and improve overall reliability. 

Datadog’s ability to provide real-time insights and alerting on key metrics helps us quickly address issues, ensuring smooth operations. It’s integral for visibility across our microservices architecture and cloud environments.

How has it helped my organization?

Datadog has been incredibly valuable to our organization. Its ability to pinpoint warnings and errors in logs and provide detailed context is essential for troubleshooting. 

The platform's request tracing feature offers comprehensive insights into user flows, allowing us to quickly identify issues and optimize performance. 

Additionally, Datadog's real-time monitoring and alerting capabilities help us proactively manage system health, ensuring operational efficiency across our applications and infrastructure.

What is most valuable?

Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization. This feature helps us quickly identify performance bottlenecks and prioritize improvements. 

Additionally, the ability to filter requests by user email is extremely useful for tracking down user-specific issues faster. It streamlines the troubleshooting process and enables us to provide more targeted support to individual users, improving overall customer satisfaction.

What needs improvement?

The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency. 

Additionally, the interface can sometimes feel overwhelming, with so much happening at once, which may discourage users from exploring new features. 

Simplifying the layout or providing clearer guidance could enhance user experience. Any improvements related to query optimization would be highly beneficial, as it would further streamline workflows and boost productivity.

For how long have I used the solution?

I've used the solution for five years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Mason Parry - PeerSpot reviewer
Data Engineer at Nursa
User
Top 20
Customizable alerts, good dashboards, and improves reliability
Pros and Cons
  • "I like how we can customize alerts, and when alerts have become too noisy, we turn their threshold down fairly easily."
  • "It's not that straightforward when creating an alert. The syntax is a little confusing."

What is our primary use case?

We have several teams and several different projects, all working in tandem, so there are a lot of logs and monitoring that need to be done. We use Datadog mostly for alerting when things go down. 

We also have several dashboards to keep track of critical operations and to make sure things are running without issues. The Slack messaging is essential in our workflow in letting us know when an alert is triggered. I also appreciate all the graphs you can make, as it gives our team a good overview of how our services are doing.

How has it helped my organization?

It has improved our reliability and our time to get back up from an outage. By creating an alert and then messaging a Slack channel, we know when something goes down fairly fast. This, in turn, improves our response time to swarm on an issue without it affecting customers. The graphs have also been useful to demonstrate to higher-ups how our services are performing, allowing them to make more informed decisions when it comes to the team. 

What is most valuable?

The alerts are the most valuable. Having alerts have saved us countless times in the past and is essentially what we use data dog for. 

I like how we can customize alerts, and when alerts have become too noisy, we turn their threshold down fairly easily. This is also the case when alerts should be notifying us more often. 

I also like the graphs and how customizable they are. It allows us to create a nice-looking dashboard with all sorts of information relating to our project. This gives us a quick overview of how things are going.

What needs improvement?

It's not that straightforward when creating an alert. The syntax is a little confusing. I guess that the trade-off is customizability. But it would be nice to have a click-and-drag kind of way when creating an alert. So, if someone who isn't so familiar with Datadog or tech in general wanted to create an alert, they wouldn't need to know the syntax. 

It would also be great if AI could be used to generate alerts and graphs. I could write a short prompt, and then the AI could auto-generate alerts and graphs for me.

For how long have I used the solution?

I've used the solution for more than two years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Architect at SEI Investments
Real User
Great support with a helpful APM and profiler
Pros and Cons
  • "The most valuable aspects of the product include the APM and profiler."
  • "I find the training great. That said, it is set for the LCD (lowest common denominator). Of course, this is very helpful to sell the product, yet, to really utilize the product, you need to get more detailed."

What is our primary use case?

We primarily use Datadog for:

  • Native memory
  • Logging
  • APM
  • Context switching
  • RUM
  • Synthetic
  • Databases
  • Java
  • JVM settings
  • File i/o
  • Socket i/o
  • Linux
  • Kubernetes
  • Kafka
  • Pods
  • Sizing

We are testing Datadog as a way to reduce our operational time to fix things (mean time to repair). This is step one. We hope to use Datadog as a way to be proactive instead of reactive (mean time to failure).

So far, Datadog has shown very good options to work on all of our operational and development issues. We are also trying to use Datadog to shift left, and fix things before they break (MTTF increase).

How has it helped my organization?

We are currently in a POC and do not own Datadog at the moment. 

So far, there have been a few issues due to security. There are two main security issues. 

The first is moving data off-prem. This has been resolved to a point (filtering logs, etc). However, there is still an issue with moving a JFR as a JFR potentially contains data that is not allowed off-prem.

The second security issue is more internal, however, the main installation requires root access or using an ACL. Our company does not use ACLs on our Linux platform. This is problematic since the install sets a no-login on the Datadog user.

What is most valuable?

The most valuable aspects of the product include the APM and profiler.

These two have given us insights into things that are very difficult to track down given the standard OS (Linux) tools. 

The native memory tracking is super difficult to see exactly where it comes from. I attended a course (continuous profiling), and it showed me the potentially very important capabilities.

If you add these details to a standard dashboard, or a sub-dashboard for techy people, or even just a notebook, it would be easy to identify issues before they occur.

Combining these details with the basic tools (infra, logging, APM, and good rules), Datadog can easily show the details that a true engineer would need. It isn't just for monitoring, however, I see the value in it for engineers.

What needs improvement?

I have done every training offered (and in a short period of time: two days for 20 courses).

I find the training great. That said, it is set for the LCD (lowest common denominator). Of course, this is very helpful to sell the product, yet, to really utilize the product, you need to get more detailed.

If I did the training as it is written and I cut/paste a bunch of stuff and see the cut/paste work, I didn't really learn anything. Later sessions (I quit using the editor and switched to VI) stopped cutting and pasting, and learned much more.

For how long have I used the solution?

I've used the solution for one month.

What do I think about the stability of the solution?

I' give stability a thumbs up.

What do I think about the scalability of the solution?

We are not sure yet in terms of scalability. The off-prem solution seems to scale well (although had issues with the training slowing down).

How are customer service and support?

Technical support is great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I previously used Dynatrace and Elastic. We didn't switch. We are in a POC.

How was the initial setup?

The initial setup is simple yet complex. There are too many teams are needed.

What about the implementation team?

We did the initial setup in-house.

What was our ROI?

In terms of ROI, the labor saving is probably the biggest. The NPR is probably second - although management would probably reverse these.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing is fairly complicated. A GB for .1 sounds great, however, once you put all 16 or so prices together, it adds up fast. A cost model sheet on the main site would be very helpful.

Which other solutions did I evaluate?

We are currently in a POC.

What other advice do I have?

We work with all product versions.

Which deployment model are you using for this solution?

On-premises

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: March 2025
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.