We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.
Works at iSpot.tv
Lots of features with a rapid log search and an easy setup process
Pros and Cons
- "The ease of graph building is nice, and MUCH easier than Prometheus."
- "It is far too easy to run up huge unexpected costs."
What is our primary use case?
How has it helped my organization?
The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.
What is most valuable?
I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.
What needs improvement?
It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.
It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.
Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
For how long have I used the solution?
I've used the solution for three years.
What do I think about the stability of the solution?
I'd rate the stability ten out of ten.
What do I think about the scalability of the solution?
I'd rate the scalability ten out of ten.
Which solution did I use previously and why did I switch?
We did not previously use a different solution.
How was the initial setup?
The setup is very straightforward. Users just install the helm chart, and boom, you're done.
What about the implementation team?
We handled the setup in-house.
What's my experience with pricing, setup cost, and licensing?
Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.
Which other solutions did I evaluate?
We've never evaluated other solutions.
What other advice do I have?
It's a great product. However, you have to pay for quality.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Oct 1, 2024
Flag as inappropriateSoftware Engineer 2 at Modernizing Medicine
Intuitive user interface with good log management and a helpful Log Explorer feature
Pros and Cons
- "The ease of use allowed me to get up to speed with log management since it's my first time using Datadog."
- "Interactive tutorials could be a game changer."
What is our primary use case?
In our fast-paced environment, managing and analyzing log data and performance metrics is crucial. That’s where Datadog comes in. We rely on it not just for monitoring but for deeper insights into our systems, and here’s how we make the most of it.
One of the first things we appreciate about Datadog is its ability to centralize logs from various sources—think applications, servers, and cloud services. This means we can access everything from one dashboard, which saves us a lot of time and hassle. Instead of digging through multiple platforms, we have all our log data in one place, making it much easier to track events and troubleshoot issues.
How has it helped my organization?
Before Datadog, we faced the common challenge of fragmented data. Our logs, metrics, and traces were spread across different tools and platforms, making it difficult to get a complete picture of our system’s health.
With Datadog, we now have a centralized monitoring solution that aggregates everything in one place. This has streamlined our workflow immensely. Whether it’s logs from our servers, metrics from our applications, or traces from user transactions, we can access all this information easily. This unified view has made it simpler for our teams to identify and troubleshoot issues quickly.
What is most valuable?
In my experience with Datadog, one feature stands out above the rest is the Log Explorer. It has completely transformed the way I interact with our log data and has become an essential part of my daily workflow.
The user interface is incredibly intuitive. When I first started using it, I was amazed at how easy it was to navigate. The design is clean and straightforward, allowing me to focus on the data rather than getting lost in complicated menus. Whether I’m searching for specific log entries or filtering by certain criteria, everything feels seamless.
This ease of use allowed me to get up to speed with log management since it's my first time using Datadog.
What needs improvement?
Interactive tutorials could be a game changer. Instead of just reading about how to use query filters, users could engage with step-by-step guides that walk them through the process. For example, a tutorial could start with a simple query and gradually introduce more complex filtering techniques, allowing users to practice along the way. These tutorials could include pop-up tips and hints that provide additional context or best practices as users work through examples. This hands-on approach not only reinforces learning but also builds confidence in using the tool.
For how long have I used the solution?
My company has recently made Datadog available to it's software engineers and I personally have been using it for almost a year now.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Oct 2, 2024
Flag as inappropriateBuyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Engineering Manager at Dbt labs
Great features and synthetic testing but pricing can get expensive
Pros and Cons
- "We have been impressed with the uptime and clean and light resource usage of the agents."
- "I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box."
What is our primary use case?
Our primary use case is custom and vendor supplied web application log aggregation, performance tracing and alerting. We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications.
Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.
How has it helped my organization?
Through use of Datadog across all of our apps we were able to consolidate a number of alerting and error tracking apps and Datadog ties them all together in cohesive dashboards.
Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting edge .NET Core with streaming logs all work.
The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.
What is most valuable?
When it comes to Datadog, several features have proven particularly valuable. The centralized pipeline tracking and error logging provides a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.
Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.
Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.
What needs improvement?
I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
We have been impressed with the uptime and clean and light resource usage of the agents.
What do I think about the scalability of the solution?
The solution is very scalable, very customizable.
How are customer service and support?
Service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.
Which solution did I use previously and why did I switch?
We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.
How was the initial setup?
The setup was generally simple. However, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.
What's my experience with pricing, setup cost, and licensing?
Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.
Which other solutions did I evaluate?
NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.
What other advice do I have?
I'm excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 30, 2024
Flag as inappropriateSenior Cloud Engineer, Vice President of Monitoring at a financial services firm with 10,001+ employees
Good ServiceNow integration, helpful API crawlers, and useful APM metrics
Pros and Cons
- "The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze."
- "It seems that admin cost control granularity is an afterthought."
What is our primary use case?
We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts.
The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability
How has it helped my organization?
Using the product has caused a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in ServiceNow. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.
What is most valuable?
For use, the most valuable features we have are infrastructure and APM metrics.
The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze.
We rely heavily on the API crawlers Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having to also make them add it at the agent level. Then we use Datadog's conditionals in the monitor to dynamically alert hundreds of teams.
With the ServiceNow integration, we can also assign tickets based on the environment. Now our top teams are using the APM/profiler to find bottlenecks and improve the speed of our apps
What needs improvement?
The real issue with this product is cost control. For example, when logs first came out they didn't have any index cuts. This caused runaway logs and exploding costs.
It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there is no way to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes, that would save us 5X on our bill.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
The solution is very stable. There are not too many outages, and they fix them fast.
What do I think about the scalability of the solution?
It is easy to scale. That is why we adopted it.
How are customer service and support?
Before premium support, I would avoid using them as it was so bad.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We previously used AppDynamics. It isn't built for the cloud and is hard to deploy at scale.
How was the initial setup?
The initial setup was not difficult. We just had to teach teams the concept of tags.
What about the implementation team?
We did the implementation in-house. It was me. I am the SME for Datadog at the company.
What was our ROI?
The solution has saved months of time and reduced blindspots for all app teams.
What's my experience with pricing, setup cost, and licensing?
I'd advise users to be careful with logs and the APM as those are the ones that can get expensive fast.
Which other solutions did I evaluate?
We looked into Dynatrace. However, we found the cost to be high.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Software Engineer at Apple
Consolidates alerts, offers comprehensive views, and has synthetic testing
Pros and Cons
- "The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly."
- "I like the idea of monitoring on the go, however, it seems the options are still a bit limited out of the box."
What is our primary use case?
Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.
We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications.
We're managing a hybrid multi-cloud solution across hundreds of applications, which is always a challenge. There are Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure and that gets all of our instrumentation and error data in one place for easy analysis and monitoring.
How has it helped my organization?
Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.
What is most valuable?
When it comes to Datadog, several features have proven particularly valuable.
The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.
Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.
Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.
What needs improvement?
I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view.
I like the idea of monitoring on the go, however, it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed.
Sometimes, the screenshots don't match the text as updates are made. I spent longer than I should have figured out how to correlate logs to traces, mostly related to environmental variables.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
We have been impressed with the uptime and clean and light resource usage of the agents.
What do I think about the scalability of the solution?
The product is very scalable and very customizable.
How are customer service and support?
Technical support is always helpful to help us tune our committed costs and alert us when we start spending out of the on-demand budget.
Which solution did I use previously and why did I switch?
We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of Linux or Windows or Container, cloud or on-prem hosted.
How was the initial setup?
The setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
ROI is reflected in in significant time saved by the development team assessing bugs and performance issues.
What's my experience with pricing, setup cost, and licensing?
Set up live trials to asses cost scaling. Small decisions around how monitors are used can impact cost scaling.
Which other solutions did I evaluate?
NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.
What other advice do I have?
We're excited to explore the new offerings around LLM further and continue to expand our presence in Datadog.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 20, 2024
Flag as inappropriateSRE at a financial services firm with 10,001+ employees
Excellent synthetic monitoring, APM, and alert features
Pros and Cons
- "The monitoring functionality, in general, and tagging infrastructure are great."
- "While the tool is robust with many different capabilities, users would greatly benefit from more examples in the documentation."
What is our primary use case?
We deploy various services for our main platform on AWS across multiple regions. We have a development environment, a staging environment, a QA environment, and a production environment. We deploy our many services across hundreds of instances.
We have many server farms, all responsible for various services on our market intelligence platform. The deployment of each server farm or even individual instances varies depending on what stood up. We have instances built in three different ways, with two different pipelines and some even on user data scripts.
How has it helped my organization?
My team has a 24/7 on-call schedule where we need to be ready to handle and mitigate incidents with the platform at any moment.
We have countless monitors set up on Datadog that alert directly to our queue using an email that generates a ticket.
The actionable steps for each type of monitor and its associated incident are easily included in the alerts whenever something is triggered. We generate links to the Datadog monitors and can instantly drill down into what went wrong and for how long.
What is most valuable?
The features I have found most helpful are synthetic monitoring, APM, and alert features. The monitoring functionality, in general, and tagging infrastructure are great.
Synthetics have become bread and butter for us as we have migrated many tests over to Datadog. We have simplified and consolidated our synthetic tests while also making them more robust with the help of your tagging.
A large portion of our monitoring is based on synthetics results, and alerts integrate seamlessly without an incident queue system. We use dashboards heavily.
The metrics capabilities are extremely helpful, and we use virtually all of the widgets.
What needs improvement?
My main place of improvement for Datadog would be the documentation. While the tool is robust with many different capabilities, users would greatly benefit from more examples in the documentation.
The number of current code snippets available in the docs is not enough, and some need to be updated even today.
One function I would add would be a button to generate a report of the performance of a synthetic test and the performance of each of the steps in the test over time.
For how long have I used the solution?
This timeline varies in terms of how long we've used the solution. We have one platform completely in the cloud and one still on-premises. We've had the solution for many years on AWS.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Staff Engineer at a tech services company with 1,001-5,000 employees
Great distributed tracing and flame graphs for debugging with a relatively painless setup
Pros and Cons
- "We like the distributed tracing and flame graphs for debugging. This has been invaluable for us during periods of high traffic or red alert conditions."
- "Once Datadog has gained wide adoption, it can often be overwhelming to both know and understand where to go to find answers to questions."
What is our primary use case?
We are using a mixture of on-prem and cloud solutions to bridge the gap with healthcare entities in the service of providing patients with the medication they need to live healthy lives.
Since we're a heavily regulated company, a lot of our solutions grew from on-premises monoliths. However, as we scaled out, it became harder and harder to move forward with that architecture. Today, we're investing heavily in transforming our systems from monoliths into distributed systems.
With this change in mind, the ability for us to connect the dots using Datadog has been invaluable.
How has it helped my organization?
We have an API that serves as a critical aspect of our system for generating new requests for us to process in service of a patient. This service has many tentacles, and it was always hard to track down how issues from this API are affecting things downstream. Since we've added more instrumentation in this API, Datadog has changed our status from a reactive posture to a proactive one.
It has also served as a prime example to other applications on what the benefit of a well-instrumented system is for that application and other applications around it. Due to this, more and more people are using Datadog.
What is most valuable?
We like the distributed tracing and flame graphs for debugging. This has been invaluable for us during periods of high traffic or red alert conditions. It has also informed our developers on how our various systems are interconnected and the downstream effects of the problems we might encounter for certain services.
We're still working on getting widespread adoption of these products. Still, we're already seeing a shift in the developer's perspective from application-specific and starting to look at things from a more holistic systems perspective.
While this is not part of the question, this is relevant: Now that I've learned more about RUM, this will be something that we will heavily leverage moving forward to give us a whole complete view of our system from the front and back end perspective.
What needs improvement?
Once Datadog has gained wide adoption, it can often be overwhelming to both know and understand where to go to find answers to questions. Currently, we use a combination of documentation and COPs to ensure that folks know how to leverage what we have in Datadog properly.
While the guides for Datadog go a long way, a way to customize the user experience from "advanced" to "novice" mode would go a long way.
For how long have I used the solution?
I've been using the solution for two years.
What do I think about the stability of the solution?
It has never failed us and therefore I consider it to be very stable.
What do I think about the scalability of the solution?
It's magic. For the most part, we just installed the product and a lot of it just worked out of the box.
How are customer service and support?
Technical support is excellent.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We have used Splunk, Sentry, and a suite of hand-made solutions. We switched since the Datadog solution was both comprehensive and cohesive. It was also easier to onboard people since the solution was well-documented and standardized.
How was the initial setup?
For the most part, it was really painless to set up.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
We're still early on in our transformation process. That said, we are gaining a lot of steam in terms of adoption. Both the engineering team and the product team are seeing tremendous value from this solution.
What's my experience with pricing, setup cost, and licensing?
Which other solutions did I evaluate?
What other advice do I have?
Adding more tooltips and links to documentation or how-tos within the application would really go a long way for those trying to get their feet wet with Datadog.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Director at CBRE
Flexible, excellent support, and reliable
Pros and Cons
- "The most valuable features of Datadog are the flexibility and additional features when compared to other solutions, such as AppDynamics and Dynatrace. Some of the features include AI and ML capabilities and cloud and analysis monitoring"
- "Datadog could improve the flexibility with AI and ML concepts. This will allow customers to be more leveraged towards publishing."
What is most valuable?
The most valuable features of Datadog are the flexibility and additional features when compared to other solutions, such as AppDynamics and Dynatrace. Some of the features include AI and ML capabilities and cloud and analysis monitoring
What needs improvement?
Datadog could improve the flexibility with AI and ML concepts. This will allow customers to be more leveraged towards publishing.
For how long have I used the solution?
I have been using Datadog for approximately one year.
What do I think about the stability of the solution?
Datadog is stable. We did not have a single outage.
What do I think about the scalability of the solution?
I have found Datadog to be scalable.
We have approximately 2,000 users using the solution in my organization.
How are customer service and support?
The support from Datadog is excellent.
Which solution did I use previously and why did I switch?
I have previously used AppDynamics and Dynatrace.
How was the initial setup?
Datadog's initial setup is easy because they have helped us come up with the easiest way of instrumenting any of the features which need to be deployed. We worked on it with their engineers and we were able to happily do it. We have done approximately 60 application monitoring through Datadog since our deployment.
What about the implementation team?
We have a very tiny team of four members that do the maintenance of Datadog.
What's my experience with pricing, setup cost, and licensing?
The price of Datadog is reasonable. Other solutions are more expensive, such as AppDynamics.
What other advice do I have?
Datadog is far better than any other monitoring tool in introducing any of the new capabilities because they think before Amazon AWS and Microsoft Azure before they introduce the concepts. Datadog is a good tool to have for monitoring your own infrastructure.
I rate Datadog a ten out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM)Popular Comparisons
Wazuh
Splunk Enterprise Security
Dynatrace
Zabbix
New Relic
Azure Monitor
IBM Security QRadar
Elastic Security
AppDynamics
Elastic Observability
Grafana
SolarWinds NPM
Sentry
PRTG Network Monitor
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- Any advice about APM solutions?
- Which would you choose - Datadog or Dynatrace?
- What is the biggest difference between Datadog and New Relic APM?
- Which monitoring solution is better - New Relic or Datadog?
- Do you recommend Datadog? Why or why not?
- How is Datadog's pricing? Is it worth the price?
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- What cloud monitoring software did you choose and why?