Internally our primary usage of Datadog pertains around APM/tracing, logging, RUM (real user monitoring), synthetic testing of service/application health and state, overall general monitoring + observability, and custom dashboards for aggregate observability. We also are more frequently leveraging the more recent service catalog feature.
We have several microservices, several databases, and a few web applications (both external and internal facing), and all of these within our systems are contained within several environments ranging from dev, sit, eat, and production.
Staff Full-Stack Engineer at OMERS
Prompt support with good logging and helps with standardization
Pros and Cons
- "The initial setup was straightforward from my own experience, helping integrate within the application and service levels."
- "In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response."
What is our primary use case?
How has it helped my organization?
Datadog has had a massive impact on our department. Before, we had loose logging dumped into a sea of GCP logs with haphazard custom solutions for traceability between logs and network calls. Datadog has helped standardize and normalize our processes around observability while providing fantastic tools for aggregating insight around what is monitored regularly, all wrapped in an easy-to-use UI.
Additionally, a range of types of users exist within our department, each with its own positive impact on Datadog. DevOps leverages it to easily manage infra, developers leverage it to easily monitor/debug services and applications, and business leverages it for statistics.
What is most valuable?
Personally I've found the RUM (real user monitoring) to be above and beyond what I've worked with before. Client-side monitoring has always been on the short end of the stick but the information collected and ease of instrumentation provided by Datadog is second to none.
Having a live dynamic service map is also one of my favourite features; it provides real-time insights into which services/applications are connected to which.
We are also investigating the new API catalog feature set, which I believe will provide a high-value impact for real-time documentation and information about all of our shared microservices that other dev teams can use.
What needs improvement?
In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response.
Buyer's Guide
Datadog
December 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
For how long have I used the solution?
I've used the solution for approximately two years across our department and around a year or so of it being used practically and fully integrated into our systems.
What do I think about the stability of the solution?
Aside from one very brief bad update from the Datadog team around RUM where they broke the native 'fetch' for node in an update to RUM (which was resolved quickly) as it used to -- and may still -- modified the global 'fetch'; Datadog as a whole solution has been highly stable.
What do I think about the scalability of the solution?
It's easy to implement and scale provided a there's a solid IaC solution in place to integrate across your system.
How are customer service and support?
The Datadog support team is prompt and helpful when tickets have been submitted from our end. When their support team have been unsure, they've properly reached out internally to the relevant SME to help answer any questions we've had prior.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I've personally dabbled with some other open-source observability and monitoring solutions; however, prior to Datadog, our department did not have any solutions other than log dumps to GCP.
How was the initial setup?
The initial setup was straightforward from my own experience, helping integrate within the application and service levels; however, our DevOps team handled most of the infra process with minimal complaints.
What about the implementation team?
We handled the solution in-house.
What's my experience with pricing, setup cost, and licensing?
I personally am not involved in the decision around costing; however, I am aware that when we first set up Datadog, we explicitly configured our services/applications to have a master switch to enable Datadog integration so that we can dynamically enable/disable targeted environments as need due to the costs being associated on a per service basis for APM/logging/etc.
Which other solutions did I evaluate?
I was not involved in the decision-making regarding the evaluation of other options.
What other advice do I have?
I highly recommend Datadog, and I would explore it for my own individual projects in the future, provided the cost is within reason. Otherwise, I would highly recommend it for any medium-to-large-sized org.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 30, 2024
Flag as inappropriateCloud Engineer at Looklet AB
Very good log management and alerting features with excellent reliability
Pros and Cons
- "Infrastructure monitoring gives us real-time visibility into our servers, containers, and cloud resources, helping us optimize performance and reduce downtime."
- "One area where Datadog could be improved is its pricing structure, which can sometimes make it cost-prohibitive to adopt new features."
What is our primary use case?
We use Datadog as our main monitoring platform across all environments, including production, staging, and development.
It plays a crucial role in monitoring infrastructure performance, aggregating logs, and running synthetic and browser tests. Datadog helps us track critical system metrics like CPU, memory, and network traffic, allowing us to detect issues in real-time.
Its log management and alerting features enable quick incident response, while synthetic monitoring ensures optimal user experience and service uptime by proactively identifying performance issues. We rely on browser tests to simulate real-world user interactions, ensuring that key workflows in our applications perform smoothly.
Overall, Datadog allows us to maintain a high level of reliability and performance across our systems.
How has it helped my organization?
Datadog has significantly improved our ability to consolidate information into one central platform. Before implementing Datadog, our data and monitoring metrics were scattered across various systems and tools, making it difficult to get a unified view of our infrastructure and application health. This fragmented approach often led to inefficiencies, as we had to switch between different systems to gather relevant information, delaying our response to incidents.
With Datadog, all our critical data—logs, metrics, and monitoring—is now integrated in one place, allowing us to easily correlate events, analyze performance, and quickly diagnose issues, greatly improving both operational efficiency and incident management.
What is most valuable?
The most valuable features we’ve found in Datadog are logging, API monitoring, infrastructure monitoring, and browser tests.
Logging allows us to collect and centralize logs from across all our services, making it easier to troubleshoot issues and gain insights into application performance.
API monitoring is crucial for ensuring the reliability and performance of our API endpoints, allowing us to detect issues proactively.
Infrastructure monitoring gives us real-time visibility into our servers, containers, and cloud resources, helping us optimize performance and reduce downtime.
Lastly, browser tests simulate real user interactions, ensuring that our web applications deliver a seamless experience by detecting any potential performance or functionality issues before they impact users.
Together, these features provide a comprehensive monitoring solution, making Datadog an essential tool for maintaining system reliability and performance.
What needs improvement?
One area where Datadog could be improved is its pricing structure, which can sometimes make it cost-prohibitive to adopt new features. As we continue to scale, the costs associated with enabling more advanced monitoring capabilities, like additional integrations or more detailed data retention, can add up quickly. This makes it challenging for teams to justify the expense, especially when trying to utilize new features that could enhance monitoring and performance analysis.
Another improvement would be better cost transparency within the product’s GUI. Currently, it can be difficult to track how specific features or services are contributing to overall costs. If Datadog could provide more detailed, real-time insights into pricing directly within the interface—such as breakdowns of how much each feature or integration costs—it would help users manage budgets more effectively and avoid unexpected charges. A built-in budgeting tool or cost alerting system could also be useful, allowing organizations to make more informed decisions about what features to activate without the fear of overextending their budget.
Adding these features would give customers a clearer understanding of how to optimize their usage without overspending, making the platform even more accessible for teams that are cost-conscious but still want to take advantage of the full range of Datadog’s powerful capabilities.
For how long have I used the solution?
I've used the solution for five years.
What do I think about the stability of the solution?
The solution is very stable.
What do I think about the scalability of the solution?
It's very scalable. It can handle pretty much anything you throw at it.
How are customer service and support?
Overall, support is very good and they have a responsive support team.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We previously had a range of open-source and in-house built tools. We switched to get everything in one place.
How was the initial setup?
It was easy to understand and to implement. Datadog offers great documentation.
What about the implementation team?
We implemented the solution in-house.
What's my experience with pricing, setup cost, and licensing?
Beware of costs when using the platform. Set up alerting for unusual log volumes and set up rate limiting when possible.
Which other solutions did I evaluate?
We did evaluate logz.io.
What other advice do I have?
It's a great product, although a bit expensive.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 30, 2024
Flag as inappropriateBuyer's Guide
Datadog
December 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
Easy to set up and good UI but needs better customization capabilities
Pros and Cons
- "The many dozens of integrations that the solution brings out of the box are excellent."
- "Deploying the agents is still very manual."
What is our primary use case?
The solution is basically used for servers and applications.
What is most valuable?
The UI, basically, is the most valuable aspect of the solution. I really like the look and feel of the solution. It's not very distinctive now since other players have caught up, however, they were the first in the market to present such an effective UI.
The many dozens of integrations that the solution brings out of the box are excellent.
It's easy to set up.
What needs improvement?
Deploying the agents is still very manual.
Network monitoring could be better or rolled into this solution so that you do not have to buy a different product.
Customization of the tool itself should be taken into account. At the moment, although what they provide out of the box is good, they don't offer many customization possibilities. I know it's difficult, however, it's something that they would need to look at. When the customer gets some customization, they want customized requirements. We cannot do it.
For how long have I used the solution?
I've been dealing with the solution for five years.
What do I think about the stability of the solution?
It's quite stable. I have never had an issue in regard to reliability, so it's very stable.
What do I think about the scalability of the solution?
It's very scalable. I have not reached the limits at any time, never in the solution. I've never seen any performance degradation in large environments. I would say it's very scalable.
Each client has its own instance. We do not share instances with multiple customers. There's usually between 20 and 30, depending on the customer.
How are customer service and support?
I never use technical support, to be honest.
How was the initial setup?
The initial setup for the solution itself is quite straightforward. You just set it up and that's it. However, when it comes to, for instance, deploying the agents to the servers, or at least the target machines, it's still a manual task. They still do not have centralized management of the FD agents, which basically delays the deployment of the solution. It's very manual still.
How long it takes to deploy is difficult to pin down. It will vary based on the environment size. Obviously, if it's ten servers, it will basically take half an hour or one hour. If it's 5,000, obviously, besides the number of notes, other considerations will need to be taken into account. If t's a large environment, it will take much longer. We would need to basically develop a solution, or an effective process to deploy the agent and configure them in a standardized manner. This is something that the tool itself or the tool provider does not offer out of the box. You need to build it. That's a drawback.
How many people you need for the deployment and maintenance processes depends on the environment's size and geographical area. On average, I would usually require for every 500 notes, one resource for implementation. Then for overall support, I usually put one resource per 1500.
What was our ROI?
Before, the ROI was much higher as you would not have to compete with any kind of tool since they were very good in the space. However, with time, other companies have picked up the slack. Now, you have other tools which provide a higher ROI. I cannot give a specific ROI percentage since I don't use it for personal use with deployment. We deploy it on behalf of customers. Obviously, depending on the deal, depending on the size, and the ROI will vary. If people are looking for a global monitoring solution in the same tool as Datadog network monitoring, they are always hindered as Datadog does not provide an adequate solution for it. That kind of decreases the ROI since you still need to get another tool to do the network monitoring.
What's my experience with pricing, setup cost, and licensing?
The licensing is a bit complicated. When you pay for it on a note basis, that's perfectly fine. However, when you put log analytics on top of it, it's based on traffic. This is actually an issue. It gets complicated.
What other advice do I have?
I'm providing Datadog. I'm a retailer.
I would recommend the solution.
I would suggest if their environment is in the cloud, companies have their environments in the public cloud, such as GCP, Azure, or AWS. Datadog is a very good candidate to provide an overview of the monitoring. If you want to consider a hybrid solution where systems and servers and applications also provide a good solution and have a lot of APM capabilities, the only drawback will be network monitoring. When you grab a tool that you want to basically monitor the entire environment at a single point of contact, with Datadog, it's possible, however, there's not an effective tool to do network monitoring.
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer:
Senior Manager - Cloud & DevOps at Publicis Sapient
Overall useful features, beneficial artificial intelligence, and effective auto scaling
Pros and Cons
- "Most of the features in the way Datadog does monitoring are commendable and that is the reason we choose it. We did some comparisons before picking Datadog. Datadog was recommended based on the features provided."
- "All solutions have some area to improve, and in Datadog they can improve their overall technology moving forward."
What is our primary use case?
My customers were using Datadog for monitoring purposes. They were using it only because the solution is running on AWS and it's a microservices-based solution. They were using an application called Dynatrace for their log.
What is most valuable?
Most of the features in the way Datadog does monitoring are commendable and that is the reason we choose it. We did some comparisons before picking Datadog. Datadog was recommended based on the features provided.
Most of the monitoring tools nowadays are have or are going to have embedded artificial intelligence and machine learning to make monitoring and logging more proactive and intelligent. Datadog has incorporated some artificial intelligence.
The solution does not require a lot of maintenance.
The solution had all the features we were looking for and we were able to create a central dashboard as per our requirements.
What needs improvement?
All solutions have some area to improve, and in Datadog they can improve their overall technology moving forward.
For how long have I used the solution?
I have been using Datadog for approximately four months.
What do I think about the stability of the solution?
Datadog is a stable solution.
What do I think about the scalability of the solution?
Datadog is a highly scalable solution because it is a SaaS solution. Having this solution be a SaaS is one of its most appealing attributes. When the vendor is going to manage data scaling and everything for you, you are only going to use the solution as per your requirements. Autoscaling is a great feature that they have.
How are customer service and support?
The support from Datadog is exellent. If you're stuck on something or you are facing any issue, support from the vendor itself is available. You will receive a response instantly from the vendor on anything related to the requirement, issues, or feature you are looking for. The responses have always been in a timely manner.
I rate the technical support from Datadog a five out of five.
Which solution did I use previously and why did I switch?
I have used other similar solutions to Datadog and when I do a comparison between the other tools Datadog is on top, it is great.
How was the initial setup?
Since Datadog is a SaaS solution we had not deployed the Datadog on-premise or in any Cloud. We were using the SaaS solution from the vendor itself. From the provisioning perspective or from the monitoring and dashboard perspective, we were using Terraform to create the typical monitoring as code. Everything was basically automated, we were not doing anything manually.
What other advice do I have?
If someone wants to set up Datadog on-premise or in any of the Cloud machines, they have to consider a lot of things from the auto-scaling perspective.
My recommendation is Datadog is very good. Your team can mainly focus on the development rather than the solution itself. The vendor is going to take care of auto-scaling and maintenance and everything for you.
I rate Datadog a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
Software Engineer at Liberis Limited
Great for logging and racing but needs better customization
Pros and Cons
- "Real user monitoring has made triaging any possible bugs our users might face a lot easier."
- "They need to offer better/more customization on what logs we get and making tracing possible on Edge runtime logs is a real requirement."
What is our primary use case?
We're using the product for logging and monitoring of various services in production environments.
It excels at providing real-time observability across a wide range of metrics, logs, and traces, making it ideal for DevOps teams and enterprises managing complex environments.
The platform integrates seamlessly with our cloud services, but browser side logging is a little lagging.
Dashboards are very useful for quick insights, but can be time consuming to create, and the learning curve is steep. Documentation is vast, but not as detailed as I'd like.
How has it helped my organization?
The solution has made logging and tracing a lot easier, and the RUM sessions are something we did not have previously. Datadog’s real-time alerting and anomaly detection help reduce downtime by allowing us to identify and address performance issues quickly.
The platform’s intelligent alert system minimises noise, ensuring your team focuses on critical incidents. This results in faster Mean Time to Resolution (MTTR), improving service availability.
It consolidates monitoring for infrastructure, applications, logs, and security into a single platform. This enables us to view and analyse data across the entire stack in one place, reducing the time spent jumping between tools.
What is most valuable?
Real user monitoring has made triaging any possible bugs our users might face a lot easier. RUM tracks actual user interactions, including page load times, clicks, and navigation flows. This gives our organization a clear picture of how our users are experiencing your application in real-world conditions, including slow-loading pages, errors, and other performance issues that affect user satisfaction. We can then easily prioritize these, and make sure we offer our users the best possible experience.
What needs improvement?
I'm not sure if this is on Datadog, however, Vercel integration is very limited.
They need to offer better/more customization on what logs we get and making tracing possible on Edge runtime logs is a real requirement. It is extremely difficult, if not completely impossible, to get working traces and logs displayed in Datadog with our stack of Vercel, NexJs, and Datadog. This is a very common stack in front end development and the difficulty of implementing it is unacceptable. Please do something about it soon. Front end logs matter.
For how long have I used the solution?
I've used the solution for a little over a year.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Oct 2, 2024
Flag as inappropriateSoftware Engineer at a computer software company with 201-500 employees
Very good custom metrics, dashboards, and alerts
Pros and Cons
- "The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues."
- "One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization."
What is our primary use case?
Our primary use case for Datadog involves utilizing its dashboards, monitors, and alerts to monitor several key components of our infrastructure.
We track the performance of AWS-managed Airflow pipelines, focusing on metrics like data freshness, data volume, pipeline success rates, and overall performance.
In addition, we monitor Looker dashboard performance to ensure data is processed efficiently. Database performance is also closely tracked, allowing us to address any potential issues proactively. This setup provides comprehensive observability and ensures that our systems operate smoothly.
How has it helped my organization?
Datadog has significantly improved our organization by providing a centralized platform to monitor all our key metrics across various systems. This unified observability has streamlined our ability to oversee infrastructure, applications, and databases from a single location.
Furthermore, the ability to set custom alerts has been invaluable, allowing us to receive real-time notifications when any system degradation occurs. This proactive monitoring has enhanced our ability to respond swiftly to issues, reducing downtime and improving overall system reliability. As a result, Datadog has contributed to increased operational efficiency and minimized potential risks to our services.
What is most valuable?
The most valuable features we’ve found in Datadog are its custom metrics, dashboards, and alerts. The ability to create custom metrics allows us to track specific performance indicators that are critical to our operations, giving us greater control and insights into system behavior.
The dashboards provide a comprehensive and visually intuitive way to monitor all our key data points in real-time, making it easier to spot trends and potential issues. Additionally, the alerting system ensures we are promptly notified of any system anomalies or degradations, enabling us to take immediate action to prevent downtime.
Beyond the product features, Datadog’s customer support has been incredibly timely and helpful, resolving any issues quickly and ensuring minimal disruption to our workflow. This combination of features and support has made Datadog an essential tool in our environment.
What needs improvement?
One key improvement we would like to see in a future Datadog release is the inclusion of certain metrics that are currently unavailable. Specifically, the ability to monitor CPU and memory utilization of AWS-managed Airflow workers, schedulers, and web servers would be highly beneficial for our organization. These metrics are critical for understanding the performance and resource usage of our Airflow infrastructure, and having them directly in Datadog would provide a more comprehensive view of our system’s health. This would enable us to diagnose issues faster, optimize resource allocation, and improve overall system performance. Including these metrics in Datadog would greatly enhance its utility for teams working with AWS-managed Airflow.
For how long have I used the solution?
I've used the solution for four months.
What do I think about the stability of the solution?
The stability of Datadog has been excellent. We have not encountered any significant issues so far.
The platform performs reliably, and we have experienced minimal disruptions or downtime. This stability has been crucial for maintaining consistent monitoring and ensuring that our observability needs are met without interruption.
What do I think about the scalability of the solution?
Datadog is generally scalable, allowing us to handle and display thousands of custom metrics efficiently. However, we’ve encountered some limitations in the table visualization view, particularly when working with around 10,000 data points. In those cases, the search functionality doesn’t always return all valid results, which can hinder detailed analysis.
How are customer service and support?
Datadog's customer support plays a crucial role in easing the initial setup process. Their team is proactive in assisting with metric configuration, providing valuable examples, and helping us navigate the setup challenges effectively. This support significantly mitigates the complexity of the initial setup.
Which solution did I use previously and why did I switch?
We used New Relic before.
How was the initial setup?
The initial setup of Datadog can be somewhat complex, primarily due to the learning curve associated with configuring each metric field correctly for optimal data visualization. It often requires careful attention to detail and a good understanding of each option to achieve the desired graphs and insights
What about the implementation team?
We implemented the solution in-house.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 23, 2024
Flag as inappropriateSenior Cloud Engineer, Vice President of Monitoring at a financial services firm with 10,001+ employees
Good ServiceNow integration, helpful API crawlers, and useful APM metrics
Pros and Cons
- "The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze."
- "It seems that admin cost control granularity is an afterthought."
What is our primary use case?
We are using the solution for migrating out of the data center. Old apps need to be re-architected. We are planning on moving to multi-cloud for disaster recovery and to avoid vendor lockouts.
The migration is a mix between an MSP (Infosys) and in-house developers. The hard part is ensuring these apps run the same in the cloud as they do on-premises. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly it's important not to cut corners - which is why we needed observability
How has it helped my organization?
Using the product has caused a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in ServiceNow. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.
What is most valuable?
For use, the most valuable features we have are infrastructure and APM metrics.
The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze.
We rely heavily on the API crawlers Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having to also make them add it at the agent level. Then we use Datadog's conditionals in the monitor to dynamically alert hundreds of teams.
With the ServiceNow integration, we can also assign tickets based on the environment. Now our top teams are using the APM/profiler to find bottlenecks and improve the speed of our apps
What needs improvement?
The real issue with this product is cost control. For example, when logs first came out they didn't have any index cuts. This caused runaway logs and exploding costs.
It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there is no way to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes, that would save us 5X on our bill.
For how long have I used the solution?
I've used the solution for about three years.
What do I think about the stability of the solution?
The solution is very stable. There are not too many outages, and they fix them fast.
What do I think about the scalability of the solution?
It is easy to scale. That is why we adopted it.
How are customer service and support?
Before premium support, I would avoid using them as it was so bad.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We previously used AppDynamics. It isn't built for the cloud and is hard to deploy at scale.
How was the initial setup?
The initial setup was not difficult. We just had to teach teams the concept of tags.
What about the implementation team?
We did the implementation in-house. It was me. I am the SME for Datadog at the company.
What was our ROI?
The solution has saved months of time and reduced blindspots for all app teams.
What's my experience with pricing, setup cost, and licensing?
I'd advise users to be careful with logs and the APM as those are the ones that can get expensive fast.
Which other solutions did I evaluate?
We looked into Dynatrace. However, we found the cost to be high.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
SRE at a financial services firm with 10,001+ employees
Excellent synthetic monitoring, APM, and alert features
Pros and Cons
- "The monitoring functionality, in general, and tagging infrastructure are great."
- "While the tool is robust with many different capabilities, users would greatly benefit from more examples in the documentation."
What is our primary use case?
We deploy various services for our main platform on AWS across multiple regions. We have a development environment, a staging environment, a QA environment, and a production environment. We deploy our many services across hundreds of instances.
We have many server farms, all responsible for various services on our market intelligence platform. The deployment of each server farm or even individual instances varies depending on what stood up. We have instances built in three different ways, with two different pipelines and some even on user data scripts.
How has it helped my organization?
My team has a 24/7 on-call schedule where we need to be ready to handle and mitigate incidents with the platform at any moment.
We have countless monitors set up on Datadog that alert directly to our queue using an email that generates a ticket.
The actionable steps for each type of monitor and its associated incident are easily included in the alerts whenever something is triggered. We generate links to the Datadog monitors and can instantly drill down into what went wrong and for how long.
What is most valuable?
The features I have found most helpful are synthetic monitoring, APM, and alert features. The monitoring functionality, in general, and tagging infrastructure are great.
Synthetics have become bread and butter for us as we have migrated many tests over to Datadog. We have simplified and consolidated our synthetic tests while also making them more robust with the help of your tagging.
A large portion of our monitoring is based on synthetics results, and alerts integrate seamlessly without an incident queue system. We use dashboards heavily.
The metrics capabilities are extremely helpful, and we use virtually all of the widgets.
What needs improvement?
My main place of improvement for Datadog would be the documentation. While the tool is robust with many different capabilities, users would greatly benefit from more examples in the documentation.
The number of current code snippets available in the docs is not enough, and some need to be updated even today.
One function I would add would be a button to generate a report of the performance of a synthetic test and the performance of each of the steps in the test over time.
For how long have I used the solution?
This timeline varies in terms of how long we've used the solution. We have one platform completely in the cloud and one still on-premises. We've had the solution for many years on AWS.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2024
Product Categories
Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM)Popular Comparisons
Zabbix
New Relic
Azure Monitor
Elastic Observability
SolarWinds NPM
PRTG Network Monitor
ThousandEyes
Nagios XI
LogicMonitor
Centreon
Auvik Network Management (ANM)
ScienceLogic
Icinga
Checkmk
BMC TrueSight Operations Management
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- Any advice about APM solutions?
- Which would you choose - Datadog or Dynatrace?
- What is the biggest difference between Datadog and New Relic APM?
- Which monitoring solution is better - New Relic or Datadog?
- Do you recommend Datadog? Why or why not?
- How is Datadog's pricing? Is it worth the price?
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- What cloud monitoring software did you choose and why?