Our company has a microservice architecture, with different teams in charge of different services. Also, it is a start, which means that we have to build fast and move very fast as well. So before we were properly using DD, we often had issues of things breaking, but without much information on where in our system the breaking happened. This was quite a big-time sync as teams were unfamiliar with other teams' codes, so they needed the help of other teams to debug. This slowed our building down a lot. So implementing dd traces fixed this
Good alerting and issue detection for many valuable features
Pros and Cons
- "Thanks to frequent concurrent deployments, the DataDog alerts monitors allow us quickly detect issues if anything occurs."
- "The monitors can be improved."
What is our primary use case?
What is most valuable?
DataDog has many features, but the most valuable have become our primary uses.
Also, thanks to frequent concurrent deployments, the DataDog alerts monitors allow us quickly detect issues if anything occurs.
What needs improvement?
The monitors can be improved. The chart in the monitors only goes back a couple of hours, clunky. Also, it can provide more info, like traces within the monitors. We have many alerts connected to different notification systems, such as Slack and Opsgenie.
When the on-caller receives notifications fired by the alerts, we are taken to the monitors. Yet often, we have to open up many different tabs to see logs, traces and info that is not accessible on the monitors. I think it would make all of the on callers' lives easier if the monitor had more data
For how long have I used the solution?
We've used the solution for three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 21, 2024
Flag as inappropriateInfrastructure engineer at a insurance company with 10,001+ employees
Good infrastructure, helpful logs, and useful alerts
Pros and Cons
- "It has a high-level insight into the infrastructure model of the application and provides important detailed data on the host and metrics, which is the main concern of our customers."
- "I sometimes log in and see items changed, either in the UI or a feature enabled. To see it for the first time without proper communication can sometimes come as a shock."
What is our primary use case?
Our use case is to provide cloud organization application monitoring. I use it for insight into what host in what region has activity or what market is using Datadog to its fullest potential and utilizing that for cost. This may also help determine who is using monitoring and setting alerts or just setting up monitoring and not doing anything about it. The use case can also be to check when the host or applications are down, or if the usage of CPU, memory, etc, is too high.
How has it helped my organization?
The solution has improved our organization from a market perspective. We have multiple departments and need some time to gather that data from a grouping point of view. Grouping that data via tag or seeing the separation is easy. In addition, it provides metrics and insights for senior leadership to have a high level of usage and cost. Application teams have better insight into their application, outages, when to plan for patches, updates, etc. Also, they have a better understanding of where the data gaps may be.
What is most valuable?
The infrastructure is the most valuable. It has a high-level insight into the infrastructure model of the application and provides important detailed data on the host and metrics, which is the main concern of our customers. It provides confirmation that the layer where the application is running is monitored and will be alerted when it is down and not functional. The customers can have ease of mind knowing their metrics are accurately being measured. The value of data provided, including service name, logs, and all other pertinent details tied to the host, makes it a valuable source of data
What needs improvement?
The solution can be improved via open communication to the broader audience on what has changed and what has not changed. I sometimes log in and see items changed, either in the UI or a feature enabled. To see it for the first time without proper communication can sometimes come as a shock.
For how long have I used the solution?
I have been using the solution for three years.
What do I think about the stability of the solution?
The stability is great.
How are customer service and support?
Technical support is great. Datadog has the resources and knowledge to tackle questions.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
I did not previously use a different solution.
How was the initial setup?
The initial setup is straightforward.
What about the implementation team?
The initial setup was handled in-house.
Which other solutions did I evaluate?
I did not evaluate any other solutions.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Senior Software Engineer at LeafLink
Good log stream with a useful APM and democratizes logs
Pros and Cons
- "Datadog's log aggregation is really helpful since it lets me and every other engineer on my team login, view, and share logs when we need to debug our application."
- "The menu on the left is pretty dense (and I know it has to be). I never knew about the cmd+k functionality until recently. It would be helpful to offer more tips/cheat sheets to see handy shortcuts like that."
What is our primary use case?
We use Datadog to view and aggregate logs and monitor all of our services. We have a lot of running infrastructure and it is very convenient to have logs and metrics all aggregated somewhere we can view and chart them.
I use Datadog to create dashboards and runbooks, and sharable graphs, which really help out my whole team. We mostly use logs and APM, yet have been starting to use other products. I would like to use more synthetic monitors.
How has it helped my organization?
It has democratized our logs and metrics, allowing all engineers to have insight into how our apps perform. It is also extremely helpful when debugging issues.
It would be very difficult to debug issues without aggregated logs and APM traces.
It has also definitely saved us some money since we can keep an eye on our running infrastructure in an easy-to-see way, rather than a less friendly CLI. It has been a very big help!
What is most valuable?
The log stream has been the most useful thing. Having so many logs on so many different running containers means it is very inconvenient to view them individually. Datadog's log aggregation is really helpful since it lets me and every other engineer on my team login, view, and share logs when we need to debug our application.
APM has also been extremely helpful for debugging issues and profiling and optimizing our apps. Dashboards have also been really helpful for communicating needs and priorities to engineering leadership.
It is very easy to get buy-in with graphs to back things up.
What needs improvement?
I recently saw the education, and it is amazing. Events like DASH are extremely helpful in understanding the deep set of features. Anything that helps to educate users is a huge win here.
The menu on the left is pretty dense (and I know it has to be). I never knew about the cmd+k functionality until recently. It would be helpful to offer more tips/cheat sheets to see handy shortcuts like that.
For how long have I used the solution?
I've used the solution for three years.
Which solution did I use previously and why did I switch?
We previously used AWS Cloudwatch logs. It was way less friendly and fully featured.
How was the initial setup?
The solution is pretty straightforward to set up. It helps with logs and metrics, and the AWS integration is really great.
What about the implementation team?
We handled the implementation in-house.
What other advice do I have?
It is hard to educate an entire team. There is a big learning curve.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Architect at a comms service provider with 10,001+ employees
Good for monitoring and following metrics with a helpful flame graph
Pros and Cons
- "Flame graphs are pretty useful for understanding how GraphQL resolves our federated queries when it comes to identifying slow points in our requests. In our microservice environment with 170 services."
- "I often have issues with the UI in my browser."
What is our primary use case?
We use the solution primarily for distributed tracing, service insight and observability, metrics, and monitoring. We create custom metrics from outbound service calls to trace the availability of back-office systems.
We use the flame graph to get insights into our GraphQL implementation. It helps highlight how resolvers work.
However, it's lacking in tracing which GraphQL queries are run, and we use custom spans for that.
How has it helped my organization?
Prior, the team only had Instana, and few people used it. The main barriers to entry were the access (since it was not integrated into our SSO) and the user experience, which made it hard to follow. We had an on-prem version, and it wasn't the snappiest. The APM has made observability and tracing more accessible to developers.
What is most valuable?
Flame graphs are pretty useful for understanding how GraphQL resolves our federated queries when it comes to identifying slow points in our requests. In our microservice environment with 170 services. There are complex transactions over the course of a single user request since we essentially operate as a middle layer with 90 back office systems we integrate to.
What needs improvement?
I often have issues with the UI in my browser. I tend to have a lot of tabs open, yet have issues with it not responding or not showing data. A couple of times, pasting the URL into an incognito window shows the data that's there.
For how long have I used the solution?
I've used the solution for two years.
How was the initial setup?
The initial setup was complex and required a bit of tweaking to get everything configured correctly and into our pipelines.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Software Engineer at a tech vendor with 1,001-5,000 employees
Great profiling and tracing but storage is expensive
Pros and Cons
- "Anything I've wanted to do, I found a way to get it done through Datadog."
- "When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself."
What is our primary use case?
We use the solution for application hosting and a little bit of everything when it comes to supporting a worldwide logistics tracking service. It's used as a central service for collecting telemetrics and logs. We find it does the same work as all of our old tools combined, including Prometheus, Kibana, Google Logs, and more; putting all of this information in a single platform makes it easy to corroborate information and associate a request with the data, which might be lost when it is saved as logs.
How has it helped my organization?
At my organization, we have plenty of microservices written in different languages. Different teams prefer one or the other framework or library within those languages.
With Datadog, we can get in a single line and march in the same direction; our logs and metrics are collected in the same fashion, making it easy to find bugs or integration problems across services and understand how they interact with other systems.
What is most valuable?
I primarily prefer to utilize the profiling and tracing feature. It can potentially be used as a more-informed alternative to logs.
Beyond that, anything I've wanted to do, I found a way to get it done through Datadog. It allows for testing, logging, hardware monitoring, system performance, memory consumption, advanced observability, AI assistance, cross-team collaboration, and business analytics. Datadog helps some of the world’s biggest brands transform faster with the help of true AIOps, AI-assisted answers, UX and business analytics, cloud observability, and smart AI assistance.
It's all supporting my desire to build a great application, and in a centralized SaaS application, it's hard to say anything can beat it.
What needs improvement?
The storage of logs is a little bit unexpected; most services generate gigabytes of logs, and their size is not excessive. When it comes to storing the logs with Datadog, I'm not sure why it costs so much to store gigabytes or terabytes of information when it's a fraction of the cost to do so myself.
For how long have I used the solution?
I've used the solution for one year.
What do I think about the stability of the solution?
We have no concerns with stability.
What do I think about the scalability of the solution?
It appears to be that there are no issues with scaling.
How are customer service and support?
Technical support is slow. It takes forever to get responses from the support team.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
I've previously used Kibana and Prometheus. We are still using these.
How was the initial setup?
Setting up through the environment variables made it unbelievably easy to get started.
What about the implementation team?
We've implemented the solution in-house.
What was our ROI?
I do not have this number off-hand, as I am not the finance guy. I just like the product.
What's my experience with pricing, setup cost, and licensing?
I'd advise new users not to start off by sending logs.
Which other solutions did I evaluate?
We did not really look at other options.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Security Engineering Manager at a financial services firm with 201-500 employees
Democratizes observability, great log searchability, and intuitive UI
Pros and Cons
- "I find the greatest feature is being able to search across logs from various microservices."
- "One area where I was really looking for improvement was the CSPM product line. I had really wanted to have team-level visibility for findings, since the team managing the resources has much more context and ability to resolve the issue, as the service owner. However, this has been added to the announcement in a recent keynote."
What is our primary use case?
I use the solution to manage security-related logs and metrics, as well as create detection rules for security events. I am a security engineer, so one area of interest is the CSPM product, giving us the ability to look at findings across the cloud environment.
The great part about the Datadog security products is that they incorporate the context of the resources/hosts where the security event is found. This allows us to see exactly what is running on a host that we see as a security alert.
How has it helped my organization?
The greatest impact it has had is on the ability to democratize observability and put monitoring into the hands of the people. Teams can quickly get the information they need, without needing a bunch of training, since the UI is super intuitive and easy for beginners. This helps reduce time to resolution during incidents and gives context to developers quickly and easily. Context is really important since seconds matter when the ship is down, and you don't know why.
What is most valuable?
I find the greatest feature is being able to search across logs from various microservices. As a member of the security team, I find that I often need visibility into other teams' services in order to get a good picture of our security posture.
I also am a fan of the ability to easily create monitors and get alerts into Slack quickly, without too much overhead. For example, I often need to create monitors where I am not too sure where the baseline lies. Having the ability to create anomaly monitors makes this process much more straightforward. Anomaly monitors are great for a security team.
What needs improvement?
One area where I was really looking for improvement was the CSPM product line. I had really wanted to have team-level visibility for findings, since the team managing the resources has much more context and ability to resolve the issue, as the service owner. However, this has been added to the announcement in a recent keynote.
For how long have I used the solution?
Personally, I've used it my entire time employed here, more than three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Director of Software Engineering at Code Climate
Helpful dashboards, useful data-driven decision-making and good integration with PagerDuty
Pros and Cons
- "We integrate our application logs. It is great to be able to tie our metrics and our traces together."
- "The pricing should be less of a surprise."
What is our primary use case?
We primarily use the solution for charting application metrics.
We use it for all our application metrics, host metrics, and monitors with a PagerDuty integration.
We integrate our application logs. It is great to be able to tie our metrics and our traces together.
We use the APM module with traces. It is great to be able to link APM, logs, and metrics in one go, as it shortens our troubleshooting and RCA dramatically.
We are loving the tool; it is great to have all those insights in one place.
We hope that they keep making my life and our engineers' life easier.
How has it helped my organization?
The solution improved our organization with:
- Data-driven decision making
- Dashboards we can share with our customer success team
- Dashboards we can share with our sales engineers
- Help during incidents
- Help with preventing incidents
- Integration with PagerDuty.
What is most valuable?
The most valuable aspects of the solution include:
- The charting application metrics
- help with the business, prioritization, software design, and infrastructure design.
What needs improvement?
The pricing model hurts and forces us to work around the tool sometimes.
On top of application performance metrics, it would be great to have host performance metrics, suggesting changes to better use a cluster like: "You are over-provisioning this host" or "based on historical data, you will need to scale up in X days."
Adding a module to extract data from Datadog so we can use the data in our own system would be helpful.
For how long have I used the solution?
I've used the solution for six or more years.
Which solution did I use previously and why did I switch?
We previously used New Relic, which was a great tool. That said, Datadog is a more complete solution.
What's my experience with pricing, setup cost, and licensing?
The pricing should be less of a surprise. They should allow us to cap costs which would lead to less frustration.
We need better documentation on the pricing.
It might be helpful if they added a pricing simulator.
Which deployment model are you using for this solution?
Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Software Engineer at Enable Medicine
Good technical documentation and overall education with improved visibility
Pros and Cons
- "We've found it most useful for managing Rstudio Workbench, which has its own logs that would not be picked up via Cloudwatch."
- "We primarily use the log management functionality, and the only feedback I have there is better fuzzy text searching in logs (the kind that Kibana has)."
What is our primary use case?
We primarily use the solution for log monitoring across our entire cloud infra (EB, EC2, Batch, and Lambda).
This is in addition to Rstudio Workbench, which has its own logs that would not be picked up via Cloudwatch(https://docs.rstudio.com/ide/server-pro/server_management/logging.html#default-log-file-locations).
We own several dozen of these servers, and we used to manage instance logs by tailing logs when incidents occurred. Datadog allows for much better visibility across our entire fleet and has saved us countless hours.
How has it helped my organization?
It is now way easier to search in one place rather than across all of Cloudwatch (and needing to know log groups, etc.).
Primarily, we run several separate deployments of Rstudio Workbench, which has its own logs that would not be picked up via Cloudwatch.
We own several dozen of these servers. We used to manage instance logs manually.
Datadog allows for much better visibility.
What is most valuable?
We've found it most useful for managing Rstudio Workbench, which has its own logs that would not be picked up via Cloudwatch.
Datadog allows for much better visibility across our entire fleet and has saved us countless eng hours as a result.
We plan on trying out offerings such as APM moving forward too.
Some things that Datadog does very well:
- Technical documentation (the docs are clear, concise, and include realistic code samples)
- Overall education efforts (e.g. the codelabs/workshops)
What needs improvement?
We primarily use the log management functionality, and the only feedback I have there is better fuzzy text searching in logs (the kind that Kibana has).
I've learned about a ton of other offerings, like APM, NPM, etc., over the course of workshops. Once I try those out, I'm sure I will have additional feedback.
For how long have I used the solution?
I've used the solution for one year.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Cloud Monitoring Software Application Performance Monitoring (APM) and Observability Network Monitoring Software IT Infrastructure Monitoring Log Management Container Monitoring AIOps Cloud Security Posture Management (CSPM)Popular Comparisons
Wazuh
Splunk Enterprise Security
Dynatrace
Zabbix
New Relic
Azure Monitor
IBM Security QRadar
Elastic Security
AppDynamics
Elastic Observability
Grafana
SolarWinds NPM
Sentry
PRTG Network Monitor
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- Any advice about APM solutions?
- Which would you choose - Datadog or Dynatrace?
- What is the biggest difference between Datadog and New Relic APM?
- Which monitoring solution is better - New Relic or Datadog?
- Do you recommend Datadog? Why or why not?
- How is Datadog's pricing? Is it worth the price?
- Anyone switching from SolarWinds NPM? What is a good alternative and why?
- Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
- What cloud monitoring software did you choose and why?