Try our new research platform with insights from 80,000+ expert users
Ramon Snir - PeerSpot reviewer
CTO at a tech vendor with 1-10 employees
Real User
Increases delivery velocity with les manual testing and good integrations
Pros and Cons
  • "Since we integrated Datadog, we have had increased confidence in the quality of our service, and we had an easier time increasing our delivery velocity."
  • "Since the Datadog platform has so many separate features, solving so many use cases, there are often inconsistencies in feature availability and interoperability between products."

What is our primary use case?

We use Datadog for three main use cases, including:

  • Infrastructure and application monitoring. It is ensuring that our services are available and performant at all times. This allows us to proactively address incidents and outages without customers contacting us. This includes monitoring of cloud resources (databases, load balancers, CPU usage, etc.), high-level application monitoring (response times, failure rates, etc.), and low-level application monitoring (business-oriented metrics and functional exceptions to customer experience.
  • Analyzing application behavior, especially around performance. We often use Datadog's application performance monitoring on non-production environments to evaluate the impact of newly introduced features and gain confidence in changes.
  • End-to-end regression testing for APIs and browser-based experiences. Using Datadog's synthetic testing checks periodically that the system behaves in the exact correct way. This is often used as a canary to detect issues even before users reach them organically.

How has it helped my organization?

Since we integrated Datadog, we have had increased confidence in the quality of our service, and we had an easier time increasing our delivery velocity. 

We have seen time after time that the monitors we have carefully created based on all ingested data are detecting issues quickly and accurately. 

This means we allow ourselves to manually test things less frequently. We have also had an easier time investigating application errors and slowness using Datadog's APM and log explorer products which allow us to introspect any part of the system, in its execution context.

What is most valuable?

The most valuable features include:

  • Integrated observability data ingestions: All data that Datadog collects is connected. This allows easily connected logs with failed requests, and slow database questions with services and requests.
  • Broad integrations allow us to monitor our entire production environment in a single place, not just cloud resources. Since all parts stream metrics, logs, and events to Datadog, we can have unified dashboards and manage monitors and incidents all from the same page.
  • A high level of configuration. We can configure and modify many parts, from how data is collected from our applications to how Datadog parses and visualizes it. This means that we always get the best experience, and we don't need to find ten different products that do small things well or settle on one product that does everything badly.

What needs improvement?

Since the Datadog platform has so many separate features, solving so many use cases, there are often inconsistencies in feature availability and interoperability between products. 

Older, more mature products tend to be complete (many features, customization, broad integrations, etc.), while newer products will often be at a "just above minimum viable product" phase for a long time, doing what's intended yet missing valuable customizations and integrations.

Buyer's Guide
Datadog
December 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.

For how long have I used the solution?

We've used the solution for 12 months.

What do I think about the scalability of the solution?

The solution scales very well on technical aspects, being able to ingest large quantities of data from many services. However, the pricing often doesn't scale naturally, and effort has to be put in to keep ongoing costs at a reasonable amount.

How are customer service and support?

Customer service and support are generally very high-quality. In most cases, they reply very quickly and offer well-researched and relevant responses. This is contrasted with many vendors who take a long time to reply and send links to documentation instead of understanding the problem.

However, we had cases where support took several weeks to reply to a complicated request and sometimes eventually responded that the issue cannot be resolved. These are rare edge-case occurrences.

How would you rate customer service and support?

Positive

How was the initial setup?

A large part of the initial setup was straightforward. We were able to collect about 80% of the relevant and 90% of the meaningful insights from just a couple of hours of connecting the AWS integration and the Datadog APM agent. 

Getting it to 100% and configuring and customizing things to our unique situation, took about two weeks. Datadog's documentation and support team were extremely helpful during both phases.

What about the implementation team?

We handled the setup in-house.

What was our ROI?

From the number of outages stopped or shortened (which lead to lost revenue from non-renewals) and the number of hours saved on investigations (which correlates to engineering salaries), I estimate that the ROI of the implementation time and monthly charges to be between 10x and 20x.

What other advice do I have?

We use the solution as a SaaS deployment.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2000457 - PeerSpot reviewer
Staff Cloud Engineer at a energy/utilities company with 51-200 employees
Real User
Good infrastructure and APM metrics with easy onboarding of new products
Pros and Cons
  • "We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level."
  • "The real issue with this product is cost control."

What is our primary use case?

We are using the solution for migrating out of the data center. Old apps need to be re-architected. We plan to move to multi-cloud for disaster recovery and avoid vendor lockouts. The migration is a mix between an MSP (Infosys) and in-house devs. The hard part is ensuring these apps run the same in the cloud as they do on-prem. Then we also need to ensure that we improve performance when possible. With deadlines approaching quickly, it is important not to cut corners which is why we needed observability.

How has it helped my organization?

The product has created a paradigm shift in how we deploy monitoring. Before, we had a one-to-one lookup in service now. This wouldn't scale, as teams wouldn't be able to create monitors on the fly and would have to wait on us to contact the ServiceNow team to create a custom lookup. Now, in real-time, as new instances are spun up and down, they are still guaranteed to be covered by monitoring. This used to require a change request, and now it is automatic.

What is most valuable?

For use, the most valuable features we have are infrastructure and APM metrics. The seamless integration between Datadog and hundreds of apps makes onboarding new products and teams a breeze. 

We rely heavily on the API crawlers that Datadog uses for cloud integrations. These allow us to pick up and leverage the tags teams have already deployed without having also to make them add them at the agent level. Then we use Datadogs conditionals in the monitor to dynamically alert hundreds of teams, and with the ServiceNow integration, we can also assign tickets based on the environment. Now, our top teams are using APM/profiler to find bottlenecks and improve the speed of our apps.

What needs improvement?

The real issue with this product is cost control. For example, when logs first came out, they didn't have any index cuts. This leads to runaway logs and exploding costs. 

It seems that admin cost control granularity is an afterthought. For example, synthetics have been out for over four years, yet there are no ways to limit teams from creating tests that fire off every minute. If we could say you can't test more than once every five minutes that would save us 5X on our bill.

For how long have I used the solution?

I've been using the solution for about three years. 

What do I think about the stability of the solution?

The solution is very stable. There are not too many outages, and they fix them fast.

What do I think about the scalability of the solution?

It is easy to scale. It's why we adopted it. 

How are customer service and support?

Before premium support, I would avoid using them since it was so bad.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We previously used App Dynamics. It isn't built for the cloud and is hard to deploy at scale.

How was the initial setup?

The initial setup was not complex. We just had to teach teams the concept of tags.

What about the implementation team?

We implemented the solution in-house. It was me. I am the SME for Datadog at the company.

What was our ROI?

We have seen an ROI. It has saved months of time and reduced blindspots for all app teams.

What's my experience with pricing, setup cost, and licensing?

We'd advise new users to be careful with logs, and the APM as those are the ones that can get expensive fast.

Which other solutions did I evaluate?

We looked into Dynatrace. However, we found the cost to be high.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
December 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
reviewer1996488 - PeerSpot reviewer
Software Engineer at Spring Health
User
Great dashboards and custom metrics with the ability to parse logs
Pros and Cons
  • "The dashboards are great."
  • "We need more advanced querying against logs."

What is our primary use case?

We share dashboards, set up alerts, and monitor everything that happens in our system. We use it in staging, features, production, and our load test environment. It is exceptionally helpful for making our engineering more data-driven. 

I came from a company that believes we should focus on being telemetry driven. Instilling this in a smaller, less mature engineering organization has been challenging. However, it is much easier while using Datadog.

What is most valuable?

The dashboards are great. They are an easy way to give visibility into what we need to watch with others who are not SMEs.

I enjoy the custom metrics. With this, we can take things that were once logs and then retain them longer.

We are able to parse logs. To be honest, this was only useful due to the fact that we had not yet set up the Datadog agent properly in PHP. Once we did this, the Datadog log parsing was no longer needed.

The ability to pin to a date and time is very helpful. This allows us to pinpoint exactly what was happening.

What needs improvement?

We need more advanced querying against logs. While most issues I have had here can be alleviated by way of sending better-formatted logs, it would be cool to do SQL-type queries against our data.

We need a way to see dashboard metadata. We launched a huge customer, and we saw more people using Datadog than ever across the entire organization, yet had no way to tell.

It would be ideal if we had some way to compare arbitrary date times more easily. We would love to use the Diff Graph command against some hard-coded value, for instance, against some known event.

For how long have I used the solution?

I've used the solution for eight months.

What do I think about the scalability of the solution?

The scalability is great!

Which solution did I use previously and why did I switch?

We previously used New Relic. I was not part of the decision-making team that made the switch.

What was our ROI?

The ROI is the speed at which we can debug live sites. It has been excellent. It's amazing how many incidents we can capture before customers notice.

Which other solutions did I evaluate?

We looked into New Relic and a home-brewed solution as potential other options.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer0962486 - PeerSpot reviewer
Head of Product Design at hackajob
User
Good alerts and detailed data but needs UI improvements
Pros and Cons
  • "Session recordings have been the most valuable to me as it helps me gain insights into user behaviour at scale."
  • "In terms of UI, everything is very small, which makes it quite difficult to navigate at times."

What is our primary use case?

I work in product design, and although we use Datadog for monitoring, etc, my use case is different as I mostly review and watch session recordings from users to gain insight into user feedback.

We watch multiple sessions per week to understand how users are using our product. From this data, we are able to hone in on specific problems that come up during the sessions. We then reach out to specific users to follow up with them via moderated testing sessions, which is very valuable for us.

How has it helped my organization?

Using Datadog has allowed us to review detailed interactions of users at a scale that leads us to make informed data-driven UX improvements as mentioned above.

Being able to pinpoint specific users via filtering is also very useful as it means when we have direct feedback from a specific user, we can follow up by watching their session back. 

The engineering team's use case for Datadog is for alerting, which is also very useful for us as it gives us visibility of how stable our platform is in various different lenses.

What is most valuable?

Session recordings have been the most valuable to me as it helps me gain insights into user behaviour at scale. By capturing real-time interactions, such as clicks, scrolls, and navigation paths, we can identify patterns and trends across a large user base. This helps us pinpoint usability issues, optimize the user experience, and improve the overall experience for our users. Analyzing these recordings enables us to make data-driven decisions that enhance both functionality and user satisfaction.

What needs improvement?

I'd like the ability to see more in-depth actions on user sessions, such as where there are specific problems and rather than having to watch numerous session recordings to understand where this happens to get alerts/notifications of specific areas that users are struggling with - such as rage clicks, etc.

In terms of UI, everything is very small, which makes it quite difficult to navigate at times, especially in terms of accessibility, so I'd love for there to be more attention on this.

For how long have I used the solution?

I've used the solution for over one year.

Which solution did I use previously and why did I switch?

We did not evaluate other options. 

What's my experience with pricing, setup cost, and licensing?

I wasn't part of the decision-making process during licensing.

Which other solutions did I evaluate?

I wasn't part of the decision-making process during the evaluation stage.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
SecOps Engineer at Ava Labs
User
Helpful support, with centralized pipeline tracking and error logging
Pros and Cons
  • "Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most."
  • "While the documentation is very good, there are areas that need a lot of focus to pick up on the key details."

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting. 

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards. 

What is most valuable?

The centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing is great, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

What needs improvement?

While the documentation is very good, there are areas that need a lot of focus to pick up on the key details. In some cases the screenshots don't match the text when updates are made. 

I spent longer than I should trying to figure out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime.

What do I think about the scalability of the solution?

It's scalable and customizable. 

How are customer service and support?

Support is helpful. They help us tune our committed costs and alert us when we start spending out of the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility.

How was the initial setup?

Setup is generally simple. .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

There has been significant time saved by the development team in terms of assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

I'd advise others to set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

We are excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Hoon Kang - PeerSpot reviewer
Full Stack Engineer at K HEALTH, INC
User
Top 20
Good alerting and issue detection for many valuable features
Pros and Cons
  • "Thanks to frequent concurrent deployments, the DataDog alerts monitors allow us quickly detect issues if anything occurs."
  • "The monitors can be improved."

What is our primary use case?

Our company has a microservice architecture, with different teams in charge of different services. Also, it is a start, which means that we have to build fast and move very fast as well. So before we were properly using DD, we often had issues of things breaking, but without much information on where in our system the breaking happened. This was quite a big-time sync as teams were unfamiliar with other teams' codes, so they needed the help of other teams to debug. This slowed our building down a lot. So implementing dd traces fixed this

What is most valuable?

DataDog has many features, but the most valuable have become our primary uses.

Also, thanks to frequent concurrent deployments, the DataDog alerts monitors allow us quickly detect issues if anything occurs.

What needs improvement?

The monitors can be improved. The chart in the monitors only goes back a couple of hours, clunky. Also, it can provide more info, like traces within the monitors. We have many alerts connected to different notification systems, such as Slack and Opsgenie. 

When the on-caller receives notifications fired by the alerts, we are taken to the monitors. Yet often, we have to open up many different tabs to see logs, traces and info that is not accessible on the monitors. I think it would make all of the on callers' lives easier if the monitor had more data

For how long have I used the solution?

We've used the solution for three years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
reviewer2003943 - PeerSpot reviewer
Software Engineer at a financial services firm with 10,001+ employees
Real User
Helpful support, good RUM monitoring, and nice dashboards
Pros and Cons
  • "I really enjoy the RUM monitoring features of Datadog. It allows us to monitor user behavior in a way we couldn't before."
  • "At times, it can be hard to generate metrics out of logs."

What is our primary use case?

We use it to monitor and alert our ECS instances as well as other AWS services, including DynamoDB, API Gateway, etc. 

We have it connected to Pagerduty for alerting all our cloud applications. 

We also use custom RUM monitoring and synthetic tests for both our internal and public-facing websites. 

For our cloud applications, we can use Datadog to define our SLOs, and SLIs and generate dashboards that are used to monitor SLOs and report them to our senior leadership.

How has it helped my organization?

Datadog has been able to improve our cloud-native monitoring significantly, as CloudWatch doesn't have enough features to create robust, sustainable dashboards that are easily able to present all the information in an aggregated manner in one place for a combination of applications, databases, and other services including our UI applications. 

RUM monitoring is also something we didn't have before Datadog. We had Splunk, which was a lot harder to set up than Datadog's custom RUM metrics and its dashboards.

What is most valuable?

I really enjoy the RUM monitoring features of Datadog. It allows us to monitor user behavior in a way we couldn't before. 

It's useful to be able to obfuscate sensitive information by setting up custom RUM actions and blocking the default ones with too much data. 

I also like being able to generate custom metrics and monitors by adding facets to existing logging. Datadog can parse logs well for that purpose. The primary method of error detection for our external website is synthetic tests. This is extremely valuable for us as we have a large user base.

What needs improvement?

At times, it can be hard to generate metrics out of logs. I've seen some of those break over time and have flakey data available. 

Creating a monitor out of the metric and using it in a dashboard to generate our SLIs and SLOs has been hard, especially in cases where the data comes from nested logging facets.

For how long have I used the solution?

I've used the solution for two years.

What do I think about the stability of the solution?

The stability is pretty good.

What do I think about the scalability of the solution?

The solution is pretty scalable! It's hard to set up all the infra (terraform code) required to link private links in Datadog to all of our different AWS accounts.

How are customer service and support?

They offer good support. Solutions are provided by the team when needed. For example, we had to delete all our RUM metrics when we accidentally logged sensitive data and the CTO of Datadog stepped in to help out and prioritize it at the time.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used Splunk and some internal tools. We switched due to the fact that some cloud applications don't integrate well with pre-existing solutions.

How was the initial setup?

The initial setup for connecting our different AWS accounts via Datadog private link wasn't great. There was a lot of duplicate terraform that had to be written. The dashboard setup is way easier.

What about the implementation team?

We installed it with the help of a vendor team.

What was our ROI?

Our return on investment is great and is so much better than CloudWatch. We can easily integrate with Pagerduty for alerting.

What's my experience with pricing, setup cost, and licensing?

Our company set up the product for us, so the engineers didn't need to be involved with pricing. 

The pricing structure isn't very clear to engineers.

Which other solutions did I evaluate?

We looked into Splunk and some internal tools.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Jon Schwartz - PeerSpot reviewer
Senior Software Engineer at LeafLink
Real User
Good log stream with a useful APM and democratizes logs
Pros and Cons
  • "Datadog's log aggregation is really helpful since it lets me and every other engineer on my team login, view, and share logs when we need to debug our application."
  • "The menu on the left is pretty dense (and I know it has to be). I never knew about the cmd+k functionality until recently. It would be helpful to offer more tips/cheat sheets to see handy shortcuts like that."

What is our primary use case?

We use Datadog to view and aggregate logs and monitor all of our services. We have a lot of running infrastructure and it is very convenient to have logs and metrics all aggregated somewhere we can view and chart them. 

I use Datadog to create dashboards and runbooks, and sharable graphs, which really help out my whole team. We mostly use logs and APM, yet have been starting to use other products. I would like to use more synthetic monitors.

How has it helped my organization?

It has democratized our logs and metrics, allowing all engineers to have insight into how our apps perform. It is also extremely helpful when debugging issues. 

It would be very difficult to debug issues without aggregated logs and APM traces. 

It has also definitely saved us some money since we can keep an eye on our running infrastructure in an easy-to-see way, rather than a less friendly CLI. It has been a very big help!

What is most valuable?

The log stream has been the most useful thing. Having so many logs on so many different running containers means it is very inconvenient to view them individually. Datadog's log aggregation is really helpful since it lets me and every other engineer on my team login, view, and share logs when we need to debug our application. 

APM has also been extremely helpful for debugging issues and profiling and optimizing our apps. Dashboards have also been really helpful for communicating needs and priorities to engineering leadership. 

It is very easy to get buy-in with graphs to back things up.

What needs improvement?

I recently saw the education, and it is amazing. Events like DASH are extremely helpful in understanding the deep set of features. Anything that helps to educate users is a huge win here. 

The menu on the left is pretty dense (and I know it has to be). I never knew about the cmd+k functionality until recently. It would be helpful to offer more tips/cheat sheets to see handy shortcuts like that.

For how long have I used the solution?

I've used the solution for three years. 

Which solution did I use previously and why did I switch?

We previously used AWS Cloudwatch logs. It was way less friendly and fully featured.

How was the initial setup?

The solution is pretty straightforward to set up. It helps with logs and metrics, and the AWS integration is really great.

What about the implementation team?

We handled the implementation in-house.

What other advice do I have?

It is hard to educate an entire team. There is a big learning curve.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.