Try our new research platform with insights from 80,000+ expert users
reviewer08624379 - PeerSpot reviewer
Senior DevOps Engineer at MIM Software Inc.
User
Great documentation and learning platform with good built-in integrations
Pros and Cons
  • "Datadog's learning platform is second to none."
  • "Datadog's roadmap can be a bit unpredictable at times."

What is our primary use case?

We were looking for an all-in-one observability platform that could handle a number of different environments and products. At a basic level, we have a variety of on-premises servers (Windows/Mac/Linux) as well as a number of commercial, cloud-hosted products. 

While it's often possible to let each team rely on its own means for monitoring, we wanted something that the entire company could rally around - a unified platform that is developed and supported by the very same people, not others just slapping their name on some open source products they have no control over.

How has it helped my organization?

Datadog has effortlessly dropped in to nearly every stage of observability for us. We appreciate how it has robust cross-platform support for our IT assets, and for integrating hosted products, enabling integrations often couldn't be easier, with many of them including native dashboards and even other types of content packs. 

Over the last couple of years, we have onboarded a number of engineering teams, and each of them feels comfortable using Datadog. This gives us the ability to build organizational knowledge.

What is most valuable?

Datadog's learning platform is second to none. It's the gold standard of training resources in my mind; not only are these self-paced courses available at no charge, but you can spin up an actual Datadog environment to try out its various features. 

I just hate when other vendors try to upsell you on training beyond their (often poorly-written) documentation. Apart from that, we appreciate the variety of content that comes from Datadog's built-in integrations - for common sources, we don't have to worry about parsing, creating dashboards, or otherwise reinventing the wheel.

What needs improvement?

Datadog's roadmap can be a bit unpredictable at times. For instance, a few years ago, our rep at the time stated that Datadog had dropped its plans to develop an incident on-call platform. However, this year, they released a platform that does exactly that.

They also decided to drop chat-based support just recently. While I understand that it's often easier to work with support tickets, I do miss the easy availability of live support. 

It would be nice if Datadog continued to broaden its variety of available integrations to include even more commercial platforms because that is central to its appeal. If we're looking at a new product and there isn't a native integration, then that's more work on our part.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Works at Koddi
User
Improved response time and cost-efficiency with good monitoring
Pros and Cons
  • "The server monitoring, service monitoring, and user session monitoring are extremely helpful, as they allow us to be alerted ahead of time of issues that users might experience."
  • "I would also like to see an improvement in the server's data extraction times, as sometimes it can take up to ten minutes to download a report for a critical issue that is costing us money."

What is our primary use case?

We monitor our multiple platforms using Datadog and post alerts to Slack to notify us of server and end-user issues. We also monitor user sessions to help troubleshoot an issue being reported. 

We monitor 3.5 platforms on our Datadog instance, and the team always monitors the trends and Dashboards we set up. We have two instances to span the 3.5 platforms and are currently looking to implement more platform monitoring over time. The user session monitoring is consistent for one of these platforms. 

How has it helped my organization?

Datadog has improved our response time and cost-efficiency in bug reporting and server maintenance. We're able to track our servers more fluidly, allowing us to expand our outreach and decrease response time. 

There are many different ways that Datadog is used, and we monitor three and a half platforms on the Datadog environment at this time. By monitoring all of these platforms in one easy-to-use instance, we're able to track the platform with the issue, the issue itself, and its impact on the end user. 

What is most valuable?

The server monitoring, service monitoring, and user session monitoring are extremely helpful, as they allow us to be alerted ahead of time of issues that users might experience. More often than not, an issue is not only able to be identified, but solved and released before an end user notices an issue. 

We are currently using this as an investigative tool to notice trends, identify issues, and locate areas of our program that we can improve upon that haven't been identified as pain points yet. This is another effective use case. 

What needs improvement?

I would like to see a longer retention time of user sessions, even if by 24 to 48 hours, or even just having the option to be configurable. By doing this, we're enabled to store user sessions that have remained invisible for a long time, and identify issues that people are working around. 

I would also like to see an improvement in the server's data extraction times, as sometimes it can take up to ten minutes to download a report for a critical issue that is costing us money. Regardless, I am very happy with Datadog and love the uses we have for the program so far.  

For how long have I used the solution?

I've used the solution for more than four years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Kenneth Dozier - PeerSpot reviewer
Associate Software Engineer at H&R Block, Inc.
User
Easy to use with good speed and helpful dashboards
Pros and Cons
  • "Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue."
  • "I would like to see the integration between PagerDuty and Datadog improved. The tags in Datadog don't match those in PagerDuty, and we have to make it work."

What is our primary use case?

We are using Datadog to improve our cloud monitoring and observability across our enterprise apps.  We have integrated a lot of different resources into Datadog, like Kubernetes, App Gateways, App Service Environments, App Service Plans, and other Web App resources. 

I will be using the monitoring and observability features of Datadog. Dashboards are used very heavily by teams and SREs. We really have seen that Datadog has already improved both our monitoring and our observability.

How has it helped my organization?

The ease and speed of which you can create a dashboard has been a huge improvement.  

The different types of monitors we can create have been huge, too. We can do so many different things with monitors that we couldn't do before with our alerts. 

Being able to click on a trace or log and drill down on it to see what happened has been great.  

Some have found the learning curve a bit steep. That said,they are coming around slowly. There is just a lot of information to learn how to navigate.

What is most valuable?

The different types of monitors have been very valuable. We have been able to make our alerts (monitors) more actionable than we were able to previously.  

Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue. 

RUM is another feature a lot of us are looking forward to seeing how it can help us improve our customer experience during tax season.  

We hope to enable the code review feature at some point to so we can see what code caused the issue.

What needs improvement?

I would like to see the integration between PagerDuty and Datadog improved.  The tags in Datadog don't match those in PagerDuty, and we have to make it work.  Also, I would like to see if the ability to replicate a KQL query in Datadog is made easier or better.  

I would like to see the alert communications to email or phones made better so we could hopefully move off PagerDuty and just use Datadog for that. 

There are also a lot of features that we haven't budgeted for yet and I would like for us to be able to use them in the future.

For how long have I used the solution?

I've used the solution for about two years.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: H&R Block has recently signed with DataDog.
Flag as inappropriate
PeerSpot user
Sid Nigam - PeerSpot reviewer
Works at RAPDEV LLC
User
Top 20
Unified platform with customizable dashboards and AI-driven insights
Pros and Cons
  • "The infrastructure monitoring capabilities, especially for our Kubernetes clusters, have helped us optimize resource allocation and reduce costs."
  • "We'd like to see more advanced incident management capabilities integrated directly into the platform."

What is our primary use case?

Our primary use case for this solution is comprehensive cloud monitoring across our entire infrastructure and application stack. 

We operate in a multi-cloud environment, utilizing services from AWS, Azure, and Google Cloud Platform. 

Our applications are predominantly containerized and run on Kubernetes clusters. We have a microservices architecture with dozens of services communicating via REST APIs and message queues. 

The solution helps us monitor the performance, availability, and resource utilization of our cloud resources, databases, application servers, and front-end applications. 

It's essential for maintaining high availability, optimizing costs, and ensuring a smooth user experience for our global customer base. We particularly rely on it for real-time monitoring, alerting, and troubleshooting of production issues.

How has it helped my organization?

Datadog has significantly improved our organization by providing us with great visibility across the entire application stack. This enhanced observability has allowed us to detect and resolve issues faster, often before they impact our end-users. 

The unified platform has streamlined our monitoring processes, replacing several disparate tools we previously used. This consolidation has improved team collaboration and reduced context-switching for our DevOps engineers. 

The customizable dashboards have made it easier to share relevant metrics with different stakeholders, from developers to C-level executives. We've seen a marked decrease in our mean time to resolution (MTTR) for incidents, and the historical data has been invaluable for capacity planning and performance optimization. 

Additionally, the AI-driven insights have helped us proactively identify potential issues and optimize our infrastructure costs.

What is most valuable?

We've found the Application Performance Monitoring (APM) feature to be the most valuable, as it provides great visibility on trace-level data. This granular insight allows us to pinpoint performance bottlenecks and optimize our code more effectively. 

The distributed tracing capability has been particularly useful in our microservices environment, helping us understand the flow of requests across different services and identify latency issues. 

Additionally, the log management and analytics features have greatly improved our ability to troubleshoot issues by correlating logs with metrics and traces. 

The infrastructure monitoring capabilities, especially for our Kubernetes clusters, have helped us optimize resource allocation and reduce costs.

What needs improvement?

While Datadog is an excellent monitoring solution, it could be improved by building more features to replace alerting apps like OpsGenie and PagerDuty. Specifically, we'd like to see more advanced incident management capabilities integrated directly into the platform. This could include features like sophisticated on-call scheduling, escalation policies, and incident response workflows. 

Additionally, we'd appreciate more customizable machine learning-driven anomaly detection to help us identify unusual patterns more accurately. Improved support for serverless architectures, particularly for monitoring and tracing AWS Lambda functions, would be beneficial. 

Enhanced security monitoring and threat detection capabilities would also be valuable, potentially reducing our reliance on separate security information and event management (SIEM) tools.

For how long have I used the solution?

I've used the solution for two years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
reviewer9816413 - PeerSpot reviewer
Engineering Manager at Video Blocks
User
Top 20
Easy, more reliable, and transparent monitoring
Pros and Cons
  • "Monitors have also been very valuable when setting up our on-call processes. It makes it easy to set up and adjust alerting to keep our teams aware of anything going wrong."
  • "One thing to improve would be making it easier to see common patterns across traces."

What is our primary use case?

We use the solution to monitor and investigate issues with production services at work. We're periodically reviewing the service catalog view for the various applications and I use it to identify any anomalies with service metrics, any changes in user behavior evident via API calls, and/or spikes in errors.  

We use monitors to trigger alerts for on-call engineers to act upon. The monitors have set thresholds for request latency, error rates, and throughput. 

We also use automated rules to block bad actors based on request volume or patterns.

How has it helped my organization?

Datadog has made setting up monitors easier, more reliable, and more transparent. This has helped standardize our on-call process and set all of our on-call engineers up for success.  

It has also standardized the way we evaluate issues with our applications by encouraging all teams to use the service catalog.  

It makes it easier for our platforms and QA teams to get other engineering teams up to speed with managing their own applications' performance. 

Overall, Datadog has been very helpful for us.

What is most valuable?

The service catalog view is very helpful for periodic reviews of our application. It has also standardized the way we evaluate issues with our applications.  Having one page with an easy-to-scan view of app metrics, error patterns, package vulnerabilities, etc., is very helpful and reduces friction for our full-stack engineers.

Monitors have also been very valuable when setting up our on-call processes. It makes it easy to set up and adjust alerting to keep our teams aware of anything going wrong.

What needs improvement?

Datadog is great overall. One thing to improve would be making it easier to see common patterns across traces. I sometimes end up in a trace but have a hard time finding other common features about the error/requests that are similar to that trace. This could be easier to get to; however, in that case, it's actually an education issue.  

Another thing that could be improved is the service list page sometimes refreshes slowly, and I accidentally click the wrong environment since the sort changes late.

For how long have I used the solution?

I've used the solution for about a year.

What do I think about the stability of the solution?

It is very stable. I have not seen any issues with Datadog.

What do I think about the scalability of the solution?

It seems very scalable.

How are customer service and support?

I've had no specific experience with technical support.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We used Honeycomb before. We switched since Datadog offered more tooling.

How was the initial setup?

Each application has been easy to instrument.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

Engineers save an unquantifiable amount of time by having one standard view for all applications and monitors.

What's my experience with pricing, setup cost, and licensing?

I am not exposed to this aspect of Datadog.

Which other solutions did I evaluate?

We did not evaluate other options. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Dmitri Panfilov - PeerSpot reviewer
Software Engineer at Redfin Corp
User
Top 20
Easy dashboard creation and alarm monitoring with a good ROI
Pros and Cons
  • "The ease of dashboard creation and alarm monitoring has helped us not only stay competitive but be industry leaders in performance."
  • "The product can be improved by allowing the grouping of APIs to add variables. That way, any API with a unique ID could be grouped together."

What is our primary use case?

We use the solution to monitor production service uptime/downtime, latency, and log storage. 

Our entire monitoring infrastructure runs off Datadog, so all our alarms are configured with it. We also use it for tracing API performance; what are the biggest regression points. 

Finally we use it to compare performance on SEO metrics vs competitors. This is a primary use case as SEO dictates our position from google traffic which is a large portion of our customer view generation so it is a vital part of the business we rely on datadog for.

How has it helped my organization?

The product improved the organization primarily by providing consistent data with virtually zero downtime. This was a problem we had with an old provider. It also made it easy to transition an otherwise massive migration involving hundreds of alarms. 

The training provided was crucial, along with having a dedicated team that can forward our requests to and from Datadog efficiently. Without that, we may have never transitioned to Datadog in the first place since it is always hard to lead a migration for an entire company.

What is most valuable?

The API tracing has been massive for debugging latency regressions and how to improve the performance of our least performant APIs. Through tracing, we managed to find the slowest step of an API, improve its latency, and iterate on the process until we had our desired timings. This is important for improving our SEO as LCP, INP are directly taking from the numbers we see on Datadog for our API timings. 

The ease of dashboard creation and alarm monitoring has helped us not only stay competitive but be industry leaders in performance.

What needs improvement?

The product can be improved by allowing the grouping of APIs to add variables. That way, any API with a unique ID could be grouped together. 

Furthermore, SEO monitoring has been crucial for us but also a difficult part to set up as comparing alarms between us and competitors is a tough feat. Data is not always consistent so we have been toying and experimenting with removing the noise of datadog but its been taking a while. 

Finally, Datadog should have a feature that reports stale alarms based on activity.

For how long have I used the solution?

I've used the solution for six months.

What do I think about the stability of the solution?

Its very stable and we have not experienced an issue with downtime on Datadog.

What do I think about the scalability of the solution?

Datadog works well for scalability as volume has not seemed to slow.

How are customer service and support?

We haven't talked to the support team. 

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We switched to Datadog as we used to have a provider that had very inconsistent logging. Our alarms would often not fire since our services were not working since the provider had a logging problem.

How was the initial setup?

The initial setup was somewhat complex due to the built-in monitoring with services. This is not always super comprehensive and has to be studied as opposed to other metrics platforms that just service all your endpoints, which you can trace them with Grafana.

What about the implementation team?

We implemented the solution through an in-house team.

What was our ROI?

The ROI is good.

What's my experience with pricing, setup cost, and licensing?

Users must try to understand the way Datadog alarms work off the bat so that they can minimize the requirements for expensive features like custom metrics. 

It can sometimes be tempting to use them; however, it is not always necessary as you migrate to Datalog, as they are a provider that treats alarms somewhat differently than you may be used to.

Which other solutions did I evaluate?

We have evaluated New Relic, Grafana, Splunk, and many more in our quest to find the best monitoring provider.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Architect at SEI Investments
Real User
Great support with a helpful APM and profiler
Pros and Cons
  • "The most valuable aspects of the product include the APM and profiler."
  • "I find the training great. That said, it is set for the LCD (lowest common denominator). Of course, this is very helpful to sell the product, yet, to really utilize the product, you need to get more detailed."

What is our primary use case?

We primarily use Datadog for:

  • Native memory
  • Logging
  • APM
  • Context switching
  • RUM
  • Synthetic
  • Databases
  • Java
  • JVM settings
  • File i/o
  • Socket i/o
  • Linux
  • Kubernetes
  • Kafka
  • Pods
  • Sizing

We are testing Datadog as a way to reduce our operational time to fix things (mean time to repair). This is step one. We hope to use Datadog as a way to be proactive instead of reactive (mean time to failure).

So far, Datadog has shown very good options to work on all of our operational and development issues. We are also trying to use Datadog to shift left, and fix things before they break (MTTF increase).

How has it helped my organization?

We are currently in a POC and do not own Datadog at the moment. 

So far, there have been a few issues due to security. There are two main security issues. 

The first is moving data off-prem. This has been resolved to a point (filtering logs, etc). However, there is still an issue with moving a JFR as a JFR potentially contains data that is not allowed off-prem.

The second security issue is more internal, however, the main installation requires root access or using an ACL. Our company does not use ACLs on our Linux platform. This is problematic since the install sets a no-login on the Datadog user.

What is most valuable?

The most valuable aspects of the product include the APM and profiler.

These two have given us insights into things that are very difficult to track down given the standard OS (Linux) tools. 

The native memory tracking is super difficult to see exactly where it comes from. I attended a course (continuous profiling), and it showed me the potentially very important capabilities.

If you add these details to a standard dashboard, or a sub-dashboard for techy people, or even just a notebook, it would be easy to identify issues before they occur.

Combining these details with the basic tools (infra, logging, APM, and good rules), Datadog can easily show the details that a true engineer would need. It isn't just for monitoring, however, I see the value in it for engineers.

What needs improvement?

I have done every training offered (and in a short period of time: two days for 20 courses).

I find the training great. That said, it is set for the LCD (lowest common denominator). Of course, this is very helpful to sell the product, yet, to really utilize the product, you need to get more detailed.

If I did the training as it is written and I cut/paste a bunch of stuff and see the cut/paste work, I didn't really learn anything. Later sessions (I quit using the editor and switched to VI) stopped cutting and pasting, and learned much more.

For how long have I used the solution?

I've used the solution for one month.

What do I think about the stability of the solution?

I' give stability a thumbs up.

What do I think about the scalability of the solution?

We are not sure yet in terms of scalability. The off-prem solution seems to scale well (although had issues with the training slowing down).

How are customer service and support?

Technical support is great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I previously used Dynatrace and Elastic. We didn't switch. We are in a POC.

How was the initial setup?

The initial setup is simple yet complex. There are too many teams are needed.

What about the implementation team?

We did the initial setup in-house.

What was our ROI?

In terms of ROI, the labor saving is probably the biggest. The NPR is probably second - although management would probably reverse these.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing is fairly complicated. A GB for .1 sounds great, however, once you put all 16 or so prices together, it adds up fast. A cost model sheet on the main site would be very helpful.

Which other solutions did I evaluate?

We are currently in a POC.

What other advice do I have?

We work with all product versions.

Which deployment model are you using for this solution?

On-premises

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
JulianLewis - PeerSpot reviewer
Senior Engineer at a educational organization with 5,001-10,000 employees
Real User
I like the amount of tooling and the number of solutions they sold with their monitoring.
Pros and Cons
  • "I like the amount of tooling and the number of solutions they sold with their monitoring. Datadog was highly intuitive to use."
  • "Datadog needs more local Asia-Pacific support, and if they don't have a SaaS solution in Asia-Pacific, they should offer an on-prem version. I'm told that's not possible."

What is our primary use case?

Datadog is a SaaS solution we tried for URL and synthetic monitoring. You record a transaction going into a website and replay that transaction from various locations. Datadog is mainly used by the admin, but three or four other guys had access to the reports and notifications, so it's five altogether.  

We probably tried no more than 8 percent of what Datadog can do. There are so many other bits and modules. I've only gone into about half of what APM can do in the Datadog stack.

How has it helped my organization?

We could detect outages on particular websites or problems in specific locations. If I had paid for the full solution, I'm sure I could get a lot of value out of Datadog.

What is most valuable?

I like the amount of tooling and the number of solutions they sold with their monitoring. Datadog was highly intuitive to use. 

What needs improvement?

Datadog needs more local Asia-Pacific support, and if they don't have a SaaS solution in Asia-Pacific, they should offer an on-prem version. I'm told that's not possible. 

For how long have I used the solution?

I have used Datadog for about two or three years.

What do I think about the scalability of the solution?

I was only using Datadog to monitor on a small scale. 

How are customer service and support?

I'd rate Datadog support four out of 10. It was primarily an issue with support in the Asia-Pacific region. I sent them several emails, and they responded around three weeks later. 

They said it went around the houses. Nobody knew who to respond to. That's not good enough. They should have at least told me they'd received the email. I used to work in support.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We were just trying Datadog, and we've switched temporarily to Site24x7. We're looking for one of the bigger ones. They've all given us proposals, whereas Datadog hasn't come forward with a proposal for what they could do.

I used Datadog because I already had a relationship with them at a previous company. However, that guy's moved on now, and I wanted to see how good they were. 

How was the initial setup?

Setting up Datadog is pretty straightforward. I have a lot of experience doing that sort of thing. It took maybe a day and a half to deploy because I was picking externally facing websites.

I deployed it by myself. One person is enough for the small system we had. However, if we were moving forward, I'd recommend at least two or three people to manage it. 

What's my experience with pricing, setup cost, and licensing?

Datadog would've cost around $850 a month based on the loads we were doing, and you could estimate roughly what you would be paying monthly. I liked their pricing model. It was flexible, so you only paid for what you used. I rate Datadog pricing eight out of 10. 

Which other solutions did I evaluate?

We looked at several URL and APM monitoring solutions like Site24x7 and Pingdom. They weren't big players like Dynatrace or any of the those that had already provided us a request for information. 

What other advice do I have?

Even with our negative experiences, I'd still give Datadog an eight out of 10. Datadog is a complete solution with easy-to-use templates and excellent scalability. People should know exactly what they're going to configure before they try it out. The trial is brief. Don't start a trial until you know exactly what you're going to do. 

You must be certain that you can meet any internal security requirements. If you're in the Asia-Pacific region, you might not be able to run something that's running abroad.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.