Try our new research platform with insights from 80,000+ expert users
Mason Parry - PeerSpot reviewer
Data Engineer at Nursa
User
Top 20
Customizable alerts, good dashboards, and improves reliability
Pros and Cons
  • "I like how we can customize alerts, and when alerts have become too noisy, we turn their threshold down fairly easily."
  • "It's not that straightforward when creating an alert. The syntax is a little confusing."

What is our primary use case?

We have several teams and several different projects, all working in tandem, so there are a lot of logs and monitoring that need to be done. We use Datadog mostly for alerting when things go down. 

We also have several dashboards to keep track of critical operations and to make sure things are running without issues. The Slack messaging is essential in our workflow in letting us know when an alert is triggered. I also appreciate all the graphs you can make, as it gives our team a good overview of how our services are doing.

How has it helped my organization?

It has improved our reliability and our time to get back up from an outage. By creating an alert and then messaging a Slack channel, we know when something goes down fairly fast. This, in turn, improves our response time to swarm on an issue without it affecting customers. The graphs have also been useful to demonstrate to higher-ups how our services are performing, allowing them to make more informed decisions when it comes to the team. 

What is most valuable?

The alerts are the most valuable. Having alerts have saved us countless times in the past and is essentially what we use data dog for. 

I like how we can customize alerts, and when alerts have become too noisy, we turn their threshold down fairly easily. This is also the case when alerts should be notifying us more often. 

I also like the graphs and how customizable they are. It allows us to create a nice-looking dashboard with all sorts of information relating to our project. This gives us a quick overview of how things are going.

What needs improvement?

It's not that straightforward when creating an alert. The syntax is a little confusing. I guess that the trade-off is customizability. But it would be nice to have a click-and-drag kind of way when creating an alert. So, if someone who isn't so familiar with Datadog or tech in general wanted to create an alert, they wouldn't need to know the syntax. 

It would also be great if AI could be used to generate alerts and graphs. I could write a short prompt, and then the AI could auto-generate alerts and graphs for me.

Buyer's Guide
Datadog
December 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for more than two years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Tejaswini A - PeerSpot reviewer
Application Engineer at Discover Financial Services
User
Top 20
Consolidates all our logs into a single place, making it easier to find errors
Pros and Cons
  • "The best way it has helped us is by consolidating all our logs into a single place and making it easier to find errors."
  • "Another issue that I have is with the search syntax, it could be simpler and it feels like there are too many ways to do the same things."

What is our primary use case?

We have a tech stack including all backend services written in TS/Node (mostly) and as a full stack engineer, it is crucial to keep track of new and existing errors. Our logs have been consolidated in Datadog and are accessible for search and review, so the service has become a daily tool for my work. 

More recently, session replay has been adopted at my company, but I do not like it so much because the UI elements are not in their place, so it is very hard to see what the users on the web app are actually clicking on.

How has it helped my organization?

The best way it has helped us is by consolidating all our logs into a single place and making it easier to find errors. Previously using AWS Cloudwatch was cumbersome and time-consuming. One issue I do have with logs is the length of time they are on the platform. Some issues happen sporadically, so it would be good to have logs for longer than one month by default or make it a configuration. 

Another issue that I have is with the search syntax, it could be simpler and it feels like there are too many ways to do the same things.

What is most valuable?

Logs search is the most valuable feature because it has consolidated all of our backend services logs into one place. Now we can see the relationship between them as requests are going from one service to other dependencies. 

What needs improvement?

One issue I do have with logs is the length of time they are on the platform. Some issues happen sporadically, so it would be good to have logs for longer than one month by default or make it a configuration. I have yet to try rehydrating logs, so this might be an option I need to try. Another issue I have is with the search syntax, it could be simpler. The syntax is a bit cumbersome and there is not an intuitive to save them to look for similar searches in the future. 

Finally, while my company replaced a different tool for session replay with DataDog's version, I find it clunky and in need of further improvements. For example, when troubleshooting a web portal issue, it is super important to know what the user clicked, but the elements are not where they should be in the replay.

It is also hard to find details about the sessions, and metadata such as user email, account, etc. that exist on other services with replay features.

For how long have I used the solution?

I have been using Datadof for approximately five years.

What do I think about the stability of the solution?

So far we haven't had any issues with uptime and Datadog has been available when needed.

What do I think about the scalability of the solution?

It seems to scale well as we continue to add services that need monitoring.

How are customer service and support?

I haven't had to contact support.

Which solution did I use previously and why did I switch?

Cloudwatch was not a great tool for what we need to do to troubleshoot issues.

What about the implementation team?

We deployed it in-house with intermediate expertise.

What was our ROI?

I am not sure how much we are paying, but I use the app often enough to feel like we are getting a good ROI.

Which other solutions did I evaluate?

I was not involved in the choosing process as a software engineer

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Datadog
December 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
Caleb Parks - PeerSpot reviewer
Works at iSpot.tv
User
Lots of features with a rapid log search and an easy setup process
Pros and Cons
  • "The ease of graph building is nice, and MUCH easier than Prometheus."
  • "It is far too easy to run up huge unexpected costs."

What is our primary use case?

We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.

How has it helped my organization?

The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.

What is most valuable?

I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.

What needs improvement?

It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.

It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

I'd rate the stability ten out of ten.

What do I think about the scalability of the solution?

I'd rate the scalability ten out of ten.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The setup is very straightforward. Users just install the helm chart, and boom, you're done.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.

Which other solutions did I evaluate?

We've never evaluated other solutions.

What other advice do I have?

It's a great product. However, you have to pay for quality.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
reviewer2004174 - PeerSpot reviewer
Senior Software Engineer at a insurance company with 10,001+ employees
Real User
Very good RUM, synthetics, and infrastructure host maps
Pros and Cons
  • "Overall, the Data UI and the usability of customer features continue to improve."
  • "It is very difficult to make the solutions fit perfectly for large organizations, especially in terms of high cardinality objects and multi-tenancy, where the data needs to be rolled up to a summarized level while maintaining its individual data granularity and identifiers."

What is our primary use case?

I have been using Datadog products and capabilities increasingly over the last 4 years, from POC to widespread adoption. 

The capabilities we use are unique for each use case and can be combined in various ways to provide the full observability coverage needed to maintain stable operations and shift from becoming more reactive to proactive. 

Our organization uses both site/service reliability for the range of backend and frontend services, custom monitoring, and dashboards that can be dynamic and reused for multiple teams.

How has it helped my organization?

The capabilities we use are unique for each use case. They can be combined in various ways to provide the full observability coverage needed to maintain stable operations in order to become more proactive. 

Our organization uses both site/service reliability for backend and frontend services. Custom monitoring and dashboards that can be dynamic and reused for multiple teams. 

We continue to increase the size of our footprint as we get more and more positive experiences.

What is most valuable?

The APM, RUM, synthetics, and infrastructure host maps have been some of the most popular and commonly used features. 

Overall, the Data UI and the usability of customer features continue to improve. 

The RUM session data and replays are much more convenient and applicable than other tools I have worked with in the past, and by combining multiple capabilities or features together, there is full visibility across the technology stacks and can identify specific bottlenecks or areas for risk and vulnerabilities to be likely to exist. 

Watchdog insights take the work out of the hardest part, helping us identify the issues before our customers.

What needs improvement?

It is very difficult to make the solutions fit perfectly for large organizations, especially in terms of high cardinality objects and multi-tenancy, where the data needs to be rolled up to a summarized level while maintaining its individual data granularity and identifiers. Tagging is imperative. However, the solutions could be improved for these needs in the future.

For how long have I used the solution?

I've used the solution for over four years now.

What do I think about the stability of the solution?

The stability is excellent.

What do I think about the scalability of the solution?

You can work with engineering to make it work for your needs. They are excellent at supporting their customers.

How are customer service and support?

Technical support is excellent.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I previously used New Relic, App Dynamics, Heap, Clicktale, and more. Datadog has incorporated many of the features we were looking for into a one-stop shop.

How was the initial setup?

The initial setup is simple and straightforward.

What about the implementation team?

We had an in-house team working directly with Datadog engineering support and technical enablement.

Which other solutions did I evaluate?

We looked into New Relic, App Dynamics, Heap, Clicktale, and more. Datadog has many of the features we were looking for in one place.

What other advice do I have?

We use all versions of the solution.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer3796153 - PeerSpot reviewer
Software Engineer 2 at Modernizing Medicine
User
Intuitive user interface with good log management and a helpful Log Explorer feature
Pros and Cons
  • "The ease of use allowed me to get up to speed with log management since it's my first time using Datadog."
  • "Interactive tutorials could be a game changer."

What is our primary use case?

In our fast-paced environment, managing and analyzing log data and performance metrics is crucial. That’s where Datadog comes in. We rely on it not just for monitoring but for deeper insights into our systems, and here’s how we make the most of it. 

One of the first things we appreciate about Datadog is its ability to centralize logs from various sources—think applications, servers, and cloud services. This means we can access everything from one dashboard, which saves us a lot of time and hassle. Instead of digging through multiple platforms, we have all our log data in one place, making it much easier to track events and troubleshoot issues.

How has it helped my organization?

Before Datadog, we faced the common challenge of fragmented data. Our logs, metrics, and traces were spread across different tools and platforms, making it difficult to get a complete picture of our system’s health. 

With Datadog, we now have a centralized monitoring solution that aggregates everything in one place. This has streamlined our workflow immensely. Whether it’s logs from our servers, metrics from our applications, or traces from user transactions, we can access all this information easily. This unified view has made it simpler for our teams to identify and troubleshoot issues quickly.

What is most valuable?

In my experience with Datadog, one feature stands out above the rest is the Log Explorer. It has completely transformed the way I interact with our log data and has become an essential part of my daily workflow. 

The user interface is incredibly intuitive. When I first started using it, I was amazed at how easy it was to navigate. The design is clean and straightforward, allowing me to focus on the data rather than getting lost in complicated menus. Whether I’m searching for specific log entries or filtering by certain criteria, everything feels seamless. 

This ease of use allowed me to get up to speed with log management since it's my first time using Datadog.

What needs improvement?

Interactive tutorials could be a game changer. Instead of just reading about how to use query filters, users could engage with step-by-step guides that walk them through the process. For example, a tutorial could start with a simple query and gradually introduce more complex filtering techniques, allowing users to practice along the way. These tutorials could include pop-up tips and hints that provide additional context or best practices as users work through examples. This hands-on approach not only reinforces learning but also builds confidence in using the tool.

For how long have I used the solution?

My company has recently made Datadog available to it's software engineers and I personally have been using it for almost a year now.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Ajay Thomas - PeerSpot reviewer
Engineering Manager at Dbt labs
Vendor
Great features and synthetic testing but pricing can get expensive
Pros and Cons
  • "We have been impressed with the uptime and clean and light resource usage of the agents."
  • "I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box."

What is our primary use case?

Our primary use case is custom and vendor supplied web application log aggregation, performance tracing and alerting. We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. 

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through use of Datadog across all of our apps we were able to consolidate a number of alerting and error tracking apps and Datadog ties them all together in cohesive dashboards. 

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting edge .NET Core with streaming logs all work. 

The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. The centralized pipeline tracking and error logging provides a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly. 

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders. 

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution is very scalable, very customizable.

How are customer service and support?

Service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The setup was generally simple. However, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house. 

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling. 

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

I'm excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Architect at SEI Investments
Real User
Great support with a helpful APM and profiler
Pros and Cons
  • "The most valuable aspects of the product include the APM and profiler."
  • "I find the training great. That said, it is set for the LCD (lowest common denominator). Of course, this is very helpful to sell the product, yet, to really utilize the product, you need to get more detailed."

What is our primary use case?

We primarily use Datadog for:

  • Native memory
  • Logging
  • APM
  • Context switching
  • RUM
  • Synthetic
  • Databases
  • Java
  • JVM settings
  • File i/o
  • Socket i/o
  • Linux
  • Kubernetes
  • Kafka
  • Pods
  • Sizing

We are testing Datadog as a way to reduce our operational time to fix things (mean time to repair). This is step one. We hope to use Datadog as a way to be proactive instead of reactive (mean time to failure).

So far, Datadog has shown very good options to work on all of our operational and development issues. We are also trying to use Datadog to shift left, and fix things before they break (MTTF increase).

How has it helped my organization?

We are currently in a POC and do not own Datadog at the moment. 

So far, there have been a few issues due to security. There are two main security issues. 

The first is moving data off-prem. This has been resolved to a point (filtering logs, etc). However, there is still an issue with moving a JFR as a JFR potentially contains data that is not allowed off-prem.

The second security issue is more internal, however, the main installation requires root access or using an ACL. Our company does not use ACLs on our Linux platform. This is problematic since the install sets a no-login on the Datadog user.

What is most valuable?

The most valuable aspects of the product include the APM and profiler.

These two have given us insights into things that are very difficult to track down given the standard OS (Linux) tools. 

The native memory tracking is super difficult to see exactly where it comes from. I attended a course (continuous profiling), and it showed me the potentially very important capabilities.

If you add these details to a standard dashboard, or a sub-dashboard for techy people, or even just a notebook, it would be easy to identify issues before they occur.

Combining these details with the basic tools (infra, logging, APM, and good rules), Datadog can easily show the details that a true engineer would need. It isn't just for monitoring, however, I see the value in it for engineers.

What needs improvement?

I have done every training offered (and in a short period of time: two days for 20 courses).

I find the training great. That said, it is set for the LCD (lowest common denominator). Of course, this is very helpful to sell the product, yet, to really utilize the product, you need to get more detailed.

If I did the training as it is written and I cut/paste a bunch of stuff and see the cut/paste work, I didn't really learn anything. Later sessions (I quit using the editor and switched to VI) stopped cutting and pasting, and learned much more.

For how long have I used the solution?

I've used the solution for one month.

What do I think about the stability of the solution?

I' give stability a thumbs up.

What do I think about the scalability of the solution?

We are not sure yet in terms of scalability. The off-prem solution seems to scale well (although had issues with the training slowing down).

How are customer service and support?

Technical support is great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I previously used Dynatrace and Elastic. We didn't switch. We are in a POC.

How was the initial setup?

The initial setup is simple yet complex. There are too many teams are needed.

What about the implementation team?

We did the initial setup in-house.

What was our ROI?

In terms of ROI, the labor saving is probably the biggest. The NPR is probably second - although management would probably reverse these.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing is fairly complicated. A GB for .1 sounds great, however, once you put all 16 or so prices together, it adds up fast. A cost model sheet on the main site would be very helpful.

Which other solutions did I evaluate?

We are currently in a POC.

What other advice do I have?

We work with all product versions.

Which deployment model are you using for this solution?

On-premises

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
JulianLewis - PeerSpot reviewer
Senior Engineer at a educational organization with 5,001-10,000 employees
Real User
I like the amount of tooling and the number of solutions they sold with their monitoring.
Pros and Cons
  • "I like the amount of tooling and the number of solutions they sold with their monitoring. Datadog was highly intuitive to use."
  • "Datadog needs more local Asia-Pacific support, and if they don't have a SaaS solution in Asia-Pacific, they should offer an on-prem version. I'm told that's not possible."

What is our primary use case?

Datadog is a SaaS solution we tried for URL and synthetic monitoring. You record a transaction going into a website and replay that transaction from various locations. Datadog is mainly used by the admin, but three or four other guys had access to the reports and notifications, so it's five altogether.  

We probably tried no more than 8 percent of what Datadog can do. There are so many other bits and modules. I've only gone into about half of what APM can do in the Datadog stack.

How has it helped my organization?

We could detect outages on particular websites or problems in specific locations. If I had paid for the full solution, I'm sure I could get a lot of value out of Datadog.

What is most valuable?

I like the amount of tooling and the number of solutions they sold with their monitoring. Datadog was highly intuitive to use. 

What needs improvement?

Datadog needs more local Asia-Pacific support, and if they don't have a SaaS solution in Asia-Pacific, they should offer an on-prem version. I'm told that's not possible. 

For how long have I used the solution?

I have used Datadog for about two or three years.

What do I think about the scalability of the solution?

I was only using Datadog to monitor on a small scale. 

How are customer service and support?

I'd rate Datadog support four out of 10. It was primarily an issue with support in the Asia-Pacific region. I sent them several emails, and they responded around three weeks later. 

They said it went around the houses. Nobody knew who to respond to. That's not good enough. They should have at least told me they'd received the email. I used to work in support.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We were just trying Datadog, and we've switched temporarily to Site24x7. We're looking for one of the bigger ones. They've all given us proposals, whereas Datadog hasn't come forward with a proposal for what they could do.

I used Datadog because I already had a relationship with them at a previous company. However, that guy's moved on now, and I wanted to see how good they were. 

How was the initial setup?

Setting up Datadog is pretty straightforward. I have a lot of experience doing that sort of thing. It took maybe a day and a half to deploy because I was picking externally facing websites.

I deployed it by myself. One person is enough for the small system we had. However, if we were moving forward, I'd recommend at least two or three people to manage it. 

What's my experience with pricing, setup cost, and licensing?

Datadog would've cost around $850 a month based on the loads we were doing, and you could estimate roughly what you would be paying monthly. I liked their pricing model. It was flexible, so you only paid for what you used. I rate Datadog pricing eight out of 10. 

Which other solutions did I evaluate?

We looked at several URL and APM monitoring solutions like Site24x7 and Pingdom. They weren't big players like Dynatrace or any of the those that had already provided us a request for information. 

What other advice do I have?

Even with our negative experiences, I'd still give Datadog an eight out of 10. Datadog is a complete solution with easy-to-use templates and excellent scalability. People should know exactly what they're going to configure before they try it out. The trial is brief. Don't start a trial until you know exactly what you're going to do. 

You must be certain that you can meet any internal security requirements. If you're in the Asia-Pacific region, you might not be able to run something that's running abroad.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.