Try our new research platform with insights from 80,000+ expert users
Lead Software Engineer at a retailer with 51-200 employees
Real User
Great APM and interesting log management but the UI is daunting
Pros and Cons
  • "The most useful feature is the APM."
  • "As a new customer, the Datadog user interface is a bit daunting."

What is our primary use case?

We are trying to get a handle on observability. Currently, the overall health of the stack is very anecdotal. Users are reporting issues, and Kubernetes pods are going down. We need to be more scientific and be able to catch problems early and fix them faster.

Given the fact that we are a new company, our user base is relatively small, yet growing very fast. We need to predict usage growth better and identify problem implementations that could cause a bottleneck. Our relatively small size has allowed us to be somewhat complacent with performance monitoring. However, we need to have that visibility.

How has it helped my organization?

We are still taking baby steps with Datadog. Hence, it's hard to come up with quantifiable information. The most immediate benefit is aggregating performance metrics together with log information. Having a better understanding of observability will help my team focus on the business problems they are trying solve and write code that is conducive to being monitored, instead of reinventing the wheel and relying on their own logic to produce metrics that are out of context

What is most valuable?

The most useful feature is the APM. Being able to quickly view which requests are time-consuming, and which calls have failed is invaluable. Being able to click on a UI and be pointed to the exact source of the problem is like magic. 

I'm also very intrigued by log management, although I haven't had quite a chance to use it very effectively. In particular, the trace and span IDs don't quite seem to work for me. However, I'm very keen on getting this to work. This will also help my developers to be more diligent and considerate when creating log data.

What needs improvement?

As a new customer, the Datadog user interface is a bit daunting. It gets easier once one has had a chance to get acquainted with it, yet at first, it is somewhat overwhelming. Maybe having a "lite" interface with basic features would make it easier to climb the learning curve.

Maybe the feature already exists. However, I'm not sure how to keep dashboard designs and synthetic tests in source control. For example, we may replace a UI feature, and rebuild a test accordingly in a pre-production environment, yet once the code is promoted to production, the updated test would also need to be promoted.

Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.

For how long have I used the solution?

We have just started using the solution and have only used it for about two months.

What do I think about the stability of the solution?

We're new at this. That said, so far, there haven't been any issues to report.

What do I think about the scalability of the solution?

I have not had the opportunity to evaluate the scalability.

How are customer service and support?

Customer support is full of great folks! We're beginning our Datadog journey, so I haven't had that much experience. The little I have had has been great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

This is all new. 

We used to work with New Relic. New Relic has an amazing APM solution. However, it also became cost-prohibitive

How was the initial setup?

Since we are relatively greenfield, it was relatively painless to set up the product. 

What about the implementation team?

Our in-house DevOps team did the implementation.

What was our ROI?

I don't know what the ROI is at this stage.

What's my experience with pricing, setup cost, and licensing?

I'm not sure what the exact pricing is. 

What other advice do I have?

So far, it's been great!

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Engineering Manager at Indeed.com
User
Transparent, easy to use, and integrates well with Slack
Pros and Cons
  • "Datadog's seamless integration with Slack and PagerDuty helped us to receive alerts right to the most common notification methods we use (our mobile devices and Slack)."
  • "I would like better navigability across pages."

What is our primary use case?

I primarily use the solution to learn, watch and monitor business and engineering metrics in the production and QA environments of my team. 

We create monitors on key business metrics and observe regressions and anomalies.

Less often, I leverage the events ability in Datadog to get notified about significant activities happening in my teams' deployments.

We learn about Datadog monitor alerts through Slack and often attempt to create SLOs using Terraform.

We use APM for observability.

Most recently, I learned about WatchDog Alerts that I will be heavily looking into.

How has it helped my organization?

Datadog simplified my ability to watch easily and add monitors on any metric emitted by any team at my organization.

Datadog APM immensely improved our ability to understand the reasons behind production issues. Its ability to navigate across services seamlessly to understand the time spent at each critical stage of a production request is helpful. This, combined with Datadog's historical ability to show business metrics aside, helped get more powerful insights much more quickly.

Datadog's seamless integration with Slack and PagerDuty helped us to receive alerts right to the most common notification methods we use (our mobile devices and Slack).

What is most valuable?

The most valuable aspects include:

  • The ability to monitor any team's metric in my company (transparency)
  • The ability to create/clone dashboards for myself (ease of use)
  • Its integration with Slack (it is very powerful)
  • The ability to add monitors on any metric emitted by any team at my organization
  • (Through Datadog APM) the ability to understand the reasons behind production issues. Its ability to navigate across services seamlessly in order to understand the time spent at each critical stage of a production request is key. This, combined with Datadog's historical ability to show business metrics aside, helped me get more powerful insights much more quickly.
  • (Through integrations like Slack and PagerDuty) the ability to receive alerts right to the most common notification method we use (our mobile devices and Slack), which saves a lot of time and helps us maintain focus. 

What needs improvement?

I would like better navigability across pages. The UI/UX is powerful, yet less intuitive. A lot of times, I somehow navigate across buttons and pages, and I end up forgetting how to get back to a particular view that was more insightful. 

Particularly as Datadog starts offering more platform capabilities like APM, Watchdog, Shift left initiatives like instrumentation, continuous testing, intelligent test runner, and Synthetic and real user monitoring, the UI can become more and more clunky, giving users a very frustrating experience. 

For how long have I used the solution?

I've used the solution for five to six years.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Sr Platform Engineer at a pharma/biotech company with 11-50 employees
Real User
Good logging with lots of great integrations and an interesting dashboard
Pros and Cons
  • "Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate."
  • "Some of the interface is still confusing to use."

What is our primary use case?

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

How has it helped my organization?

It's good to have a single location for all the logs. If you have logs coming from a whole lot of sources, it makes it hard to find where the problem lies. 

We had to spend a lot of time logging into various systems and pursuing a billion different log files looking for something that stands out as a possible cause of the issue. That can take a lot of time and doesn't give much visibility into the possible interactions between systems.

What is most valuable?

Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate. 

It has a lot of ability to make fancy and deep searches using regular expressions and to graph them into useful and interesting dashboard graphs. 

The plethora of built-in/downloadable integrations make it much easier to set up for our platforms. Otherwise, we'd have to parse the log files ourselves, which would take a great deal of effort. Had to do it before when had to use an ELK stack for logging, which was painful.

What needs improvement?

Some of the interface is still confusing to use. It has many features, and it takes a lot of effort to figure out what they all mean. Maybe having tooltips or something would be helpful. Also, some of the integrations are better than others.

For how long have I used the solution?

I've used the solution for a month.

What do I think about the stability of the solution?

The solution seems very stable.

Which solution did I use previously and why did I switch?

Have used an ELK stack before. However, it took a lot of effort to maintain, and parsing the logs was difficult.

How was the initial setup?

We implemented the solution in-house.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Ian Schell - PeerSpot reviewer
Senior Site Reliability Architect at a tech vendor with 1,001-5,000 employees
Real User
Reduces debugging time, with good distributed tracing and useful RUM
Pros and Cons
  • "We have hundreds of microservices, and knowing how top-level requests weave throughout all of them is invaluable."
  • "There is occasional UI slowness and bugs."

What is our primary use case?

We use Datadog for general observability into our infrastructure, as well as running analytics queries for our SLI/SLO platform. This helps all of our teams be informed of how well their products are actually performing in production, and aim their efforts at the thing that will provide the highest ROI. 

We also use it for general monitoring and alerting during load tests and service releases to detect any issues related to the deployments. This helps us maintain our high contractual uptime promises to our clients.

How has it helped my organization?

It has drastically reduced the amount of time we spend on debugging issues and tracking down the root causes of incidents. What might have taken days or hours with separate vendors in the past (or even single vendors with terrible UI) is now quick and easy. 

We've often gone from detecting an incident to identifying the needed fix within ten minutes or less and covered multiple domains like APM, Logs, Database performance monitoring, etc., in just a few clicks. This is extremely powerful.

What is most valuable?

Distributed tracing is the most valuable feature. We have hundreds of microservices, and knowing how top-level requests weave throughout all of them is invaluable. 

At one glance, we can clearly see which service is slow and then switch over to the infrastructure view or container view to debug why the slowness is happening. This is true of all their other integrated products as well; the more you add, the more insights you get when looking at traces.

We also use RUM extensively. This helps us cover the last mile of application performance. Without it, we wouldn't know if our browser applications were functioning slowly for our users.

What needs improvement?

There is occasional UI slowness and bugs. While the Datadog UI is generally miles above its competitors, there are a few cases where it falls short or has started to slow down over time. They also occasionally make poor UI redesign choices. They should continue focusing on this area to maintain the high standard they started out with.

For how long have I used the solution?

I've used the solution for five years.

What do I think about the stability of the solution?

We've never had major stability issues.

What do I think about the scalability of the solution?

Scalability has never been an issue, although there is occasionally UI slowness.

How are customer service and support?

Support via tickets is absolutely terrible. It's the one obvious bad spot for Datadog. If we didn't have direct relationships with many of their product managers, our experience would be much worse.

How would you rate customer service and support?

Negative

Which solution did I use previously and why did I switch?

We previously used New Relic. It had a terrible UI and the integration between products was not great. Datadog is miles ahead of them and is continuing to increase that distance.

How was the initial setup?

The initial setup is straightforward, and the docs are done well.

What about the implementation team?

We managed the implementation in-house.

What was our ROI?

Our ROI is high.

What's my experience with pricing, setup cost, and licensing?

I'd advise users to negotiate rates. Datadog's off-the-shelf rates are pretty high.

Which other solutions did I evaluate?

We have only used and looked into New Relic.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
IT Test Manager at a transportation company with 10,001+ employees
Real User
Very good documentation provided along with regular new features
Pros and Cons
  • "Datadog is constantly adding new features."
  • "Lacks some flexibility in the customization."

What is our primary use case?

Our primary use case is log management and we also use the solution for monitoring the application and underlying infrastructure. I'm an IT test manager. 

What is most valuable?

I appreciate that they are constantly adding new features, some of which we haven't yet had a chance to implement. 

What needs improvement?

I'd like to see more flexibility in the customization and they have a few settings which need to be changed but we are unable to make those changes as users or as the administrator. The tagging to get the different parts of the monitoring interconnected is a bit tricky and takes time to work out. 

For how long have I used the solution?

I've been using this solution for 18 months. 

What do I think about the stability of the solution?

The stability is good. 

What do I think about the scalability of the solution?

I would say that the amount that we are monitoring is not that large and we've never had any scalability issues. We have around 50 users in our department. 

How are customer service and support?

The availability or accessibility to customer service is not always good, although they generally provide solutions once you do manage to get hold of them. 

Which solution did I use previously and why did I switch?

We have previously used different tools for different parts of the monitoring. We changed to AWS when we moved to the cloud. We also found that the effort in maintaining Grafana and Prometheus and keeping it up to date was taking too much time.

How was the initial setup?

The initial setup was straightforward, we used a service provider and they also maintain our operation in general.

What's my experience with pricing, setup cost, and licensing?

We have a four-year contract with Datadog, and the solution is pay-as-you-use. 

What other advice do I have?

I would suggest using the documentation, which is quite good. It's best to start with existing integrations, and then do the customization step-by-step.

I rate this solution eight out of 10. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1479957 - PeerSpot reviewer
Senior Director of DevOps at Housecall Pro
Real User
Good graphing and dashboards, and it improves visibility for developers
Pros and Cons
  • "Having a wealth of information has helped us investigate outages, and having historical data helps us tune our system."
  • "Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion."

What is our primary use case?

We primarily use Datadog for the monitoring of EC2 and ECS containers running mostly Rails applications that host a SaaS product. We also monitor ElasticSearch and RDS, and we are working on adding their Application Performance Monitoring solution to monitor our applications directly.

We use DataDog to create dashboards, graphs, and alerts based on interesting metrics. DataDog is our first place to look to find the performance of our system.

We also use their logging platform and it works well. Especially useful is that the logs and metrics are tightly integrated so you can jump between them easily.

How has it helped my organization?

Developers are able to see how code is running in production, where this was mostly opaque previous to us implementing DataDog. We are able to emit custom metrics that are specific to our business, and the built-in metrics have also proven useful. Having a wealth of information has helped us investigate outages, and having historical data helps us tune our system.

DevOps engineers are able to put sensors around our system to proactively detect problems, whereas before, our engineers heard about problems from customers. Logs are easier to find for developers.

What is most valuable?

Metric graphing and Dashboards are the most valuable features because they give us good observability into our system and work well to alert us when interesting things happen. We use this functionality daily.

We value the monitoring capability since it allows us to be pushed alerts, rather than have to observe graphs continually. The integrations with Slack and PagerDuty enable us to be interrupted appropriately and keep a running tab on the system without bothering us unnecessarily.

The online process monitoring has been extremely helpful, as it gives engineers the ability to see the live status of all the processes running our systems without them having to log in.

What needs improvement?

Their logging solution is expensive for our use case. They do have the capability to rehydrate old or incomplete logs, and it works, but I would rather not have to think about that operation.

Datadog has a lot of documentation, but a lot of that documentation assumes you know how the service works, which can lead to confusion. Positive note is that they do have lots of documentation, it just needs better curation.

Their APM solution still needs some work, but they are actively developing it. I would also like to see more database-specific application monitoring.

For how long have I used the solution?

I have been using Datadog for five years across two companies.

What do I think about the stability of the solution?

Any issues are addressed and communicated very quickly. I have not had any issues with uptime.

What do I think about the scalability of the solution?

If you do not need 100% of data such as logs, APM traces, etc., this scales well. It does not scale as well if you want 100% of your logs indexed. You should understand any other usage-based bills before using any part of their service as it is very easy to run up a large bill.

The performance of the system scales very well, and host monitoring and APM are relatively cheap.

How are customer service and technical support?

Account support is excellent.

Customer support is good if you get them to go beyond pointing out the right documentation.

Which solution did I use previously and why did I switch?

Previously, I used homebuilt solutions with Nagios and Cacti but found that there was far too much work to understand them and keep them up and fed compared to the value that I got. They also did not integrate well with existing data sources without a lot of effort.

I also previously used StackDriver and found it too opinionated. I like that DataDog gives you tools to work with certain types of data and make your own graphs, monitors, etc., whereas, with StackDriver, I felt like there were a limited number of ways you could accomplish goals.

How was the initial setup?

The basic setup is easy. A more advanced setup can be tricky because the documentation assumes you know how the system works already. Support is somewhat helpful, but mostly points out the documentation you should already have found.

What about the implementation team?

We implemented in-house.

What's my experience with pricing, setup cost, and licensing?

My advice is to understand what number of hosts and data you want to commit to. Beware that usage-based billing is both a blessing and a curse. It is easy to run up a large bill, so become familiar with the cost of each piece of your bill and use the metrics they supply to estimate and monitor your bill.

I have had good luck with their support team helping us to figure out the correct commit levels. Their account support is excellent in this regard. I have heard their sales team can be aggressive, but I have not experienced it personally.

Which other solutions did I evaluate?

I originally chose Datadog because of my previous experience. We recently considered moving over to New Relic because we liked their APM solution better. However, the pricing of New Relic and our familiarity with Datadog won over. New Relic is a good product but it didn't fit our overall needs as well as Datadog.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Site Reliability Engineer at a computer software company with 201-500 employees
Real User
They have a good ecosystem for their integrations
Pros and Cons
  • "Their interface is probably one of the easiest things to use because it lets non-developers and non-engineers quickly get access to metrics and pull business value out of them. We could put together dashboards and give it to people who are non-technical, then they can see the state of the world."
  • "We have been able to set very specific CPU and memory alerts, at the very base level, then we started to pull real business value, like 99th percentile response rates for our API calls."
  • "It has turned into an operational dashboard. If you felt something is going wrong, you can immediately open up Datadog. It has been our go to application because we know the answer will be there."
  • "The way data is represented can be limiting. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two."
  • "When I started using it years ago, it had stability problems. I remember, specifically, we ran everything in Docker containers. There were some problems getting it into a Docker container with very specific memory limits."

What is our primary use case?

We use it for custom metrics of our applications and monitoring of our systems.

How has it helped my organization?

My current company didn't have very good monitoring in the past. We had been using basic CPU monitoring. We have been able to set very specific CPU and memory alerts, at the very base level, then we started to pull real business value, like 99th percentile response rates for our API calls. 

It has turned into an operational dashboard. If you felt something is going wrong, you can immediately open up Datadog. It has been our go to application because we know the answer will be there.

What is most valuable?

Their interface is probably one of the easiest things to use because it lets non-developers and non-engineers quickly get access to metrics and pull business value out of them. We could put together dashboards and give it to people who are non-technical, then they can see the state of the world. 

They have a very good ecosystem for their integrations. They have a lot of different integrations, and we use a lot of them. We have integrations with Amazon for ECS, RDS, and all of the subsystems of Amazon. We also have Docker and Splunk integrations. The integrations are great because they're definitely vetted and not third-party integrations. They're part of the Datadog ecosystem and seamless.

What needs improvement?

The way data is represented can be limiting. They have added their own little query language that you can use to manipulate things, so you can graph and relate two different metrics together. This is relatively new this year. When I first tried it out a long time ago, you could graph a metric and another metric, and they'd overlay, but you couldn't take the ratio between the two. However, it looks like this is the direction that they're going, and that's a good direction. I think they should continue adding things that way.

I like being able to put the formulas in myself. I don't want the average. I want a rolling average over three minutes, not five minutes. They're getting better at letting the user customize this.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

When I started using it years ago, it had stability problems. I remember, specifically, we ran everything in Docker containers. There were some problems getting it into a Docker container with very specific memory limits. We couldn't nail down exactly what the limits and the application needed. Once we did that, we were good. However, it was tricky to get the limit in the first place.

What do I think about the scalability of the solution?

It has always scaled for us. Cost scales up too, but that is not necessarily a bad thing. It's reasonable for what they're providing. I haven't had any concerns about scaling.

We use between a 100 to 500 servers at any given point in time.

How is customer service and technical support?

For the most part, the technical support is pretty good. Every now and again, you will get stuck with a support rep who could have better training, but in general, they are very good and responsive. They're willing to talk about new features, etc.

How was the initial setup?

The integration and configuration processes have been very smooth because everything is very well-documented. The documentation is phenomenal. 

What was our ROI?

We can see trends a lot easier than if we didn't have the solution. The management can see the changes which are being made, whether it being performance or in the number of hosts that went down. We recently made internal improvements to some of our internal APIs, so we reduced the number of servers that we needed. So, you could see that the load on the system went down and the number of servers went down. Thus, it was easy to visualize.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing are reasonable for what they give you. You get the first five hosts free, which is fun to play around with. Then it's about four dollars a month per host, which is very affordable for what you get out of it. We have a lot of hosts that we put a lot of custom metrics into, and every host gives you an allowance for the number of custom metrics. We have not had a problem with it.

Which other solutions did I evaluate?

My company now is pretty good at looking at alternatives. Also, I evaluated alternative solutions at my last company. 

There are some other competitors. For example, I know one of them started doing metrics and their licensing is very cheap because the metric size is very small and it's per megabyte. They charge you per storage, and it's very small. However, the interface and integrations aren't there. and there are some other competitors, 

The other thing is granularity. Datadog gives you one second granularity for a year. Whereas, some of the competitors would roll up, so after about a week you don't have one second, you have five seconds. Then, after a month, you don't have five seconds, you have a minute. So, you start to lose the granularity, whether it be that it averages it or maxes it, you start to lose the ability to see incidents historically, which is super valuable. If we have an incident, which we think we've seen this before, and want to look back historically, we can zoom right in and see in the database where it peaked.

What other advice do I have?

Give Datadog a try. It's the leader in this space. 

I have only used the AWS version of the product.

They have a thing for the color purple, but it is all good.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Real User
Top 20
Great dashboards, good monitoring, and easy SLAs
Pros and Cons
  • "Profiling has been made easier."
  • "Lately, chat support has a longer waiting time."

What is our primary use case?

Our primary use case would be using the dashboards and getting proper insights based on the dashboards.

The monitoring, SLO, and SLA have been better and easier since we started using the Terraform infrastructure. APM has been easier as we had to enable it through the CronJob directly.

Profiling has been made easier. We are able to get many insights into the code. Profiling provides really good insights right now. 

Logs are the most valuable and the best solution so far. Datadog can help solve any slow queries or database-related errors. 

The primary use case would be using the dashboards and getting proper insights based on the dashboards.

How has it helped my organization?

Monitoring has been better and easier since we started using the Terraform infrastructure.

APM has been easier as we had to enable it through the CronJob directly.

Profiling has made it easier in terms of getting many insights into the code.

The logs are the most valuable and the best solution. Datadog can help us to solve any slow queries or database-related errors.

What is most valuable?

Profiling provides really good insights, and APM has really good tracing visibility. 

The SLA and SLO definitions and the monitoring are also really important and very valuable parts of the product and make great Datadog features. 

Datadog support is also really valuable as they provide support for the product through the chat as well. 

The Datadog premium support has helped us to provide faster outcomes for a problem. 

Also, rather than having an email thread, it would be better to get the support on call and sort out the issue, which is the support we get from Datadog CSM.

What needs improvement?

Integration should have been easier. It is very tough to go to all the services and enable Datadog integration for each AWS service. 

We can add the AWS services and the services on one page and show only the services that are enabled. A similar approach should be for any other integration.

Lately, chat support has a longer waiting time. We would love to get faster chat support. We also need additional support for sending the flare files

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.