Try our new research platform with insights from 80,000+ expert users
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees
Real User
Top 20
Good alerts and monitoring with a relatively simple setup
Pros and Cons
  • "The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast."
  • "Managing dashboards as IaC is a bit hard to work out at times."

What is our primary use case?

Datadog provides us with a solution for data ingesting for all of our application metrics, resource metrics, APM/tracing data etc. 

We use it for use in dashboards, monitoring/alerting, SLO targets, incident response etc. 

We have a lot of applications across multiple languages/frameworks etc., and have deployed in Kubernetes across multiple regions in AWS, along with underlying managed resources such as SQS, Aurora, etc. 

Datadog makes understanding the state of these seamless. We are a company with millions of daily active users, and this level of detail is excellent.

How has it helped my organization?

Datadog has allowed us to rapidly spin up alerting and monitoring that helps our incident responders get alerted quickly when our SLOs are in danger and helps to quickly resolve issues. 

It is the single most important tool we have from an SRE perspective. 

It also provides us with an easy way to get information at a glance for all of our services through APM and create unified dashboards that track our underlying resources, such as databases, queues, etc., alongside application data. 

It has been invaluable to our organization.

What is most valuable?

The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast. 

Management of resources using infrastructure-as-code has been a recent game-changer for us. Combining the two has allowed us to provide product teams with a total solution for getting their applications attached to user-focused alerting and monitoring within a matter of days rather than months - and has clearly impacted our ability to discover and respond to significant production incidents.

What needs improvement?

Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us. 

There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.

Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.

For how long have I used the solution?

I've been using the solution for over five years.

What do I think about the stability of the solution?

I have never seen a major outage that prevented us from using Datadog, although I can't speak for other teams/time zones

What do I think about the scalability of the solution?

This product is massively scalable - I haven't seen any issues as we continue to onboard new technologies and teams

How are customer service and support?

Datadog provides us with a number of direct lines to support, although I haven't personally required their assistance.

Which solution did I use previously and why did I switch?

We previously used LightStep for APM and switched to Datadog to unify all of our application data.

How was the initial setup?

Most elements are quite simple to set up. However, some types of data collection require organization-wide engineering buy-in.

What about the implementation team?

We handled the initial setup in-house.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Engineering Manager,Mobile Wireless Engineering at a comms service provider with 10,001+ employees
Real User
Top 20
Efficient and helps with integration and creating queries
Pros and Cons
  • "Datadog is providing efficiency in the products we develop for the wireless device engineering department."
  • "We need more integration functionality, including certain metrics integration."

What is our primary use case?

The product is primarily used for the DevOps team. 

How has it helped my organization?

It has helped us build pipelines for ops review and other functions.

What is most valuable?

Datadog is providing efficiency in the products we develop for the wireless device engineering department. We had to provide more developer integration tools and also needed to help in creating easy queries that would help in creating efficient toolsets for management to make decisions based on these metrics.

What needs improvement?

We need more integration functionality, including certain metrics integration. We should be able to monitor devs and need it to build more monitoring tools and offer leadership metrics.

For how long have I used the solution?

I've used the solution for almost six months.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Cloud Specialyst at a financial services firm with 501-1,000 employees
Real User
Centralized with good observability and many modules
Pros and Cons
  • "The most valuable aspect is for us to have everything in one place."
  • "We need a lot of modules since we collect all data logs from all operating systems."

What is our primary use case?

We collect all data logs from all operating systems, such as Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers. 

Now we are starting a POC to analyze the APM module. In the feature, the next step is to do a POC of security modules. 

The final idea is to have a unique portal for observability. This will make it easy to troubleshoot and for layer levels 1 and 2. 

How has it helped my organization?

We are looking into a lot of modules. We collect all data logs from all operating systems, including Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers. 

We're developing POCs for APM and security modules. We'll also have a unique portal for observability. This will make it easy to troubleshoot. 

The most valuable aspect is for us to have everything in one place.

What is most valuable?

We're investigating many modules. We collect all data logs from all operating systems (Windows, Linux, VMware, and bare metal data centers). We also automatize the installation of the agent on servers. 

We're doing POCs in APM and security. 

Soon, we'll have a unique portal for observability. This will make troubleshooting easy at levels 1 and 2. 

The most valuable aspect for us is to have everything in the same place.

What needs improvement?

We need a lot of modules since we collect all data logs from all operating systems. 

The most important module for us is log management. The second is the security module. The third one is the APM.

For how long have I used the solution?

We've used the solution for one year.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Test Engineer at a tech services company with 1,001-5,000 employees
Real User
Great for monitoring, helps with internal communication and offers improved visibility
Pros and Cons
  • "We've been able to glean from the monitors what servers are down, and can alert the team in Slack."
  • "The more tools that they can build that allow you to run AWX playbooks, or other similar fixes, would benefit clients greatly."

What is our primary use case?

We're moving towards the cloud yet still have several active data center contracts. As we move to the cloud, we are interested in knowing more about our services, and DataDog APM/logs should give us this perspective. 

We currently use the infrastructure monitoring part of DataDog. Still, I've really seen the advantage of moving more data into the cloud for comparison and being able to have one place where we can view all related pieces of information regarding a possible incident or potential issue.

How has it helped my organization?

We've been able to glean from the monitors what servers are down, and can alert the team in Slack. Knowing what we need to do next is what we would like to move to, so seeing the power of Notebooks is key. We also have several other services that we are underutilizing (logging, error tracking, etc.) that would be better housed in DataDog since it gives us more visibility into linking all of the things together into one cohesive picture.

What is most valuable?

Monitoring has been invaluable, and as we start to look to other products, bringing in logs and APM traces will create a full picture of what we need to do to resolve incidents. We're also interested in our users' perspectives and when things are slowing down for them.

Additionally, we are interested in the workflows announced today as an actionable decision tree that will allow our teams to see what is wrong and know what to do based on the connected runbook/notebook. I feel that this is the final piece that will make DataDog so much more valuable than its competition.

What needs improvement?

I have talked to vendors that mention that DataDog is drawing attention to an issue. However, you'll still need to take action on your own. The more tools that they can build that allow you to run AWX playbooks, or other similar fixes, would benefit clients greatly.

For how long have I used the solution?

We've used the solution for two years.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
PeerSpot user
Senior Manager, Cyber Digital Transformation at a security firm with 1,001-5,000 employees
Real User
Insightful and easy to use solution that makes application performance monitoring easy
Pros and Cons
  • "The infrastructure monitoring capabilities are really valuable. You can just log on and see everything that is happening within an IT environment."
  • "Their security features could be improved. We looked at their Security Monitoring feature but it was early in its development. Datadog are just getting into the security space so I'm sure this will improve in the future."

What is our primary use case?

We have used this solution primarily for application performance monitoring. To do this, we needed to make sure we had the right data in the system so that people could be able to monitor their applications end-to-end.

What is most valuable?

The infrastructure monitoring capabilities are really valuable. You can just log on and see everything that is happening within an IT environment.

What needs improvement?

Their security features could be improved. We looked at their Security Monitoring feature but it was early in its development. Datadog are just getting into the security space so I'm sure this will improve in the future.

What do I think about the stability of the solution?

This is a stable solution. 

What do I think about the scalability of the solution?

This is a scalable solution. 

What other advice do I have?

To get started with this solution, I would recommend front-loading it with some sort of data or process and filter out to view only the information you need.

I would rate this solution a nine out of ten. 

Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller
PeerSpot user
DevOps Engineer at a printing company with 51-200 employees
Real User
Great visibility, good logs, and a helpful dashboard
Pros and Cons
  • "For us to have visibility into our app stack and the hardware we run has been highly beneficial."
  • "I want to applaud the efforts in making the UI extremely usable and approachable. My suggestion would be to take another look at how the menu structure is put together, however. Even after using the platform mostly every day for months, I still find myself trying to find a service or feature in the menus."

What is our primary use case?

Log aggregation for us was a key component since we have a fairly old-school app running on VMs on bare metal. We previously didn't have much insight into our logs unless we manually tunneled them into each server.

The solution is reducing manual labor in troubleshooting problems in our environments server by server.

We also needed to monitor our Java app and MySQL database to understand their problems so that we could take action and resolve them.

Our use cases have since expanded to encompass all aspects of monitoring.

How has it helped my organization?

Before Datadog, all we had to go on was the gut reaction of the old guard on our team. While useful, the reactions and inherent knowledge only benefited a few folks.

Datadog has allowed us to create comprehensive dashboards and proactively send out alerts. We used the knowledge of people very versed with our products to help set up the platform and have since benefited from that.

The operative word here is visibility, and we've seen a huge improvement in that.

What is most valuable?

Seeing log trends and patterns and aggregate search was a huge first step for us. We then began using other features of the Datadog platform by enabling APM. After that, we did other integrations.

For us to have visibility into our app stack and the hardware we run has been highly beneficial.

We leverage APM, log management, and at least ten other integrations. Our DB, web servers, network, storage, and other areas are now monitored and hooked up to dashboards.

Dashboarding has also proven useful when information is going to be viewed by anyone in the organization.

What needs improvement?

Our experience has been overwhelmingly positive so far. That said, there is one area that could benefit from some polish. For example, I want to applaud the efforts in making the UI extremely usable and approachable. My suggestion would be to take another look at how the menu structure is put together, however. Even after using the platform mostly every day for months, I still find myself trying to find a service or feature in the menus.

For how long have I used the solution?

I've used the solution for around six or eight months. We've had the Datadog agents deployed on our various environments.

What do I think about the stability of the solution?

So far, we have not had any issues with stability. It should be very stable and easy to update.

What do I think about the scalability of the solution?

The solution is currently deployed on a limited scale. That said, we see the potential and benefits of deploying this in a cloud scenario.

How are customer service and support?

Customer service and the support teams have been very responsive when we need them. They are very professional.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

This was our first solution in this space.

How was the initial setup?

The initial setup steps with the agent are only confusing when using the config files for the first time. The main file includes a lot that you can specify elsewhere and it's not readily apparent which one to use until you dig in more.

What about the implementation team?

We did an in-house implementation.

What was our ROI?

Our ROI with Datadog has been very high. It's given us the ability to see how we're performing, which we didn't have before.

What's my experience with pricing, setup cost, and licensing?

Ensure you have your ingestion pipelines dialed in, or you'll likely spend more than you were expecting.

Which other solutions did I evaluate?

We evaluated free and open-source options, however, ultimately, we decided that we didn't have the manpower as a small company to maintain them.

What other advice do I have?

There is nothing that the documentation cannot help with; it's very good.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Cloud Engineer at a retailer with 51-200 employees
Real User
Good logs, analytics and dashboards
Pros and Cons
  • "We can handle debugging and find out why things are breaking in our applications."
  • "The documentation leaves a lot to be desired for new users."

What is our primary use case?

I am using the solution for monitoring metrics, logs, traces, etc. It's mainly for making dashboards as well as monitoring our services. 

We also use Datadog to help centralize our incident management to show the logs, where issues spiked, and some metrics. 

We use Datadog to do troubleshooting in Kubernetes, specifically in our Azure Kubernetes service. Beyond that, we are looking to use open telemetry in tandem with Datadog to further our log-tracing efforts. In the future, this may be expanded.

How has it helped my organization?

This solution improves our organization as now we have higher visibility into our application that we otherwise would not have. 

Since the Datadog agent comes in three forms, agentless, scraping, and through the API, it is very flexible. It is this flexibility in how to report our logs that keeps our logs centralized and organized. 

One major drawback of Datadog is the cost. Sometimes we set up flows in place to monitor resources that end up logging more than we thought, and the bill is too high.

What is most valuable?

Dashboards have been marrying the most valuable parts of Datadog. Dashboards use metrics that are very helpful for monitoring services. I recently used metrics to monitor the number of pods in Kubernetes, the spikes in requests in Kubernetes, and overall CPU and memory usage in our Kubernetes clusters. 

We can also use log analytics to further our understanding. We can handle debugging and find out why things are breaking in our applications. 

The log portion of Datadog has robust features to debug the applications we are running. I really appreciate the ability to use facets to par down the logs.

What needs improvement?

The documentation leaves a lot to be desired for new users. The documentation is way too much text and has no real information just to help get people started. Sometimes it doesn't help to read an entire essay just to get a grasp on how the logs or metrics work.

For how long have I used the solution?

I've used the solution for two years.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Production engineer at a consultancy with 51-200 employees
Real User
Offers great flexibility with useful APM traces and logging for debugging
Pros and Cons
  • "The flexibility to create notebooks and dashboards and fully customize them gives us a lot of power to track the exact services and endpoints we are working on."
  • "We need more visibility into the error tracking dashboard."

What is our primary use case?

We have deep integration with Datadog for observability and monitoring.

We use everything from APM, logs, and RUM to monitor and dashboards for tracking system health. 

We are trying to move from many different solutions for error tracking/observability to a single platform (Datadog).

We are currently in the process of setting up logging in Datadog in order to maintain our logs better. We are looking to create more insights into the real user flows by using real user monitoring (RUM) too.

How has it helped my organization?

We use Datadog quite extensively. I primarily work with APM traces and logs to debug issues and unblock myself in my day-to-day role. I have found the traces and spans most useful in providing details about why certain services are performing poorly.

Datadog provides a lot of value in terms of adding monitoring and observability to our app. There are so many different solutions, it is sometimes difficult to gauge where to start, and I sometimes miss a lot of functionality (such as the very useful error-tracking dashboard mentioned in my review above).

What is most valuable?

As I mentioned above, we use Datadog quite extensively. In my day-to-day role, I primarily work with APM traces and logs to debug issues and unblock myself. 

I have found the traces and spans most useful in providing details about why certain services are performing poorly.

Additionally, the flexibility to create notebooks and dashboards and fully customize them gives us a lot of power to track the exact services and endpoints we are working on.

Furthermore, we are also using monitoring to page us if things break, and the Slack integration provides us instantaneous feedback on how things are performing.

What needs improvement?

We need more visibility into the error tracking dashboard. I only learned about it during a demo at Dash Con. That said, it seems to be a very useful tool.

Additionally, we want to export our dashboards and monitors to source control, and there doesn't seem to be any easy way to do so.

For how long have I used the solution?

I've used the solution for four years.

Which solution did I use previously and why did I switch?

For logging, we are moving from LogDNA to Datadog to have access to everything in one place. Also, searching and traversing through logs seems easier in Datadog

Which other solutions did I evaluate?

I did not evaluate others, however, my team probably did.

What other advice do I have?

Datadog provides a lot of value in terms of adding monitoring and observability to our app. There are so many different solutions; it is sometimes difficult to gauge where to start, and I sometimes miss a lot of functionality. For example, the very useful error-tracking dashboard that I just discovered.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.