Try our new research platform with insights from 80,000+ expert users
reviewer2044977 - PeerSpot reviewer
Senior Site Reliability Engineer at a tech vendor with 10,001+ employees
Real User
Good alerts and monitoring with a relatively simple setup
Pros and Cons
  • "The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast."
  • "Managing dashboards as IaC is a bit hard to work out at times."

What is our primary use case?

Datadog provides us with a solution for data ingesting for all of our application metrics, resource metrics, APM/tracing data etc. 

We use it for use in dashboards, monitoring/alerting, SLO targets, incident response etc. 

We have a lot of applications across multiple languages/frameworks etc., and have deployed in Kubernetes across multiple regions in AWS, along with underlying managed resources such as SQS, Aurora, etc. 

Datadog makes understanding the state of these seamless. We are a company with millions of daily active users, and this level of detail is excellent.

How has it helped my organization?

Datadog has allowed us to rapidly spin up alerting and monitoring that helps our incident responders get alerted quickly when our SLOs are in danger and helps to quickly resolve issues. 

It is the single most important tool we have from an SRE perspective. 

It also provides us with an easy way to get information at a glance for all of our services through APM and create unified dashboards that track our underlying resources, such as databases, queues, etc., alongside application data. 

It has been invaluable to our organization.

What is most valuable?

The management of SLOs and their related burn-rate monitors have allowed us to onboard teams to on-call fast. 

Management of resources using infrastructure-as-code has been a recent game-changer for us. Combining the two has allowed us to provide product teams with a total solution for getting their applications attached to user-focused alerting and monitoring within a matter of days rather than months - and has clearly impacted our ability to discover and respond to significant production incidents.

What needs improvement?

Managing dashboards as IaC is a bit hard to work out at times. I use custom tools to convert JSON dashboards to Terraform resources. Ideally, I'd like for some sort of building tool for this to be built into the app. For example, a templating system that can easily be exported to IaC would be transformative for us. 

There are also some aspects of the API that can be a bit verbose - especially in the area of new features like SLOs - and take some time to understand. That said, overall, they're well-documented enough to be a minor concern for us.

Buyer's Guide
Datadog
March 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
839,422 professionals have used our research since 2012.

For how long have I used the solution?

I've been using the solution for over five years.

What do I think about the stability of the solution?

I have never seen a major outage that prevented us from using Datadog, although I can't speak for other teams/time zones

What do I think about the scalability of the solution?

This product is massively scalable - I haven't seen any issues as we continue to onboard new technologies and teams

How are customer service and support?

Datadog provides us with a number of direct lines to support, although I haven't personally required their assistance.

Which solution did I use previously and why did I switch?

We previously used LightStep for APM and switched to Datadog to unify all of our application data.

How was the initial setup?

Most elements are quite simple to set up. However, some types of data collection require organization-wide engineering buy-in.

What about the implementation team?

We handled the initial setup in-house.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003829 - PeerSpot reviewer
Sr Platform Engineer at a pharma/biotech company with 11-50 employees
Real User
Good logging with lots of great integrations and an interesting dashboard
Pros and Cons
  • "Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate."
  • "Some of the interface is still confusing to use."

What is our primary use case?

We use it mostly for logging log messages from our Kubernetes and EC2 instances, for example, system messages and errors. Also, we want log messages from our firewalls and other network infrastructure in case of network issues. We intend to use it for application logging, et cetera, to get insight into internal problems in the applications in Kubernetes pods. We want to use it for monitoring in case of system problems and hardware failures so that it can notify us.

How has it helped my organization?

It's good to have a single location for all the logs. If you have logs coming from a whole lot of sources, it makes it hard to find where the problem lies. 

We had to spend a lot of time logging into various systems and pursuing a billion different log files looking for something that stands out as a possible cause of the issue. That can take a lot of time and doesn't give much visibility into the possible interactions between systems.

What is most valuable?

Datadog has a lot of features to be able to drill down deep into the swath of logs that our platforms generate. 

It has a lot of ability to make fancy and deep searches using regular expressions and to graph them into useful and interesting dashboard graphs. 

The plethora of built-in/downloadable integrations make it much easier to set up for our platforms. Otherwise, we'd have to parse the log files ourselves, which would take a great deal of effort. Had to do it before when had to use an ELK stack for logging, which was painful.

What needs improvement?

Some of the interface is still confusing to use. It has many features, and it takes a lot of effort to figure out what they all mean. Maybe having tooltips or something would be helpful. Also, some of the integrations are better than others.

For how long have I used the solution?

I've used the solution for a month.

What do I think about the stability of the solution?

The solution seems very stable.

Which solution did I use previously and why did I switch?

Have used an ELK stack before. However, it took a lot of effort to maintain, and parsing the logs was difficult.

How was the initial setup?

We implemented the solution in-house.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Datadog
March 2025
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
839,422 professionals have used our research since 2012.
reviewer2004069 - PeerSpot reviewer
support Eng
Real User
Helpful dashboards with a good cloud security posture manager and cloud workload security
Pros and Cons
  • "It helps us better manage our logs."
  • "They should continue expanding and integrating with more third-party apps."

What is our primary use case?

We use the application for our application monitoring, data security monitoring, and log management. What we like about the application is that it helps us to track issues more proactively instead of reactively.

There are other improvements we would like to see.

1. Being able to restrict users from seeing or viewing specific dashboards once they log in

2. They can cut down the prices for Cloud SIEM. It seems very useful, however, the prices are high. Some organizations are finding it difficult to make decisions in terms of getting the tool.

How has it helped my organization?

We use the application for our application monitoring, data security monitoring, and log management. It helps us to track issues proactively instead of reactively.

It helps us better manage our logs.

We can effectively track down issues.

We have dashboards that give us an overview of our environment.

What is most valuable?

The tools I have found useful include the Datadog cloud security posture manager and cloud workload security.

What needs improvement?

Datadog is a great tool, and we value the services they offer. They should continue expanding and integrating with more third-party apps.

For how long have I used the solution?

I've used the solution for three years. 

What do I think about the stability of the solution?

I love its stability.

What do I think about the scalability of the solution?

It is very scalable.

How are customer service and support?

Technical support has been great.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously used AWS.

How was the initial setup?

The initial setup is not too complex.

What was our ROI?

We've seen an ROI of 50%.

What's my experience with pricing, setup cost, and licensing?

It's a little pricy yet worth it.

Which other solutions did I evaluate?

We did not previously evaluate another solution.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2003214 - PeerSpot reviewer
Sr. Director of Software Engineering at a tech consulting company with 1,001-5,000 employees
Real User
Helpful support, good incident management, and helps triage faster
Pros and Cons
  • "The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support."
  • "The pricing is a bit confusing."

What is our primary use case?

The RUM is implemented for customer support session replays to quickly route, triage, and troubleshoot support issues which can be sent to our engineering teams directly. 

Customer Support will log in directly after receiving a customer request and work on the issue. Engineers will utilize the replay along with RUM to pinpoint the issue combined with APM and Infra trace to be able to look for signals to find the direct cause of the customer impact. 

Incident management will be utilized to open a Jira ticket for engineering, and it integrates with ITSM systems and on-call as needed.

How has it helped my organization?

The RUM solution has improved our ability to triage faster and hand more capabilities to our customer support.

The RUM is implemented for customer support. It can quickly route, triage, and troubleshoot support issues that are sent to our engineering teams. 

Customer support can log in and start troubleshooting after receiving a customer request. The replay and RUM help pinpoint the issue. This functionality is combined with APM and Infra trace to be able to look for the cause of the issue. Incident management is leveraged to open a Jira ticket for engineering, and it can integrate with ITSM systems and on-call as needed.

What is most valuable?

RUM with session replay combined with a future use case to support synthetics will help to identify issues earlier in our process. We have not rolled this out yet but plan for it as a future use case for our customer support process. This, combined with integrated automation for incident management, will drive down our MTTR and time spent working through tickets. Overall, we are hoping to use this to look at our data and perfection rate over time in a BI-like way to reduce our customer support headcount by saving on time spent.

What needs improvement?

I would like to see retention options greater than 30-days for session replay. I'd also like to see forwarding options for retention to custom solutions, and a greater ability to event and export data from the tooling overall to BI/DW solutions for reporting across the long term and to see trends as needed.

For how long have I used the solution?

I've used the solution for about nine months.

What do I think about the stability of the solution?

So far, stability has been great.

What do I think about the scalability of the solution?

I'd like to see more bells and whistles added over time. Widgets are coming soon to help with RUM.

How are customer service and support?

Support is very good. They are responsive and gave us the help we need.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We have utilized New Relic, however, not for RUM. We went with Datadog to potentially switch the entire platform into an all-in-one solution that makes sense for a company of our size.

How was the initial setup?

We started on the beta, and the documentation was lagging behind. We also needed direct instructions and links from the customer support/account representative that was not immediately available by searching online.

What about the implementation team?

We implemented the solution ourselves.

What was our ROI?

Ideally, this will inform our strategy to not increase our customer support headcount as significantly into 2023 and beyond.

What's my experience with pricing, setup cost, and licensing?

The pricing is a bit confusing. However, the RUM session replay, in general, is very inexpensive compared to whole solutions.

Which other solutions did I evaluate?

We looked into LogRocket and New Relic.

What other advice do I have?

I'd advise other users to try it out.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2004210 - PeerSpot reviewer
Cloud Specialyst at a financial services firm with 501-1,000 employees
Real User
Centralized with good observability and many modules
Pros and Cons
  • "The most valuable aspect is for us to have everything in one place."
  • "We need a lot of modules since we collect all data logs from all operating systems."

What is our primary use case?

We collect all data logs from all operating systems, such as Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers. 

Now we are starting a POC to analyze the APM module. In the feature, the next step is to do a POC of security modules. 

The final idea is to have a unique portal for observability. This will make it easy to troubleshoot and for layer levels 1 and 2. 

How has it helped my organization?

We are looking into a lot of modules. We collect all data logs from all operating systems, including Windows, Linux, VMware, and bare metal data centers. We also automatize the installation of the agent on servers. 

We're developing POCs for APM and security modules. We'll also have a unique portal for observability. This will make it easy to troubleshoot. 

The most valuable aspect is for us to have everything in one place.

What is most valuable?

We're investigating many modules. We collect all data logs from all operating systems (Windows, Linux, VMware, and bare metal data centers). We also automatize the installation of the agent on servers. 

We're doing POCs in APM and security. 

Soon, we'll have a unique portal for observability. This will make troubleshooting easy at levels 1 and 2. 

The most valuable aspect for us is to have everything in the same place.

What needs improvement?

We need a lot of modules since we collect all data logs from all operating systems. 

The most important module for us is log management. The second is the security module. The third one is the APM.

For how long have I used the solution?

We've used the solution for one year.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Plinio Moreira - PeerSpot reviewer
Sales Engineer at Delfia
Real User
Great Logging, APM, and RUM capabilities
Pros and Cons
  • "The CCM, Workflows, Logs, APM, and RUM are all useful aspects of the solution."
  • "We have contact with many customers that cover many areas, so we have cases where the infrastructure administration could be improved."

What is our primary use case?

I'm a Datadog partner in Brazil, and I monitor all my applications with Datadog too. I would like to enable all features in my DPN portal and get access to custom demos. We resell Datadog and a full stack of pre-sales, sales, and post-sales services. We have customers for all sectors, including governmental, financial services, services in general, telecom, et cetera. Today, we are the biggest Datadog partner in Brazil, and we are searching for an expansion in our MSP environment.

How has it helped my organization?

I resell all solutions in Datadog, so all features are important for our customers.
We are the biggest Datadog partner in Brazil, and we would like to expand our MSP environment.

What is most valuable?

The CCM, Workflows, Logs, APM, and RUM are all useful aspects of the solution. I resell all solutions in Datadog, so all features are important.

I'm a Datadog partner in Brazil, and I monitor all my applications with Datadog too. 

The solution works within all sectors, including, governmental, financial services, services in general, and telecom. 

What needs improvement?

We have contact with many customers that cover many areas, so we have cases where the infrastructure administration could be improved. In general, cloud users and microservices users like Kubernetes offer a faster improvement in the environment. Our users of the feature logs had a lot of benefits and found cost reductions also.

For how long have I used the solution?

I've used the solution for three years. 

Which solution did I use previously and why did I switch?

We used to use AppDynamics. We switched due to the fact that the cloud monitoring and K8 monitoring are not as good as Datadog.

What about the implementation team?

I'm a reseller.

What's my experience with pricing, setup cost, and licensing?

The licensing model is better, however, if they had the option to block consumption in the Infra and APM, that would help to keep better control of costs.

Which other solutions did I evaluate?

We did not evaluate other solutions.

What other advice do I have?

We use the solution as a SaaS.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2000451 - PeerSpot reviewer
SRE at a financial services firm with 10,001+ employees
Real User
Great visibility, easy to implement, and offers the ability to set thresholds
Pros and Cons
  • "It has provided visibility with ease of implementation and allowed multiple teams to quickly onboard it."
  • "Federated views for Datadog dashboards are critical as large companies utilize multiple instances of the product and cannot link the metrics or correlate the metrics together. This stunts the usage of Datadog."

What is our primary use case?

We primarily use the solution for observability, metrics, logs, tracing, and end-to-end user flow monitoring. 

We are looking to implement this as a company-wide standard for cloud solutions.

At this time, we're currently in a POC, and we're interested in using either a Datadog agent or the OTel agent with a Datadog exporter. We have dashboards with panels that correlate metrics and allow you to link through to traces. Flame graphs to show latency across services and the various spans. 

While we are not security minded, we still require it and are interested in more. It's used for monitoring critical systems.

How has it helped my organization?

It has provided visibility with ease of implementation and allowed multiple teams to quickly onboard it. This provided a standard way to approach observability and visibility. 

Monitoring rules and alerting thresholds can also be set and exported to other teams for use. 

There is an issue with federated dashboards, as multiple teams running on different Datadog instances cannot use features like the service catalog or easily switch between services in a long business flow.

What is most valuable?

The K8 monitoring is extremely useful in Datadog. Preset dashboards that it provides help to speed up the work. 

The metrics summary is useful. Tracing with a span breakdown is helpful for us. We like the dashboarding with power packs and logging correlation with traces and logs. 

The Flame graph for tracing helps determine where the latency is the highest. 

Dashboards are created as a standard set and then exported into other Datadog instances for other teams. 

These dashboards would be updated regularly and pushed out to the teams. Unfortunately, there is no way to automatically push or deploy code in a quicker way. Each team I work with has its own Datadog instance.

What needs improvement?

Federated views for Datadog dashboards are critical as large companies utilize multiple instances of the product and cannot link the metrics or correlate the metrics together. This stunts the usage of Datadog. Additionally, using an OTel agent would be more acceptable and allow for easier adoption of Datadog across the hundreds of teams here.

For how long have I used the solution?

I've used the solution for four months.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Technology Competency and Solution Head at LearningMate
Real User
Good infrastructure and traffic visualizations help with capacity planning
Pros and Cons
    • "The error traceability is an area that can be improved."

    What is our primary use case?

    We use Datadog for application monitoring, to help identify errors. It is also used to monitor application performance.

    It helps organizations to understand User Experience with user behaviour pattern

    How has it helped my organization?

    Helped to reduce production issues in a defined timeframe

    Helped to refine UX

    What is most valuable?

    Datadog has a very good visualization for my complete infrastructure and network traffic, which enabled me to create a capacity plan.

    This product is great because it shows you the SQL and your application request in a single view.

    What needs improvement?

    The error traceability is an area that can be improved. This is something that helps us to pinpoint the area where a problem is occurring. It is a function stack, and it should be showing us how each function is defined.

    For how long have I used the solution?

    We have been using Datadog for the past couple of Years.

    What do I think about the stability of the solution?

    I have not worked on it long enough to properly comment on stability, yet, because it has to be tested across my other platforms.

    What do I think about the scalability of the solution?

    We have not done a full evaluation yet, but given that it is cloud-based, DataDog has to be scalable.

    How are customer service and support?

    I have not needed to contact technical support.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    We were using New Relic prior to implementing Datadog. In terms of application monitoring, Datadog is not up to the level that New Relic is. It is a better product but the price is too high, which is why we switched.

    How was the initial setup?

    Yes. It is not complex. It allows you to get a certification of DataDog prior deployment of associates to administration and configuration

    What about the implementation team?

    Inhouse. We got our Admin team certified.

    What was our ROI?

    Time to resolution production issue

    What's my experience with pricing, setup cost, and licensing?

    The price is better than some competing products.

    Which other solutions did I evaluate?

    NewRelic

    What other advice do I have?

    This is a good product and I can recommend it to others, although New Relic is still my first choice. Datadog is my second choice.

    Overall, it is a good product and my main complaint is that it needs better error traceability.

    I would rate this solution a nine out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
    Updated: March 2025
    Buyer's Guide
    Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.