Try our new research platform with insights from 80,000+ expert users
reviewer2507895 - PeerSpot reviewer
Software Architect at Keller Williams Realty, Inc.
User
Good RUM and APM with good observability
Pros and Cons
  • "We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages."
  • "The cost is pretty high."

What is our primary use case?

We use Datadog across the enterprise for observability of infrastructure, APM, RUM, SLO management, alert management and monitoring, and other features. We're also planning on using the upcoming cloud cost management features and product analytics.

For infrastructure, we integrate with our Kube systems to show all hosts and their data.

For APM, we use it with all of our API and worker services, as well as cronjobs and other Kube deployments.

We use serverless to monitor our Cloud Functions.

We use RUM for all of our user interfaces, including web and mobile.

How has it helped my organization?

It's given us the observability we need to see what's happening in our systems, end to end. We get full stack visibility from APM and RUM, through to logging and infrastructure/host visibility. It's also becoming the basis of our incident management process in conjunction with PagerDuty.

APM is probably the most prominent place where it has helped us. APM gives us detailed data on service performance, including latency and request count. This drives all of the work that we do on SLOs and SLAs.

RUM is also prominent and is becoming the basis of our product team's vision of how our software is actually used.

What is most valuable?

APM is a fundamental part of our service management, both for viewing problems and improving latency and uptime. The latency views drive our SLOs and help us identify problems.

We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages.

RUM has been critical in identifying what our users are actually doing, and we'll be using the new product analytics tools to research and drive new feature development.

All of this feeds into the PagerDuty integration, which we use to drive our incident management process.

What needs improvement?

Sometimes thesolution changes features so quickly that the UI keeps moving around. The cost is pretty high. Outside of that, we've been relatively happy.

The APM service catalog is evolving fast. That said, it is redundant with our other tools and doesn't allow us to manage software maturity. However, we do link it with our other tools using the APIs, so that's helpful.

Product analytics is relatively new and based on RUM, so it will be interesting to see how it evolves.

Sometimes some of the graphs take a while to load, based on the window of data.

Some stock dashboards don't allow customization. You need to clone them first, but this can lead to an abundance of dashboards. Also, there are some things that stock dashboards do that can't yet be duplicated with custom dashboards, especially around widget organization.

The "top users" widget on the product analytics page only groups by user email, which is unfortunate, since user ID is the field we use to identify our users.

Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.

For how long have I used the solution?

I've used the solution for three and a half years.

What do I think about the stability of the solution?

The solution is pretty stable.

What do I think about the scalability of the solution?

The solution is very scalable.

How are customer service and support?

Support was excellent during the sales process, with a huge dropoff after we purchased the product. It has only recently (within the past year) they have begun to reach acceptable levels again.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not have a global solution. Some teams were using New Relic.

How was the initial setup?

The instructions aren't always clear, especially when dealing with multiple products across multiple languages. The tracer works very differently from one language to another.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

We have built our own set of installation instructions for our teams, to ensure consistent tagging and APM setup.

Which other solutions did I evaluate?

We did look at Dynatrace.

What other advice do I have?

The service was great during the initial testing phase. However, once we bought the product, the quality of service dropped significantly. However, in the past year or so, it has improved and is now approaching the level we'd expect based on the cost.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Senior Engineer at a retailer with 1,001-5,000 employees
User
Good monitoring capabilities, centralizing of logs, and making data easily searchable
Pros and Cons
  • "The intuitive user interface has been one of the most valuable features for us."
  • "While the UI and search functionality are excellent, further improvement could be made in the querying of logs by offering more advanced templates or suggestions based on common use cases."

What is our primary use case?

Our primary use of Datadog involves monitoring over 50 microservices deployed across three distinct environments. These services vary widely in their functions and resource requirements. 

We rely on Datadog to track usage metrics, gather logs, and provide insight into service performance and health. Its flexibility allows us to efficiently monitor both production and development environments, ensuring quick detection and response to any anomalies. 

We also have better insight into metrics like latency and memory usage.

How has it helped my organization?

Datadog has significantly improved our organization’s monitoring capabilities by centralizing all of our logs and making them easily searchable. This has streamlined our troubleshooting process, allowing for quicker root cause analysis. 

Additionally, its ease of implementation meant that we could cover all of our services comprehensively, ensuring that logs and metrics were thoroughly captured across our entire ecosystem. This has enhanced our ability to maintain system reliability and performance.

What is most valuable?

The intuitive user interface has been one of the most valuable features for us. Unlike other platforms like Grafana, as an example, where learning how to query either involves a lot of trial and error or memorization almost like learning a new language, Datadog’s UI makes finding logs, metrics, and performance data straightforward and efficient. This ease of use has saved us time and reduced the learning curve for new team members, allowing us to focus more on analysis and troubleshooting rather than on learning the tool itself.

What needs improvement?

While the UI and search functionality are excellent, further improvement could be made in the querying of logs by offering more advanced templates or suggestions based on common use cases. This would help users discover powerful queries they might not think to create themselves. 

Additionally, enhancing alerting capabilities with more customizable thresholds or automated recommendations could provide better insights, especially when dealing with complex environments like ours with numerous microservices.

For how long have I used the solution?

I've used the solution for five years.

What do I think about the stability of the solution?

We have never experienced any downtime.

Which solution did I use previously and why did I switch?

We previously used Sumo Logic.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Datadog
November 2024
Learn what your peers think about Datadog. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Gediminas Anza - PeerSpot reviewer
Manager, System at Visma
User
Increases efficiency, helps with customer satisfaction, and enhances collaboration
Pros and Cons
  • "The agents feature in Datadog stands out as a valuable asset within our organization due to its robust functionality, versatility, and role in providing comprehensive monitoring and observability capabilities."
  • "Presently, the billing CSV reports provide insights into billing-related information yet are somewhat limited in functionality, typically offering reports with only three columns."

What is our primary use case?

The primary use case of Datadog within our organization encompasses providing a comprehensive and sophisticated solution that caters to the diverse needs of our internal customers. We have strategically implemented Datadog to serve as a centralized platform for monitoring, analyzing, and optimizing various aspects of our operations. With a robust suite of functionalities, Datadog empowers us to meet the dynamic requirements of over 40 internal customers efficiently.

Through Datadog, we offer a wide array of services to our internal stakeholders, allowing them to access and leverage its capabilities to enhance performance, troubleshoot issues, and make data-driven decisions. The tool's versatility enables different teams within our organization to monitor and track distinct metrics, such as application performance, infrastructure health, and logs, tailored to their specific requirements.

Moreover, Datadog serves as a pivotal component in our organizational ecosystem by streamlining processes, enhancing collaboration, and fostering a culture of data-driven decision-making. By harnessing the power of Datadog, our internal customers can proactively address issues, optimize resources, and ultimately improve operational efficiency across the board.

In essence, the primary use case of Datadog in our organization revolves around empowering our internal customers with a comprehensive and feature-rich solution that enables them to monitor, analyze, and optimize various aspects of our operations seamlessly and effectively. This strategic implementation of Datadog plays a vital role in enhancing our overall performance, fostering transparency, and driving continuous improvement within our organization.

How has it helped my organization?

Datadog has significantly contributed to enhancing the overall effectiveness and efficiency of our organization through various key improvements. One of the standout benefits has been the accelerated resolution of issues. By leveraging Datadog's monitoring and alerting capabilities, we have been able to swiftly detect, diagnose, and address issues before they escalate, resulting in minimized downtime and enhanced operational continuity.

Moreover, the implementation of Datadog has had a tangible positive impact on customer satisfaction. With improved visibility into our systems and applications, coupled with proactive monitoring and performance optimization, we have been able to deliver a more reliable and seamless experience to our customers. This has translated into higher customer satisfaction scores and strengthened relationships with our stakeholders.

Another notable improvement brought about by Datadog is the streamlining of our toolset. By identifying and removing multiple unused or redundant features and tools, Datadog has helped optimize our workflows and resources. This decluttering of unnecessary functionalities has not only increased operational efficiency yet also streamlined our processes, allowing us to focus on the tools and features that truly add value to our operations.

In summary, Datadog's impact on our organization has been profound, enhancing our ability to resolve issues rapidly, improving customer satisfaction levels, and streamlining our toolset for increased efficiency and focus. These improvements have led to a more robust and resilient operational environment, enabling us to better meet the needs of our internal and external stakeholders.

What is most valuable?

Within our organization, we have found the Agents feature in Datadog to be exceptionally valuable due to its rich set of functionalities and capabilities. The Agents play a crucial role in our monitoring and data collection processes, providing a comprehensive and reliable means to gather crucial performance metrics and insights across our systems and applications.

One of the key reasons why the agents feature stands out as particularly valuable is its versatility. The Agents offer a wide range of monitoring and data collection options, allowing us to capture diverse metrics and performance data with precision. This flexibility enables us to tailor our monitoring strategy to meet the specific needs of different teams and use cases within our organization.

Moreover, the agents feature in Datadog enhances the overall observability of our infrastructure and applications. By deploying Agents strategically across our environment, we can gather real-time metrics, logs, and traces, enabling us to monitor the health, performance, and behavior of our systems comprehensively. This deep level of observability empowers us to proactively identify issues, optimize performance, and make informed decisions based on accurate and timely data.

Furthermore, the agents feature in Datadog plays a pivotal role in driving actionable insights and facilitating efficient troubleshooting. With the detailed data collected by the Agents, we can perform in-depth analysis, detect anomalies, and troubleshoot issues quickly and effectively. This proactive approach to monitoring and analysis ultimately enhances our operational efficiency and resilience.

In essence, the agents feature in Datadog stands out as a valuable asset within our organization due to its robust functionality, versatility, and role in providing comprehensive monitoring and observability capabilities. By leveraging the power of the Agents feature, we can effectively monitor, analyze, and optimize our systems and applications to ensure seamless operations and performance excellence.

What needs improvement?

In assessing areas for potential improvement, one key aspect where Datadog could enhance its service is in the realm of billing CSV reports. Presently, the billing CSV reports provide insights into billing-related information yet are somewhat limited in functionality, typically offering reports with only three columns. Expanding the capabilities of the billing CSV reports to include more detailed and customizable information would greatly benefit users by allowing them to gain a deeper understanding of their usage, costs, and billing trends within Datadog.

Additionally, in considering features for inclusion in the next release of Datadog, the development of more robust and customizable billing CSV reports could be a significant enhancement. By allowing users to tailor their billing reports to specific metrics, timeframes, and parameters of interest, Datadog could provide greater transparency and control over billing data, enabling users to make informed decisions regarding resource allocation, cost optimization, and budget planning.

Moreover, the inclusion of features such as cost forecasting, budget tracking, and customizable alerts related to billing thresholds could further empower users to manage their expenses effectively and proactively monitor and control costs within Datadog. These additions would not only enhance user experience and satisfaction, however, also contribute to a more holistic and actionable approach to financial management within the Datadog platform.

By refining the functionality of billing CSV reports and incorporating advanced features for cost analysis, forecasting, and monitoring, Datadog can elevate its service offering and provide users with enhanced tools for optimizing their usage, expenses, and financial oversight within the platform.

For how long have I used the solution?

I've used the solution for over three years.

What do I think about the scalability of the solution?

Datadog is easy to scale. However, it's scaled for price, so be sure to measure what you need and not push all logs to the solution, or your price will skyrocket quickly.

Which solution did I use previously and why did I switch?

We use multiple APM tools to have both price and value correlations relevant to the teams using them.

What's my experience with pricing, setup cost, and licensing?

Request a test account during the POC phase to determine if the tool is the right fit; all providers do that for free.

Which other solutions did I evaluate?

We did POC with over five products. I can't name them due to the related NDA.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
reviewer2543758 - PeerSpot reviewer
Engineering Manager at RVshare
User
Good visibility into application performance, understanding of end-user behavior, and a single pane of glass view
Pros and Cons
  • "The single pane of glass view with maneuvering between products has helped us to truly understand root causes after incidents."
  • "The wide range of products Datadog now offers can be a bit intimidating to developers."

What is our primary use case?

The primary use case for this solution is to enhance our monitoring visibility, determine the root cause of incidents, understand end-user behaviour from their point of view (RUM), and understand application performance.

Our technical environment consists of a local dev env where Datadog is not enabled, we have deployed environments that range from UAT testing with our product org to ephemeral stacks that our developers use to test there code not on there computer.  We also have a mobile app where testing is also performed.

How has it helped my organization?

Datadog has greatly improved our organization in many ways. Some of those ways include greater visibility into application performance, understanding of end-user behavior, and a single pane of glass view into our entire infrastructure.  

Regarding visibility, our organization previously used New Relic, and when incidents or regressions happened, New Relic's query language was very hard to use. End-user behavior in RUM has improved our ability to know what to focus on. Lastly, the single pane of glass view with maneuvering between products has helped us truly understand root causes after incidents.

What is most valuable?

APM has been a top feature for us. I can speak for all developers here: they use it more often than other products. Due to a standard in tracing (even though it is customizable), engineers find it easier to walk a trace than to understand what went wrong when looking at logging.  

Another feature that I find valuable, though it isn't the first one that comes to mind, is Watchdog. I have found that has been a good source of understanding anomalies and where maybe we (as an organization) need more monitoring coverage.

What needs improvement?

I am not 100% sure how this is done or if it can be though I've had a lot of education I've had to do to ramp developers up on the platform. This feels like the nature of just the sheer growth and number of products Datadog now offers.  

When I first started using the Datadog platform, I thought that was a big pro of the company that the ramp-up time was much quicker, not having to learn a query language. I still believe that to be true when comparing the product to someone like New Relic though with the wide range of products Datadog now offers it can be a bit intimidating to developers to know where to go to find what they want.

For how long have I used the solution?

I have been using the solution at my current company for almost four years, and have used it at my previous company as well.

Which solution did I use previously and why did I switch?

A while ago, we used New Relic, and we switched due to Datadog being a better product.

What about the implementation team?

We did the implementation in-house.

What's my experience with pricing, setup cost, and licensing?

The value compared to pricing is reasonable, though it can be a bit of a sticker shock to some.

Which other solutions did I evaluate?

We did not evaluate other options. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Real User
Top 20
Improves monitoring and observability with actionable alerts
Pros and Cons
  • "The selection of monitors is a big feature I have been working with."
  • "The PagerDuty integration could be a little bit better."

What is our primary use case?

We are using Datadog to improve our monitoring and observability so we can hopefully improve our customer experience and reliability.  

I have been using Datadog to build better actionable alerts to help teams across the enterprise. Also by using Datadog we are hoping to have improved observability into our apps and we are also taking advantage of this process to improve our tagging strategy so teams can hopefully troubleshoot incidents faster and a much reduced mean time to resolve. 

We have a lot of different resources we use like Kubernetes, App Gateway and Cosmos DB just to name a few.

How has it helped my organization?

As soon as we started implementing Datadog into our cloud environment people really like how it looked and how easy it was to navigate. We could see the most data in our Kubernetes environments than we ever could. 

Some people liked how the logs were color coded so it was easy to see what kind of log you were looking at. The ease of making dashboards has also been greatly received as a benefit. 

People have commented that there is so much information that it takes a time to digest and get used to what you are looking at and finding what you are looking for. 

What is most valuable?

The selection of monitors is a big feature I have been working with. Previously with Azure Monitor we couldn't do a whole lot with their alerts. The log alerts can sometimes take a while to ingest. Also, we couldn't do any math with the metrics we received from logs to make better alerts from logs.  

The metric alerts are ok but are still very limited. With Datadog, we can make a wide range of different monitors that we can tweak in real time because there is a graph of data as you are creating the alert which is very beneficial. The ease of making dashboards has saved a lot of people a lot of time. No KQL queries to put together the information you are looking for and the ability to pin any info you see into a dashboard is very convenient. 

RUM is another feature we are looking forward to using this upcoming tax season, as we will have a front-row view into what frustrates customers or where things go wrong in their process of using our site. 

What needs improvement?

The PagerDuty integration could be a little bit better. If there was a way to format the monitors to different incident management software that would be awesome. As of right now, it takes a lot of manipulating of PagerDuty to get the monitors from Datadog to populate all the fields we want in PagerDuty.  

I love the fact you can query data without using something like KQL. However, it would also be helpful if there was a way to convert a complex KQL query into Datadog to be able to retrieve the same data - especially for very specific scenarios that some app teams may want to look for.

For how long have I used the solution?

I've used the solution for about two years.

Which solution did I use previously and why did I switch?

We previously used Azure Monitor, App Insights, and Log Analytics.  We switched because it was a lot for developers and SREs to switch between three screens to try troubleshoot and when you add in the slow load times from Azure it can take a while to get things done.

What's my experience with pricing, setup cost, and licensing?

I would advise taking a close look at logging costs, man-hours needed, and the amount of time it takes for people to get comfortable navigating Datadog because there is so much information that it can be overwhelming to narrow down what you need.

Which other solutions did I evaluate?

We did evaluate DynaTrace and looked into New Relic before settling on Datadog.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: H&R Block has just recently became a customer of Datadog.
Flag as inappropriate
PeerSpot user
Product Engineering Manager at FMG Suite
User
Good logging, easy to find issues, and saves time
Pros and Cons
  • "The logging in general is one of my favorite features."
  • "I love to have some DD guru come in and do a department training directly at our setup."

What is our primary use case?

We use the solution for APM, AWS, Lambda, logging, and infrastructure. We have many different things all over AWS, and having one place to look is great.

We have all sorts of different AWS things out there that are in C# and Node. Having a single place to log and APM into is very important to us.

Keeping track of the cloud infrastructure is also important. We have Lambda, containers, EC2, etc.

Having a super simple interface to filter the searching for APM and logging is great. It is super easy to show people how to use. This is super important to us.

How has it helped my organization?

Finding issues quickly is super important. Being able to create dashboards and alert on issues.

Having the ability to create dashboards has really taught us how to utilize the searching part of the system. We are able to share them, and build upon them so easily. Many iterations later people are putting some solid information out there.

Alerting is also important to us. We have set up many alerts that help us spot issues in the platform before they become bigger issues. This has enabled my teams to use incidents and address the issues so they are no longer problems.

What is most valuable?

Alerting on running systems is very helpful. Finding issues is quick. We have one place for logging, searching through. Being able to save these and reference them in the future and build upon them.

The logging in general is one of my favorite features. The search is so straight forward and easy to use. Just being able to click on a field and add it to search has taught me so much about the interface, It might not be as useful without a shortcut like that to teach me the system. We have Cloudflare logs in there, and I have no idea sometimes how to filter on such a buried piece of JSON. That is where the interface helps me by clicking on the add to search I get what I need.

What needs improvement?

The "Pager Duty" replacement is something we are very interested in. We only really use pager duty to call the team when things are down.

I love to have some DD guru come in and do a department training directly at our setup. We would love to have someone come in and show us the things we could do better within our current setup.

Also saving a bit of cash would also help if there are things we are doing that are costing us. It's a big enough tool that it is tough to have someone dedicated to manage. 

For how long have I used the solution?

I've used the solution for a bit over a year at this point.

What do I think about the stability of the solution?

The stability seems good here too.

What do I think about the scalability of the solution?

Scalability seems good to me. I have no complaints

How are customer service and support?

I get answers from our contact, and one team member did reach out. It went well.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used Loggly. 

We switched because we wanted an all-in-one tool

How was the initial setup?

Some parts of our setup were tough. Some Windows container setups cost us a lot of time.

The AWS infrastructure was tough to fully turn on due to the large cost of everything being run.

What about the implementation team?

We handled the setup ourselves in-house.

What was our ROI?

This cost us more overall. ROI is hard to sell. That said, I can find issues way faster and see what is going on in my entire platform. I pay back the cost every month with productivity. 

What's my experience with pricing, setup cost, and licensing?

It is going to cost you more than you think to keep everything running. We saw value in the one-for-all solution, however, it came at a premium to what we were paying. 

Which other solutions did I evaluate?

We did evaluate Dynatrace.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Senior Software Engineer at a insurance company with 10,001+ employees
Real User
Very good RUM, synthetics, and infrastructure host maps
Pros and Cons
  • "Overall, the Data UI and the usability of customer features continue to improve."
  • "It is very difficult to make the solutions fit perfectly for large organizations, especially in terms of high cardinality objects and multi-tenancy, where the data needs to be rolled up to a summarized level while maintaining its individual data granularity and identifiers."

What is our primary use case?

I have been using Datadog products and capabilities increasingly over the last 4 years, from POC to widespread adoption. 

The capabilities we use are unique for each use case and can be combined in various ways to provide the full observability coverage needed to maintain stable operations and shift from becoming more reactive to proactive. 

Our organization uses both site/service reliability for the range of backend and frontend services, custom monitoring, and dashboards that can be dynamic and reused for multiple teams.

How has it helped my organization?

The capabilities we use are unique for each use case. They can be combined in various ways to provide the full observability coverage needed to maintain stable operations in order to become more proactive. 

Our organization uses both site/service reliability for backend and frontend services. Custom monitoring and dashboards that can be dynamic and reused for multiple teams. 

We continue to increase the size of our footprint as we get more and more positive experiences.

What is most valuable?

The APM, RUM, synthetics, and infrastructure host maps have been some of the most popular and commonly used features. 

Overall, the Data UI and the usability of customer features continue to improve. 

The RUM session data and replays are much more convenient and applicable than other tools I have worked with in the past, and by combining multiple capabilities or features together, there is full visibility across the technology stacks and can identify specific bottlenecks or areas for risk and vulnerabilities to be likely to exist. 

Watchdog insights take the work out of the hardest part, helping us identify the issues before our customers.

What needs improvement?

It is very difficult to make the solutions fit perfectly for large organizations, especially in terms of high cardinality objects and multi-tenancy, where the data needs to be rolled up to a summarized level while maintaining its individual data granularity and identifiers. Tagging is imperative. However, the solutions could be improved for these needs in the future.

For how long have I used the solution?

I've used the solution for over four years now.

What do I think about the stability of the solution?

The stability is excellent.

What do I think about the scalability of the solution?

You can work with engineering to make it work for your needs. They are excellent at supporting their customers.

How are customer service and support?

Technical support is excellent.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I previously used New Relic, App Dynamics, Heap, Clicktale, and more. Datadog has incorporated many of the features we were looking for into a one-stop shop.

How was the initial setup?

The initial setup is simple and straightforward.

What about the implementation team?

We had an in-house team working directly with Datadog engineering support and technical enablement.

Which other solutions did I evaluate?

We looked into New Relic, App Dynamics, Heap, Clicktale, and more. Datadog has many of the features we were looking for in one place.

What other advice do I have?

We use all versions of the solution.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Enrique Bassallo - PeerSpot reviewer
AWS Cloud Architect Consultant at a manufacturing company with 10,001+ employees
Real User
A very solid option with flexible features for analyzing data and enhancing observability
Pros and Cons
  • "The solution allows flexibility and heightened observability for presenting data, creating indicators, and setting service-level objectives."
  • "The solution should provide alerts for cloud outages."

What is our primary use case?

Our company deploys the solution for our customers as an observability tool to define SLOs and SLIs along with logs and metrics. 

The solution includes incident, post-mortem, and root cause analysis that provides a level of truth for incidents and issues with applications. 

We have SREs and teams in operations, management, and applications who all access to the solution and ensure proper integrations. 

How has it helped my organization?

Our company is adopting SRE practices and the solution helps us to align the practices with our site reliability. We get more insights about issues at the outset which helps us to make better decisions such as continuing with agility or stopping to fix issues. 

We are at the beginning stages of using the solution but are defining it as our company standard for use by all teams. 

What is most valuable?

The solution allows flexibility and heightened observability for presenting data, creating indicators, and setting service-level objectives. There are interesting options for monitors and features that offer flexible ways to analyze data.

What needs improvement?

The solution should provide alerts for cloud outages that would allow us to report potential service impacts directly to applications or on the dashboard. Alerts are important because there is a need to determine the impact on your SLAs, SLOs, and SLIs to decide whether to move toward disaster recovery or another environment. 

I would like the ability to share dashboard screenshots via email rather than having to direct others to the dashboard because it sometimes requires permissions. 

For how long have I used the solution?

I have been using the solution for one year. 

What do I think about the stability of the solution?

Our SRE teams report that the solution is very solid, stable, and reliable. 

What do I think about the scalability of the solution?

We are using the SaaS model so do not manage scalability because the product takes care of our scaling needs with no issues. 

How are customer service and support?

I don't have direct access with support but our SRE teams work with them and are satisfied. 

Which solution did I use previously and why did I switch?

Our company used other products in the past but wanted to move to the cloud. We found that the solution was a very good fit for us. 

How was the initial setup?

The initial setup is not that easy because there are many choices for configuration, workloads, servers, and containers. 

We utilized technical support to help us understand integration and prepare patterns for other applications. 

What about the implementation team?

We created small configurations and then utilized technical support to configure an application selected from our portfolio. 

We utilize a team approach for implementations that sometimes includes SREs. 

What's my experience with pricing, setup cost, and licensing?

The solution is fairly priced but history and log storage can get costly depending on your needs. 

I rate the cost a four out of ten. 

What other advice do I have?

The solution is appropriate for companies that are moving to the cloud and want a very solid tool for observability, logging, and everything related to SRE practices. 

I rate the solution a nine out of ten. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Buyer's Guide
Download our free Datadog Report and get advice and tips from experienced pros sharing their opinions.