The product is used for APM solutions for the metrics and traces for the REST API requests and service maps to understand the upstream and downstream services.
We are creating dashboards and widgets to monitor the status. We are creating alerts and monitors as well. We integrated the alerts and ticketing system in our organization with SNOW and Netcool.
We are using Kubernetes, AWS, and infrastructure metrics. We are using Kafka and Aurora Postgres logs as well, and we are using HTTP status codes to identify the error types.
So far, the solution works very well and solves most of the problems we have. Currently, we are trying to integrate the trace ID into Datadog and correlate the logs and metrics. However, Datadog is not supporting the spring-generated trace IDs, and they are not shown in the Datadog UI. It works in reverse. This means Datadog injects the DD-specific trace ID into the application logs, and those logs can be in other tools, for example, Cloud Watch and Splunk.
The most valuable aspect is the APM which can monitor the metrics and latencies. There's a low error rate, and any alerts can be tagged to the service requests and sent via email to the required DLs.
We can create incidents as well in our internal tools, like SNOW and Netcool.
The monitoring enables different dimensions of metrics to monitor the services and infrastructure.
We have cloud infrastructure monitoring in Kubernetes nodes, pods containers, and ingress metrics.
Alerts are sent to an email in case of any issues. The metrics are used to create alerts.
The solution offers good dashboards, service maps, traces and flame graphs, HTTP status codes, power packs, service catalogs, and profiling.
While the logs module is not activated, we are using all other modules.
The correlation between the logs and the metrics needs improvement as most cases, we might use another logging tool (that is cheaper in cost) which then we have to link together.
They can improve the SSO logging as well. Currently, we are logging in every two to three days by sending the login link explicitly.
I've been using the solution for two years.
The stability is awesome.
We are expanding beyond observability right now.
They offer pretty awesome customer support.
We did not previously use a different solution.
The initial setup was easy.
We implemented the solution with the help of a vendor team.
I'd rate the ROI ten out of ten.
I would recommend Datadog to others.
We also evaluated ECE and Splunk.
The solution has a great support model.