We primarily use the solution for observability, metrics, logs, tracing, and end-to-end user flow monitoring.
We are looking to implement this as a company-wide standard for cloud solutions.
At this time, we're currently in a POC, and we're interested in using either a Datadog agent or the OTel agent with a Datadog exporter. We have dashboards with panels that correlate metrics and allow you to link through to traces. Flame graphs to show latency across services and the various spans.
While we are not security minded, we still require it and are interested in more. It's used for monitoring critical systems.
It has provided visibility with ease of implementation and allowed multiple teams to quickly onboard it. This provided a standard way to approach observability and visibility.
Monitoring rules and alerting thresholds can also be set and exported to other teams for use.
There is an issue with federated dashboards, as multiple teams running on different Datadog instances cannot use features like the service catalog or easily switch between services in a long business flow.
The K8 monitoring is extremely useful in Datadog. Preset dashboards that it provides help to speed up the work.
The metrics summary is useful. Tracing with a span breakdown is helpful for us. We like the dashboarding with power packs and logging correlation with traces and logs.
The Flame graph for tracing helps determine where the latency is the highest.
Dashboards are created as a standard set and then exported into other Datadog instances for other teams.
These dashboards would be updated regularly and pushed out to the teams. Unfortunately, there is no way to automatically push or deploy code in a quicker way. Each team I work with has its own Datadog instance.
Federated views for Datadog dashboards are critical as large companies utilize multiple instances of the product and cannot link the metrics or correlate the metrics together. This stunts the usage of Datadog. Additionally, using an OTel agent would be more acceptable and allow for easier adoption of Datadog across the hundreds of teams here.
I've used the solution for four months.