Datadog vs ELK: which one is good in terms of performance, cost and efficiency?
Dear community members,
I've been exploring Datadog vs ELK and I need your opinion about both of them in terms of performance, cost, and efficiency? Which one would you recommend?
With Datadog, we have near-live visibility across our entire platform. We have seen APM metrics impacted several times lately using the dashboards we have created with Datadog; they are very good core indicators of the health of our system. You can build very useful dashboards right out of the box using this solution. Our developers are able to see how code is running in production, and having custom built-in metrics gives us a wealth of knowledge and historical data that helps keep our system running smoothly.
The pricing with Datadog can be very high. We feel there should be a more tiered approach to give users the opportunity to buy a more tailored product specific to their organizational needs. Datadog could also be more user-friendly. We would like to see deeper application-level insight and better incident management.
Using ELK, you can gather authentication information from service providers and determine which identity provider is not performing properly. ELK is very flexible; you can have a number of scenarios and get logs from all of them. ELK is a very cost-effective solution.
But it can be a complex solution to use. Better integration with 3rd party APMs would be very helpful. Currently, upgrades with ELK are released as stacks. Plugins or extensions would save removal and reinstallation and make the process move seamlessly.
Conclusion
We researched both of these solutions and concluded Datadog gave us the best visibility, better integration, and helpful, timely support. The logs and error reporting are extremely useful to conduct analysis and root cause analysis. The setup, ease of use, and flexibility with dashboard creation and reporting are just some of the things that our team liked about Datadog.
It depends on your requirement. If you are looking for a SIEM/log management solution ELK would be a better option.
But if you are looking for more of a monitoring solution Datadog would be better. Also, Datadog provides out-of-the-box integrations with a lot of cloud applications. ELK could be cost-effective but a bit challenging to configure & finetune.
Datadog:Unify logs, metrics, and traces from across your distributed infrastructure. Datadog is the leading service for cloud-scale monitoring. It is used by IT, operations, and development teams who build and operate applications that run on dynamic or hybrid cloud infrastructure. Start monitoring in minutes with Datadog!
Datadog features offered are:
200+ turn-key integrations for data aggregation Clean graphs of StatsD and other integrations
Elasticsearch:Open Source, Distributed, RESTful Search Engine. Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
Elasticsearch provides the following key features:
Distributed and Highly Available Search Engine. Multi Tenant with Multi Types. Various set of APIs including RESTful
Cyber Security Consultant at a computer software company with 11-50 employees
MSP
2021-06-21T19:57:40Z
Jun 21, 2021
Dear,
Unfortunately, I can't say much about Datadog but I have used ELK for a short period.
And I can tell you not everything works the way it should. For example, I noticed heavy CPU usage for a Windows client on MS AD servers. I advise you to consider this if it's important to you.
IT Technical Architect at a insurance company with 5,001-10,000 employees
Real User
2021-10-21T12:30:56Z
Oct 21, 2021
Where do you want to spend your money, on people or licenses?
ELK requires a long-term investment in engineering resources to manage the system and to provide the capability.
Datadog provides capabilities for you so you only need some administrators. What are the capabilities? Some critical ones include availability, scalability, consuming log files, platform upgrades, ...
If you are consuming smaller data sets (100's of GB) with shorter retention, the size and scaling are much easier making ELK easier.
Do you have admins or engineers? If your team doesn't have dedicated time & skills to spend developing solutions like elastic-alert you should look for a vendor to provide capabilities.
I expect some capabilities in Datadog you will not be able to replicate in ELK.... so that answer makes this obvious.
We are going to evaluate the same for our org. We do about 10 TB a day consumption in ELK and are looking to see if we can shift $$$ from engineers and infra to SaaS.
I have used both ELK and Datadog, and there are lots of variables to consider here. The three important points that I looked at are:
- Cost. In addition to service costs, you have to consider egress and ingress costs as well.
- Real-time observability that you need during development vs long-term Observability. Keep in mind, when you export data over the internet, it comes with the same reliability issues as any other service on the internet. Regardless of how Datadog classifies its service as real-time, it is not real-time, IMO. It very much depends on your definition of real-time.
- Deployment and maintenance complexity. When your ELK cluster grows it has some pain points you need to be aware of.
My general approach is to deploy ELK for development, tune the data, and then pivot toward commercial solutions if I need to. This gives you insight into your data and what you should be preserving and that way you are not paying high costs, when or if you do decide to take advantage of a commercial solution.
Senior Software Engineer at a tech services company with 501-1,000 employees
Real User
Jun 23, 2021
@it_user860751 I want to store application logs, metrics , kafka queue stats topic level, in short looking for best APM solution in terms of cost, efficiency , scalibility. Nearly 6GB- 7GB data daily basis i need to logs as part of app logs.
Datadog and Elastic Security are two prominent solutions in the monitoring and security market. Datadog seems to have the upper hand, with users favoring its features and ease of deployment.
Features: Datadog users appreciate its comprehensive monitoring capabilities, real-time data analytics, and a broad range of monitoring tools. Elastic Security is valued for its powerful data indexing, search functionalities, and advanced analysis features.
Room for Improvement: Datadog users report...
With Datadog, we have near-live visibility across our entire platform. We have seen APM metrics impacted several times lately using the dashboards we have created with Datadog; they are very good core indicators of the health of our system. You can build very useful dashboards right out of the box using this solution. Our developers are able to see how code is running in production, and having custom built-in metrics gives us a wealth of knowledge and historical data that helps keep our system running smoothly.
The pricing with Datadog can be very high. We feel there should be a more tiered approach to give users the opportunity to buy a more tailored product specific to their organizational needs. Datadog could also be more user-friendly. We would like to see deeper application-level insight and better incident management.
Using ELK, you can gather authentication information from service providers and determine which identity provider is not performing properly. ELK is very flexible; you can have a number of scenarios and get logs from all of them. ELK is a very cost-effective solution.
But it can be a complex solution to use. Better integration with 3rd party APMs would be very helpful. Currently, upgrades with ELK are released as stacks. Plugins or extensions would save removal and reinstallation and make the process move seamlessly.
Conclusion
We researched both of these solutions and concluded Datadog gave us the best visibility, better integration, and helpful, timely support. The logs and error reporting are extremely useful to conduct analysis and root cause analysis. The setup, ease of use, and flexibility with dashboard creation and reporting are just some of the things that our team liked about Datadog.
It depends on your requirement. If you are looking for a SIEM/log management solution ELK would be a better option.
But if you are looking for more of a monitoring solution Datadog would be better. Also, Datadog provides out-of-the-box integrations with a lot of cloud applications. ELK could be cost-effective but a bit challenging to configure & finetune.
Datadog: Unify logs, metrics, and traces from across your distributed infrastructure. Datadog is the leading service for cloud-scale monitoring. It is used by IT, operations, and development teams who build and operate applications that run on dynamic or hybrid cloud infrastructure. Start monitoring in minutes with Datadog!
Datadog features offered are:
200+ turn-key integrations for data aggregation
Clean graphs of StatsD and other integrations
Elasticsearch: Open Source, Distributed, RESTful Search Engine. Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
Elasticsearch provides the following key features:
Distributed and Highly Available Search Engine.
Multi Tenant with Multi Types.
Various set of APIs including RESTful
@Shibu Babuchandran thanks @Shibu for your valuable comments.
Dear,
Unfortunately, I can't say much about Datadog but I have used ELK for a short period.
And I can tell you not everything works the way it should. For example, I noticed heavy CPU usage for a Windows client on MS AD servers. I advise you to consider this if it's important to you.
Good luck!
Where do you want to spend your money, on people or licenses?
ELK requires a long-term investment in engineering resources to manage the system and to provide the capability.
Datadog provides capabilities for you so you only need some administrators. What are the capabilities? Some critical ones include availability, scalability, consuming log files, platform upgrades, ...
If you are consuming smaller data sets (100's of GB) with shorter retention, the size and scaling are much easier making ELK easier.
Do you have admins or engineers? If your team doesn't have dedicated time & skills to spend developing solutions like elastic-alert you should look for a vendor to provide capabilities.
I expect some capabilities in Datadog you will not be able to replicate in ELK.... so that answer makes this obvious.
We are going to evaluate the same for our org. We do about 10 TB a day consumption in ELK and are looking to see if we can shift $$$ from engineers and infra to SaaS.
I have used both ELK and Datadog, and there are lots of variables to consider here. The three important points that I looked at are:
- Cost. In addition to service costs, you have to consider egress and ingress costs as well.
- Real-time observability that you need during development vs long-term Observability. Keep in mind, when you export data over the internet, it comes with the same reliability issues as any other service on the internet. Regardless of how Datadog classifies its service as real-time, it is not real-time, IMO. It very much depends on your definition of real-time.
- Deployment and maintenance complexity. When your ELK cluster grows it has some pain points you need to be aware of.
My general approach is to deploy ELK for development, tune the data, and then pivot toward commercial solutions if I need to. This gives you insight into your data and what you should be preserving and that way you are not paying high costs, when or if you do decide to take advantage of a commercial solution.
Can you tell me what you actually want to do so that I can help you?
@it_user860751 I want to store application logs, metrics , kafka queue stats topic level, in short looking for best APM solution in terms of cost, efficiency , scalibility. Nearly 6GB- 7GB data daily basis i need to logs as part of app logs.