Data processing using Cloudera Distribution for Hadoop mainly involves fflow and Nifi for workflow orchestration and Kafka for messaging queuing. We use fflow for orchestration and Spark for real-time and batch-mode processing. I rate the overall product an eight out of ten.
Senior Business Development Manager at BBI Consultancy
Real User
Top 10
2024-03-21T10:28:21Z
Mar 21, 2024
Speaking about the security features of the tool, I feel that it is a very secure system, but I cannot comment more on it since I don't have a technical background. The product follows international security guidelines to comply with the PII data and other kinds of regulated data for its end customers. I recommend that those planning to use the solution examine their environment and its complexities. There are cheaper tools in the market since everybody is not well-suited to using Cloudera Distribution for Hadoop. All the large enterprises' on-premise architecture definitely needs to have the tool. As most of our company's customers are now moving to the cloud, Cloudera's role in their environments has been reduced. The benefits of the solution stem from the fact that it is a tool for big data management that can host multiple technologies. The other benefit of Cloudera is that you can use it to support your AI or artificial intelligence initiatives since the tool can host different data warehouses or data lakes, which provides you with the flexibility of hosting an AI solution on top of it. Customers can leverage Cloudera platform for their AI initiatives. There has been an increase in hardware utilization over the years, so the servers, hardware, memory, IOPS, and CPU required need to be much more efficient than in the past. I rate the tool a nine out of ten.
For using Cloudera, it depends on what you want to use it for. If you're looking for something easy to manage and operate in the cloud environment, then Cloudera is a good option. You don't need to do much; you can just deploy it and go. From my perspective, it depends on your use case and how you see your data needs, as well as how you manage cloud data technologies and work with different departments, teams, and identity features. If Cloudera satisfies your requirements and you have no issues with it, then go for it. Overall, I would rate it a seven out of ten.
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
I give Cloudera Distribution for Hadoop an eight out of ten. We have approximately 20 users, including technicians, developers, data scientists, and external users. Due to the complexity of the Cloudera Distribution for Hadoop, I recommend utilizing a cloud-based solution managed by the cloud provider's operations. On-premise solutions can be intricate in terms of configuration and the number of servers required, adding to their overall complexity. The Cloudera Distribution for Hadoop is best suited for large organizations due to its considerable complexity. Therefore, it is always preferable to find a cloud-based solution.
Technical Presales Engineer at a tech services company with 51-200 employees
Reseller
Top 20
2023-03-27T13:44:29Z
Mar 27, 2023
Cloudera is a cost-effective solution if you need more storage space. In this case, I advise you to opt for it. I rate the solution as an eight out of ten.
Head of Big Data and Analytics Competency center at OTP Bank Hungary
Real User
Top 20
2022-11-04T13:34:09Z
Nov 4, 2022
It's important to have in-house, stable operations that are not dependent on external parties that can make things very tricky. The reliability of the cluster from our perspective is due to the fact that we have a very good in-house administrator team that can react quickly if they can see something is going wrong. I rate this solution eight out of 10.
Vice President at a financial services firm with 10,001+ employees
Real User
2022-04-29T11:53:00Z
Apr 29, 2022
I'm using Cloudera Distribution for Hadoop. The advice I would give to others looking into implementing or using Cloudera Distribution for Hadoop is for them to opt for a cloud variant, particularly something scalable for Azure, because of the ease of deployment and ease of setup. Procuring Cloudera Distribution for Hadoop is also a challenge unless the customer goes for its cloud version. I would rate Cloudera Distribution for Hadoop six out of ten because of its limited features. If they can enhance their feature list, that would improve their score.
I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features. We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.
The CDP I used was almost 2.5 years ago on-premise. I would rate it 8/10. I did not have much to compare against in those days and due to Cloud not accessible in my organisation. But, definitely CDP was a good choice then wrt to open source distribution. The installation was a bit of challenge for on-premise but once done it was quite good, not many issues you would encounter. The dashboards and selection of tools was quite good.
AD - Associate Director at a financial services firm with 10,001+ employees
Real User
2020-09-13T07:02:21Z
Sep 13, 2020
I am a part of security and software development. We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component. I would rate this solution a five out of ten.
Data engineer at a tech services company with 11-50 employees
Real User
2020-03-25T15:24:00Z
Mar 25, 2020
In terms of the advice, I would say to focus on what tools are available on the market. In terms of open-source, most companies are delivering open source technologies and providing support to these tools. Now I have the option to purchase a license for whatever platform for $1. I can deliver it with another small company at a lower cost. If I was the decision-maker, I'd invest in open-source tools. Cloudera and all of these companies are trying to adapt to these big data technologies and open source tools. Cloudera is trying to put it inside their platform so that we can have a compatible solution. I would rate it an eight out of ten.
This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies. My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful. I would rate this solution a seven out of ten.
DBA team manager at a financial services firm with 1,001-5,000 employees
Real User
2019-07-16T05:40:00Z
Jul 16, 2019
I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools.
I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems. I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.
I would say that the product as it currently is should rate at an eight out of ten. The reason that score is not higher is because of the workarounds that we have to do when it comes to certain models that do not support using multiple programming languages. For example, in a single notebook, it is inflexible if you want to use other program languages. As far as other advice for people considering this solution, I would say take a good look at your business need before you decide on this technology and which solution to choose. Make sure that you are not already able to solve for your particular, identified needs using your existing technology before even considering a change. You want to be sure you're applying the technology to the right business case because of actual need and not just change for change's sake.
Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.
Data processing using Cloudera Distribution for Hadoop mainly involves fflow and Nifi for workflow orchestration and Kafka for messaging queuing. We use fflow for orchestration and Spark for real-time and batch-mode processing. I rate the overall product an eight out of ten.
Speaking about the security features of the tool, I feel that it is a very secure system, but I cannot comment more on it since I don't have a technical background. The product follows international security guidelines to comply with the PII data and other kinds of regulated data for its end customers. I recommend that those planning to use the solution examine their environment and its complexities. There are cheaper tools in the market since everybody is not well-suited to using Cloudera Distribution for Hadoop. All the large enterprises' on-premise architecture definitely needs to have the tool. As most of our company's customers are now moving to the cloud, Cloudera's role in their environments has been reduced. The benefits of the solution stem from the fact that it is a tool for big data management that can host multiple technologies. The other benefit of Cloudera is that you can use it to support your AI or artificial intelligence initiatives since the tool can host different data warehouses or data lakes, which provides you with the flexibility of hosting an AI solution on top of it. Customers can leverage Cloudera platform for their AI initiatives. There has been an increase in hardware utilization over the years, so the servers, hardware, memory, IOPS, and CPU required need to be much more efficient than in the past. I rate the tool a nine out of ten.
Day-to-day maintenance is simple. We have two technical staff to take care of the solution. Overall, I rate the solution a nine out of ten.
Overall, I rate the solution an eight out of ten.
I recommend the solution. Overall, I rate it an eight out of ten.
For using Cloudera, it depends on what you want to use it for. If you're looking for something easy to manage and operate in the cloud environment, then Cloudera is a good option. You don't need to do much; you can just deploy it and go. From my perspective, it depends on your use case and how you see your data needs, as well as how you manage cloud data technologies and work with different departments, teams, and identity features. If Cloudera satisfies your requirements and you have no issues with it, then go for it. Overall, I would rate it a seven out of ten.
I recommend the solution to others. Overall, I rate the solution an eight out of ten.
I give Cloudera Distribution for Hadoop an eight out of ten. We have approximately 20 users, including technicians, developers, data scientists, and external users. Due to the complexity of the Cloudera Distribution for Hadoop, I recommend utilizing a cloud-based solution managed by the cloud provider's operations. On-premise solutions can be intricate in terms of configuration and the number of servers required, adding to their overall complexity. The Cloudera Distribution for Hadoop is best suited for large organizations due to its considerable complexity. Therefore, it is always preferable to find a cloud-based solution.
Cloudera is a cost-effective solution if you need more storage space. In this case, I advise you to opt for it. I rate the solution as an eight out of ten.
I would recommend this solution to others. I rate this solution as an eight out of ten.
It's important to have in-house, stable operations that are not dependent on external parties that can make things very tricky. The reliability of the cluster from our perspective is due to the fact that we have a very good in-house administrator team that can react quickly if they can see something is going wrong. I rate this solution eight out of 10.
I would rate CDH as eight out of ten.
I'm using Cloudera Distribution for Hadoop. The advice I would give to others looking into implementing or using Cloudera Distribution for Hadoop is for them to opt for a cloud variant, particularly something scalable for Azure, because of the ease of deployment and ease of setup. Procuring Cloudera Distribution for Hadoop is also a challenge unless the customer goes for its cloud version. I would rate Cloudera Distribution for Hadoop six out of ten because of its limited features. If they can enhance their feature list, that would improve their score.
I rate this solution nine out of 10.
My advice to others is this solution can be complex. I rate Cloudera Distribution for Hadoop a seven out of ten.
I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features. We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.
I rate Cloudera Distribution for Hadoop eight out of 10.
The CDP I used was almost 2.5 years ago on-premise. I would rate it 8/10. I did not have much to compare against in those days and due to Cloud not accessible in my organisation. But, definitely CDP was a good choice then wrt to open source distribution. The installation was a bit of challenge for on-premise but once done it was quite good, not many issues you would encounter. The dashboards and selection of tools was quite good.
I would recommend this solution. I would rate Cloudera Distribution for Hadoop a nine out of ten.
I am a part of security and software development. We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component. I would rate this solution a five out of ten.
I would rate this solution a nine out of ten.
In terms of the advice, I would say to focus on what tools are available on the market. In terms of open-source, most companies are delivering open source technologies and providing support to these tools. Now I have the option to purchase a license for whatever platform for $1. I can deliver it with another small company at a lower cost. If I was the decision-maker, I'd invest in open-source tools. Cloudera and all of these companies are trying to adapt to these big data technologies and open source tools. Cloudera is trying to put it inside their platform so that we can have a compatible solution. I would rate it an eight out of ten.
This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies. My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful. I would rate this solution a seven out of ten.
I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools.
I will rate this solution a nine out of ten because nothing is ever perfect. You will always face problems, but I'm quite happy with Cloudera.
I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems. I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.
I would rate this solution seven out of 10. There's tons of room for improvement.
I would say that the product as it currently is should rate at an eight out of ten. The reason that score is not higher is because of the workarounds that we have to do when it comes to certain models that do not support using multiple programming languages. For example, in a single notebook, it is inflexible if you want to use other program languages. As far as other advice for people considering this solution, I would say take a good look at your business need before you decide on this technology and which solution to choose. Make sure that you are not already able to solve for your particular, identified needs using your existing technology before even considering a change. You want to be sure you're applying the technology to the right business case because of actual need and not just change for change's sake.