The tool doesn't support reporting, and relational databases are still the major source of reporting data. Apache Iceberg will be launched soon within the Cloudera cluster for analytical purposes. The Cloudera Machine Learning aspect could be tuned and enhanced to enable us to host some predictive analytics machine learning and AI use cases.
Senior Business Development Manager at BBI Consultancy
Real User
Top 10
2024-03-21T10:28:21Z
Mar 21, 2024
The tool's ability to be deployed on a cloud model is an area of concern where improvements are required. The tool works very well when deployed on an on-premises model. The deployment on a cloud platform is where Cloudera needs to work more. There are competitors who are way ahead of Cloudera.
We switched to Airflow because Cloudera is outdated. It's not widely used. It would be good if we had the Spark 3.5. Spark is quite old. Cloudera is now offering an alternate solution as a replacement for AWS. AWS works badly with small files. The solution is not fit for on-premise distributions. It should be containerized so we can deploy it as containers within Kubernetes. We had one upgrade from CDH to CDP, which lasted for a long time. And I would expect with containerized deployment, it would be upgraded much more quickly than we had the experience.
The governance aspect of the solution should be improved. The pricing renewal notices can also be a bit challenging for us. It requires providing a substantial amount of notice for renewal, which has been a notable difficulty in our experience.
The company is struggling to keep up with the upgrades of various components, and they are not willing to invest more in Cloudera. The company is still switching from traditional methods to cutting-edge technology. While the deployed product is generally functional, there are instances where it presents difficulties. For example, the high SPs do not allow for metadata patching once it is created in the panel. This restriction limits our ability to make changes to the metadata. I am aware that some companies are using open-source alternatives, which offer more flexibility. So, product maturity with cutting-edge technology will take more time. The primary concern is the cost. If you have the budget and are willing to pay for it, then it's fine. However, if we don't want to spend more money, it's not the best option.
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
Cloudera Distribution for Hadoop is not always completely stable in some cases, which can be a concern for big data solutions. Sometimes, there are problems with the network, and, of course, there can be communication issues with Active Directory or similar systems due to authorization scheduling, resulting in occasional problems. The implementation process is quite complex because of the schedules.
Technical Presales Engineer at a tech services company with 51-200 employees
Reseller
Top 20
2023-03-27T13:44:29Z
Mar 27, 2023
They should work on the solution's pricing. Also, finding resources with good experience in the solution is difficult. Thus, they should upgrade their technical capabilities in the market. They should add features like AutoML and AutoDev for enhanced machine-learning experiences. In addition, they should consider developing an integration capability similar to Informatica for an end-to-end enterprise solution.
Head of Big Data and Analytics Competency center at OTP Bank Hungary
Real User
2022-11-04T13:34:09Z
Nov 4, 2022
The Cloudera training is terrible. Five years ago, they had up-to-date training material and instructor-led courses that were pretty good. These days, the material is outdated and the training is very expensive and irrelevant to the new platform. It's hard to gather the necessary information for administrators or developers. We now apply for training hosted by other companies such as the UDME course which is better than Cloudera. Their professional service is also something that has a lower quality nowadays. What is really missing is a well-designed UI where people can get insight into data. We don't feel that Cloudera has a good SQL UI and there is a lot of room for improvement.
AI & Data Engineering Lead at a tech services company with 10,001+ employees
Real User
2022-05-20T12:33:27Z
May 20, 2022
Cloudera's prices are too high and are not competitive with other solutions. They could also improve the Data Science Workbench and add some more features, like wizard activities.
Vice President at a financial services firm with 10,001+ employees
Real User
2022-04-29T11:53:00Z
Apr 29, 2022
The setup and administration were not easy with Cloudera Distribution for Hadoop. They could be improved. The solution has a limited feature list, so having more features is something I'd like to see in the next release of Cloudera Distribution for Hadoop.
The only thing that needs improvement is the cost, it's a very expensive solution and one of the main reasons companies are not attracted to the product.
Integration is one of the main things we struggle with because we're working with several other environments. For example, we've got an MPP environment outside the Hadoop environment. Many cloud-based platforms like Azure are fully integrated with technology that gives you MPP machine learning and data lakes all in one environment. We've got on-premises IBM solutions and Cloudera, so it isn't easy to integrate. It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform. And ideally, we should get as much raw data as possible into the platform before we can do the engineering, so we have machine learning and model training.
AD - Associate Director at a financial services firm with 10,001+ employees
Real User
2020-09-13T07:02:21Z
Sep 13, 2020
The performance can be improved. We have experienced some performance issues. It is not as sophisticated as Oracle Sybase. Currently, we are using many other tools such as Spark and Blade Job to improve the performance. The setup could be simplified, it's complex. The security needs to be improved.
Data engineer at a tech services company with 11-50 employees
Real User
2020-03-25T15:24:00Z
Mar 25, 2020
We're processing a huge amount of data on our system. Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment. Cloudera is trying to adopt new technologies. I think the idea of open source tools now is dominating. So Cloudera has to decide how to deal with open-source tools. I subscribe to Cloudera to get an enterprise version but I have found that I can get some of its features from other vendors that would be at a lower cost than Cloudera. They should lower the price.
There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon. When we are upgrading CDH, there are many things that need to be upgraded and it would be helpful if it were bundled. As it is now, we have to upgrade many different things separately.
DBA team manager at a financial services firm with 1,001-5,000 employees
Real User
2019-07-16T05:40:00Z
Jul 16, 2019
I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools.
Senior Consultant & Training at a tech services company with 51-200 employees
Consultant
2019-07-16T05:40:00Z
Jul 16, 2019
We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that.
The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it. In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.
The Data Science Workbench doesn't support multiple languages. It needs to support multiple programming languages. We were trying to use Scalar and Python for some solutions we wanted to deploy, but they didn't work properly. As a result, we had to come up with other workaround solutions. If the Data Science Workbench supported multiple programming languages our workflow would be easier and the solutions could be better. Another aspect we would like to see improved is better opportunities for integration. For example, we would like to use H2O machine learning, which is an open-source product, and Cloudera doesn't support H2O. If they could support H2O and also deploy multi-language support on the Cloudera Data Science that would be great. But the biggest thing that would help right now is H2O support. Finally, one other improvement I would suggest is integrating data privacy software into Cloudera. It is not quite complete in this aspect.
Lead Consultant - Product Development at FIS (http://www.fisglobal.com/)
Real User
2019-01-23T17:11:00Z
Jan 23, 2019
As such in the product side, I don't have much to comment. But like other upcoming technologies like RPA, AI, GO etc they have ample training materials with variety of USE Cases, which users can understand and aligned with their current requirements. On same ground I didn't see much training materials from Cloudera.
Cloudera Distribution for Hadoop is the world's most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH is 100% Apache-licensed open source and is the only Hadoop solution to offer unified batch processing, interactive SQL, and interactive search, and role-based access controls. More enterprises have downloaded CDH than all other such distributions combined.
The tool doesn't support reporting, and relational databases are still the major source of reporting data. Apache Iceberg will be launched soon within the Cloudera cluster for analytical purposes. The Cloudera Machine Learning aspect could be tuned and enhanced to enable us to host some predictive analytics machine learning and AI use cases.
The tool's ability to be deployed on a cloud model is an area of concern where improvements are required. The tool works very well when deployed on an on-premises model. The deployment on a cloud platform is where Cloudera needs to work more. There are competitors who are way ahead of Cloudera.
Pricing could be improved.
We switched to Airflow because Cloudera is outdated. It's not widely used. It would be good if we had the Spark 3.5. Spark is quite old. Cloudera is now offering an alternate solution as a replacement for AWS. AWS works badly with small files. The solution is not fit for on-premise distributions. It should be containerized so we can deploy it as containers within Kubernetes. We had one upgrade from CDH to CDP, which lasted for a long time. And I would expect with containerized deployment, it would be upgraded much more quickly than we had the experience.
The governance aspect of the solution should be improved. The pricing renewal notices can also be a bit challenging for us. It requires providing a substantial amount of notice for renewal, which has been a notable difficulty in our experience.
The company is struggling to keep up with the upgrades of various components, and they are not willing to invest more in Cloudera. The company is still switching from traditional methods to cutting-edge technology. While the deployed product is generally functional, there are instances where it presents difficulties. For example, the high SPs do not allow for metadata patching once it is created in the panel. This restriction limits our ability to make changes to the metadata. I am aware that some companies are using open-source alternatives, which offer more flexibility. So, product maturity with cutting-edge technology will take more time. The primary concern is the cost. If you have the budget and are willing to pay for it, then it's fine. However, if we don't want to spend more money, it's not the best option.
The dashboard could be improved.
Cloudera Distribution for Hadoop is not always completely stable in some cases, which can be a concern for big data solutions. Sometimes, there are problems with the network, and, of course, there can be communication issues with Active Directory or similar systems due to authorization scheduling, resulting in occasional problems. The implementation process is quite complex because of the schedules.
They should work on the solution's pricing. Also, finding resources with good experience in the solution is difficult. Thus, they should upgrade their technical capabilities in the market. They should add features like AutoML and AutoDev for enhanced machine-learning experiences. In addition, they should consider developing an integration capability similar to Informatica for an end-to-end enterprise solution.
The pricing needs to improve. If the price was affordable, then we might have continued using Cloudera. We switched to HPE because of the cost.
The Cloudera training is terrible. Five years ago, they had up-to-date training material and instructor-led courses that were pretty good. These days, the material is outdated and the training is very expensive and irrelevant to the new platform. It's hard to gather the necessary information for administrators or developers. We now apply for training hosted by other companies such as the UDME course which is better than Cloudera. Their professional service is also something that has a lower quality nowadays. What is really missing is a well-designed UI where people can get insight into data. We don't feel that Cloudera has a good SQL UI and there is a lot of room for improvement.
Cloudera's prices are too high and are not competitive with other solutions. They could also improve the Data Science Workbench and add some more features, like wizard activities.
The setup and administration were not easy with Cloudera Distribution for Hadoop. They could be improved. The solution has a limited feature list, so having more features is something I'd like to see in the next release of Cloudera Distribution for Hadoop.
The only thing that needs improvement is the cost, it's a very expensive solution and one of the main reasons companies are not attracted to the product.
The procedure for operations could be simplified.
The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS.
Integration is one of the main things we struggle with because we're working with several other environments. For example, we've got an MPP environment outside the Hadoop environment. Many cloud-based platforms like Azure are fully integrated with technology that gives you MPP machine learning and data lakes all in one environment. We've got on-premises IBM solutions and Cloudera, so it isn't easy to integrate. It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform. And ideally, we should get as much raw data as possible into the platform before we can do the engineering, so we have machine learning and model training.
It could be faster and more user-friendly.
There are better solutions out there that have more features than this one.
The performance can be improved. We have experienced some performance issues. It is not as sophisticated as Oracle Sybase. Currently, we are using many other tools such as Spark and Blade Job to improve the performance. The setup could be simplified, it's complex. The security needs to be improved.
The price of this solution could be lowered.
We're processing a huge amount of data on our system. Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment. Cloudera is trying to adopt new technologies. I think the idea of open source tools now is dominating. So Cloudera has to decide how to deal with open-source tools. I subscribe to Cloudera to get an enterprise version but I have found that I can get some of its features from other vendors that would be at a lower cost than Cloudera. They should lower the price.
There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon. When we are upgrading CDH, there are many things that need to be upgraded and it would be helpful if it were bundled. As it is now, we have to upgrade many different things separately.
I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools.
We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that.
The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it. In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.
The Data Science Workbench doesn't support multiple languages. It needs to support multiple programming languages. We were trying to use Scalar and Python for some solutions we wanted to deploy, but they didn't work properly. As a result, we had to come up with other workaround solutions. If the Data Science Workbench supported multiple programming languages our workflow would be easier and the solutions could be better. Another aspect we would like to see improved is better opportunities for integration. For example, we would like to use H2O machine learning, which is an open-source product, and Cloudera doesn't support H2O. If they could support H2O and also deploy multi-language support on the Cloudera Data Science that would be great. But the biggest thing that would help right now is H2O support. Finally, one other improvement I would suggest is integrating data privacy software into Cloudera. It is not quite complete in this aspect.
The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better.
As such in the product side, I don't have much to comment. But like other upcoming technologies like RPA, AI, GO etc they have ample training materials with variety of USE Cases, which users can understand and aligned with their current requirements. On same ground I didn't see much training materials from Cloudera.