Cloudera Distribution for Hadoop provides a comprehensive platform for efficient data management and analytics, integrating advanced analytics tools with enterprise-grade security and hybrid cloud support.


| Product | Mindshare (%) |
|---|---|
| Cloudera Distribution for Hadoop | 14.7% |
| Apache Spark | 13.9% |
| HPE Data Fabric | 10.2% |
| Other | 61.2% |
| Type | Title | Date | |
|---|---|---|---|
| Category | Hadoop | Jun 22, 2026 | Download |
| Product | Reviews, tips, and advice from real users | Jun 22, 2026 | Download |
| Comparison | Cloudera Distribution for Hadoop vs Apache Spark | Jun 22, 2026 | Download |
| Comparison | Cloudera Distribution for Hadoop vs Amazon EMR | Jun 22, 2026 | Download |
| Comparison | Cloudera Distribution for Hadoop vs HPE Data Fabric | Jun 22, 2026 | Download |
| Title | Rating | Mindshare | Recommending | |
|---|---|---|---|---|
| Microsoft Azure Cosmos DB | 4.1 | N/A | 95% | 109 interviewsAdd to research |
| MongoDB Enterprise Advanced | 4.1 | N/A | 92% | 82 interviewsAdd to research |
| Company Size | Count |
|---|---|
| Small Business | 15 |
| Midsize Enterprise | 7 |
| Large Enterprise | 22 |
| Company Size | Count |
|---|---|
| Small Business | 108 |
| Midsize Enterprise | 34 |
| Large Enterprise | 121 |
Designed for handling vast datasets, Cloudera Distribution for Hadoop facilitates seamless data processing through its components such as Hive, Pig, and Spark. It supports both structured and unstructured data management with robust scalability and powerful data handling capabilities. While the latest version focuses on enhancing speed and integration, challenges remain with HBase stability and processing in Cloudera 5 clusters. Organizations leverage it for big data management tasks like data warehousing, log analytics, and real-time data processing using tools like Hadoop and Spark.
What are the key features of Cloudera Distribution for Hadoop?In industries such as finance, retail, and healthcare, Cloudera Distribution for Hadoop is implemented to enhance data-driven decision-making and operational efficiency. It aids in processing large volumes of data for analytics, data warehousing, and infrastructure building. Companies utilize it to streamline machine learning and log analytics, serving as a data lake for preprocessing substantial datasets.
| Author info | Rating | Review Summary |
|---|---|---|
| Head of Advaced Analytics & Intelligence; AGM at Alinma Bank | 4.0 | I've used Cloudera Distribution for Hadoop for several years for analytics and large-scale data processing. It's stable, scalable, secure, and effective, though data modification could be easier. Overall, the experience and ROI have been positive. |
| Manager, Bussines Development & Co Owner at Troia d.o.o. | 4.5 | I implemented Cloudera Distribution for Hadoop for an electrical distribution company to collect data from smart meters. It's unique for on-premises installation and offers a powerful hybrid solution. However, it's complex to configure, especially with Active Directory integration. |
| Senior Data Architect at Teradata Corporation | 4.0 | I use Cloudera Distribution for Hadoop for workflow distribution and real-time processing. Its distributed file system and unstructured data processing are valuable, but it lacks reporting support, relying on relational databases. Improvements in machine learning are needed. |
| Senior Data Archirect at Yettel | 4.0 | We use Cloudera Distribution for Hadoop to manage our data lake and big data solutions. It's similar to open-source options but benefits from Cloudera support. However, stability and complex implementation can be issues, particularly with network and authorization. |
| Senior Business Development Manager at BBI Consultancy | 4.5 | In our experience, Cloudera is ideal for on-premises big data management, offering excellent scalability through container technologies. However, its cloud deployment capabilities require improvement, as competitors currently surpass Cloudera in this area for cloud-based solutions. |
| Senior IT Application Architect at a insurance company with 5,001-10,000 employees | 3.5 | We use Cloudera Distribution for Hadoop primarily for computing with Spark, Hive, HDFS, and Impala. Its secure environment meets our protection needs, but competitors offer better functionalities. We considered Databricks for its effective cloud capabilities. |
| Senior Architect at a comms service provider with 1,001-5,000 employees | 4.0 | In my experience with Cloudera Distribution for Hadoop, managing data services centrally is beneficial, but its outdated nature led us to switch to Airflow. Cloudera lacks containerization, which complicates deployment upgrades compared to a Kubernetes-based approach. |
| BI Manager at Discovery Health | 4.0 | We use Cloudera Distribution for Hadoop primarily for machine learning, valuing its data science capabilities. However, we find the governance aspect needs improvement, and the pricing renewal notices can be challenging. We've previously worked with Hortonworks. |
| Head of Big Data and Analytics Competency center at OTP Bank Hungary | 4.0 | I use Cloudera as a cloud-independent data lake solution, valuing its strong end-to-end security. However, I find the training and professional services have declined significantly, the setup is complex, and it can become costly at scale. |
| Data architect at Banking Sector | 4.5 | I use Cloudera Distribution for Hadoop as a big data platform to consolidate data and connect multiple sources. Its mature Data Lake, strong integration, and quarterly updates enhance value, though pricing could improve. Cloudera's compatibility surpasses HPE Ezmeral Data Fabric. |
The solution offers power processing and supports different file systems and query engines. It provides parallel processing for handling many requests.
The platform includes role-based access control in Cloudera Distribution for Hadoop. It secures the data itself and provides users with different roles and privileges.
The stability rating would be eight out of ten.
Positive
We performed our enhancement evaluation a long time ago, conducted comparisons, and ultimately chose Cloudera Distribution for Hadoop.
I would rate this solution eight out of ten.

We have a solution on top of the Cloudera platform for an electrical distribution company to gather data from smart meters for energy consumption.
This is the only solution that is possible to install on-premise. Cloudera provides a hybrid solution that combines compute on cloud or on-premises. It includes all machine learning algorithms in the Spark machine learning library.
All functionalities needed for a big data platform and ETL are on the platform, eliminating the need for other tools. It is scalable, ready for vertical scaling, and very powerful, offering numerous functionalities and configurations for generative AI.
It is quite complicated to configure and install. Integrating the platform into an information system is always a challenge, especially when starting with on-premise implementation. Integrating with Active Directory, managing security, and configuration are the main concerns.
I have used the solution for six years.
The solution is quite stable.
I think it is scalable and ready for vertical expansion.
The technical support is quite good and better than IBM. When I open a ticket, they always have people ready to start working on it.
Neutral
The initial setup depends on how good the engineer is. Generally, two people are required. For Kubernetes or OpenShift, more people may be needed since a specialist for OpenShift and then for Cloudera is necessary.
The price for Cloudera is average, yet it is very good compared to other solutions. It can be deployed on-premises, unlike competitors' cloud-only solutions.
Microsoft offers solutions on Microsoft Edge, and IBM has similar products, however, they are complicated and too large.
My advice is to rate the overall solution nine out of ten.

We use the solution for workflow distribution. It's an ETL for real-time and batch-mode processing. It's mainly used for all the stuff, including data warehousing.
The tool's most interesting features are the distributed file system and unstructured data processing capability. Because we have a lot of unstructured data, like XML and social media logs, these features make it more valuable than the usual data warehousing solutions.
Data warehouse solutions mainly use structured, regular, and formatted data, but Cloudera Distribution for Hadoop can handle unstructured data. This is the most interesting part. Also, the huge amount of data can be tuned in HDFS rather than relational databases. Cloudera Distribution for Hadoop can be a promising solution for distributed file systems, real-time processing, batch mode processing, AI, and machine learning use cases.
We are using several security features in the solution. These include Linux's security implementations and its built-in firewall. We also rely on single sign-on and encryption—at rest and in transit—for sensitive data. It has access, ensuring that not everyone can use every service; for example, some users can access Hive, others Impala, and others hBase, depending on their privileges.
We also use LDAP to track who registers or logs into the cluster. Additionally, we use key nodes to manage firewalls between Cloudera Manager or the Cloudera cluster and other data sources.
The tool doesn't support reporting, and relational databases are still the major source of reporting data. Apache Iceberg will be launched soon within the Cloudera cluster for analytical purposes. The Cloudera Machine Learning aspect could be tuned and enhanced to enable us to host some predictive analytics machine learning and AI use cases.
I have been working with the product for four years.
I rate the solution's stability an eight out of ten.
The tool's scalability is high. My company has 50 users for the product. I rate its scalability a nine out of ten.
We manage the product with the Cloudera Manager. If we can't resolve something on our end, we open a ticket withsupport. They log in with us and help us determine what's going wrong. Support responds with 30 minutes to one hour.
Positive
I've worked with Teradata and Oracle Exadata and used MicroStrategy and Power BI for visualization tools.
We have some data packets for deployment. Prerequisites like hardware specs, firewall configurations, open ports, and others are predefined. There aren't many prerequisites or steps to set up a Cloudera cluster.
Data processing using Cloudera Distribution for Hadoop mainly involves fflow and Nifi for workflow orchestration and Kafka for messaging queuing. We use fflow for orchestration and Spark for real-time and batch-mode processing. I rate the overall product an eight out of ten.

Big data, from the perspective of the end-user, is almost similar to an open-source solution. However, in the case of Cloudera Distribution, we have support from the Cloudera site, which is good. Additionally, we also engage an external company, which acts as a system integrator, to provide partial involvement in the maintenance and installation of Cloudera Distribution for Hadoop implementations. Nonetheless, our team also works with it, not solely as end users but also when they are developing something on this platform.
Cloudera Distribution for Hadoop is not always completely stable in some cases, which can be a concern for big data solutions. Sometimes, there are problems with the network, and, of course, there can be communication issues with Active Directory or similar systems due to authorization scheduling, resulting in occasional problems. The implementation process is quite complex because of the schedules.
I have been using Cloudera Distribution for Hadoop for four years.
The stability of Cloudera Distribution for Hadoop is not good. There are other database clustering solutions that are more mature and stable. As a result, there are some issues with this particular solution.
The scalability of Cloudera Distribution for Hadoop is excellent. It is not a straightforward process to scale, but it is easy to deploy a new server and connect it. Once connected, the data can be distributed effectively.
The quality of the technical support depends on the consultant assigned to each issue. Initially, we may receive a person who is not very experienced. If they are unable to provide a solution, we have to escalate the issue, and then we will likely be assigned somebody who is more experienced and comes at a higher cost for the company. This is the standard procedure.
To deploy Cloudera Distribution for Hadoop properly, it may take a couple of months on average. However, for a complex deployment, it can take up to a year. In our case, we had over 12 data nodes and over 30 different servers involved in the implementation.
For the deployment, we require at least five knowledgeable people due to the complexity of the system.
I believe we pay for a three-year license.
I give Cloudera Distribution for Hadoop an eight out of ten.
We have approximately 20 users, including technicians, developers, data scientists, and external users.
Due to the complexity of the Cloudera Distribution for Hadoop, I recommend utilizing a cloud-based solution managed by the cloud provider's operations. On-premise solutions can be intricate in terms of configuration and the number of servers required, adding to their overall complexity. The Cloudera Distribution for Hadoop is best suited for large organizations due to its considerable complexity. Therefore, it is always preferable to find a cloud-based solution.

The tool is used by our company's different customers who have requirements for big data management. When our company's customers want to build a platform for big data management, they choose Cloudera as their tool and as a big data management platform even though there are different options in the market since it is best suited if they consider having an on-premises solution. If a customer wants a cloud-based solution for big data management, then there are other tools in the market that better suit their requirements. For an on-premises big data management platform, Cloudera is the best choice.
The best part of the tool is that it is able to expand horizontally and vertically when its customer wants to grow the business. The tool can be deployed using different container technologies, which makes it very scalable.
The tool's ability to be deployed on a cloud model is an area of concern where improvements are required. The tool works very well when deployed on an on-premises model. The deployment on a cloud platform is where Cloudera needs to work more. There are competitors who are way ahead of Cloudera.
I have been using Cloudera Distribution for Hadoop for five years. My company has a partnership with Cloudera.
It is a very stable solution. Stability-wise, I rate the solution a nine out of ten.
It is a scalable solution. Scalability-wise, I rate the solution a nine out of ten. Scalability depends on the environment, but it can scale up in an on-premises environment. There are challenges with its scalability on the cloud.
My company deals with around seven customers who use the product.
The technical team in my company deals with the product support team.
The ease or difficulty in setting up the product depends on the environment of the customer where the tool is deployed. If a banking, industrial, or retail sector firm is taken into concentration, depending on how big of a database is maintained, including the applications that are to be hosted, the deployment process can range from a simple to a very complex phase, depending on the architecture.
For Cloudera Distribution for Hadoop, one has to go through the usual deployment process, like for any software product. You have to have different environments before going into production, like pre-production environments, test and dev environments. You install and configure all the components in the test environment and then test them on the pre-production environment. Once UAT is done, you move them to the production environment. In general, it's a critical product deployed in a company.
The tool is expensive. Overall, it's not a cheap software tool, and that is why only large enterprises who are mature enough and have an architecture that is complex enough opt for Cloudera, as its ROI would make sense to such businesses. For the SMB market or customers whose environments are not that complex and do not have multiple systems running, Cloudera might not be a good option.
Speaking about the security features of the tool, I feel that it is a very secure system, but I cannot comment more on it since I don't have a technical background. The product follows international security guidelines to comply with the PII data and other kinds of regulated data for its end customers.
I recommend that those planning to use the solution examine their environment and its complexities. There are cheaper tools in the market since everybody is not well-suited to using Cloudera Distribution for Hadoop. All the large enterprises' on-premise architecture definitely needs to have the tool. As most of our company's customers are now moving to the cloud, Cloudera's role in their environments has been reduced.
The benefits of the solution stem from the fact that it is a tool for big data management that can host multiple technologies. The other benefit of Cloudera is that you can use it to support your AI or artificial intelligence initiatives since the tool can host different data warehouses or data lakes, which provides you with the flexibility of hosting an AI solution on top of it. Customers can leverage Cloudera platform for their AI initiatives. There has been an increase in hardware utilization over the years, so the servers, hardware, memory, IOPS, and CPU required need to be much more efficient than in the past.
I rate the tool a nine out of ten.
We use the product for computing. We mainly use Spark, Hive, HDFS, and Impala.
The product has been instrumental for all computing needs. We have a data warehouse and a data lake. We read from S3 and load it into different databases. We compute all the transformations, logic, and code we write in PySpark or Spark Scala. Spark is very valuable for data processing.
The product is completely secure. It meets our protection needs. We have a dedicated on-premise cluster. Every year, the vendor introduces new versions and supports many tools that are available. They have different hosts. They have a private cloud and a public cloud base.
The competitors provide better functionalities.
I have been using the solution for six years.
The tool’s stability is good. I rate the stability an eight or nine out of ten.
We have 2000 to 3000 users in our organization.
I rate the ease of setup a seven out of ten. The deployment takes 48 hours. We need six Hadoop administrators for the deployment.
The tool is not expensive. However, it has a cost to it. I rate the pricing a seven out of ten.
Databricks has a Runtime version. It works well with the cloud.
We have an analytics data mart. It is built on top of SQL Server. We use Spark for computing. We use SSIS and SSRS for SQL Server. There is a path set for analytics to migrate to Azure. I will recommend the solution to others. Cloudera is the best option if we need an on-premise implementation of Hadoop. If an organization wants to choose a cloud version, then Databricks might be a good option. Overall, I rate the solution a seven out of ten.

We share company data leaks based on cloud data on their clusters.
We had a data warehouse before all the data. We can process a lot more data structures.
The solution has data managers. You can manage all services from one place in an integrated manner. You don't have to manage the other services separately by Spark, etc.
We switched to Airflow because Cloudera is outdated. It's not widely used. It would be good if we had the Spark 3.5. Spark is quite old. Cloudera is now offering an alternate solution as a replacement for AWS. AWS works badly with small files.
The solution is not fit for on-premise distributions. It should be containerized so we can deploy it as containers within Kubernetes. We had one upgrade from CDH to CDP, which lasted for a long time. And I would expect with containerized deployment, it would be upgraded much more quickly than we had the experience.
I have been using Cloudera Distribution for Hadoop for two years. We are using the V7.1.7 of the solution.
The product is stable.
I rate the solution’s stability an eight out of ten.
We have a team of four data engineers and seven data scientists. We all work on the system, but machine learning models are delivered to other systems in our company. We have 10-15 people using this solution in total. There are five data discovery engineers using Jupyter portals.
We scaled up from six data nodes to 12 data nodes. It took considerable time because we did it during the COVID. We had to wait for eternity to get the servers. When servers came, it took maybe three weeks.
I rate the solution’s scalability a seven out of ten.
We had back and forth with Cloudera support. We had to schedule the meetings and workshops, which took over a month. We had several sessions of writing. They called an engineer to fix the issues.
Neutral
We had a vendor. They have some issues with the installation, especially with the upgrade. When we upgraded from CDH to CDP on-premise, they had problems with the user interface and user authorization.
Deployment depends on the projects. It takes maybe a month for migration to CDP.
There was a local third-party vendor involved in the installation process.
The solution's license price increased five times because of CDH. We set the licensing levels like data engineering, an enterprise data hub, data science, and data engineering, and then when they moved to CDP, none of this was possible anymore. It's way more expensive now.
I rate the product's pricing a two out of ten, where one is cheap, and ten is expensive.
Overall, I rate the solution an eight out of ten.

We use it for machine learning.
The data science aspect of the solution is valuable.
The governance aspect of the solution should be improved. The pricing renewal notices can also be a bit challenging for us. It requires providing a substantial amount of notice for renewal, which has been a notable difficulty in our experience.
We have been using the solution for the past five years.
The solution is highly stable. I rate it a nine out of ten.
The solution is highly scalable. I rate it a perfect ten. We have 20 users for the solution in our company.
The technical support team is great.
Positive
We have worked with Hortonworks.
The initial setup was neither straightforward nor overly complex. It took us a few days to complete. I wasn't involved in the configuration; another team handled it. There may have been technical issues related to the clusters. We have a technical team with seven administrators.
The solution is fairly expensive.
I recommend the solution. Overall, I rate it an eight out of ten.
We use this solution as a data lake, pre-processing the large amount of data we have for further consumption by relational databases or advanced analytics. We use HDFS and Spark for that purpose and we are using Cloudera Machine Learning, a Jupyter Notebook-like environment with model monitoring opportunities, model catalog, and things like that. We are customers of Cloudera and I'm head of big data and the analytics competency center.
The solution has good features connected to end-to-end security. It's difficult to ensure a safe environment so this is a primary feature and the main reason we use Cloudera is because it's cloud-independent. That's very important because we are considering partially moving our workloads to cloud and not tying ourselves to a given vendor. For instance, if you were to go with Microsoft Azure, it would be impossible to move to another cloud provider if you're not satisfied with their pricing or quality of service.
The Cloudera training is terrible. Five years ago, they had up-to-date training material and instructor-led courses that were pretty good. These days, the material is outdated and the training is very expensive and irrelevant to the new platform. It's hard to gather the necessary information for administrators or developers. We now apply for training hosted by other companies such as the UDME course which is better than Cloudera. Their professional service is also something that has a lower quality nowadays. What is really missing is a well-designed UI where people can get insight into data. We don't feel that Cloudera has a good SQL UI and there is a lot of room for improvement.
I've been using this solution for five years.
The solution is stable. Any problems we've had have been due to the underlying infrastructure. The only challenge was that it was sensitive to the different hardware configurations. So if you had to add additional notes to your cluster and this configuration was different, then it is sensitive to that. But there are workarounds available.
The scalability is fairly good, but we don't have much experience with it. Our users are mainly technical, automated users who are like second-service users. We have hundreds of service users who are taking and pushing forward the data from the data lake. Then we have 20 to 30 data scientists and advanced analytics people. We have a few users interested in Google analytics logs. In total, we'd have a maximum of 50 users.
We push them towards the direction of using BI tools and we try to make it clear that it's sometimes important to go for raw data. We are not always able to push it to the data warehouse or to a data mark for them, but it's always risky from a production process perspective because then the solution will be sensitive to source system changes.
In the past two years, we haven't needed to contact customer support.
We previously used open-source solutions and only moved to Cloudera because of security and compliance issues.
The initial setup is complex, mainly due to all the services that it encapsulates. The complexity depends on the number of components heavily used by a given client or customer. We try to restrict the number of components as much as possible. Deployment took three days with preparation and the downtime was eight hours, which can be significant if you're running a 24/7 operation. We carried out an in-place upgrade. We upgraded the existing cluster from CBH to CDP. It went fairly smoothly despite some challenges. Four administrators carried out the deployment; they are also responsible for Cloudera support.
Cloudera can become costly once you need to scale up. and more expensive than going with public cloud services. If you cross a certain threshold of nodes, you need to look for alternatives because it becomes too expensive. We are below that with around 30 to 40 cloud error compute units that we pay for.
There are not many options that are comparable to Cloudera. There is another company that I think has been acquired by Hewlett-Packard but it doesn't offer the same components as Cloudera. I can't see any competitor that can give the same set of services from the data lake level up to Cloudera Machine Learning.
It's important to have in-house, stable operations that are not dependent on external parties that can make things very tricky. The reliability of the cluster from our perspective is due to the fact that we have a very good in-house administrator team that can react quickly if they can see something is going wrong.
I rate this solution eight out of 10.

There are multiple use cases of Cloudera. It is a big data platform where we collect all the data and connect other sources to get data from multiple sources. Cloudera has a Data Lake.
Data Lake is mature. Their integration across the Hadoop ecosystem and the interoperability across other distributions is mature. The technical skill set for Cloudera is available. They enhance their roadmap quarterly and provide new features to enhance current functionalities and capabilities. They are capitalizing on their product and have a clear roadmap.
Pricing could be improved.
I have been using Cloudera Distribution for Hadoop for six years.
The product is stable.
The solution is scalable. You can add many whenever you need.
We have 100s of users using this solution. We have plans to increase the usage.
Customer support is supportive and proactive. They engage you for patching and upgrades.
I have used HPE Ezmeral Data Fabric. The difference is compatibility. HPE lacks compatibility with Informatica, while Cloudera is compatible.
The initial setup is easy.
The value is huge. We have about 30 use cases.
The product comes with an annual subscription, which is expensive. They are bundling technologies together. You have to pay an extra cost if you need the technology out of the base license.
Day-to-day maintenance is simple. We have two technical staff to take care of the solution.
Overall, I rate the solution a nine out of ten.