Try our new research platform with insights from 80,000+ expert users
Miodrag-Stanic - PeerSpot reviewer
Senior Architect at a comms service provider with 1,001-5,000 employees
Real User
Top 5
You can manage all services from one place in an integrated manner
Pros and Cons
  • "We had a data warehouse before all the data. We can process a lot more data structures."
  • "The solution is not fit for on-premise distributions."

What is our primary use case?

We share company data leaks based on cloud data on their clusters.

How has it helped my organization?

We had a data warehouse before all the data. We can process a lot more data structures.

What is most valuable?

The solution has data managers. You can manage all services from one place in an integrated manner. You don't have to manage the other services separately by Spark, etc.

What needs improvement?

We switched to Airflow because Cloudera is outdated. It's not widely used. It would be good if we had the Spark 3.5. Spark is quite old. Cloudera is now offering an alternate solution as a replacement for AWS. AWS works badly with small files.

The solution is not fit for on-premise distributions. It should be containerized so we can deploy it as containers within Kubernetes. We had one upgrade from CDH to CDP, which lasted for a long time. And I would expect with containerized deployment, it would be upgraded much more quickly than we had the experience.

Buyer's Guide
Cloudera Distribution for Hadoop
April 2025
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.
848,989 professionals have used our research since 2012.

For how long have I used the solution?

I have been using Cloudera Distribution for Hadoop for two years. We are using the V7.1.7 of the solution.

What do I think about the stability of the solution?

The product is stable.

I rate the solution’s stability an eight out of ten.

What do I think about the scalability of the solution?

We have a team of four data engineers and seven data scientists. We all work on the system, but machine learning models are delivered to other systems in our company. We have 10-15 people using this solution in total. There are five data discovery engineers using Jupyter portals.

We scaled up from six data nodes to 12 data nodes. It took considerable time because we did it during the COVID. We had to wait for eternity to get the servers. When servers came, it took maybe three weeks.

I rate the solution’s scalability a seven out of ten.

How are customer service and support?

We had back and forth with Cloudera support. We had to schedule the meetings and workshops, which took over a month. We had several sessions of writing. They called an engineer to fix the issues.

How would you rate customer service and support?

Neutral

How was the initial setup?

We had a vendor. They have some issues with the installation, especially with the upgrade. When we upgraded from CDH to CDP on-premise, they had problems with the user interface and user authorization.

Deployment depends on the projects. It takes maybe a month for migration to CDP.

What about the implementation team?

There was a local third-party vendor involved in the installation process.

What's my experience with pricing, setup cost, and licensing?

The solution's license price increased five times because of CDH. We set the licensing levels like data engineering, an enterprise data hub, data science, and data engineering, and then when they moved to CDP, none of this was possible anymore. It's way more expensive now.

I rate the product's pricing a two out of ten, where one is cheap, and ten is expensive.

What other advice do I have?

Overall, I rate the solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1272822 - PeerSpot reviewer
AD - Associate Director at a financial services firm with 10,001+ employees
Real User
Feature rich and scalable with good support, but there are performance issues and the security could be improved
Pros and Cons
  • "The main advantage is the storage is less expensive."
  • "Currently, we are using many other tools such as Spark and Blade Job to improve the performance."

What is our primary use case?

We are using this solution for storing Big Data in one centralized location.

How has it helped my organization?

It has been helpful in allowing data storage in one centralized location with data lakes and all of the surrounding applications.

All of the data processes are being stored into the Big Data Lake.

What is most valuable?

It allows us to store huge amounts of data, which is an advantage.

They have BI (Business Intelligence) tools. There are many AI tools.

We are able to connect and analyze the data to get reports. The reports are very good.

The main advantage is the storage is less expensive.

What needs improvement?

The performance can be improved. We have experienced some performance issues. It is not as sophisticated as Oracle Sybase.

Currently, we are using many other tools such as Spark and Blade Job to improve the performance.

The setup could be simplified, it's complex.

The security needs to be improved.

For how long have I used the solution?

I have been using this solution since 2015.

What do I think about the stability of the solution?

It's a stable solution.

What do I think about the scalability of the solution?

Scalability is good. It's replicated and by default, with Big Data there is a replication factor.

Over the years we have grown, when we started we had 10 nodes now we have increased to a large number of nodes.

How are customer service and technical support?

Technical support is good. I have been able to learn from them. As a developer, I am learning every day.

I would rate the technical support a ten out of ten.

Which solution did I use previously and why did I switch?

Previously we were using Oracle Sybase SQL. We switched because now, we have introduced Big Data.

How was the initial setup?

The initial setup was complex.

It's not as simple as Oracle Sybase.

It's a complex architecture because you have raw data and many engines.

What's my experience with pricing, setup cost, and licensing?

When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive.

What other advice do I have?

I am a part of security and software development. 

We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component.

I would rate this solution a five out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Cloudera Distribution for Hadoop
April 2025
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.
848,989 professionals have used our research since 2012.
Hamid M. Hamid - PeerSpot reviewer
Data architect at Banking Sector
Real User
Top 5Leaderboard
The integration across the Hadoop ecosystem and the interoperability across other distributions is mature

What is our primary use case?

There are multiple use cases of Cloudera. It is a big data platform where we collect all the data and connect other sources to get data from multiple sources. Cloudera has a Data Lake.

What is most valuable?

Data Lake is mature. Their integration across the Hadoop ecosystem and the interoperability across other distributions is mature. The technical skill set for Cloudera is available. They enhance their roadmap quarterly and provide new features to enhance current functionalities and capabilities. They are capitalizing on their product and have a clear roadmap.

What needs improvement?

Pricing could be improved.

For how long have I used the solution?

I have been using Cloudera Distribution for Hadoop for six years.

What do I think about the stability of the solution?

The product is stable.

What do I think about the scalability of the solution?

The solution is scalable. You can add many whenever you need.

We have 100s of users using this solution. We have plans to increase the usage.

How are customer service and support?

Customer support is supportive and proactive. They engage you for patching and upgrades.

Which solution did I use previously and why did I switch?

I have used HPE Ezmeral Data Fabric. The difference is compatibility. HPE lacks compatibility with Informatica, while Cloudera is compatible.

How was the initial setup?

The initial setup is easy.

What was our ROI?

The value is huge. We have about 30 use cases.

What's my experience with pricing, setup cost, and licensing?

The product comes with an annual subscription, which is expensive. They are bundling technologies together. You have to pay an extra cost if you need the technology out of the base license.

What other advice do I have?

Day-to-day maintenance is simple. We have two technical staff to take care of the solution.

Overall, I rate the solution a nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Thishen Govender - PeerSpot reviewer
BI Manager at Discovery Health
Real User
Top 10
Includes several useful proprietary tools
Pros and Cons
  • "CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
  • "It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."

How has it helped my organization?

CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools. 

What needs improvement?

Integration is one of the main things we struggle with because we're working with several other environments. For example, we've got an MPP environment outside the Hadoop environment. Many cloud-based platforms like Azure are fully integrated with technology that gives you MPP machine learning and data lakes all in one environment. We've got on-premises IBM solutions and Cloudera, so it isn't easy to integrate. It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform. And ideally, we should get as much raw data as possible into the platform before we can do the engineering, so we have machine learning and model training.

For how long have I used the solution?

I've been using CDH for about two years, or rather, I manage the team that uses it.

What do I think about the stability of the solution?

We haven't had any issues with Cloudera. It's a solid product. 

What do I think about the scalability of the solution?

Cloudera is dependable, and it's completely scalable.

How are customer service and support?

We have engaged the technical support based in the UK. My team hasn't worked with them directly, but the administration team has. To my knowledge, they're fairly responsive. 

What other advice do I have?

I rate Cloudera Distribution for Hadoop eight out of 10.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Atif Tariq - PeerSpot reviewer
Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East
Real User
Top 5Leaderboard
Easy-to-manage solutions with strong support, but faces challenges with upgrades, flexibility, and high costs
Pros and Cons
  • "Customer service and support were able to fix whatever the issue was."
  • "While the deployed product is generally functional, there are instances where it presents difficulties."

What is most valuable?

It offers a pre-build distribution. Even the support is there for the solution, whatever the issue is that you face. And apart from that, they have a platinum best practice to manage tools and all these things.  

What needs improvement?

The company is struggling to keep up with the upgrades of various components, and they are not willing to invest more in Cloudera.

The company is still switching from traditional methods to cutting-edge technology. While the deployed product is generally functional, there are instances where it presents difficulties. For example, the high SPs do not allow for metadata patching once it is created in the panel. This restriction limits our ability to make changes to the metadata.

I am aware that some companies are using open-source alternatives, which offer more flexibility. So, product maturity with cutting-edge technology will take more time.  

The primary concern is the cost. If you have the budget and are willing to pay for it, then it's fine. However, if we don't want to spend more money, it's not the best option.

For how long have I used the solution?

I've used it. 

How are customer service and support?

Customer service and support were able to fix whatever the issue was. But they haven't used the solution, so they cannot do anything in whatever is implementation. Because they are using self-managed open-source technology. There's no order or feature development by now. So wherever there's a limitation from the product, like the tool side, the kind of environment, they are not able to do anything.

How would you rate customer service and support?

Positive

What's my experience with pricing, setup cost, and licensing?

It is an expensive product.

What other advice do I have?

For using Cloudera, it depends on what you want to use it for. If you're looking for something easy to manage and operate in the cloud environment, then Cloudera is a good option. 

You don't need to do much; you can just deploy it and go. From my perspective, it depends on your use case and how you see your data needs, as well as how you manage cloud data technologies and work with different departments, teams, and identity features. If Cloudera satisfies your requirements and you have no issues with it, then go for it.

Overall, I would rate it a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sayyed Aadil - PeerSpot reviewer
Hadoop Admin at Tata Consultancy
Real User
A great solution for gathering, storing and processing data
Pros and Cons
  • "It is helpful to gather and process data."
  • "There are multiple bugs when we update."

What is our primary use case?

It is helpful to gather and process data.

How has it helped my organization?

We used to collect data in small cases, and with Cloudera Distribution for Hadoop, we used data on large scales. It helps to store and protect the data and is helpful for processing.

What is most valuable?

The Cloudera Distribution for Hadoop is valuable.

What needs improvement?

There are multiple bugs when we update.

For how long have I used the solution?

We have been using this solution for three and a half years and using version 6.3. It is deployed on-premises.

What do I think about the stability of the solution?

It is a stable solution.

What do I think about the scalability of the solution?

It is a scalable solution. It is the best solution for larger companies. There are about 3000 users, and there are medical teams for medical data with about 1000 users. We require about six people for maintenance and deployment.

How are customer service and support?

Cloudera's support is helpful, and I rate the technical support a nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup was straightforward and not an issue.

What was our ROI?

We have seen a return on investment.

What other advice do I have?

I rate this solution a nine out of ten, and it is a helpful solution.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1850319 - PeerSpot reviewer
Vice President at a financial services firm with 10,001+ employees
Real User
Stores large volumes of data and makes log analytics, monitoring, and management easier, but its feature list is limited
Pros and Cons
  • "We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
  • "Cloudera Distribution for Hadoop has a limited feature list and a lot of costs involved."

What is our primary use case?

In my previous organization, we used Cloudera Distribution for Hadoop
for compiling website logs and application logs. We used it for log analytics.

How has it helped my organization?

We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization.

What is most valuable?

The feature I found most valuable in Cloudera Distribution for Hadoop is the Cloudera Manager. It's a good component because it makes log management easy. It's really useful as a management and monitoring console.

What needs improvement?

The setup and administration were not easy with Cloudera Distribution for Hadoop. They could be improved.

The solution has a limited feature list, so having more features is something I'd like to see in the next release of Cloudera Distribution for Hadoop.

For how long have I used the solution?

I've been using Cloudera Distribution for Hadoop for two years. I'm still using it.

What do I think about the stability of the solution?

Cloudera Distribution for Hadoop seems to be a stable product.

What do I think about the scalability of the solution?

Cloudera Distribution for Hadoop is really easy to scale. We can add more servers to it, so it's scalable.

How are customer service and support?

I don't have experience contacting the technical support team of Cloudera Distribution for Hadoop.

How was the initial setup?

The initial setup for Cloudera Distribution for Hadoop was easy for us because we outsourced the work to the vendor. All the nitty-gritty was taken care of by them.

What about the implementation team?

We implemented Cloudera Distribution for Hadoop through the vendor. Deployment was done by an integrator. It usually doesn't take a lot of time. It usually takes just a day to deploy the solution.

Our implementation strategy for Cloudera Distribution for Hadoop was more into outsourcing. For example, the hardware, including its management, was outsourced, so the admin, data management, support, etc., were also outsourced. We were looking into having the application done in-house, with the team. We were looking at a one-year implementation plan to move more and more governance and data sets into Cloudera Distribution for Hadoop. Every quarter, we planned to have other features reintroduced into the platform.

Two people did the installation and two people did the deployment. It was deployed in a single location, and we initially had ten users of Cloudera Distribution for Hadoop.

What was our ROI?

It's tricky to derive the ROI from Cloudera Distribution for Hadoop, because in analytics, it's a little difficult to determine that this is the investment, and we're increasing the footprints and the revenue. It's very difficult to evaluate.

What's my experience with pricing, setup cost, and licensing?

Cloudera Distribution for Hadoop is expensive. There are a lot of costs involved. For example: apart from the standard licensing fees, there are support costs involved, and support could be for three years, five years, etc., so support is a pretty large part of the contract.

Which other solutions did I evaluate?

We didn't evaluate other options before choosing Cloudera Distribution for Hadoop.

What other advice do I have?

I'm using Cloudera Distribution for Hadoop.

The advice I would give to others looking into implementing or using Cloudera Distribution for Hadoop is for them to opt for a cloud variant, particularly something scalable for Azure, because of the ease of deployment and ease of setup. Procuring Cloudera Distribution for Hadoop is also a challenge unless the customer goes for its cloud version.

I would rate Cloudera Distribution for Hadoop six out of ten because of its limited features. If they can enhance their feature list, that would improve their score.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
CEO at AM-BITS LLC
Real User
Top 5
Good enterprise platform with good stability
Pros and Cons
  • "It has the best proxy, security, and support features compared to open-source products."
  • "The areas of improvement depend on the scale of the project. For banking customers, security features and an essential budget for commercial licenses would be the top priority. Data regulation could be the most crucial for a project with extensive data or an extra use case."

What is most valuable?

It is a good enterprise platform. It is easier and more stable. Additionally, it has the best proxy, security, and support features compared to open-source products.

What needs improvement?

The areas of improvement depend on the scale of the project. For banking customers, security features and an essential budget for commercial licenses would be the top priority. Data regulation could be the most crucial for a project with extensive data or an extra use case.

For how long have I used the solution?

We have been using Cloudera Distribution for Hadoop for a few years.

What do I think about the stability of the solution?

I rate the product’s stability a ten out of ten.

What do I think about the scalability of the solution?

We have ten customers using the product. They include data engineers, performance engineers, and environment engineers.

I rate its stability a ten out of ten.

How are customer service and support?

The product has a support subscription for one year. We use technical support only for complex use cases. We work with their team as we have direct and quick access to contact them. It helps us better understand the technical and business-related queries of the customers.

How was the initial setup?

The on-cloud version is easy to set up. Although, it is complicated to process a large amount of data for on-premises or hybrid setup. It is not a ready-to-use solution for telecom or finance technology. It requires the deployment of robust technology relying on network infrastructure.

What's my experience with pricing, setup cost, and licensing?

The product’s price depends from project to project. It is more expensive than open-source solutions and could be cheaper. However, in some cases, it is less costly than open-source.

What other advice do I have?

It is the best solution in the world at the moment. I advise others to go for it if you have an enterprise customer. I rate it a ten out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user