Try our new research platform with insights from 80,000+ expert users
it_user900987 - PeerSpot reviewer
Data Management at BCX
Real User
Offers big data support for analytical applications but the technical support needs improvement
Pros and Cons
  • "In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
  • "The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."

What is our primary use case?

We primarily use it only for big data support for analytical applications.

What is most valuable?

The feature that we've used quite intensively is Spark, in how it specifically can speed up some of the data to assist with processing.

What needs improvement?

The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.

In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.

For how long have I used the solution?

I've been using the solution since 2016.
Buyer's Guide
Cloudera Distribution for Hadoop
March 2025
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
839,422 professionals have used our research since 2012.

What do I think about the stability of the solution?

The stability is problematic. We did encounter quite a lot of issues with the cluster going down quite frequently.

What do I think about the scalability of the solution?

In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues. Currently, only about 10 people in total are using the solution. So we have about four business users and then four technical people. It's only limited to two environments.

How are customer service and support?

I think there's a lot of room for improvement on the technical support side. Mostly because we don't have a lot of local skills in South Africa that could have supported the solution. It was an issue.

Which solution did I use previously and why did I switch?

This is our first solution. We tested a bunch of other technologies, but that was our first one and we're still using it.

How was the initial setup?

The initial implementation was straightforward from an application side. There weren't any hiccups. In terms of deployment time, it's going to be difficult to say, because most of it was related to hardware problems. Software took about two months to deploy. We required four people for deployment.

What's my experience with pricing, setup cost, and licensing?

The pricing is very competitive. It's not bad.

Which other solutions did I evaluate?

We considered working with a few other companies, including IBM Bluemix.

What other advice do I have?

I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems.

I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Mohammed Hamad - PeerSpot reviewer
AI & Data Engineering Lead at a tech services company with 10,001+ employees
Real User
Flexible and comprehensive solution
Pros and Cons
  • "The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on."
  • "Cloudera's support is extremely bad and cannot be relied on."

What is our primary use case?

I primarily use CDH for data storage and regular dashboard reports.

What is most valuable?

The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on.

What needs improvement?

Cloudera's prices are too high and are not competitive with other solutions. They could also improve the Data Science Workbench and add some more features, like wizard activities.

What do I think about the stability of the solution?

CDH is stable.

What do I think about the scalability of the solution?

CDH is scalable, but it's expensive to do it.

How are customer service and support?

Cloudera's support is extremely bad and cannot be relied on.

What's my experience with pricing, setup cost, and licensing?

I wouldn't recommend CDH to others because of its high cost.

What other advice do I have?

I would rate CDH as eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Cloudera Distribution for Hadoop
March 2025
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
839,422 professionals have used our research since 2012.
it_user692565 - PeerSpot reviewer
DBA team manager at a financial services firm with 1,001-5,000 employees
Real User
Helpful to build infrastructure for advanced analytics and is easy to install
Pros and Cons
  • "The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
  • "I would like to see an improvement in how the solution helps me to handle the whole cluster."

What is our primary use case?

I'm part of the IT team at my company, and our primary use case of this solution is building infrastructure for advanced analytics, where we copy data from our data warehouse that is now our relational database. We copy it to the Cloudera Distribution for Hadoop and then analyze it with Python and machine learning. 

What is most valuable?

The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.

What needs improvement?

I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools. 

For how long have I used the solution?

I've been using this solution for about three years now.

What do I think about the stability of the solution?

It is a very stable solution.

What do I think about the scalability of the solution?

Not many people are currently using this solution at my organization, but I do believe it is scalable. I don't, however, have experience with upgrading or adding users. 

How are customer service and technical support?

My problem is that I started using Cloudera Express without technical support and then I purchased the Enterprise edition through another company. So now I don't really have access to Cloudera support, even though I hardly need to use it. 

How was the initial setup?

The initial setup was simple, but we had trouble implementing the cables in the Hadoop solution.

What other advice do I have?

I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user370224 - PeerSpot reviewer
Director of Data Management at a media company with 51-200 employees
Vendor
It gives us improved business intelligence reporting from daily to every two hours.

Valuable Features:

Faster runtime for batch jobs.

Improvements to My Organization:

Improved Business Intelligence reporting from daily to every two hours satisfying the business stakeholders who would favour transactional systems to draft reports because it had the latest data. 

The issue that arises using transactional systems with multiple version of truths across the enterprise. With faster turn-around time business stakeholders are now adopting the BI systems designed to give a cohesive view of the performance metrics important to them.

Room for Improvement:

Full Support for all Spark SQL features, support for SparkR, compatibility with Hive for DataFrame saved tables.

Cloudera CDH5.5.x does not support SparkR. SparkR, the integration of R models in API would be a great addition since this will enable fast near real-time analytical integration of R models with data feed.

The functionality in SparkSQL to save a DataFrame as a table in HIVE produces a table not compatible with HIVE. There is a workaround for this in creating the HIVE table first and then doing inserts.

Cloudera CDH5.5.x is a great product, but the adoption of additional features not currently supported will make the product even better but by no means subtract from its desirability.


Other Advice:

Do thorough research and ensure your use-cases or scale does not conflict with the system requirements and that those features that would make a difference are supported.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user357645 - PeerSpot reviewer
Data/Big Data Architect at a healthcare company with 1,001-5,000 employees
Real User
We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions. At times, heavy queries do not finish at all.

What is most valuable?

Mostly HUE, Impala, Sqoop, and Hive. The impala-shell command is number one.

How has it helped my organization?

We are working on research for genomic data looking for specific genes and variances. Even Hive was not good enough to process it correctly, only with Impala are we getting results quicker.

What needs improvement?

Sometimes the heavy queries do not finish at all. It would be good to see the progress of heavy script in the impala shell or get some way to access it.

For how long have I used the solution?

We started to use Cloudera about one-and-a-half years ago.

What do I think about the stability of the solution?

We are having some issues with stability and are speaking to Cloudera support.

How are customer service and technical support?

Customer Service:

It's acceptable.

Technical Support:

It's acceptable.

Which solution did I use previously and why did I switch?

We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions.

How was the initial setup?

We have struggled a bit in installing and configuring Cloudera Manager on the AWS cluster. For now, it is good.

What about the implementation team?

We did the implementation only using our team and resources. It was a hard start, but an easy landing.

What other advice do I have?

Cloudera is good for mid to big company, but small ones can use AWS Impala/HUE. Go to training, or you are going to spend many hours to find short answers. The Cloudera solution is big with good documentation, but you need to know what and where to read first.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1046250 - PeerSpot reviewer
Senior Consultant & Training at a tech services company with 51-200 employees
Consultant
The valuable combination of all the tools enable me to solve use cases I'm working on
Pros and Cons
  • "We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
  • "We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."

What is our primary use case?

I've been working on the software installation from the beginning, and we have a client for global supply change, so we get information from Telefonica's sales and distributions. Getting all that information into this system allows us to process it, get KPIs, and create outgoing information for business intelligence tools. 

In the cloud provider enterprise we get all the information from the gamers, like delays, response, and information from the games. It allows us to see if gamers are having trouble, high latency or any other kind of issue. They test that and get information about the issues in order to solve them.

What is most valuable?

I like the combination of all the tools that allow me to provide solutions and enable me to solve the use cases I'm working on. You need tools or components to foresee everything, and they are all in our emails. Sometimes you try several of them, and sometimes one will work better than the other. So you have to test the tools to see what works for you. 

What needs improvement?

We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that. 

For how long have I used the solution?

I've been using this solution for about a year and a half now.

How was the initial setup?

It's been quite easy to install. We only had to follow the instructions and there weren't many problems. That's important for us.

What other advice do I have?

I will rate this solution a nine out of ten because nothing is ever perfect. You will always face problems, but I'm quite happy with Cloudera. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user363186 - PeerSpot reviewer
Team Lead / Data Architect at a tech services company with 51-200 employees
Consultant
​The Cloudera Manager administrator webpage simplifies the administration tasks.

What is most valuable?

The Cloudera Manager administrator webpage simplifies the administration tasks and helps to maintain a global overview of the cluster performance.

How has it helped my organization?

We are moving from an standard SQL environment (Oracle DataWarehouse) to a Big Data environment, and the Hadoop cluster will be the key of our new organization. It will allow to scale in an easy namer.

What needs improvement?

We found some difficulties when importing Hive tables from another Cluster.

I want to point the fact that we encounter many problems related to the cloud storage and how resources are managed. Our learning has been that, although it is quite simple to deploy single machines on the cloud, deploying clusters of machines is much more complex as many factors need to be considered: individual machines, connectivity across machines, storage.

For how long have I used the solution?

I've used it for three months.

What do I think about the stability of the solution?

We found some issues but were related with the hardware provider. For the moment I have not detected any problem from the Cloudera software point of view.

How are customer service and technical support?

Technical support is really efficient.

Which solution did I use previously and why did I switch?

We chose this product as it is considered a market standard and due to its wide documentation on the web. I evaluated other options but the fact that now it is becoming an standard for many companies helped me to choose this option.

How was the initial setup?

In the cloud environment where we deployed (Azure Resource Manager) there was a ready-to-deploy template which simplified a lot the initial set-up.

What about the implementation team?

We implemented with an in-house team. Our initial idea was to stop the cluster during the weekends and when there was no usage. However, we found strong difficulties and we were not able to start programmatically the whole cluster, so finally we left the cluster working all the time.

This issues were mainly related with the cloud provider and how this provider manages the resources for the cluster machines.

What was our ROI?

From our point of view it is a long-time investment. We hope to get the ROI in the following years.

What other advice do I have?

I am very comfortable with this product. The combination of Cloudera Manager administrator server, which allows the management of the Hadoop Cluster, and the Hue server, which simplifies the use make this product a current standard on the market. Perhaps it lacks a full integration of all its components.

Disclosure: My company has a business relationship with this vendor other than being a customer: My company has a partnership relation with the vendor.
PeerSpot user
it_user347592 - PeerSpot reviewer
Senior Analyst - Strategy Analytics at a consultancy with 10,001+ employees
Real User
We were able to utilize data which was untapped previously, but the documentation on Hive could be more standardized.

What is most valuable?

The features we've found most valuable are--

  • Fast processing of data
  • Easy to manipulate using HiveQL

How has it helped my organization?

We were able to utilize data which was untapped previously. We've got great use cases now to drive business revenue.

What needs improvement?

It needs more standardized documentation on Hive.

For how long have I used the solution?

I've used it for two and a half years.

How are customer service and technical support?

Customer Service:

It's great.

Technical Support:

The level of technical support is great.

Which solution did I use previously and why did I switch?

No previous solution was used, and senior management chose to bring it in.

How was the initial setup?

I was not directly involved in deployment.

What about the implementation team?

It was done by the vendor team, who were great.

What other advice do I have?

It's good for Big Data analytics.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user