Try our new research platform with insights from 80,000+ expert users
Sayyed Aadil - PeerSpot reviewer
Hadoop Admin at Tata Consultancy
Real User
Leaderboard
A great solution for gathering, storing and processing data
Pros and Cons
  • "It is helpful to gather and process data."
  • "There are multiple bugs when we update."

What is our primary use case?

It is helpful to gather and process data.

How has it helped my organization?

We used to collect data in small cases, and with Cloudera Distribution for Hadoop, we used data on large scales. It helps to store and protect the data and is helpful for processing.

What is most valuable?

The Cloudera Distribution for Hadoop is valuable.

What needs improvement?

There are multiple bugs when we update.

Buyer's Guide
Cloudera Distribution for Hadoop
December 2024
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.

For how long have I used the solution?

We have been using this solution for three and a half years and using version 6.3. It is deployed on-premises.

What do I think about the stability of the solution?

It is a stable solution.

What do I think about the scalability of the solution?

It is a scalable solution. It is the best solution for larger companies. There are about 3000 users, and there are medical teams for medical data with about 1000 users. We require about six people for maintenance and deployment.

How are customer service and support?

Cloudera's support is helpful, and I rate the technical support a nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup was straightforward and not an issue.

What was our ROI?

We have seen a return on investment.

What other advice do I have?

I rate this solution a nine out of ten, and it is a helpful solution.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1272822 - PeerSpot reviewer
AD - Associate Director at a financial services firm with 10,001+ employees
Real User
Feature rich and scalable with good support, but there are performance issues and the security could be improved
Pros and Cons
  • "The main advantage is the storage is less expensive."
  • "Currently, we are using many other tools such as Spark and Blade Job to improve the performance."

What is our primary use case?

We are using this solution for storing Big Data in one centralized location.

How has it helped my organization?

It has been helpful in allowing data storage in one centralized location with data lakes and all of the surrounding applications.

All of the data processes are being stored into the Big Data Lake.

What is most valuable?

It allows us to store huge amounts of data, which is an advantage.

They have BI (Business Intelligence) tools. There are many AI tools.

We are able to connect and analyze the data to get reports. The reports are very good.

The main advantage is the storage is less expensive.

What needs improvement?

The performance can be improved. We have experienced some performance issues. It is not as sophisticated as Oracle Sybase.

Currently, we are using many other tools such as Spark and Blade Job to improve the performance.

The setup could be simplified, it's complex.

The security needs to be improved.

For how long have I used the solution?

I have been using this solution since 2015.

What do I think about the stability of the solution?

It's a stable solution.

What do I think about the scalability of the solution?

Scalability is good. It's replicated and by default, with Big Data there is a replication factor.

Over the years we have grown, when we started we had 10 nodes now we have increased to a large number of nodes.

How are customer service and technical support?

Technical support is good. I have been able to learn from them. As a developer, I am learning every day.

I would rate the technical support a ten out of ten.

Which solution did I use previously and why did I switch?

Previously we were using Oracle Sybase SQL. We switched because now, we have introduced Big Data.

How was the initial setup?

The initial setup was complex.

It's not as simple as Oracle Sybase.

It's a complex architecture because you have raw data and many engines.

What's my experience with pricing, setup cost, and licensing?

When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive.

What other advice do I have?

I am a part of security and software development. 

We are currently considering migrating to the cloud, and planning on using Microsoft Azure, mainly for the Big Data component.

I would rate this solution a five out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Cloudera Distribution for Hadoop
December 2024
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
Thishen Govender - PeerSpot reviewer
BI Manager at Discovery Health
Real User
Top 10
Open-source solution for intelligent data management and analysis
Pros and Cons
  • "Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
  • "The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."

What is our primary use case?

We make recommendations to clients for using different models of this solution to handle data intelligently.

How has it helped my organization?

It gives us the opportunity to offer more options to our clients and create better solution models.

What is most valuable?

We find CDSW useful and plan to use it as a one-stop application for model build and training. Currently, we use Zeppelin notebook and we want to gravitate to a single application for notebooks.

What needs improvement?

The Data Science Workbench doesn't support multiple languages. It needs to support multiple programming languages. We were trying to use Scalar and Python for some solutions we wanted to deploy, but they didn't work properly. As a result, we had to come up with other workaround solutions. If the Data Science Workbench supported multiple programming languages our workflow would be easier and the solutions could be better.

Another aspect we would like to see improved is better opportunities for integration. For example, we would like to use H2O machine learning, which is an open-source product, and Cloudera doesn't support H2O.

If they could support H2O and also deploy multi-language support on the Cloudera Data Science that would be great. But the biggest thing that would help right now is H2O support.

Finally, one other improvement I would suggest is integrating data privacy software into  Cloudera. It is not quite complete in this aspect.

For how long have I used the solution?

We have been using the solution for approximately eight weeks.

What do I think about the stability of the solution?

From a stability point of view, we know that there is a new product coming out called Unity — or that is the proposed name of the product that merges Cloudera and Hortonworks. We know that this means that some changes will be happening within the environment. We don't believe that they will be radical changes that will affect existing software that we have. It should just be added functionality of Hortonworks integration. But we know at the same time that Cloudera support will be available if we need it.

What do I think about the scalability of the solution?

While we have not yet done a lot to scale the solution, we think that is going to be quite scalable because it's working on a distributed architecture. 

We will probably start with 10 or 15 users once we roll the solution out into production, which will probably be at the end of this week. Afterward, the user base will be growing quite large by double digits in percentage. But that is just to start with. Over a few years, we plan to start thinking about rolling out our experiences to our international businesses as well. This would be a substantial increase in user base.

How are customer service and technical support?

At the moment and for what we have been able to experience, technical support seems to be fine. I would rate it at between seven to eight out of ten.

Which solution did I use previously and why did I switch?

We did not consider other solutions.

How was the initial setup?

The initial setup was difficult and we didn't like it. That is only because we implemented it with other software solutions outside Cloudera and needed to do the integrations. 

We are still battling with working out problems with some integrations after eight weeks. It's up and running, but we're optimizing, so that is why I'm saying it's probably medium to complex. But that was the situation for us and our particular needs. It may not be as complex for other businesses at all.

What about the implementation team?

We have been working through the implementation with our own team.

Which other solutions did I evaluate?

We did consider other opportunities. Although we are quite comfortable with our current solution we may look at Hortonworks again, but that is not yet confirmed. We believe, from what we have read and what has been advertised, that Hortonworks and Cloudera are going to eventually merge and become one product. According to some sources, it has already happened.

We're simply trying to get the best of both worlds.

What other advice do I have?

I would say that the product as it currently is should rate at an eight out of ten. The reason that score is not higher is because of the workarounds that we have to do when it comes to certain models that do not support using multiple programming languages. For example, in a single notebook, it is inflexible if you want to use other program languages. 

As far as other advice for people considering this solution, I would say take a good look at your business need before you decide on this technology and which solution to choose. Make sure that you are not already able to solve for your particular, identified needs using your existing technology before even considering a change.

You want to be sure you're applying the technology to the right business case because of actual need and not just change for change's sake.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1850319 - PeerSpot reviewer
Vice President at a financial services firm with 10,001+ employees
Real User
Stores large volumes of data and makes log analytics, monitoring, and management easier, but its feature list is limited
Pros and Cons
  • "We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
  • "Cloudera Distribution for Hadoop has a limited feature list and a lot of costs involved."

What is our primary use case?

In my previous organization, we used Cloudera Distribution for Hadoop
for compiling website logs and application logs. We used it for log analytics.

How has it helped my organization?

We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization.

What is most valuable?

The feature I found most valuable in Cloudera Distribution for Hadoop is the Cloudera Manager. It's a good component because it makes log management easy. It's really useful as a management and monitoring console.

What needs improvement?

The setup and administration were not easy with Cloudera Distribution for Hadoop. They could be improved.

The solution has a limited feature list, so having more features is something I'd like to see in the next release of Cloudera Distribution for Hadoop.

For how long have I used the solution?

I've been using Cloudera Distribution for Hadoop for two years. I'm still using it.

What do I think about the stability of the solution?

Cloudera Distribution for Hadoop seems to be a stable product.

What do I think about the scalability of the solution?

Cloudera Distribution for Hadoop is really easy to scale. We can add more servers to it, so it's scalable.

How are customer service and support?

I don't have experience contacting the technical support team of Cloudera Distribution for Hadoop.

How was the initial setup?

The initial setup for Cloudera Distribution for Hadoop was easy for us because we outsourced the work to the vendor. All the nitty-gritty was taken care of by them.

What about the implementation team?

We implemented Cloudera Distribution for Hadoop through the vendor. Deployment was done by an integrator. It usually doesn't take a lot of time. It usually takes just a day to deploy the solution.

Our implementation strategy for Cloudera Distribution for Hadoop was more into outsourcing. For example, the hardware, including its management, was outsourced, so the admin, data management, support, etc., were also outsourced. We were looking into having the application done in-house, with the team. We were looking at a one-year implementation plan to move more and more governance and data sets into Cloudera Distribution for Hadoop. Every quarter, we planned to have other features reintroduced into the platform.

Two people did the installation and two people did the deployment. It was deployed in a single location, and we initially had ten users of Cloudera Distribution for Hadoop.

What was our ROI?

It's tricky to derive the ROI from Cloudera Distribution for Hadoop, because in analytics, it's a little difficult to determine that this is the investment, and we're increasing the footprints and the revenue. It's very difficult to evaluate.

What's my experience with pricing, setup cost, and licensing?

Cloudera Distribution for Hadoop is expensive. There are a lot of costs involved. For example: apart from the standard licensing fees, there are support costs involved, and support could be for three years, five years, etc., so support is a pretty large part of the contract.

Which other solutions did I evaluate?

We didn't evaluate other options before choosing Cloudera Distribution for Hadoop.

What other advice do I have?

I'm using Cloudera Distribution for Hadoop.

The advice I would give to others looking into implementing or using Cloudera Distribution for Hadoop is for them to opt for a cloud variant, particularly something scalable for Azure, because of the ease of deployment and ease of setup. Procuring Cloudera Distribution for Hadoop is also a challenge unless the customer goes for its cloud version.

I would rate Cloudera Distribution for Hadoop six out of ten because of its limited features. If they can enhance their feature list, that would improve their score.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Thishen Govender - PeerSpot reviewer
BI Manager at Discovery Health
Real User
Top 10
Includes several useful proprietary tools
Pros and Cons
  • "CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
  • "It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."

How has it helped my organization?

CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools. 

What needs improvement?

Integration is one of the main things we struggle with because we're working with several other environments. For example, we've got an MPP environment outside the Hadoop environment. Many cloud-based platforms like Azure are fully integrated with technology that gives you MPP machine learning and data lakes all in one environment. We've got on-premises IBM solutions and Cloudera, so it isn't easy to integrate. It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform. And ideally, we should get as much raw data as possible into the platform before we can do the engineering, so we have machine learning and model training.

For how long have I used the solution?

I've been using CDH for about two years, or rather, I manage the team that uses it.

What do I think about the stability of the solution?

We haven't had any issues with Cloudera. It's a solid product. 

What do I think about the scalability of the solution?

Cloudera is dependable, and it's completely scalable.

How are customer service and support?

We have engaged the technical support based in the UK. My team hasn't worked with them directly, but the administration team has. To my knowledge, they're fairly responsive. 

What other advice do I have?

I rate Cloudera Distribution for Hadoop eight out of 10.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Atif Tariq - PeerSpot reviewer
Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East
Real User
Top 5Leaderboard
Easy-to-manage solutions with strong support, but faces challenges with upgrades, flexibility, and high costs
Pros and Cons
  • "Customer service and support were able to fix whatever the issue was."
  • "While the deployed product is generally functional, there are instances where it presents difficulties."

What is most valuable?

It offers a pre-build distribution. Even the support is there for the solution, whatever the issue is that you face. And apart from that, they have a platinum best practice to manage tools and all these things.  

What needs improvement?

The company is struggling to keep up with the upgrades of various components, and they are not willing to invest more in Cloudera.

The company is still switching from traditional methods to cutting-edge technology. While the deployed product is generally functional, there are instances where it presents difficulties. For example, the high SPs do not allow for metadata patching once it is created in the panel. This restriction limits our ability to make changes to the metadata.

I am aware that some companies are using open-source alternatives, which offer more flexibility. So, product maturity with cutting-edge technology will take more time.  

The primary concern is the cost. If you have the budget and are willing to pay for it, then it's fine. However, if we don't want to spend more money, it's not the best option.

For how long have I used the solution?

I've used it. 

How are customer service and support?

Customer service and support were able to fix whatever the issue was. But they haven't used the solution, so they cannot do anything in whatever is implementation. Because they are using self-managed open-source technology. There's no order or feature development by now. So wherever there's a limitation from the product, like the tool side, the kind of environment, they are not able to do anything.

How would you rate customer service and support?

Positive

What's my experience with pricing, setup cost, and licensing?

It is an expensive product.

What other advice do I have?

For using Cloudera, it depends on what you want to use it for. If you're looking for something easy to manage and operate in the cloud environment, then Cloudera is a good option. 

You don't need to do much; you can just deploy it and go. From my perspective, it depends on your use case and how you see your data needs, as well as how you manage cloud data technologies and work with different departments, teams, and identity features. If Cloudera satisfies your requirements and you have no issues with it, then go for it.

Overall, I would rate it a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Data Architect Manager at Unifonic
Consultant
Great being able to manage the security layer using the shared SDX which provides flexibility
Pros and Cons
  • "With a cluster available, you can manage the security layer using the shared SDX - it provides flexibility."
  • "This is a very expensive solution."

What is our primary use case?

This product is a framework for edge AI, it comes with multiple ecosystems as a project. I'm a senior data architect manager and we are consultants. We offer Cloudera to our customers but we don't have a partnership with them. 

What is most valuable?

The best feature is the layer shared experience. If you have a cluster available on-prem or in the cloud, you can manage that security layer using the shared SDX and it provides flexibility. New features are constantly being added. 

What needs improvement?

The only thing that needs improvement is the cost, it's a very expensive solution and one of the main reasons companies are not attracted to the product. 

What do I think about the stability of the solution?

This product has been around for a long time so it's very mature and stable.

What do I think about the scalability of the solution?

The scalability is very good. 

How was the initial setup?

The initial setup has become easier although you need a dedicated admin to maintain and manage the solution because it's a framework and not a single product. Deployment nowadays is much smoother with the PaaS offering in the public cloud, so you can carry out the deployment with an in-house team. The deployment only takes a day but a company is unlikely to go with the default so the solution needs fine-tuning which can take a couple of weeks. 

What's my experience with pricing, setup cost, and licensing?

For enterprise organizations that can bear the cost, it's a good solution. A smaller company wouldn't be able to afford the licensing fees. You can get a free trial for 60 days. They'll never have a community version because they're the only ones in the market offering this kind of framework. 

What other advice do I have?

I rate this solution nine out of 10. 

Disclosure: My company has a business relationship with this vendor other than being a customer: Consultant
PeerSpot user
Suresh_Srinivasan - PeerSpot reviewer
Co-Founder at FORMCEPT Technologies
Real User
Top 10
Has a useful file system and is scalable
Pros and Cons
  • "The file system is a valuable feature."
  • "The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS."

What is our primary use case?

We use Cloudera Distribution for file storage. 

This solution is deployed on-premise. 

What is most valuable?

The file system is a valuable feature. 

What needs improvement?

The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS. 

For how long have I used the solution?

I have been working with Cloudera Distribution for Hadoop for 11 years. 

What do I think about the stability of the solution?

This solution is stable. 

What do I think about the scalability of the solution?

This solution is scalable enough for us. 

We have created a product, using HDFS, and when our engineers install it for themselves or for customers, we use this solution. There are about 15 to 20 people using it at any point of time. 

How was the initial setup?

The installation is straightforward. We use command-line-based installation and we have created our own way of installing with our product. 

Depending on the customer or depending on internal usage, our DevOps engineer will install it or my development team will install it. 

What about the implementation team?

We are very well-versed on these tools, so we implemented it ourselves. 

What's my experience with pricing, setup cost, and licensing?

I haven't bought a license for this solution. I'm only using the Apache license version. 

What other advice do I have?

I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features. 

We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user