We use Cloudera Distribution for file storage.
This solution is deployed on-premise.
We use Cloudera Distribution for file storage.
This solution is deployed on-premise.
The file system is a valuable feature.
The security of this solution could be improved. There should also be a way to basically have a blockchain enabled storage with the HDFS.
I have been working with Cloudera Distribution for Hadoop for 11 years.
This solution is stable.
This solution is scalable enough for us.
We have created a product, using HDFS, and when our engineers install it for themselves or for customers, we use this solution. There are about 15 to 20 people using it at any point of time.
The installation is straightforward. We use command-line-based installation and we have created our own way of installing with our product.
Depending on the customer or depending on internal usage, our DevOps engineer will install it or my development team will install it.
We are very well-versed on these tools, so we implemented it ourselves.
I haven't bought a license for this solution. I'm only using the Apache license version.
I rate this solution an eight out of ten. Cloudera is a great product and, overall, there are many features.
We actually use Cloudera HDFS underneath, and we build our product on top of it. So, we don't use the Cloudera versions of all the other products, we just use the Cloudera HDFS, nothing else.
It is a good enterprise platform. It is easier and more stable. Additionally, it has the best proxy, security, and support features compared to open-source products.
The areas of improvement depend on the scale of the project. For banking customers, security features and an essential budget for commercial licenses would be the top priority. Data regulation could be the most crucial for a project with extensive data or an extra use case.
We have been using Cloudera Distribution for Hadoop for a few years.
I rate the product’s stability a ten out of ten.
We have ten customers using the product. They include data engineers, performance engineers, and environment engineers.
I rate its stability a ten out of ten.
The product has a support subscription for one year. We use technical support only for complex use cases. We work with their team as we have direct and quick access to contact them. It helps us better understand the technical and business-related queries of the customers.
The on-cloud version is easy to set up. Although, it is complicated to process a large amount of data for on-premises or hybrid setup. It is not a ready-to-use solution for telecom or finance technology. It requires the deployment of robust technology relying on network infrastructure.
The product’s price depends from project to project. It is more expensive than open-source solutions and could be cheaper. However, in some cases, it is less costly than open-source.
It is the best solution in the world at the moment. I advise others to go for it if you have an enterprise customer. I rate it a ten out of ten.
I'm part of the IT team at my company, and our primary use case of this solution is building infrastructure for advanced analytics, where we copy data from our data warehouse that is now our relational database. We copy it to the Cloudera Distribution for Hadoop and then analyze it with Python and machine learning.
The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized.
I would like to see an improvement in how the solution helps me to handle the whole cluster. For example, when I'm going down to a specific tool, like Kafka, for example, the Cloudera manager doesn't really help me. Then I have to use Google with other Kafka knowledge and tools.
It is a very stable solution.
Not many people are currently using this solution at my organization, but I do believe it is scalable. I don't, however, have experience with upgrading or adding users.
My problem is that I started using Cloudera Express without technical support and then I purchased the Enterprise edition through another company. So now I don't really have access to Cloudera support, even though I hardly need to use it.
The initial setup was simple, but we had trouble implementing the cables in the Hadoop solution.
I had a bad experience connecting the Cloudera Distribution for Hadoop cluster to my other resources in the company, like the active directory or firewall. I would like to see the outside environment to be easier to handle. I will rate this eight out of ten because the solution doesn't cover everything. It is a very complicated solution because it contains a lot of internal tools.
We mostly use the solution for big data analytics, data sharing, and reporting.
Cloudera, as a whole, is designed to provide organizations with solutions for big data. Cloudera is not one single component. It has many components related to storage, analytics, queries, and processing. All of these components work together to support big data implementation and analytics.
The performance of some analytics engines provided by Cloudera is not that good. So, we are using other analytics tools besides Cloudera.
I have been using the solution for more than four years.
We also use other tools like DataIQ and Apache Kudu.
I'm working with the solution myself. As a company, we are implementing it for other customers. Cloudera itself does not provide analytics. It prepares data for analytics tools that work with Big Data, such as Apache Spark, DataIQ, and Tableau.
Overall, I rate the solution a nine out of ten.
I use the solution because my data is too big. It is almost 100 TB.
The product provides many APIs to connect with other applications. The product provides better data processing features than other tools.
The dashboard could be improved.
I have been using the solution for seven years.
The tool is stable. I rate the stability an eight out of ten.
The tool is scalable. I rate the scalability an eight out of ten. It is easy to scale the product. Almost 20 to 25 people use the tool in our organization. We maintain the solution ourselves. We have nine engineers in our maintenance team.
The support is very, very helpful.
Positive
I have worked with Oracle. Oracle is too expensive.
It was pretty easy to install the product. It took us 20 minutes.
The product’s cost is higher compared to other tools. The pricing must be improved.
I recommend the solution to others. Overall, I rate the solution an eight out of ten.
We primarily use it only for big data support for analytical applications.
The feature that we've used quite intensively is Spark, in how it specifically can speed up some of the data to assist with processing.
The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it.
In the next release, I think it would be helpful if there was easier integration into all the other existing data back corners. It will be a big plus as it's a favorite capability. We had to go with a third-party application in order to achieve that.
The stability is problematic. We did encounter quite a lot of issues with the cluster going down quite frequently.
In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues. Currently, only about 10 people in total are using the solution. So we have about four business users and then four technical people. It's only limited to two environments.
I think there's a lot of room for improvement on the technical support side. Mostly because we don't have a lot of local skills in South Africa that could have supported the solution. It was an issue.
This is our first solution. We tested a bunch of other technologies, but that was our first one and we're still using it.
The initial implementation was straightforward from an application side. There weren't any hiccups. In terms of deployment time, it's going to be difficult to say, because most of it was related to hardware problems. Software took about two months to deploy. We required four people for deployment.
The pricing is very competitive. It's not bad.
We considered working with a few other companies, including IBM Bluemix.
I would recommend the solution given that they've proven the business case and that they've proven the technology. We have found that if you don't use or address the right business code you end up buying a technology that doesn't necessarily solve your business problems.
I would rate the solution seven out of ten. The main reason for not rating it higher is that I think that the overall support is not great and we've found some limitations. It wasn't mature when we started. It's getting there. It's getting better. The main reason for the score of seven is mainly the support as well as the limited functionality.
Mostly HUE, Impala, Sqoop, and Hive. The impala-shell command is number one.
We are working on research for genomic data looking for specific genes and variances. Even Hive was not good enough to process it correctly, only with Impala are we getting results quicker.
Sometimes the heavy queries do not finish at all. It would be good to see the progress of heavy script in the impala shell or get some way to access it.
We started to use Cloudera about one-and-a-half years ago.
We are having some issues with stability and are speaking to Cloudera support.
It's acceptable.
Technical Support:It's acceptable.
We were trying AWS Impala as well, but Cloudera won as it had more functionality with HUE, Sqoop, and Solr as built-in functions.
We have struggled a bit in installing and configuring Cloudera Manager on the AWS cluster. For now, it is good.
We did the implementation only using our team and resources. It was a hard start, but an easy landing.
Cloudera is good for mid to big company, but small ones can use AWS Impala/HUE. Go to training, or you are going to spend many hours to find short answers. The Cloudera solution is big with good documentation, but you need to know what and where to read first.
We are dealing with data from the telecom industry. We were using an Oracle system but our volume has increased. We now have a lot of real-time data that needs to be transformed so that it can be made available and used.
The most valuable feature is Impala, the querying engine, which is very fast. We have been able to work with one terabyte of data in less than 20 minutes. The speed makes it easy for us to process all of the data that comes in, in time.
The support is very good.
All of the data has automatic triple replication in order to secure integrity.
There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon.
When we are upgrading CDH, there are many things that need to be upgraded and it would be helpful if it were bundled. As it is now, we have to upgrade many different things separately.
I have been working with the Cloudera Distribution for Hadoop for around two years.
It is a stable solution.
The scalability is good and it works on commodity hardware. One of the problems we have right now is that there is a lot of data and we're moving it from our Oracle solution. This means that there is a double cost, in terms of storage, during our transition to working with big data.
We are using a data lake that is a store for all of the data in our organization. There are more than25 projects, with between 25 and 30 people in each one, for a total of almost 1,000 people. All of them are dependent on this solution.
Most of our users are technicians who have problems to solve using the data available to them. A couple of them are data scientists and the remainder are upper management, who do the analysis.
The technical support is very good. Whenever we open a ticket, we get support right away.
We did use another solution prior to this one but it could not keep up with our increase in data.
This suitability of this solution depends on the size of the data that you are going to be working with. If you have going to be working with a huge dataset that contains many gigabytes of data then this is a good solution. For smaller datasets, you should also consider other technologies.
My advice for anybody who is implementing this solution is to take some time to learn it. Beyond that, be sure to contact support if you have any problems because they are very helpful.
I would rate this solution a seven out of ten.