We are a distributor for Hadoop. Our customers choose whether they would like to use Cloudera or another product.
Cloudera Distribution is deployed on-premise as well as on bare metal servers in AWS.
I decided to give Cloudera's Manager software a try, and was pleasantly surprised at how simple it becomes to deploy a substantial Hadoop cluster.
I began by creating an automated kickstart installer for RHEL 6.2 (booting off a custom isolinux image created for this purpose), with all of the required packages, so that from server power on to creating a 20+ node cluster takes less than 15 minutes. The limitation for the number of concurrent node installs is based on network and disk i/o bottlenecks on the deployment server. If you wanted to PXE boot the cluster in a production environment, you would want a bank of servers behind a load balancer, optimally.
Once the Manager is installed on the master node, you simply log into the administration webpage, and from there, add all of the hosts to deploy the cluster on. One nice discovery was that it takes advantage of regular expressions for host names or IP addresses, so you can literally create a cluster containing hundreds of nodes with a trivial amount of effort.
Once the software is deployed, you can select the roles for each of the servers. It's an incredibly painless deployment. That being said, it is not without its flaws.
One of the primary flaws is that all of the configuration and log files are in non-standard locations, and are split in non-standard ways. It's obvious from the way that the files are arranged that it simplifies programmatic deployment. It also makes it a bit harder for a human who is used to standard Hadoop deployments to figure out where everything is located.
And finally, I discovered a bug with one of the packaged software products, Oozie. One of the resource files, oozie-bundle-0.1.xsd contains an invalid regular expression on line 22. I haven't tracked down the behavior, but for some reason JDK 1.6.30 will parse that invalid regex, but JDK 1.7U2 will exit with errors. Naturally, I was running JDK 1.7U2, so it took me a little extra time to debug the problem.
Overall, I quite liked Cloudera's Manager. It's certainly one of the better cluster deployment products I've seen.
We are a distributor for Hadoop. Our customers choose whether they would like to use Cloudera or another product.
Cloudera Distribution is deployed on-premise as well as on bare metal servers in AWS.
Cloudera is a very manageable solution with good support.
When you compare Cloudera with EMR, EMR has a lot of administrative features, so you don't need to manage the solution. Cloudera is not as easy, as it requires more DevOps resources than other solutions.
We have been offering this solution for five years.
Cloudera Distribution is stable.
This is a scalable solution. We have clients that have a large installation of Cloudera.
Technical support from Cloudera is fine.
The initial setup of Cloudera is difficult. After you have installed it once, it is not difficult to reproduce.
For a POC deployment, we required only one DevOps. On larger-scale implementation, we also require a data engineer.
Cloudera requires a license to use.
We looked at EMR, however Cloudera is better when using OnPrem.
Cloudera is one of the best solutions for on-prem.
I would rate this solution an 8 out of 10.
Implementing a Hadoop cluster has become relatively straight-forward using CDH. Administering it is also less complex. As a result, efforts spent in these areas are less than anticipated.
We have been using it for the last two years.
Following a single path for installation becomes confusing due to multiple recommended approaches e.g. parcels vs packages.
Flume seems unstable and has to be restarted quite often.
None as such
We are mostly using Cloudera Express so we did not use their technical support. However, the Cloudera community is an active place and Cloudera representatives participate actively in understanding and resolving issues.
Cloudera is a prominent player in the Hadoop space and we did not have a need to adopt a different solution. However, we are also looking to work on Hadoop and MapR
Following a single path for installation was initially confusing due to multiple recommended approaches e.g. parcels vs. packages. However, after a while, we managed to master it. However, knoweldge of Cloudera Manager and Hadoop architecture is a must.
We have our own team of consultants who are proficient in implementing it. The high level steps about the implementation remain the same; however, it is the environment specific issues which are challenging.
We haven't really measured ROI.
Licensing price on per node basis for Cloudera seems to be pretty steep (based on the inputs we have received from Cloudera).
It is user friendly and installation is pretty straightforward. Cloudera Manager is a good tool to administer it. However, configuration for specific requirements is sometimes pretty complex.
You should have a team which is knowledgeable in Hadoop. Do keep in mind that the product is still maturing so there are good chances that you will come across unexpected issues now and then.
Our core product is an insurance product and the actuarial module is quite complex. SMEs so far collect data from various sources into Excel sheets and through macros do the analytics which is a very crude form of doing the analysis. So we thought to use big data for such analysis.
That is still in PUC stage, as I have mentioned our analyst used to do the actuarial on a spreadsheet but after Hadoop implementation they are getting confidence that now analysis is more appropriate and fast. Now exploring cloud implementation as well.
Keeping multi copies of the file and tools of map reduce like PIG, HIVE due to their flexibility it is easy to develop the application with less or almost no knowledge of Java and Sql. And capability to handle huge data size.
As such in the product side, I don't have much to comment. But like other upcoming technologies like RPA, AI, GO etc they have ample training materials with variety of USE Cases, which users can understand and aligned with their current requirements. On same ground I didn't see much training materials from Cloudera.
Seems quite stable, as such didn't face any issue.
It is very stable, didn't face any performance issue.
No when we were heard of Hadoop, we tried on that only. I mean tried to migrate from spreadsheets to Hadoop.
Very straight forward. Typical Windows type installation...Next, next, next clicks.
In-house.
Other department handles all these so I can't comment on that.
Not really.
Very solid. Excellent user experience. good documentation. The Cloudera Manager is definitely a deal breaker. Packaging for Ubuntu is great for all the components.
Before the introduction of Cloudera Manager (that actually works), all the orchestration was done with scripts and Chef, and inexperienced team members had difficulties to participate in maintenance. The Cloudera Hadoop manager eased the work.
More customization, better documentation for the API (basically it's the same for all Cloudera Hadoop components).
I've used it for two years.
No issues encountered.
No issues encountered.
No issues encountered.
Didn't use dedicated service or support. The documentation is a bit of a mess, but it is decent and sufficient.
Straightforward. The CDH VirtualBox with preconfigured environment helps for demonstration purposes
We did it in-house.
We also looked at Hortonworks, but chose Cloudera because of my familiarity with it.
Do a comparisomn with Hortonworks as it's always good to compare to another major vendor
The most valuable feature for me are--
We used it to build an enterprise data hub.
Apache Kudu needs improvement. It's a real-time updatable database.
We used a vendor team to implement the solution.
We use the solution for the data warehousing.
The product as a whole is good.
There are better solutions out there that have more features than this one.
I have just started using the solution.
I do not know of any issues with the stability of the solution.
I have an internal team that does maintenance for the solution.
The price could be better for the product.
We are a solution provider and this is one of the systems that we implement for our clients.
Our clients for this product are in the financial industry and they use it to perform cost analysis tasks.
The most valuable feature is Kubernetes.
The price of this solution could be lowered.
We have been using the Cloudera Distribution for Hadoop for five years.
It is a stable solution.
The Cloudera Distribution for Hadoop can be scaled. Our customers are enterprise-level companies and they have about 100 users for this solution.
We offer technical support for this solution to our customers.
We did not use another solution prior to this one.
The initial setup is straightforward.
The pricing is expensive.
Cloudera really has no competition.
I would rate this solution a nine out of ten.
Hi
Can I have Cloudera's Manager software for free to test and deploy it on a sandBox to work on a POC purposes.