Try our new research platform with insights from 80,000+ expert users

Cloudera Distribution for Hadoop vs Spark SQL comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Cloudera Distribution for H...
Ranking in Hadoop
2nd
Average Rating
8.0
Reviews Sentiment
6.4
Number of Reviews
50
Ranking in other categories
NoSQL Databases (8th)
Spark SQL
Ranking in Hadoop
5th
Average Rating
7.8
Reviews Sentiment
7.6
Number of Reviews
14
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of April 2025, in the Hadoop category, the mindshare of Cloudera Distribution for Hadoop is 25.0%, up from 23.0% compared to the previous year. The mindshare of Spark SQL is 9.8%, down from 11.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Hadoop
 

Featured Reviews

Rok Dolinsek - PeerSpot reviewer
Enables on-premise implementation with powerful data processing capabilities
This is the only solution that is possible to install on-premise. Cloudera provides a hybrid solution that combines compute on cloud or on-premises. It includes all machine learning algorithms in the Spark machine learning library. All functionalities needed for a big data platform and ETL are on the platform, eliminating the need for other tools. It is scalable, ready for vertical scaling, and very powerful, offering numerous functionalities and configurations for generative AI.
Sahil Taneja - PeerSpot reviewer
Easy to use and do not require a learning curve
Spark SQL can improve the documentation they have provided. It can be a bit unclear at times. They could improve the documentation a bit more so that we can understand it more easily. Moreover, they could improve SparkUI to have more advanced versions of the performance and the queries and all.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"We also really like the Cloudera community. You can have any question and will have your answer within a few hours."
"The tool can be deployed using different container technologies, which makes it very scalable."
"The solution's most valuable feature is the enterprise data platform."
"Very good end-to-end security features."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"The solution is stable."
"The most valuable feature is Kubernetes."
"I don't see any performance issues."
"One of Spark SQL's most beautiful features is running parallel queries to go through enormous data."
"Overall the solution is excellent."
"The stability was fine. It behaved as expected."
"Offers a variety of methods to design queries and incorporates the regular SQL syntax within tasks."
"Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that."
"The team members don't have to learn a new language and can implement complex tasks very easily using only SQL."
"The solution is easy to understand if you have basic knowledge of SQL commands."
"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline."
 

Cons

"The procedure for operations could be simplified."
"It is quite complicated to configure and install."
"The performance of some analytics engines provided by Cloudera is not that good."
"The solution is not fit for on-premise distributions."
"The price of this solution could be lowered."
"The tool doesn't support reporting, and relational databases are still the major source of reporting data. Apache Iceberg will be launched soon within the Cloudera cluster for analytical purposes. The Cloudera Machine Learning aspect could be tuned and enhanced to enable us to host some predictive analytics machine learning and AI use cases."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."
"SparkUI could have more advanced versions of the performance and the queries and all."
"There are many inconsistencies in syntax for the different querying tasks."
"In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL."
"It would be beneficial for aggregate functions to include a code block or toolbox that explains its calculations or supported conditional statements."
"There should be better integration with other solutions."
"In the next release, maybe the visualization of some command-line features could be added."
"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."
"It would be useful if Spark SQL integrated with some data visualization tools."
 

Pricing and Cost Advice

"The tool is expensive...For the SMB market or customers whose environments are not that complex and do not have multiple systems running, Cloudera might not be a good option."
"The price is very high. The solution is expensive."
"Cloudera Distribution for Hadoop is expensive, with support costs involved."
"It is an expensive product."
"I wouldn't recommend CDH to others because of its high cost."
"The tool is not expensive."
"The pricing must be improved."
"The solution is expensive."
"The solution is bundled with Palantir Foundry at no extra charge."
"There is no license or subscription for this solution."
"We use the open-source version, so we do not have direct support from Apache."
"The solution is open-sourced and free."
"The on-premise solution is quite expensive in terms of hardware, setting up the cluster, memory, hardware and resources. It depends on the use case, but in our case with a shared cluster which is quite large, it is quite expensive."
"We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
845,406 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
25%
Computer Software Company
15%
Educational Organization
12%
Manufacturing Company
7%
Financial Services Firm
22%
Computer Software Company
15%
Retailer
8%
Manufacturing Company
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Cloudera Distribution for Hadoop?
The tool can be deployed using different container technologies, which makes it very scalable.
What is your experience regarding pricing and costs for Cloudera Distribution for Hadoop?
The price for Cloudera is average, yet it is very good compared to other solutions. It can be deployed on-premises, unlike competitors' cloud-only solutions.
What needs improvement with Cloudera Distribution for Hadoop?
It is quite complicated to configure and install. Integrating the platform into an information system is always a challenge, especially when starting with on-premise implementation. Integrating wit...
What do you like most about Spark SQL?
Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline.
What is your experience regarding pricing and costs for Spark SQL?
We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small.
What needs improvement with Spark SQL?
In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL. There could be additional features that I haven't explored but the current solution for working ...
 

Overview

 

Sample Customers

37signals, Adconion,adgooroo, Aggregate Knowledge, AMD, Apollo Group, Blackberry, Box, BT, CSC
UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions
Find out what your peers are saying about Cloudera Distribution for Hadoop vs. Spark SQL and other solutions. Updated: March 2025.
845,406 professionals have used our research since 2012.