Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Cloudera Data Platform comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Apr 1, 2025

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
66
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
Cloudera Data Platform
Average Rating
8.0
Reviews Sentiment
6.4
Number of Reviews
26
Ranking in other categories
Cloud Master Data Management (MDM) Solutions (10th), Data Management Platforms (DMP) (7th)
 

Featured Reviews

Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.
Miodrag-Stanic - PeerSpot reviewer
Distributed computing improves data processing while upgrade complexity needs addressing
There are challenges with upgrading or updating various services like Spark, Impala, and Hive on on-premise and bare metal solutions. We aim to address these issues with a Kubernetes-based platform that will simplify the task of upgrading services. We also wish to implement lakehouse capabilities with Iceberg or Delta Lake frameworks.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Features include machine learning, real time streaming, and data processing."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"The product’s most valuable features are lazy evaluation and workload distribution."
"I feel the streaming is its best feature."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"We use it for data science activities."
"Hortonworks should not be expensive at all to those looking into using it."
"Ambari Web UI: user-friendly."
"Cloudera Data Platform has significantly improved our data management."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"The product offers a fairly easy setup process."
"The upgrades and patches must come from Hortonworks."
 

Cons

"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"It would be beneficial to enhance Spark's capabilities by incorporating models that utilize features not traditionally present in its framework."
"The main concern is the overhead of Java when distributed processing is not necessary."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"Spark could be improved by adding support for other open-source storage layers than Delta Lake."
"The cost of the solution is high and there is room for improvement."
"It would also be nice if there were less coding involved."
"I would like to see more support for containers such as Docker and OpenShift."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"The version control of the software is also an issue."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS."
"For on-premise use, I would not recommend Cloudera Data Platform as it is expensive and complicated to upgrade."
 

Pricing and Cost Advice

"Licensing costs can vary. For instance, when purchasing a virtual machine, you're asked if you want to take advantage of the hybrid benefit or if you prefer the license costs to be included upfront by the cloud service provider, such as Azure. If you choose the hybrid benefit, it indicates you already possess a license for the operating system and wish to avoid additional charges for that specific VM in Azure. This approach allows for a reduction in licensing costs, charging only for the service and associated resources."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"The solution is affordable and there are no additional licensing costs."
"Apache Spark is an expensive solution."
"They provide an open-source license for the on-premise version."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"We are using the free version of the solution."
"Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results."
"It is priced well and it is affordable"
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
849,475 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
Comms Service Provider
6%
No data available
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The Spark solution could improve in scheduling tasks and managing dependencies. Spark alone cannot handle sequential tasks, requiring environments like Airflow scheduler or scripts. For instance, o...
What do you like most about Hortonworks Data Platform?
Distributed computing, secure containerization, and governance capabilities are the most valuable features.
What is your experience regarding pricing and costs for Hortonworks Data Platform?
I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high. It was particularly expensive f...
What needs improvement with Hortonworks Data Platform?
Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solu...
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Information Not Available
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: March 2025.
849,475 professionals have used our research since 2012.