Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Cloudera Data Platform comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Apr 1, 2025

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
66
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
Cloudera Data Platform
Average Rating
8.0
Reviews Sentiment
6.4
Number of Reviews
26
Ranking in other categories
Cloud Master Data Management (MDM) Solutions (10th), Data Management Platforms (DMP) (7th)
 

Featured Reviews

Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.
Miodrag-Stanic - PeerSpot reviewer
Distributed computing improves data processing while upgrade complexity needs addressing
There are challenges with upgrading or updating various services like Spark, Impala, and Hive on on-premise and bare metal solutions. We aim to address these issues with a Kubernetes-based platform that will simplify the task of upgrading services. We also wish to implement lakehouse capabilities with Iceberg or Delta Lake frameworks.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The main feature that we find valuable is that it is very fast."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"It is useful for handling large amounts of data. It is very useful for scientific purposes."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"The distribution of tasks, like the seamless map-reduce functionality, is quite impressive."
"There's a lot of functionality."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"It is a scalable platform."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"Cloudera Data Platform has significantly improved our data management."
"The upgrades and patches must come from Hortonworks."
"The scalability is the key reason why we are on this platform."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
 

Cons

"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"One limitation is that not all machine learning libraries and models support it."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"They could improve the issues related to programming language for the platform."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"The main concern is the overhead of Java when distributed processing is not necessary."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"Security and workload management need improvement."
"More information could be there to simplify the process of running the product."
"I would like to see more support for containers such as Docker and OpenShift."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"It would also be nice if there were less coding involved."
"The version control of the software is also an issue."
"For on-premise use, I would not recommend Cloudera Data Platform as it is expensive and complicated to upgrade."
 

Pricing and Cost Advice

"It is an open-source platform. We do not pay for its subscription."
"Apache Spark is an expensive solution."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"Apache Spark is an open-source solution, and there is no cost involved in deploying the solution on-premises."
"It is an open-source solution, it is free of charge."
"Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results."
"It is priced well and it is affordable"
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
849,475 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
Comms Service Provider
6%
No data available
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The Spark solution could improve in scheduling tasks and managing dependencies. Spark alone cannot handle sequential tasks, requiring environments like Airflow scheduler or scripts. For instance, o...
What do you like most about Hortonworks Data Platform?
Distributed computing, secure containerization, and governance capabilities are the most valuable features.
What is your experience regarding pricing and costs for Hortonworks Data Platform?
I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high. It was particularly expensive f...
What needs improvement with Hortonworks Data Platform?
Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solu...
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Information Not Available
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: March 2025.
849,475 professionals have used our research since 2012.