Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Hortonworks Data Platform comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Apache Spark
Ranking in Hadoop
1st
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Compute Service (4th), Java Frameworks (2nd)
Hortonworks Data Platform
Ranking in Hadoop
6th
Average Rating
8.0
Reviews Sentiment
6.1
Number of Reviews
25
Ranking in other categories
Open Source Databases (15th), Data Management Platforms (DMP) (9th)
 

Featured Reviews

SurjitChoudhury - PeerSpot reviewer
Offers batch processing of data and in-memory processing in Spark greatly enhances performance
Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated. In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
Prashant  Singh - PeerSpot reviewer
A good technology with an easy setup but is at end of life
The solution is fairly simple to set up. It's not too complex or difficult. If you know the solution, it's easy. However, there is a learning curve. If you don't know anything about it, it can be more complex. You can typically deploy it within a week. We have five or six people capable of handling a deployment.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The product's deployment phase is easy."
"The deployment of the product is easy."
"The data processing framework is good."
"The product’s most valuable features are lazy evaluation and workload distribution."
"The solution is scalable."
"Spark can handle small to huge data and is suitable for any size of company."
"The distribution of tasks, like the seamless map-reduce functionality, is quite impressive."
"The product is useful for analytics."
"The upgrades and patches must come from Hortonworks."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"The product offers a fairly easy setup process."
"Ambari Web UI: user-friendly."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"The data platform is pretty neat. The workflow is also really good."
 

Cons

"They could improve the issues related to programming language for the platform."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"At the initial stage, the product provides no container logs to check the activity."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"Apache Spark provides very good performance The tuning phase is still tricky."
"We are building our own queries on Spark, and it can be improved in terms of query handling."
"The solution needs to optimize shuffling between workers."
"It would also be nice if there were less coding involved."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"It's at end of life and no longer will there be improvements."
"More information could be there to simplify the process of running the product."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"I would like to see more support for containers such as Docker and OpenShift."
"The version control of the software is also an issue."
"Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS."
 

Pricing and Cost Advice

"Apache Spark is an open-source tool."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"We are using the free version of the solution."
"The product is expensive, considering the setup."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"It is priced well and it is affordable"
"Currently, we are using the product in a sandbox environment, and there is no licensing. We might choose a licensing option once we get the results."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
816,406 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
University
5%
Computer Software Company
21%
Financial Services Firm
14%
University
9%
Government
9%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
What do you like most about Hortonworks Data Platform?
Distributed computing, secure containerization, and governance capabilities are the most valuable features.
What is your experience regarding pricing and costs for Hortonworks Data Platform?
I haven't done a price analysis specifically for HDP. However, when it was first introduced as Hadoop 2.0, there were a few use cases where the price was quite high. It was particularly expensive f...
What needs improvement with Hortonworks Data Platform?
Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS. These platforms offer competitive storage solu...
 

Also Known As

No data available
Hortonworks, HDP
 

Learn More

 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Mayo Clinic, Symantec, Progressive Insurance, Noble Energy, Cardinal Health, Rogers, Mercy, Neustar, TRUECar, T-Mobile
Find out what your peers are saying about Apache Spark vs. Hortonworks Data Platform and other solutions. Updated: October 2024.
816,406 professionals have used our research since 2012.