Try our new research platform with insights from 80,000+ expert users

Apache Spark vs SAP HANA comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
SAP HANA
Average Rating
8.4
Reviews Sentiment
7.5
Number of Reviews
83
Ranking in other categories
Data Virtualization (2nd), Embedded Database (4th), Relational Databases Tools (3rd)
 

Featured Reviews

SurjitChoudhury - PeerSpot reviewer
Offers batch processing of data and in-memory processing in Spark greatly enhances performance
Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated. In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
Md Ashraful  Islam - PeerSpot reviewer
Powerful in-memory processing capabilities and advanced analytics, enhancing real-time decision-making, but with challenges in reaching SAP for support
Continuous improvement is essential for these systems to stay competitive and relevant in the market. Currently, for support, we rely on local implementers who act as intermediaries. This makes it challenging for me to reach out to SAP directly. If, for instance, I encounter any issues during the ongoing project, expanding it to other companies may pose a challenge without direct access to SAP support.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"ETL and streaming capabilities."
"We use Spark to process data from different data sources."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"The solution is very stable."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"I found the solution stable. We haven't had any problems with it."
"It is difficult for me to narrow down what the best features are in SAP HANA because they work together to provide the overall functionality of the solution. However, the Fiori application is very good."
"Technically it resembles Oracle, but as a somewhat lighter version."
"The feature I found most valuable in SAP HANA is modeling. I also like that the solution has good integration and you can integrate it with any system, even third-party systems."
"We like that the product is both vertically and horizontally scalable, allowing us to do around 86 percent compression of documentation from 50 to seven terabytes."
"Eases management of databases."
"We've had good experiences with technical support."
"It is very stable and very innovative. You can integrate many applications with it."
"As I only worked part-time on SAP HANA, I did not have the opportunity to explore the advanced features of the solution. However, I did work with basic features, such as user administration and access controls for the accounting department. The feature that stood out to me the most was the Single Sign-On and user administration, backup, and server management. My experience with SAP HANA was mainly focused on basic server improvements."
 

Cons

"Apache Spark should add some resource management improvements to the algorithms."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"It should support more programming languages."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"Apache Spark lacks geospatial data."
"While new users to this solution have the benefit of the new design, existing ERP users may experience issues with migrating legacy data. We would like to see development of ready-made tools that allow for easy mapping when upgrading."
"While SAP HANA is good, it would be better if it comes at a more affordable price."
"SAP HANA is not perfect and they could improve by having more options and more integration."
"I give the scalability of SAP HANA a six out of ten."
"SAP HANA's long run time to extract data could be improved upon."
"I would have rated the solution higher if this version was not missing some key features the newer version has."
"One notable issue is the difficulty in finding consultants with experience in the SuccessFactors product, a human resource management tool part of SAP's cloud-based solutions. For example, learning the Oracle database is straightforward. You can easily go to the Oracle website, download the database, install it on your laptop, and access technical resources and books."
"I would like to see improvement on the feedback from the road-map; it is currently extremely hard to get insight in this area."
 

Pricing and Cost Advice

"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"Licensing costs can vary. For instance, when purchasing a virtual machine, you're asked if you want to take advantage of the hybrid benefit or if you prefer the license costs to be included upfront by the cloud service provider, such as Azure. If you choose the hybrid benefit, it indicates you already possess a license for the operating system and wish to avoid additional charges for that specific VM in Azure. This approach allows for a reduction in licensing costs, charging only for the service and associated resources."
"It is an open-source platform. We do not pay for its subscription."
"The solution is affordable and there are no additional licensing costs."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"They provide an open-source license for the on-premise version."
"There is an annual payment needed to use the solution."
"Setup and licensing require planning and proper budgeting, as it is not cheap."
"Set up a consortium of consulting partners and hardware vendors to define your tech. Landscape TCO (total cost of ownership) and then approach the OEM for pricing (on-premise or on cloud or a hybrid model). Check if you can bring your own licenses for some of the existing application licenses on the new platform, to reduce TCO."
"The tool has a high price. I rate the solution’s pricing, one on a scale of ten, where one is expensive and ten is cheap."
"I would rate the pricing of the solution a ten on ten. The pricing for on-prem SAP HANA is on a yearly basis and expensive. I am not aware of the actual pricing but it is based on 64 Gigabytes. For instance, if you consider $10,000 for 64 Gigabytes, then you can calculate the cost for Terabytes in case you have one. You will have to buy expensive hardware as well which is quite a lot."
"The pricing for SAP HANA is high. You pay a lot for the license, and you also have to pay for some add-ons."
"The price of the solution could be reduced, it is expensive."
"We pay annually for the license of the solution."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
816,406 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
University
5%
Manufacturing Company
15%
Computer Software Company
12%
Financial Services Firm
10%
Government
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
What are the biggest benefits of using SAP HANA?
Based on my work with SAP HANA, the biggest benefit that it can bring to your business is total data management. This product is by SAP - a company that serves almost all needs a client may have co...
Is SAP HANA’s customer and technical support reliable?
We have been using SAP HANA for a fairly short period of time and have only taken advantage of their customer support. So far, we have not had issues that required specialized help from technical s...
Is SAP HANA difficult to set up and start using?
SAP HANA is fairly easy to set up, however, I do not think a complete beginner can do it. You certainly need some preparation - either you need to have experience with similar solutions, or with ot...
 

Comparisons

 

Also Known As

No data available
SAP High-Performance Analytic Appliance, HANA
 

Learn More

 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Unilever, NHS 24, adidas Group, CHIO Aachen, Hamburg Port Authority (HPA), Bangkok Airways Public Company Limited
Find out what your peers are saying about Apache Spark vs. SAP HANA and other solutions. Updated: October 2024.
816,406 professionals have used our research since 2012.