Try our new research platform with insights from 80,000+ expert users

Amazon EMR vs Apache Spark comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Amazon EMR
Ranking in Hadoop
3rd
Average Rating
7.8
Reviews Sentiment
7.2
Number of Reviews
22
Ranking in other categories
Cloud Data Warehouse (11th)
Apache Spark
Ranking in Hadoop
1st
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Compute Service (4th), Java Frameworks (2nd)
 

Mindshare comparison

As of January 2025, in the Hadoop category, the mindshare of Amazon EMR is 14.2%, down from 18.2% compared to the previous year. The mindshare of Apache Spark is 18.4%, down from 21.5% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Hadoop
 

Featured Reviews

Prashant  Singh - PeerSpot reviewer
Seamless data integration enhances reporting efficiency and an easy setup
Amazon EMR has multiple connectors that can connect to various data sources. The service charges are based on processing only, depending on the resources used, which can help save money. It is easy to integrate with other services for storage, allowing data to be shifted to cheaper storage based on usage.
Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
"It has a variety of options and support systems."
"The security of the managed workflow and the managed services are the best features for us. Since we inherited their security model and it's all managed services, those are the key benefits for our clients."
"The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
"The project management is very streamlined."
"The initial setup is pretty straightforward."
"Amazon EMR's most valuable features are processing speed and data storage capacity."
"The solution is scalable."
"The data processing framework is good."
"Spark can handle small to huge data and is suitable for any size of company."
"Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"The solution is scalable."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"Apache Spark can do large volume interactive data analysis."
"I found the solution stable. We haven't had any problems with it."
 

Cons

"There is no need to pay extra for third-party software."
"The product's features for storing data in static clusters could be better."
"There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
"Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
"The legacy versions of the solution are not supported in the new versions."
"The initial setup was time-consuming."
"The product must add some of the latest technologies to provide more flexibility to the users."
"Modules and strategies should be better handled and notified early in advance."
"Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available."
"At times during the deployment process, the tool goes down, making it look less robust. To take care of the issues in the deployment process, users need to do manual interventions occasionally."
"There were some problems related to the product's compatibility with a few Python libraries."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"They could improve the issues related to programming language for the platform."
"The initial setup was not easy."
 

Pricing and Cost Advice

"There is no need to pay extra for third-party software."
"The product is not cheap, but it is not expensive."
"I rate the tool's pricing a five out of ten. It can be expensive since it's a managed service, and if you are not careful, you can run into unexpected charges. You can make a mistake that costs you tens of thousands of dollars. That's happened to us twice, so I'm sensitive to it. We're still trying to work on that. Our smallest client probably spends a hundred thousand dollars yearly on licensing, while our largest is well over a million."
"Amazon EMR's price is reasonable."
"The cost of Amazon EMR is very high."
"Amazon EMR is not very expensive."
"You don't need to pay for licensing on a yearly or monthly basis, you only pay for what you use, in terms of underlying instances."
"There is a small fee for the EMR system, but major cost components are the underlying infrastructure resources which we actually use."
"Apache Spark is an expensive solution."
"We are using the free version of the solution."
"It is an open-source solution, it is free of charge."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"Apache Spark is an open-source tool."
"It is an open-source platform. We do not pay for its subscription."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
831,265 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
24%
Computer Software Company
14%
Manufacturing Company
9%
Educational Organization
7%
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
7%
University
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Amazon EMR?
Amazon EMR is a good solution that can be used to manage big data.
What is your experience regarding pricing and costs for Amazon EMR?
The cost of Amazon EMR is a little bit expensive, especially considering the support package, which includes a gold package.
What needs improvement with Amazon EMR?
Spark jobs take longer on Amazon EMR compared to previous experiences. This aspect could be improved to make them more efficient.
What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
 

Also Known As

Amazon Elastic MapReduce
No data available
 

Overview

 

Sample Customers

Yelp
NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Find out what your peers are saying about Amazon EMR vs. Apache Spark and other solutions. Updated: January 2025.
831,265 professionals have used our research since 2012.