Try our new research platform with insights from 80,000+ expert users

Amazon EMR vs Apache Hadoop comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Amazon EMR
Average Rating
7.8
Number of Reviews
21
Ranking in other categories
Hadoop (3rd), Cloud Data Warehouse (8th)
Apache Hadoop
Average Rating
7.8
Number of Reviews
37
Ranking in other categories
Data Warehouse (6th)
 

Featured Reviews

CG
Aug 1, 2023
Suitable for online deployments with a serverless architecture
EMR Serverless is useful for online deployments that require a serverless architecture. We were previously using a server-based architecture, but we have since switched to serverless The product is very well-designed. It is easy to use, and it is very cost-effective.   The solution eliminates…
AC
Apr 11, 2024
Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge
We primarily use Kafka for intensive data streaming. For batch-based processing, we use Hadoop. Additionally, we have our own custom batch catalog that likely helps prepare data for further analysis or use. We have many projects where our main data storage is done in Hadoop only. All projects take data from Hadoop to provide data insights and reports. Hadoop YARN for resource management is a really good aspect. It is is very good for managing large data volumes. It allows us to monitor data processing effectively. We can see how much data there is, the consumption of RAM or ROM, and how resources are allocated. It's good for managing and previewing the scale of data processing.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The solution is pretty simple to set up."
"It has a variety of options and support systems."
"The solution is scalable."
"One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
"Amazon EMR's most valuable features are processing speed and data storage capacity."
"The initial setup is pretty straightforward."
"The solution helps us manage huge volumes of data."
"The security of the managed workflow and the managed services are the best features for us. Since we inherited their security model and it's all managed services, those are the key benefits for our clients."
"The performance is pretty good."
"The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable."
"Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done."
"It's good for storing historical data and handling analytics on a huge amount of data."
"Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
"The tool's stability is good."
"The most valuable feature is the database."
"The most valuable feature is scalability and the possibility to work with major information and open source capability."
 

Cons

"The solution can become expensive if you are not careful."
"As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data."
"The problem for us is it starts very slow."
"There is no need to pay extra for third-party software."
"The most complicated thing is configuring to the cluster and ensure it's running correctly."
"The dashboard management could be better. Right now, it's lacking a bit."
"The legacy versions of the solution are not supported in the new versions."
"The product must add some of the latest technologies to provide more flexibility to the users."
"The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks."
"Hadoop's security could be better."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"The load optimization capabilities of the product are an area of concern where improvements are required."
"I would like to see more direct integration of visualization applications."
"Real-time data processing is weak. This solution is very difficult to run and implement."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"The stability of the solution needs improvement."
 

Pricing and Cost Advice

"Amazon EMR's price is reasonable."
"The price of the solution is expensive."
"There is a small fee for the EMR system, but major cost components are the underlying infrastructure resources which we actually use."
"I rate the tool's pricing a five out of ten. It can be expensive since it's a managed service, and if you are not careful, you can run into unexpected charges. You can make a mistake that costs you tens of thousands of dollars. That's happened to us twice, so I'm sensitive to it. We're still trying to work on that. Our smallest client probably spends a hundred thousand dollars yearly on licensing, while our largest is well over a million."
"There is no need to pay extra for third-party software."
"You don't need to pay for licensing on a yearly or monthly basis, you only pay for what you use, in terms of underlying instances."
"The product is not cheap, but it is not expensive."
"The cost of Amazon EMR is very high."
"​There are no licensing costs involved, hence money is saved on the software infrastructure​."
"The price of Apache Hadoop could be less expensive."
"It's reasonable, but there's room for improvement in cost-effectiveness."
"We just use the free version."
"Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
"The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
"We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
"If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
report
Use our free recommendation engine to learn which Cloud Data Warehouse solutions are best for your needs.
801,394 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
25%
Computer Software Company
13%
Manufacturing Company
9%
Educational Organization
7%
Financial Services Firm
30%
Computer Software Company
11%
University
7%
Manufacturing Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Amazon EMR?
Amazon EMR is a good solution that can be used to manage big data.
What is your experience regarding pricing and costs for Amazon EMR?
I rate the tool's pricing a five out of ten. It can be expensive since it's a managed service, and if you are not careful, you can run into unexpected charges. You can make a mistake that costs you...
What needs improvement with Amazon EMR?
The solution can become expensive if you are not careful.
What do you like most about Apache Hadoop?
It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.
What is your experience regarding pricing and costs for Apache Hadoop?
I would rate the product's subscription-based pricing a six out of ten. It's reasonable, but there's room for improvement in cost-effectiveness.
What needs improvement with Apache Hadoop?
The product's availability of comprehensive training materials could be improved for faster onboarding and skill development among team members.
 

Also Known As

Amazon Elastic MapReduce
No data available
 

Overview

 

Sample Customers

Yelp
Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
Find out what your peers are saying about Amazon EMR vs. Apache Hadoop and other solutions. Updated: September 2024.
801,394 professionals have used our research since 2012.