EMR is used to analyze data for projects where we wish to ingest multiple data sources into our analysts' data lakes. EMR is further used to process millions of roles within a fixed span of hours.
Lead Data Engineer at Seven Lakes Enterprises, Inc.
Simplifies running big data frameworks, but needs to improve its modules
Pros and Cons
- "It has a variety of options and support systems."
- "Modules and strategies should be better handled and notified early in advance."
What is our primary use case?
What is most valuable?
It gives us multiple options from Infra to the WES and different tools. It has a variety of options and support systems. Plus, it makes life easy for our developers, whether in the cloud or on-prem. Related DevOS installations are quick and easy to maintain.
What needs improvement?
Interdependencies with a third-party or open source solution should be improved. Modules and strategies should be better handled and notified early in advance. Maybe if AWS starts releasing AWS-certified or AWS-verified installations, that will generate even more confidence just like OpenJet, it'll add a specific version.
For how long have I used the solution?
I have been using Amazon EMR for three years.
Buyer's Guide
Amazon EMR
November 2024
Learn what your peers think about Amazon EMR. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
What do I think about the stability of the solution?
The solution is mostly stable with certain data and code related issues.
What do I think about the scalability of the solution?
Amazon EMR is a scalable solution. We have around twenty users, but they are engineering users. Outside, we have, our analytics product, which is being used for most of our clients. We have plans of upgrading the usage of the solution.
How are customer service and support?
The technical support team is good. The team is thorough with their knowledge and are quick to respond.
How was the initial setup?
The initial setup of Amazon EMR is straightforward if you know the basics and have knowledge of big data and architecture. The AWS documentation also helps with the deployment. Initially, the deployment takes a few hours and then it becomes easy.
The maintenance depends on your application and specific requirements, but for deployment, you don't need a much bigger team, probably a team of one cloud guy should be enough.
What about the implementation team?
The deployment can be done in-house.
What's my experience with pricing, setup cost, and licensing?
There is a small fee for the EMR system, but the major cost components are the underlying infrastructure resources that we actually use.
What other advice do I have?
My advice would be to do a dependency analysis to understand the limitations before planning to move in with Amazon EMR. I would rate the overall solution a six out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East
An inexpensive solution that can be used to manage big data
Pros and Cons
- "Amazon EMR is a good solution that can be used to manage big data."
- "As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data."
What is our primary use case?
We use Amazon EMR to manage new data software like Hadoop.
What is most valuable?
Amazon EMR is a good solution that can be used to manage big data.
What needs improvement?
As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data.
For how long have I used the solution?
I have been using Amazon EMR for three years.
How are customer service and support?
The solution’s technical support is good.
How was the initial setup?
The solution's initial setup is very easy since it's all about cloud service deployment.
What's my experience with pricing, setup cost, and licensing?
Amazon EMR is not very expensive.
What other advice do I have?
I would highly recommend Amazon EMR to other users.
Overall, I rate Amazon EMR an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Amazon EMR
November 2024
Learn what your peers think about Amazon EMR. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Lead Data Scientist at a manufacturing company with 10,001+ employees
A stable and scalable solution, but the initial setup is time-consuming
Pros and Cons
- "The solution is scalable."
- "The initial setup was time-consuming."
What is our primary use case?
The product is deployed on cloud.
What needs improvement?
The product can be improved by automating their up-sizing and downsizing their cluster.
For how long have I used the solution?
We have been using this solution for less than one year and are currently using one of the latest versions.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
The solution is scalable.
How are customer service and support?
I cannot rate customer service and support as we have not contacted them.
How was the initial setup?
The initial setup was time-consuming, and deployment took approximately 30 minutes. In addition, one person was required for deployment and maintenance.
What's my experience with pricing, setup cost, and licensing?
I cannot comment on licensing costs as I don't know the prices.
Which other solutions did I evaluate?
We evaluated IQ.
What other advice do I have?
I rate this solution seven out of ten. The solution is good but can be improved by making it more user-friendly and easy to set up.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Engineering Manager/Solution architect at a computer software company with 201-500 employees
Stable, scalable, and has all the necessary distributions
Pros and Cons
- "One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR."
- "Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana."
What is our primary use case?
A use case of this solution, for one of our clients with a large database of letters with addresses, is to predict if a person still lives at the listed address or if they have moved to another. We leverage EMR and SageMaker in AWS.
EMR is cloud-based and managed through the cloud.
What is most valuable?
One of the valuable features about this solution is that it's managed services, so it's pretty stable, and scalable as much as you wish. It has all the necessary distributions. With some additional work, it's also possible to change to a Spark version with the latest version of EMR. It also has Hudi, so we are leveraging Apache Hudi on EMR for change data capture, so then it comes out-of-the-box in EMR.
What needs improvement?
Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana.
For how long have I used the solution?
I have been working with this solution for three years.
What do I think about the stability of the solution?
This solution is pretty stable.
What do I think about the scalability of the solution?
It's managed services, so it's scalable as much as you wish.
There are something like 40 to 50 people using EMR in my organization.
How are customer service and support?
We are an AWS Premier Partner, so we have all the necessary support and the ability to contact product teams.
Which solution did I use previously and why did I switch?
We didn't use any other products before implementing EMR. Some of our clients have Cloudera distributions, but we prefer EMR.
How was the initial setup?
The installation is straightforward because you can do it from the AWS Console or with Terraform. You can do it yourself.
What about the implementation team?
We implement this solution ourselves. On our team, we have admins, data engineers, DevOps engineers, and MLOps engineers. We have 40 or 50 data engineers.
What's my experience with pricing, setup cost, and licensing?
You don't need to pay for licensing on a yearly or monthly basis, you only pay for what you use, in terms of underlying instances.
What other advice do I have?
We have a range of clients in addition to the client with the large database of addresses. Another client is a large blockchain company and we do analytics for them, using Bare Metal and Hadoop, but not EMR. We're also doing Spark Streaming, Spark SQL, and some queries with Impala. We also have a company that enriches data from mobile companies, in terms of GAL locations of cell phones, with a variety of data from other sources to predict profitability.
I rate Amazon EMR an eight out of ten. It's continuously improving, and now it's possible to manage EMR directly from SageMaker Notebook. It's continuously evolving. I would recommend EMR to others because it's pretty straightforward, so onboarding doesn't take much time.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Provides good scalability and is easily adaptable to the environment
Pros and Cons
- "Amazon EMR's most valuable features are processing speed and data storage capacity."
- "The product's features for storing data in static clusters could be better."
What is most valuable?
Amazon EMR's most valuable features are processing speed and data storage capacity.
What needs improvement?
The product's features for storing data in static clusters could be better. It would be helpful if they released a beta version for limited users to know about the product.
For how long have I used the solution?
I have been using Amazon EMR for one to two months.
What do I think about the stability of the solution?
It is a stable platform.
What do I think about the scalability of the solution?
It is a scalable platform and suitable for enterprise businesses.
How are customer service and support?
Amazon EMR's technical support services are good. We get an instant response and help over their chat portal.
How was the initial setup?
The platform's initial setup process is easy. The time taken for deployment depends on specific project requirements. It takes approximately an hour to complete.
What's my experience with pricing, setup cost, and licensing?
Amazon EMR's price is reasonable.
What other advice do I have?
Amazon EMR is easy to use and easily adaptable to the environment. It reduces the cost of storing data in a static cluster.
I rate it a nine out of ten.
Which deployment model are you using for this solution?
Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Science Engineer
Ability to easily and quickly resize the cluster is what really makes it stand out
Pros and Cons
- "The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions."
- "There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange."
What is most valuable?
The ability to resize the cluster is what really makes it stand out over other Hadoop and big data solutions. You can do it very easily and quickly. It is a managed service from AWS Amazon so it removes a lot of the headaches of configuring the different environments for all the nodes in the cluster, and frees you up to do other things. You can use it. You can set it up in minutes and it's very straightforward.
How has it helped my organization?
Well, I've been at two different companies and mostly I'll relate to my experience at HLI, Human Longevity, in San Diego. We used it for genomics. Genomics is a perfect use case for big data. We manage literally terabytes of data using some of the tools that are included with EMR like Spark and Hive. What we were able to do with these EMR tools - EMR is a collection of things - was to essentially set up a genomic data warehouse of people's samples and their sequenced DNA. And then we were able to quickly and easily pair that with annotation data which essentially just tells you what your genome means, like what that sequence, or what certain sections of those characters, means. That was just all very, very easy and it allowed everyone to know where, for instance, the most recent versions of certain data lived at all times, which is really important.
What needs improvement?
There were times where they would release new versions and it seemed to end up breaking old versions, which is very strange. It could have been a red herring, it could have been that something else changed in our environment that we never found out. But all of a sudden one day we couldn't run our scripts to start up clusters, the things we could do the day before. It was because they'd released a new version and we had to change things around.
They have listened to the community quite a bit. So, the things that we had suggested to them - they sometimes have older versions of some of these tools because they're open source and Amazon creates their own version of these. Like, for instance, the version of Hive was pretty far behind for a quite a while.
They've addressed that and I think it's partially because of customers like us telling them, "Hey, there are a lot of new features that should be available but aren't in your distribution."
For how long have I used the solution?
For close to two years now.
What do I think about the stability of the solution?
No, not really. I can definitely count on it to do what it needs to do. There hasn't been a time in the last year that it has been anything but the data you're feeding into it.
You have to configure it. You may have to configure your cluster with bigger nodes or with more nodes if the shape of your data changes. That's going to be the nature of the beast with any kind of solution like this, so that's not EMR's fault.
What do I think about the scalability of the solution?
No.
How are customer service and technical support?
I have not called them but we had a plan where, if we had an urgent case, we could email them. There were certain people in the organization who could actually call them for mission critical things in our department using EMR. We could basically either ask those people to do it or we could email them, and we could expect the response within a couple of hours.
We did have to do that when the new version came out and broke the old version. And then when there was one time it turned out to be the data that was a problem. There were so many logs and we were in a time crunch and searching through the logs, trying to figure out what was going on. So we emailed them, and both times they were very responsive, and they solved the problem very quickly.
Which solution did I use previously and why did I switch?
No, not really. The reason that we used it at that company - when I got there, that's what they were using. It was because my boss was very big on using those managed services from Amazon because it does give you an additional layer of insurance where, if something goes wrong at the level of the operating system for instance - the patching for the operating system for the nodes in the cluster - that's on Amazon to take care of that. We didn't have to focus on that so we could focus on actually getting the work done.
How was the initial setup?
It was one of those things where once you figured it out, you've got it. With this big data stuff, you put in a lot of work, trying to set something up and then you sort of set it and forget it.
Amazon has made it much easier since I first started with it. Once you get the cluster set up, if you set it up in the graphical interface, just point and click, you can actually copy a script that you could run from the command line to create that cluster. That is extremely helpful and that's the way that most people do it in production. You have a script and you run and it comes up. So it's a one-button kind of thing.
They tried to make it easy. It was fairly simple once you got through the complexity of everything that was involved with it.
Which other solutions did I evaluate?
Every now and then we would evaluate another vendor like Cloudera or MapR, but at the end of the day, we ended up sticking with EMR because nothing made a compelling enough argument to change.
We did try Cloudera. We liked Cloudera quite a bit, but between the fact that we already had such an investment in EMR and the fact was that Cloudera's cost - it's not that they weren't competitive - just wasn't enough of a cost savings to justify switching. And then MapR came in and tried to sell us on them, and none of us ever saw any benefit for using MapR over any other solutions.
Using Cloudera may have looked a little bit less expensive because Amazon EMR does charge extra fees per node based on the size of the node. When you're using EMR, it can be up to 16 to 32 times the actual original cost of the nodes. But we determined that that extra cost - for us, it was only about two to four times because of the size of the nodes we were using - the penalties weren't as great. And the benefit of not having to manage the infrastructure was enough that we said, "Well, if we want the Cloudera, we would have to do that to a certain extent, potentially." So, we said, "All right, well, it would be more work. So, let's just keep it with EMR."
What other advice do I have?
I would say take advantage of the documentation that exists, there are a lot of tutorials, and there's a really good community. The documentation is actually very thorough and very well-written, which is one of the greatest things with AWS. I don't know if this matters, but I'm a Certified Developer and Solutions Architect with Associate level, so not that I wouldn't criticize them, if I had anything to criticize.
I gave it a nine out of 10 because nothing is perfect. Everything can always improve but, overall, it's extremely well thought out. The cost is a bit prohibitive sometimes, but the whole world of big data and cluster computing can be very daunting, especially for someone new getting into it as a developer, or from a business perspective. Amazon makes it about as easy as it can be to dip a toe in those waters.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Chief Engineer (Enterprise System Presales/Postsales) at a comms service provider with 10,001+ employees
Reliable, responsive support, and simple implementation
Pros and Cons
- "We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot."
What is our primary use case?
We are using Amazon EMR for data pipelines. We are using it to put our data into it and then we are transforming it.
What is most valuable?
We are using applications, such as Splunk, Livy, Hadoop, and Spark. We are using all of these applications in Amazon EMR and they're helping us a lot.
For how long have I used the solution?
I have been using Amazon EMR for approximately one year.
What do I think about the stability of the solution?
Amazon EMR is reliable and stable.
How are customer service and support?
Whenever we have issues we contact Amazon EMR support and we receive the responses. We are satisfied with the support.
How was the initial setup?
The initial setup of Amazon EMR is easy.
What's my experience with pricing, setup cost, and licensing?
The cost of Amazon EMR is very high.
What other advice do I have?
My advice to others is that before implementing a solution they should look around. There are multiple solutions available and one might be a better fit for their use case.
I rate Amazon EMR an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Deputy CTO at a tech company with 51-200 employees
Easily accessible to many dev teams, simple to use and very flexible
Pros and Cons
- "This is the best tool for hosts and it's really flexible and scalable."
- "The most complicated thing is configuring to the cluster and ensure it's running correctly."
What is our primary use case?
We use the solution to run spark script on our system for combination algorithms on our website. It's a Hadoop cluster to make the calculation to execute spark scripts. We have a cashback website and offer personalized recommendations to users and EMR is used to make the calculation by accessing user data. We also use this product for building a data lake using our numerous primary data sources. We've used EMR to make the latest version in the data lake. All data is stored in S3 bucket in a packet format. I'm Deputy CTO of the company.
What is most valuable?
This tool is simple to use and it's really accessible to many dev teams. It's the best tool for hosts and it's really flexible and scalable which is necessary because we have a lot of data and some of our tasks take a lot of resources.
What needs improvement?
The most complicated thing is configuring to the cluster and to ensure it's running correctly. You need to configure at least three Amazon policies to get authorization for all the instances. And if you're new on the system it's really complicated. It's something that could be simplified for users. For additional features, I'd like to see a better MLOps platform but it's possible that it's already in production.
For how long have I used the solution?
I've been using this solution for almost six years.
What do I think about the stability of the solution?
We haven't had any problems with stability.
What do I think about the scalability of the solution?
The solution is scalable, you can choose the cluster with the instance you need depending on memory or storage or GPU as you need. There are about five users in the company.
How was the initial setup?
The initial setup is simple when you know the tool, but when you don't know the tool you need to look through the documentation. One of our team carried out deployment. We recently rebuilt our data lake and it took a day to get the right configuration.
Which other solutions did I evaluate?
We were on AWS for the web part so it was logical to take another AWS product, but today we are looking for an acceleration tool. The DevOps part takes a lot of time so we want to integrate with all the scope of MLOps and so we're looking at Databricks.
What other advice do I have?
I would recommend this solution and I rate it an eight out of 10.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Amazon EMR Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Popular Comparisons
Apache Spark
Cloudera Distribution for Hadoop
HPE Ezmeral Data Fabric
Spark SQL
Hortonworks Data Platform
IBM Analytics Engine
Buyer's Guide
Download our free Amazon EMR Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions: