As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data.
Lead Data Engineer at Seven Lakes Enterprises, Inc.
Real User
Top 5
2023-08-15T13:11:00Z
Aug 15, 2023
Interdependencies with a third-party or open source solution should be improved. Modules and strategies should be better handled and notified early in advance. Maybe if AWS starts releasing AWS-certified or AWS-verified installations, that will generate even more confidence just like OpenJet, it'll add a specific version.
Technology Analyst at Proodlehospitatilityservicesltd
Real User
Top 20
2023-07-27T08:05:04Z
Jul 27, 2023
The product's features for storing data in static clusters could be better. It would be helpful if they released a beta version for limited users to know about the product.
We have had issues with the boolean mathematical operation in 2. X's big version is working in newer versions because the old version of the solution does not support it, which is a compatibility issue that can be improved. In addition, the legacy versions of the solution are not supported in the new versions.
The problem for us is it starts very slow. They need to improve the start time. If we use a long-running EMR, it costs a lot of money. However, when we start, for example, a job, if the job runs for one hour, it's normal as it starts in about ten minutes. If we want, for example, to run each five minutes, it's a problem if it takes ten minutes to start. It's a little bit weird that you cannot use the service within a short period. The support could be better.
The cost is increasing. We are looking into how we can optimize the cost part of EMR. We're doing a comparison between Cloudera running on AWS and running AWS EMR. We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part.
Deputy CTO at a tech company with 51-200 employees
Real User
2021-06-25T18:04:13Z
Jun 25, 2021
The most complicated thing is configuring to the cluster and to ensure it's running correctly. You need to configure at least three Amazon policies to get authorization for all the instances. And if you're new on the system it's really complicated. It's something that could be simplified for users. For additional features, I'd like to see a better MLOps platform but it's possible that it's already in production.
The dashboard management could be better. Right now, it's lacking a bit. I'd like more of a remote connection between my computer and the solution. We have multi-factor authentication, and at one point it was an issue due to the fact that I lost my phone. It stopped me from accessing the system. We have to replicate all the infrastructure and we need to ensure that we have the scalability and to do so in production. We are hoping that Amazon will allow us to scale easily. However, we have not attempted to scale just yet.
Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data. Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.
Spark jobs take longer on Amazon EMR compared to previous experiences. This aspect could be improved to make them more efficient.
The solution can become expensive if you are not careful.
As people are shifting from legacy solutions to other technologies, Amazon EMR needs to add more features that give more flexibility in managing user data.
The product's stability could be even better.
Interdependencies with a third-party or open source solution should be improved. Modules and strategies should be better handled and notified early in advance. Maybe if AWS starts releasing AWS-certified or AWS-verified installations, that will generate even more confidence just like OpenJet, it'll add a specific version.
The product's features for storing data in static clusters could be better. It would be helpful if they released a beta version for limited users to know about the product.
We have had issues with the boolean mathematical operation in 2. X's big version is working in newer versions because the old version of the solution does not support it, which is a compatibility issue that can be improved. In addition, the legacy versions of the solution are not supported in the new versions.
The product can be improved by automating their up-sizing and downsizing their cluster.
The problem for us is it starts very slow. They need to improve the start time. If we use a long-running EMR, it costs a lot of money. However, when we start, for example, a job, if the job runs for one hour, it's normal as it starts in about ten minutes. If we want, for example, to run each five minutes, it's a problem if it takes ten minutes to start. It's a little bit weird that you cannot use the service within a short period. The support could be better.
The cost is increasing. We are looking into how we can optimize the cost part of EMR. We're doing a comparison between Cloudera running on AWS and running AWS EMR. We don't have much control. If we have multiple users, if they want to scale up, the cost will go and increase and we don't know how we can restrict that price part.
Amazon EMR is continuously improving, but maybe something like CI/CD out-of-the-box or integration with Prometheus Grafana.
The most complicated thing is configuring to the cluster and to ensure it's running correctly. You need to configure at least three Amazon policies to get authorization for all the instances. And if you're new on the system it's really complicated. It's something that could be simplified for users. For additional features, I'd like to see a better MLOps platform but it's possible that it's already in production.
The dashboard management could be better. Right now, it's lacking a bit. I'd like more of a remote connection between my computer and the solution. We have multi-factor authentication, and at one point it was an issue due to the fact that I lost my phone. It stopped me from accessing the system. We have to replicate all the infrastructure and we need to ensure that we have the scalability and to do so in production. We are hoping that Amazon will allow us to scale easily. However, we have not attempted to scale just yet.