Amazon EMR is highly scalable, reliable, and cost-effective, utilizing Amazon EC2 and S3 for cloud storage. It supports auto-scaling and easy integration with Hadoop, HDFS, and various tools like Spark, Hive, and Flink. The platform offers managed services, reducing hardware management. Users benefit from its processing speed, data storage capacity, and security features. It supports frameworks for managing structured and unstructured data, facilitating data lakes, stores, and marts integration, with flexible real-time and batch processing capabilities.
- "I rate Amazon EMR as ten out of ten."
- "Amazon EMR has multiple connectors that can connect to various data sources."
- "The security of the managed workflow and the managed services are the best features for us. Since we inherited their security model and it's all managed services, those are the key benefits for our clients."
Amazon EMR requires improvement in areas like user interface, web support, and cluster configuration. Users find steep learning curves and encounter version issues affecting stability and compatibility. There's a need for better cost optimization, faster start times, and enhanced monitoring and debugging features. Expanding platform integrations and automating provisioning and scaling could enhance efficiency. Security, pricing, and support services require enhancements, and adding newer technologies would increase flexibility and user control.
- "There is room for improvement with respect to retries, handling the volume of data on S3 buckets, cluster provisioning, scaling, termination, security, and integration between services like S3, Glue, Lake Formation, and DynamoDB."
- "Spark jobs take longer on Amazon EMR compared to previous experiences."
- "The solution can become expensive if you are not careful."