

Amazon EMR and Cloudera Data Platform compete in the big data processing and analytics space. Cloudera seems to have an upper hand in flexibility and open-source capabilities, while Amazon EMR excels in integration with AWS services.
Features: Amazon EMR integrates well with AWS services like EC2 and S3, offers scalability and ease of management without manual intervention, and supports data processing with tools like Hive and Spark. Cloudera Data Platform provides strong open-source capabilities, offers flexibility, and features a comprehensive Hadoop ecosystem with Ambari for cluster management.
Room for Improvement: Amazon EMR could improve its user interface, enhance cost management, and reduce deployment times. It also faces challenges with integration complexities involving third-party solutions. Cloudera could focus on enhancing governance features, ease of deployment, and improving the security and integration experience with cloud components.
Ease of Deployment and Customer Service: Amazon EMR is deployed on public cloud infrastructure, providing flexibility for cloud-based environments, supported by generally positive customer service reviews. Cloudera Data Platform, used across hybrid and private cloud environments, offers complexity in deployment but also flexibility. Feedback on Cloudera's support is mixed, highlighting opportunities for improvement relative to Amazon EMR's more consistent support.
Pricing and ROI: Amazon EMR operates on a pay-as-you-go model, leading to potential high costs for sustained use but allowing savings through efficient cluster management. Cloudera Data Platform, rooted in open source, can be cost-effective for large-scale deployments despite a complex pricing model. The choice largely depends on specific business scenarios, with Amazon EMR's costs linked to its managed service nature and Cloudera's value tied to open-source flexibility.
A specific example of the positive impact of Cloudera Data Platform is the clearly saved time and improved performance, which is the main result of it.
In terms of return on investment, I see great changes in operational effectiveness measured by RTO when comparing on-premises solutions with cloud solutions.
We get all call support, screen sharing support, and immediate support, so there are no problems.
They help with billing, cost determination, IAM properties, security compliance, and deployment and migration activities.
Having a common chat channel between firms and service providers would make communication faster and more efficient.
Cloudera support is timely and responsive, adhering to the SLAs they provide.
I have communicated with technical support, and they are responsive and helpful.
Scalability can be provisioned using the auto-scaling feature, EC2 instances, on-demand instances, and storage locations like block storage, S3, or file storage.
CDP allows for easy, mostly automated scalability where I can schedule job workflows, fine-tune system resource metrics, and add nodes with just a click.
Integration with other tools works well for us and we successfully scaled the solution after two to three years without any issues.
For scalability, I rate Cloudera Data Platform at an eight out of ten as it is an on-premise solution.
Regular updates, patch installations, monitoring, logging, alerting, and disaster recovery activities are crucial for maintaining stability.
Sometimes a node goes down, but it automatically returns to a healthy state.
Cloudera Data Platform is stable functionality-wise, but it needs some bug fixes for security.
The cost factor differs significantly. When you run Spark application on EKS, you run at the pod level, so you can control the compute cost. But in Amazon EMR, when you have to run one application, you have to launch the entire EC2.
There is room for improvement with respect to retries, handling the volume of data on S3 buckets, cluster provisioning, scaling, termination, security, and integration between services like S3, Glue, Lake Formation, and DynamoDB.
We aim to address these issues with a Kubernetes-based platform that will simplify the task of upgrading services.
Cloudera Data Platform should include additional capabilities and features similar to those offered by other data management solutions like Azure and Databricks.
Databricks, which are more flexible and in tune with current trends in AI and machine learning.
Costs are involved based on cluster resources, data volumes, EC2 instances, instance sizes, Kubernetes, Docker services, storage, and data transfers.
Initially, CDH had a straightforward pricing model based on nodes, but CDP includes factors like processors, cores, terabytes, and drives, making it difficult to calculate costs.
We find Cloudera Data Platform to be cost-effective.
Amazon EMR helps in scalability, real-time and batch processing of data, handling efficient data sources, and managing data lakes, data stores, and data marts on file systems and in S3 buckets.
Amazon EMR provides out-of-the-box functionality because we can deploy and get Spark functionality over Hadoop.
What stands out the most in Cloudera Manager are SDX, which provide centralized control for governance, security, and data lineage across multiple sources.
Cloudera Data Platform has positively impacted our organization by reducing overall manual intervention, requiring fewer efforts and resources to build a big data cluster compared to traditional methods.
The Ranger integration makes it more flexible and reliable for me by allowing control over data access, specifying who can access at what level, such as table level, masking, or data layer level.
| Product | Market Share (%) |
|---|---|
| Amazon EMR | 12.8% |
| Cloudera Distribution for Hadoop | 21.9% |
| Apache Spark | 19.0% |
| Other | 46.3% |
| Product | Market Share (%) |
|---|---|
| Cloudera Data Platform | 6.2% |
| Palantir Foundry | 25.2% |
| Informatica Intelligent Data Management Cloud (IDMC) | 14.1% |
| Other | 54.5% |


| Company Size | Count |
|---|---|
| Small Business | 6 |
| Midsize Enterprise | 5 |
| Large Enterprise | 11 |
| Company Size | Count |
|---|---|
| Small Business | 8 |
| Midsize Enterprise | 5 |
| Large Enterprise | 21 |
Cloudera Data Platform offers a powerful fusion of Hadoop technology and user-centric tools, enabling seamless scalability and open-source flexibility. It supports large-scale data operations with tools like Ranger and Cloudera Data Science Workbench, offering efficient cluster management and containerization capabilities.
Designed to support extensive data needs, Cloudera Data Platform encompasses a comprehensive Hadoop stack, which includes HDFS, Hive, and Spark. Its integration with Ambari provides user-friendliness in management and configuration. Despite its strengths in scalability and security, Cloudera Data Platform requires enhancements in multi-tenant implementation, governance, and UI, while attribute-level encryption and better HDFS namenode support are also needed. Stability, especially regarding the Hue UI, financial costs, and disaster recovery are notable challenges. Additionally, integration with cloud storage and deployment methods could be more intuitive to enhance user experience, along with more effective support and community engagement.
What are the key features?Cloudera Data Platform is implemented extensively across industries like hospitality for data science activities, including managing historical data. Its adaptability extends to operational analytics for sectors like oil & gas, finance, and healthcare, often enhanced by Hortonworks Data Platform for data ingestion and analytics tasks.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.