Apache Hadoop Reviews

Name: Apache Hadoop
Brand: Apache
Rating: 3.9 (40 reviews)

3.9 out of 5

40 reviews
89% willing to recommend

1,781 followers

What is Apache Hadoop?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Get the Apache Hadoop Buyer's Guide and find out what your peers are saying about Apache Hadoop, Teradata, Snowflake and more!

Apache Hadoop is the #7 ranked solution in top Data Warehouse solutions. PeerSpot users give Apache Hadoop an average rating of 7.8 out of 10. Apache Hadoop is most commonly compared to Teradata: Apache Hadoop vs Teradata. Apache Hadoop is popular among the large enterprise segment, accounting for 75% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 33% of all views.

Helped 848,253 peers since 2012

Featured Apache Hadoop reviews

Sushil Arya

Software developer at Fiserv

When working with Kafka, I saw that the data came in an incremental order. The incremental data processing part is still not very effective in Apache Hadoop. If the data is already there, it can be processed very effectively, especially if the data is coming in every second. If you want to know the location of some data every second, then such data is not processed effectively in Apache Hadoop. I can say that one of the features where improvements are required revolves around the licensing cost of the tool. If the tool can build some licensing structures in a pay-per-use manner, organizations can get the look and feel of Apache Hadoop. Apache Hadoop can offer a licensing structure of the product that can be seen as similar to how AWS operates. Apache Hadoop can look into the capability of processing incremental data. The tool's setup process can be a scope of improvement. Also, it is not very simple because while doing the setup, we need to do all the server settings, including port listing and firewall configurations. If we look at other products on the market, then they can be made simpler. There are certain shortcomings when it comes to the product's technical support part, making it an area where improvements are required. The time frame for the resolution is an area that needs to be improved. The overall communication part of the technical support team also needs improvement.

Read full review

Syed Afroz Pasha

Head Of Data Governance at Alibaba Group

The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with data skewness do not make any sense. We usually have to deal with it using a custom solution. Spark would deal with such cases efficiently. If Hadoop solves the issues the way Spark does, it can compete with Spark at the same level. Hive is a little slower than Spark. Spark is in-memory and parallel processing. Hive is not in-memory, but it is parallel processing.

Read full review

Juliet Hoimonthi

Manager at Robi Axiata Limited

What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.

Read full review

Apache Hadoop mindshare

As of April 2025, the mindshare of Apache Hadoop in the Data Warehouse category stands at 5.0%, down from 5.7% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Data Warehouse

PeerAnalyst reports based on Apache Hadoop reviews

Type	Title	Date
Category	Data Warehouse	Apr 19, 2025	Download
Product	Reviews, tips, and advice from real users	Apr 19, 2025	Download
Comparison	Apache Hadoop vs Snowflake	Apr 19, 2025	Download
Comparison	Apache Hadoop vs Oracle Exadata	Apr 19, 2025	Download
Comparison	Apache Hadoop vs Teradata	Apr 19, 2025	Download

Title	Rating	Mindshare	Recommending
Teradata	4.1	16.3%	87%	76 interviews Add to research
Snowflake	4.2	15.2%	97%	100 interviews Add to research

Valuable Features

Apache Hadoop is valued for its open-source, cost-effective nature, supporting large volumes and diverse data types. Its distributed file system, HDFS, effectively manages storage and scalability while supporting high-speed processing via Spark. Users appreciate flexibility, integration capabilities, and fault tolerance. Tools like Ambari simplify configuration, making it suitable for analytics, big data processing, and real-time streaming. The community-driven ecosystem is advantageous for organizations utilizing a variety of hardware and software environments.

"Hadoop is a distributed file system, and it scales reasonably well provided you give it sufficient resources."
"Hadoop can store any kind of data—structured, unstructured, and semi-structured—and presents it using the relational model through Hive."
"Its flexibility in handling and storing large volumes of data is particularly beneficial, as is its resilience, which ensures data redundancy and fault tolerance."

Room for Improvement

Apache Hadoop users highlight challenges with redundant data replication, rolling restarts, and inefficient I/O operations. Integration complexities and a steep learning curve hinder broader adoption. Suggestions include a more robust user interface, enhanced real-time processing, and simplified setup. Security needs strengthening, especially for large data volumes. Users seek better community support and documentation. There is a demand for incremental data processing improvements and additional training materials for quicker onboarding. Licensing concerns and pricing are recurring issues.

"The problem with Apache Hadoop arose when the guys that originally set it up left the firm, and the group that later owned it didn't have enough technical resources to properly maintain it."
"Hadoop lacks OLAP capabilities."
"Improvements in security measures would be beneficial, given the large volumes of data handled."

ROI

Users have noted that Apache Hadoop offers valuable storage and processing capabilities at a lower cost. Different users experience varying levels of return based on their data analytics sophistication and utilization. While some users find it challenging to calculate precise ROI figures, satisfaction can vary greatly. Cost savings are achieved by optimizing compute resource usage according to workload demands, but the return is dependent on each client’s unique application and the complexity of their analytic processes.

Pricing

Apache Hadoop pricing varies based on usage, largely free due to its open-source nature. Costs arise with enterprise-level implementations or specific distributions, such as Cloudera, which require licenses. The solution can be costly for smaller enterprises, but workable for large-scale data. Cloud deployments often offer cost advantages over on-premises setups. Users emphasize careful consideration of storage and compute scaling. Some distributions like Hortonworks are discontinued, impacting costs and licensing.

"The product is open-source, but some associated licensing fees depend on the subscription level."
"It's reasonable, but there's room for improvement in cost-effectiveness."
"For any big enterprise the costs can be handled, and it is suitable for big enterprises because the scale of data is large. For medium and small enterprises, the tool is on the high-price side."

Popular Use Cases

Apache Hadoop is used primarily for data storage and processing in data lakes and enterprise data hubs. It supports data aggregation for KPIs, big data analytics, real-time streaming with Spark, and integration of legacy systems. Organizations utilize it for data warehousing, ETL processes, AI/ML use cases, and securing internal applications. It also assists in migrating data warehouses, handling large data sets, and connecting to multiple data sources for efficient processing.

Service and Support

Apache Hadoop customer service varies with the distributor; Hortonworks receives high ratings. Documentation is more beneficial than some vendors. Community forums are helpful, offering solutions and support. Those using external vendors find them supportive. Challenges arise with integrating add-ons. Technical support is viewed positively, though some experience delays with complex issues. Many rely on in-house teams, online resources, or vendor support for assistance. Cloudera offers structured support; open-source nature means formal support is less available.

Deployment

Initial setup of Apache Hadoop is often complex, especially for larger node counts or when integrating additional tools. Ambari is recommended for simplifying the process, and cloud deployments can ease setup challenges. Expertise is crucial for a smoother installation, and security considerations add complexity. Learning curves are noted, with initial deployments ranging from a day to several months depending on environment and requirements. Apache Hadoop as a service can streamline the process for some users.

Scalability

Apache Hadoop is reputed for its robust scalability, especially when deployed in the cloud. Users report its seamless expansion by adding data nodes, managing large data volumes efficiently, and accommodating thousands of users. While easier to scale in cloud environments, it handles on-premise scaling well with adequate hardware. Businesses across various industries, including telecommunications and banking, find it flexible and essential for big data infrastructure and analytics processing, providing significant data management capabilities.

Stability

Many users report that Apache Hadoop is stable, especially with proper setup and configuration. Some encountered stability issues during initial setups or due to memory constraints, but these were manageable. Clusters often play a role in maintaining stability, and newer versions have shown improvements. Issues with starting or shutting down were mentioned, but these are rare. Despite some challenges with online data ingestion, Hadoop is generally seen as a stable choice by users.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Apache Hadoop Buyer's Guide for additional reliable information.

Review data by company size

By reviewers

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

33%

Computer Software Company

11%

University

Energy/Utilities Company

Manufacturing Company

Retailer

Government

Educational Organization

Comms Service Provider

Healthcare Company

Media Company

Real Estate/Law Firm

Non Profit

Transportation Company

Construction Company

Wholesaler/Distributor

Legal Firm

Logistics Company

Performing Arts

Insurance Company

Hospitality Company

Outsourcing Company

Pharma/Biotech Company

Compare Apache Hadoop with alternative products

Learn more about Apache Hadoop

Apache Hadoop customers

Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab