Apache Hadoop Reviews

Name: Apache Hadoop
Brand: Apache
Rating: 3.9 (40 reviews)

3.9 out of 5

40 reviews
89% willing to recommend

1,781 followers

What is Apache Hadoop?

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Get the Apache Hadoop Buyer's Guide and find out what your peers are saying about Apache Hadoop, Teradata, Snowflake and more!

Apache Hadoop is the #8 ranked solution in top Data Warehouse solutions. PeerSpot users give Apache Hadoop an average rating of 7.8 out of 10. Apache Hadoop is most commonly compared to Teradata: Apache Hadoop vs Teradata. Apache Hadoop is popular among the large enterprise segment, accounting for 75% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 34% of all views.

Helped 842,672 peers since 2012

Featured Apache Hadoop reviews

Sushil Arya

Software developer at Fiserv

When working with Kafka, I saw that the data came in an incremental order. The incremental data processing part is still not very effective in Apache Hadoop. If the data is already there, it can be processed very effectively, especially if the data is coming in every second. If you want to know the location of some data every second, then such data is not processed effectively in Apache Hadoop. I can say that one of the features where improvements are required revolves around the licensing cost of the tool. If the tool can build some licensing structures in a pay-per-use manner, organizations can get the look and feel of Apache Hadoop. Apache Hadoop can offer a licensing structure of the product that can be seen as similar to how AWS operates. Apache Hadoop can look into the capability of processing incremental data. The tool's setup process can be a scope of improvement. Also, it is not very simple because while doing the setup, we need to do all the server settings, including port listing and firewall configurations. If we look at other products on the market, then they can be made simpler. There are certain shortcomings when it comes to the product's technical support part, making it an area where improvements are required. The time frame for the resolution is an area that needs to be improved. The overall communication part of the technical support team also needs improvement.

Read full review

Syed Afroz Pasha

Head Of Data Governance at Alibaba Group

The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with data skewness do not make any sense. We usually have to deal with it using a custom solution. Spark would deal with such cases efficiently. If Hadoop solves the issues the way Spark does, it can compete with Spark at the same level. Hive is a little slower than Spark. Spark is in-memory and parallel processing. Hive is not in-memory, but it is parallel processing.

Read full review

Juliet Hoimonthi

Manager at Robi Axiata Limited

What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.

Read full review

Apache Hadoop mindshare

As of March 2025, the mindshare of Apache Hadoop in the Data Warehouse category stands at 5.1%, down from 5.8% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Data Warehouse

PeerAnalyst reports based on Apache Hadoop reviews

Type	Title	Date
Category	Data Warehouse	Mar 31, 2025	Download
Product	Reviews, tips, and advice from real users	Mar 31, 2025	Download
Comparison	Apache Hadoop vs Snowflake	Mar 31, 2025	Download
Comparison	Apache Hadoop vs Oracle Exadata	Mar 31, 2025	Download
Comparison	Apache Hadoop vs Teradata	Mar 31, 2025	Download

Title	Rating	Mindshare	Recommending
Teradata	4.1	16.1%	87%	76 interviews Add to research
Snowflake	4.2	15.8%	97%	99 interviews Add to research

Valuable Features

Apache Hadoop is valued for its open-source, cost-effective nature, supporting large volumes and diverse data types. Its distributed file system, HDFS, effectively manages storage and scalability while supporting high-speed processing via Spark. Users appreciate flexibility, integration capabilities, and fault tolerance. Tools like Ambari simplify configuration, making it suitable for analytics, big data processing, and real-time streaming. The community-driven ecosystem is advantageous for organizations utilizing a variety of hardware and software environments.

"Hadoop is a distributed file system, and it scales reasonably well provided you give it sufficient resources."
"Hadoop can store any kind of data—structured, unstructured, and semi-structured—and presents it using the relational model through Hive."
"Its flexibility in handling and storing large volumes of data is particularly beneficial, as is its resilience, which ensures data redundancy and fault tolerance."

Room for Improvement

Apache Hadoop users highlight challenges with redundant data replication, rolling restarts, and inefficient I/O operations. Integration complexities and a steep learning curve hinder broader adoption. Suggestions include a more robust user interface, enhanced real-time processing, and simplified setup. Security needs strengthening, especially for large data volumes. Users seek better community support and documentation. There is a demand for incremental data processing improvements and additional training materials for quicker onboarding. Licensing concerns and pricing are recurring issues.

"The problem with Apache Hadoop arose when the guys that originally set it up left the firm, and the group that later owned it didn't have enough technical resources to properly maintain it."
"Hadoop lacks OLAP capabilities."
"Improvements in security measures would be beneficial, given the large volumes of data handled."

ROI

Users have noted that Apache Hadoop offers valuable storage and processing capabilities at a lower cost. Different users experience varying levels of return based on their data analytics sophistication and utilization. While some users find it challenging to calculate precise ROI figures, satisfaction can vary greatly. Cost savings are achieved by optimizing compute resource usage according to workload demands, but the return is dependent on each client’s unique application and the complexity of their analytic processes.

Pricing

Apache Hadoop pricing varies based on usage, largely free due to its open-source nature. Costs arise with enterprise-level implementations or specific distributions, such as Cloudera, which require licenses. The solution can be costly for smaller enterprises, but workable for large-scale data. Cloud deployments often offer cost advantages over on-premises setups. Users emphasize careful consideration of storage and compute scaling. Some distributions like Hortonworks are discontinued, impacting costs and licensing.

"The product is open-source, but some associated licensing fees depend on the subscription level."
"It's reasonable, but there's room for improvement in cost-effectiveness."
"For any big enterprise the costs can be handled, and it is suitable for big enterprises because the scale of data is large. For medium and small enterprises, the tool is on the high-price side."

Popular Use Cases

Apache Hadoop is used primarily for data storage and processing in data lakes and enterprise data hubs. It supports data aggregation for KPIs, big data analytics, real-time streaming with Spark, and integration of legacy systems. Organizations utilize it for data warehousing, ETL processes, AI/ML use cases, and securing internal applications. It also assists in migrating data warehouses, handling large data sets, and connecting to multiple data sources for efficient processing.

Service and Support

Apache Hadoop customer service varies with the distributor; Hortonworks receives high ratings. Documentation is more beneficial than some vendors. Community forums are helpful, offering solutions and support. Those using external vendors find them supportive. Challenges arise with integrating add-ons. Technical support is viewed positively, though some experience delays with complex issues. Many rely on in-house teams, online resources, or vendor support for assistance. Cloudera offers structured support; open-source nature means formal support is less available.

Deployment

Initial setup of Apache Hadoop is often complex, especially for larger node counts or when integrating additional tools. Ambari is recommended for simplifying the process, and cloud deployments can ease setup challenges. Expertise is crucial for a smoother installation, and security considerations add complexity. Learning curves are noted, with initial deployments ranging from a day to several months depending on environment and requirements. Apache Hadoop as a service can streamline the process for some users.

Scalability

Apache Hadoop is reputed for its robust scalability, especially when deployed in the cloud. Users report its seamless expansion by adding data nodes, managing large data volumes efficiently, and accommodating thousands of users. While easier to scale in cloud environments, it handles on-premise scaling well with adequate hardware. Businesses across various industries, including telecommunications and banking, find it flexible and essential for big data infrastructure and analytics processing, providing significant data management capabilities.

Stability

Many users report that Apache Hadoop is stable, especially with proper setup and configuration. Some encountered stability issues during initial setups or due to memory constraints, but these were manageable. Clusters often play a role in maintaining stability, and newer versions have shown improvements. Issues with starting or shutting down were mentioned, but these are rare. Despite some challenges with online data ingestion, Hadoop is generally seen as a stable choice by users.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Apache Hadoop Buyer's Guide for additional reliable information.

Review data by company size

By reviewers

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

34%

Computer Software Company

11%

University

Energy/Utilities Company

Retailer

Manufacturing Company

Government

Educational Organization

Healthcare Company

Comms Service Provider

Real Estate/Law Firm

Media Company

Non Profit

Transportation Company

Wholesaler/Distributor

Legal Firm

Construction Company

Performing Arts

Hospitality Company

Insurance Company

Logistics Company

Pharma/Biotech Company

Outsourcing Company

Compare Apache Hadoop with alternative products

Learn more about Apache Hadoop

Apache Hadoop customers

Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab

Product Categories

Data Warehouse

Popular Comparisons

Teradata vs Apache Hadoop

Snowflake vs Apache Hadoop

Oracle Exadata vs Apache Hadoop

Vertica vs Apache Hadoop

VMware Tanzu Data Solutions vs Apache Hadoop

SAP BW4HANA vs Apache Hadoop

IBM Netezza Performance Server vs Apache Hadoop

Oracle Database Appliance vs Apache Hadoop

IBM Db2 Warehouse vs Apache Hadoop

SAP IQ vs Apache Hadoop

Oracle Big Data Appliance vs Apache Hadoop

See all alternatives

Apache Hadoop reviews

Sort by:

Nick Rapoport

Financial Advisor at a financial services firm with 10,001+ employees

Verified user of Apache Hadoop

Mar 27, 2025

Reliable performance maintained but requires ongoing management and support

Pros

"Hadoop is a distributed file system, and it scales reasonably well provided you give it sufficient resources."

Cons

"The problem with Apache Hadoop arose when the guys that originally set it up left the firm, and the group that later owned it didn't have enough technical resources to properly maintain it."

What is our primary use case?

We had some Apache Hadoop, but very little. It was an on-premises deployment of Apache Hadoop, which we are trying to get away from. It was holding it just fine. We were running all the data that we now have in ADLS was in Apache Hadoop.

How has it helped my organization?

Hadoop was used for years, but there were problems since the people who originally set it up left the firm. The group that owned it later didn't have the technical resources to properly maintain it. Although there was nothing wrong with Hadoop itself, issues arose without proper management and upgrades.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (576 words)

Sushil Arya

Software developer at Fiserv

Verified user of Apache Hadoop

Jul 8, 2024

Provides ease of integration with the IT workflow of a business

Pros

"It is a reliable product."

Cons

"There are certain shortcomings when it comes to the product's technical support part, making it an area where improvements are required."

What is our primary use case?

I use the solution in my company since it makes the analytical processing easy. It takes data into one cluster and then processes it. While working on any GPU whatever the analytics are, and what I get as insight from the data, I can say that processing is very fast.

What is most valuable?

The main features of the tool are the distribution, how it makes data clusters, and what the data is, which are all very organized in Apache Hadoop.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (725 words)

Buyer's Guide

Apache Hadoop

Download free report

Find out what your peers are saying about Apache Hadoop. Updated March 2025

842,672 professionals have used our research since 2012.

it_user340983

Infrastructure Engineer at Zirous, Inc.

Verified user of Apache Hadoop

Mar 22, 2017

Product version discussed: 2.7.2

The Distributed File System stores video, pictures, JSON, XML, and plain text all in the same file system.

What is most valuable?

The Distributed File System, which is the base of Hadoop, has been the most valuable feature with its ability to store video, pictures, JSON, XML, and plain text all in the same file system.

How has it helped my organization?

We do use the Hadoop platform internally, but mostly it is for R&D purposes. However, many of the recent projects that our IT consulting firm has taken on have deployed Hadoop as a solution to store high-velocity and highly variable data sizes and structures, and be able to process that data together quickly and efficiently.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (622 words)

Akhilesh Chipre

Senior Assosiate Consultant at Applied Materials

Verified user of Apache Hadoop

May 1, 2024

Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge

Pros

"It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming."

Cons

"Since it is an open-source product, there won't be much support. "

What is our primary use case?

We use it to store data. Our team then takes this data to create reports on top of that.

How has it helped my organization?

We primarily use Kafka for intensive data streaming. For batch-based processing, we use Hadoop. Additionally, we have our own custom batch catalog that likely helps prepare data for further analysis or use.

We have many projects where our main data storage is done in Hadoop only. All projects take data from Hadoop to provide data insights and reports.

Hadoop YARN for resource management is a really good aspect. It is is very good for managing large data volumes. It allows us to monitor data processing effectively. We can see how much data there is, the consumption of RAM or ROM, and how resources are allocated. It's good for managing and previewing the scale of data processing.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (452 words)

Cheikh Mbengue

Database Administrator at Lacoste

Verified user of Apache Hadoop

Jul 8, 2024

Good fit for telecom sector, and has good data warehousing features

Pros

"I recommend it for the telecom sector. I know it well, and it's a good fit."

What is our primary use case?

Our use case is for a customer who wants to migrate their data warehouse to Hadoop. It's a request from a customer in Senegal who wants to migrate their Oracle data warehouse to Hadoop. I'm trying to migrate it to Hive or HBase.

They're choosing between upgrading Oracle or moving to Cloudera Hadoop. They seem to prefer Cloudera.

The current data warehouse runs on Oracle DB, but we have to migrate the analytics process to Hadoop.

What is most valuable?

My customers like the HDFS and the data warehouse capabilities within Hadoop.

They have integrated other tools as well, like Power BI and Oracle BI, both on Azure, for reporting. Oracle BI is difficult to integrate.

It is difficult to integrate them with Hadoop.

*Disclosure: My company has a business relationship with this vendor other than being a customer: partner

Read full review (485 words)

Syed Afroz Pasha

Head Of Data Governance at Alibaba Group

Verified user of Apache Hadoop

Mar 5, 2024

Compatible with almost all the query engines and is open-sourced

Pros

"Hadoop File System is compatible with almost all the query engines."

Cons

"In certain cases, the configurations for dealing with data skewness do not make any sense."

What is our primary use case?

We use the Hadoop File System. We usually keep the data for our tables or big data on it. Hadoop has a query engine called Hive. We write SQL queries, and the tool usually processes in a parallel environment and gets us the data on Hive.

What is most valuable?

Hadoop File System is a perfect choice if we want to use any database systems or file systems because it is open-source. It has no cost. Or else, we’ll have to use Amazon S3 or Azure database, for which we will have to pay a lot. A lot of big data processing needs a proper partition and structure. Hadoop File System is compatible with almost all the query engines. That’s another reason why people would be very comfortable working with the Hadoop ecosystem.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (469 words)

Juliet Hoimonthi

Manager at Robi Axiata Limited

Verified user of Apache Hadoop

Jul 25, 2022

Has good analysis and processing features for AI/ML use cases, but isn't as user-friendly and requires an advanced level of coding or programming

Pros

"What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies."

Cons

"What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly."

What is our primary use case?

I'm from the data governance team, and this is how my team uses Apache Hadoop: there's a GUI called Apache Atlas, then there's an option called the "business glossary". My team uses the business glossary from Apache Atlas and also uses Apache Ranger. Apache Ranger is another GUI where you can check who is using which data source through the Apache Hadoop platform. My team also uses the Apache Hadoop platform for AI-related use cases and relevant data, so the data required from any kind of AI use case, that data is processed with ETL, specifically with the Talend tool. My team then loads the data in Apache Hadoop, uses that data by making some clusters, and uses the data for AI/ML cases.

What is most valuable?

What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (876 words)

Miodrag Milojevic

Senior Data Archirect at Yettel

Verified user of Apache Hadoop

Aug 9, 2023

A file system for data collection that contains needed information and files

Pros

" It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database. "

Cons

"The stability of the solution needs improvement."

What is our primary use case?

I have been using the latest version of Apache Hadoop. It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.

What needs improvement?

Hadoop isn't so problematic. It deals with file storage and maintenance. It is a network of file operations.

The stability of the solution needs improvement.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (532 words)

Apache Hadoop Reviews

What is Apache Hadoop?

Featured Apache Hadoop reviews

Apache Hadoop mindshare

PeerAnalyst reports based on Apache Hadoop reviews

Valuable Features

Room for Improvement

ROI

Pricing

Popular Use Cases

Service and Support

Deployment

Scalability

Stability

Review data by company size

Top industries

Compare Apache Hadoop with alternative products

Learn more about Apache Hadoop

Apache Hadoop customers

Related questions

Product Categories

Popular Comparisons

Apache Hadoop reviews

Pros

Cons

What is our primary use case?

How has it helped my organization?

Pros

Cons

What is our primary use case?

What is most valuable?

What is most valuable?

How has it helped my organization?

Pros

Cons

What is our primary use case?

How has it helped my organization?

Pros

What is our primary use case?

What is most valuable?

Pros

Cons

What is our primary use case?

What is most valuable?

Pros

Cons

What is our primary use case?

What is most valuable?

Pros

Cons

What is our primary use case?

What needs improvement?