Apache Hadoop Reviews and Pricing

Donghan Kim

R&D Head, Big Data Adjunct Professor at SK Communications Co., Ltd.

Jan 26, 2022

Download

Not dependent on third-party vendors

Pros and Cons

"We selected Apache Hadoop because it is not dependent on third-party vendors."

"Real-time data processing is weak. This solution is very difficult to run and implement."

What needs improvement?

Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products. We are continuously researching other solutions and other vendors.

Another weak point of this solution, technically speaking, is that it's very difficult to run and difficult to smoothly implement. Preparation and integration are important.

The integration of this solution with other data-related products and solutions, and having other functions, e.g. API connectivity, are what I want to see in the next release.

For how long have I used the solution?

We've started using Apache Hadoop since 2011.

Which solution did I use previously and why did I switch?

We selected Apache Hadoop because it is not dependent on third-party vendors. Previously, our main business unit was related to big vendors like IBM, Oracle, and EMC, etc. We wanted to have a competitive advantage in technology, so we selected the Apache project and used Apache open source.

What about the implementation team?

The solution was implemented through a local vendor team here in Korea.

Buyer's Guide

Apache Hadoop

July 2025

Free Report: Apache Hadoop Reviews and More

Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: July 2025.

DOWNLOAD NOW

861,524 professionals have used our research since 2012.

Which other solutions did I evaluate?

We evaluated IBM, Oracle, and EMC solutions.

What other advice do I have?

My position in the company falls under the research and development of new technologies and solutions. I investigate, research, download, and read information and reports as part of my job.

Our company has a big data business division, and we propose, develop, and implement things which are related to big data projects. We are using Cloud Hadoop open source versions, distributed versions, and commercial Hadoop distributed versions. We propose all these versions to our customers from any industry.

Our focus is on the public sector. Big data is our strong point in Korea. Our company is the leader in big data technology, including infrastructure and visualization. This is a solution we provide to our customers. We are also in partnership with IBM. Our main focus is on Apache Hadoop.

We provide Apache Hadoop to our customers. I work for a systems integrator and technical consulting company.

Overall, our satisfaction with this solution is so-so. We continuously investigate new technologies and other solutions.

The Hadoop open source version was implemented in 95% of our company's customer base. Our remaining customers had the local vendor's Hadoop platform package implemented for them.

Our company is in the big data business. Before the big data business back in 1976, we implemented BI (business intelligence), DW (data warehouse), EIS, and DSS (decision support system), so we are in partnership with IBM.

I don't have advice for people looking into implementing this solution because I'm not in the business unit. I'm in the research field. My role is to plan new technology and provide consultation to our customers for big data projects in the early stages.

My rating for Apache Hadoop from a technical standpoint is eight out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Satya Raju

Archtect - software engineering at Innominds

Oct 22, 2024

Download

Robust data processing and analytics with potential improvements for streaming capabilities

Pros and Cons

"Hadoop can store any kind of data—structured, unstructured, and semi-structured—and presents it using the relational model through Hive."

"Hadoop lacks OLAP capabilities."

What is our primary use case?

I use Hadoop as a data lake in an AIML solution, where it connects to various data sources and ingests data into Hadoop. It is utilized for processing large data volumes with various data sources such as RDBMS, file systems, Kafka for real-time streaming data, IoT, web sockets, and API metadata.

How has it helped my organization?

Hadoop provides a robust data lake functionality, allowing for the ingestion and processing of varied data types. It ensures no data loss through data replication and efficient transformation jobs handled in parallel, enhancing data analytics.

What is most valuable?

Hadoop can store any kind of data—structured, unstructured, and semi-structured—and presents it using the relational model through Hive. The combination with Spark enhances data analytics capabilities.

What needs improvement?

Hadoop lacks OLAP capabilities. I recommend adding a Delta Lake feature to make the data compatible with ACID properties. Also, video and audio streaming import issues could be improved to ensure proper data validation.

For how long have I used the solution?

I have been working with Apache Hadoop for the last ten years.

What do I think about the stability of the solution?

The stability of Hadoop is good. I have been working with it for the last ten years and have not encountered significant stability issues.

What do I think about the scalability of the solution?

Hadoop offers good scalability with horizontal scaling, especially when deployed on cloud platforms like Cloudera, which takes care of scaling the infrastructure. On-premises requires the maintenance of data nodes.

How are customer service and support?

I rate the customer support at eight out of ten. It is satisfactory.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I have worked with traditional RDBMS which do not support analytics, querying, and data replication as efficiently as Hadoop.

How was the initial setup?

Setting up Apache Hadoop requires some learning curve and is of medium complexity. I rate the setup a seven out of ten.

What about the implementation team?

We need provision for the cluster deployment, including a master node, coordinator node, and the setup of Spark and Hive for development.

Which other solutions did I evaluate?

I have also worked with HDFS, Hive, and Apache Spark with Scala and Java.

What other advice do I have?

As cloud technology is emerging, it is advisable to transition from traditional Hadoop to cloud-based solutions like AWS EMR and Azure, which offer better maintenance-free infrastructure management.

I'd rate the solution seven out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Last updated: Oct 22, 2024

Buyer's Guide

Apache Hadoop

July 2025

Free Report: Apache Hadoop Reviews and More

Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: July 2025.

DOWNLOAD NOW

861,524 professionals have used our research since 2012.

reviewer901065

Partner at a tech services company with 11-50 employees

Oct 10, 2021

Download

Highly elastic and stable, but it needs better security

Pros and Cons

"Hadoop is extensible — it's elastic."

"Hadoop's security could be better."

What is our primary use case?

There are several use cases for Hadoop. Sometimes it's used for data warehousing. Other times, it's analytics. And In some cases, it's used to do transformation. For example, I have one client using it to decompress, compress, or encrypt data on ingestion. So, he used it like an ETL engine.

What is most valuable?

Hadoop is extensible — it's elastic.

What needs improvement?

Hadoop's security could be better.

For how long have I used the solution?

I've been using Hadoop for about eight years. I'm not sure exactly.

What do I think about the stability of the solution?

Performance is one of the reasons people choose Hadoop.

What do I think about the scalability of the solution?

Scalability is one of Hadoop's strong suits.

How are customer service and support?

I've never had to use Hadoop support.

How was the initial setup?

The complexity of Hadoop's setup depends on the customer and their needs. However, most of my customers wind up using Hadoop as a service, which makes it very easy. It doesn't need much maintenance. My staff maintains multiple systems, so it's not like there would ever be somebody dedicated to one, and Hadoop is not a high-touch platform.

What other advice do I have?

I rate Hadoop seven out of 10. It's very good, but it could always be better. To anyone considering Hadoop, I recommend that you be mindful of what you're trying to achieve.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: My company has a business relationship with this vendor other than being a customer. Implementer

reviewer1384338

Vice President - Finance & IT at a consumer goods company with 1-10 employees

Jul 15, 2020

Download

Great micro-partitions, helpful technical support and quite stable

Pros and Cons

"The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."

"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."

What is our primary use case?

As an example of a use case, when I was a contractor for Cisco, we were processing mobile network data and the volume was too big. RDBMS was not supporting anything. We started using the Hadoop framework to improve the process and get the results faster.

What is most valuable?

The data is stored in micro-partitions which makes the processes very fast compared to other RDBMS systems. Apache Spark is in the memory process, and it's much better than MapReduce.

Micro-partitions and the HDFS are both excellent features.

What needs improvement?

I'm not sure if I have any ideas as to how to improve the product.

Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution.

The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance.

The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning.

We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout.

For how long have I used the solution?

I've been using the solution for four years.

What do I think about the stability of the solution?

We haven't had too many problems with stability. For the POC we used a small amount of data and we started with 10 nodes. We're gradually increasing in now to 40 nodes. We haven't seen any issues after the small teething period in the beginning. The configuration issues and the performance issues have subsided. Once we learned how to stack everything, it has been much better.

What do I think about the scalability of the solution?

The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so.

We are supporting a multitenancy model and we get the data on supporting the users. I would say, per organization, we have eight to 10 users and probably have a total of around 40 users across the board.

How are customer service and technical support?

We started on the solution as a POC. Once we got into production, we had some minor issues. We get great support. They share advice and helped us tweak some things in terms of the configurations. We've been satisfied with the level of service we've been provided.

Which solution did I use previously and why did I switch?

We have only ever used Apache Hadoop, or a version of it. When we looked for the commercial tier, there was Cloudera and Hortonworks. We started with the Hortonworks due to the fact that at that time we felt it was cost-effective. However, Cloudera bought Hadoop and Hortonworks and now it's all basically the same solution.

How was the initial setup?

The initial setup was a little complex the first time around. We were new to the system, and we didn't have any expertise at that time. Once we get some support and insights into how to work everything properly it went more smoothly.

First, we started with a POC - proof of concept. It takes a couple of days in terms of understanding and configuring everything, etc. When we went to production, it was a couple of hours for deployment and we put into practice everything we learned from the POC.

There's definitely a learning curve. It's stable for us now.

We have a team of developers doing multiple tasks on the solution and few of them are taking care of Hadoop, so we do have a few people handling maintenance.

What about the implementation team?

As we were new to the solution, we found we needed some outside assistance to guide us. However, that was for the POC. In the end, I did it myself.

What other advice do I have?

We're just a customer. We don't have a business relationship with Hadoop.

My day-to-day job is data modeling and architecting.

Originally we used it as an open-source solution. We downloaded it, then we went for a commercial version of it.

In terms of advice, I'd tell other potential users that whether the solution is right for them depends on a few items. If the data volume is too big, it's IoT data, or the stream of data is too much, this solution can handle it and I would definitely recommend Apache Hadoop.

Recently, in the last 18 months, I've been working with the Snowflake, it's a Data Lake project, and I am really impressed with that one. I got a certification so that we started using Snowflake set for our Data Lake environment.

I'd rate the solution eight out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Yogesh Thakkar

Business data analyst at RBSG Internet operations

Sep 19, 2022

Download

A low-cost solution that allows us to download data, but has latency issues when running queries

Pros and Cons

"One valuable feature is that we can download data."

"I think more of the solution needs to be focused around the panel processing and retrieval of data."

What is our primary use case?

We use the solution as a data link for our customer payment and SaaS information. We get data from various sources and then utilize and leverage that data.

What is most valuable?

One valuable feature is that we can download data. Another is that it is a low-cost solution. Hadoop has also made it feasible to have all the data available in one area.

What needs improvement?

We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data.

For how long have I used the solution?

I have been using this solution for about seven or eight years.

What do I think about the stability of the solution?

This is a stable product.

What do I think about the scalability of the solution?

The scalability of the solution is good. Approximately 100 people are currently using this solution within our company.

How are customer service and support?

I would rate the tech support as a four out of five.

How would you rate customer service and support?

Positive

What other advice do I have?

I would recommend this product to others. I would rate it as an eight out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

DulalMali

Data Analytics Practice head at bse

Apr 29, 2022

Download

Stable, highly scalable, but integration could improve

Pros and Cons

"The scalability of Apache Hadoop is very good."

"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."

What needs improvement?

The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.

For how long have I used the solution?

I have been using Apache Hadoop for approximately nine years.

What do I think about the stability of the solution?

Apache Hadoop is stable.

What do I think about the scalability of the solution?

The scalability of Apache Hadoop is very good.

What's my experience with pricing, setup cost, and licensing?

The price of Apache Hadoop could be less expensive.

What other advice do I have?

My advice to others is if you have a strong engineering team then this solution is excellent.

I rate Apache Hadoop an eight out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1464630

Founder & CTO at a tech services company with 1-10 employees

Dec 15, 2020

Download

Processes large data sets across clusters of computers

Pros and Cons

"Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."

"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."

What is our primary use case?

We mainly use Apache Hadoop for real-time streaming. Real-time streaming and integration using Spark streaming and the ecosystem of Spark technologies inside Hadoop.

What is most valuable?

I actually like most of the capabilities, but I think Spark has added reposit capabilities on top of the Hadoop ecosystem. The Spark area includes the capabilities that I like the most with Hadoop.

What needs improvement?

I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine.

The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem.

From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.

For how long have I used the solution?

I have been using this solution for roughly five years.

What do I think about the stability of the solution?

I've never experienced any bugs or glitches.

What do I think about the scalability of the solution?

Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability.

How was the initial setup?

It's a well-known fact that Hadoop's configuration is pretty hard.

What other advice do I have?

Usually, people need to study and prepare for a few use cases and compare multiple ecosystems before choosing one. When people think of using a big data solution, Hadoop comes to mind. For certain use cases, Hadoop is comparable with other technologies. For example, when building a sort of real-time data warehouse — an enterprise data hub —, people don't think about using Hadoop directly. People often use solutions like DROID for building.

At the end of the day, you need to compare technologies — existing technologies against their use cases. You need to study your use case and select the technology inside of Hadoop that will fit your use case. You may find another ecosystem that solves your problem, just keep in mind, Hadoop is not the only solution, there are a lot of solutions. It depends on the use case.

Overall, on a scale from one to ten, I would give Hadoop a rating of eight.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1433400

Technical Lead at a government with 201-500 employees

Oct 20, 2020

Download

Good distributed processing and performance, but very expensive

Pros and Cons

"The performance is pretty good."

"The solution is very expensive."

What is most valuable?

The distributed processing is excellent.

On the solution, Spark is very good.

The performance is pretty good.

What needs improvement?

For the visualization tools, we use Apache Hadoop and it is very slow.

It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable.

The solution needs to be powerful and offer better availability for gathering queries.

The solution is very expensive.

For how long have I used the solution?

I've been using the solution for about five years now.

What do I think about the stability of the solution?

The solution is stable and offers good performance. It doesn't crash or freeze. It's not buggy at all.

What do I think about the scalability of the solution?

You can scale the solution if you need to. We find that it's pretty easy to expand it out.

There were about 13-20 people using it at any given time.

How are customer service and technical support?

The technical support was pretty good. It's my understanding that the company was pretty satisfied with the level of support they received. They were knowledgeable and responsive.

Which solution did I use previously and why did I switch?

I've also worked with MySQL and Postgres. Hadoop is more for analytical processing. While the others claim to have a distributor, Hadoop is far better in that regard. It's excellent compared to other options.

How was the initial setup?

The initial setup was pretty straightforward. It was not overly complex for our team.

What's my experience with pricing, setup cost, and licensing?

The solution isn't cheap. It's quite costly.

What other advice do I have?

The solution is perfect for those dealing with a huge amount of data. Still, you need to check to make sure it meets your company's requirements. You need to understand them before actually choosing the technology you'll ultimately use.

Overall, I would rate the solution at a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.