- Storage
- Processing (cost efficient)
Senior Hadoop Engineer with 1,001-5,000 employees
The heart of BigData
What is most valuable?
How has it helped my organization?
With the increase in data size for the business, this horizontal scalable appliance has answered every business question in terms of storage and processing. Hadoop ecosystem has not only provided a reliable distributed aggregation system but has also allowed room for analytics which has resulted in great data insights.
What needs improvement?
The Apache team is doing great job and releasing Hadoop versions much ahead of what we can think about. Every room for improvement is fixed as soon as a version is released by ASF. Currently, Apache Oozie 4.0.1 has some compatibility issues with Hadoop 2.5.2.
For how long have I used the solution?
2.5 years
Buyer's Guide
Apache Hadoop
December 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
What was my experience with deployment of the solution?
Not at all.
What do I think about the stability of the solution?
We did when we started initially with Hadoop 1.x, which did’t have HA, but now we don’t have any stability issue.
What do I think about the scalability of the solution?
Hadoop is known for its scalability. Yahoo stores approx. 455 PB in their Hadoop cluster.
How are customer service and support?
Customer Service:
It depends on the Hadoop distributor. I would rate Hortonworks 9/10.
Technical Support:I would rate Hortonworks 9/10.
Which solution did I use previously and why did I switch?
We previously used Netezza. We switched because our business required a highly scalable appliance like Hadoop.
How was the initial setup?
It's a bit complex in terms of build around for commodities, but soon it will ease up as the product matures.
What about the implementation team?
We used a vendor team who were 9/10.
What was our ROI?
Valuable storage and processing with a lower cost than previously.
What's my experience with pricing, setup cost, and licensing?
Best in pricing and licensing depends on the flavors, but remember it is only good if you have very large data set which cannot be handled by traditional RDBMS.
Which other solutions did I evaluate?
Cloud options.
What other advice do I have?
First, understand your business requirement; second, evaluate the traditional RDBMS scalability and capability, and finally, if you have reached to the tip of an iceberg (RDBMS) then yes, you definitely need an island (Hadoop) for your business. Feasibility checks are important and efficient for any business before you can take any crucial step. I would also say “Don’t always flow with stream of a river because some time it will lead you to a waterfall, so always research and analyze before you take a ride.”
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Stable, highly scalable, but integration could improve
Pros and Cons
- "The scalability of Apache Hadoop is very good."
- "The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
What needs improvement?
The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.
For how long have I used the solution?
I have been using Apache Hadoop for approximately nine years.
What do I think about the stability of the solution?
Apache Hadoop is stable.
What do I think about the scalability of the solution?
The scalability of Apache Hadoop is very good.
What's my experience with pricing, setup cost, and licensing?
The price of Apache Hadoop could be less expensive.
What other advice do I have?
My advice to others is if you have a strong engineering team then this solution is excellent.
I rate Apache Hadoop an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Apache Hadoop
December 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
Business data analyst at RBSG Internet operations
A low-cost solution that allows us to download data, but has latency issues when running queries
Pros and Cons
- "One valuable feature is that we can download data."
- "I think more of the solution needs to be focused around the panel processing and retrieval of data."
What is our primary use case?
We use the solution as a data link for our customer payment and SaaS information. We get data from various sources and then utilize and leverage that data.
What is most valuable?
One valuable feature is that we can download data. Another is that it is a low-cost solution. Hadoop has also made it feasible to have all the data available in one area.
What needs improvement?
We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data.
For how long have I used the solution?
I have been using this solution for about seven or eight years.
What do I think about the stability of the solution?
This is a stable product.
What do I think about the scalability of the solution?
The scalability of the solution is good. Approximately 100 people are currently using this solution within our company.
How are customer service and support?
I would rate the tech support as a four out of five.
How would you rate customer service and support?
Positive
What other advice do I have?
I would recommend this product to others. I would rate it as an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
CEO at AM-BITS LLC
Good stability and scalability but the visualization isn't good
Pros and Cons
- "The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
- "There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution."
What is our primary use case?
We primarily use the solution for the enterprise data hub and big data warehouse extension.
What is most valuable?
The ability to add multiple nodes without any restriction is the solution's most valuable aspect.
What needs improvement?
What needs improvement depends on the customer and the use case. The classical Hadoop, for example, we consider an old variant. Most now work with flash data.
There is a very wide application for this solution, but in enterprise companies, if you work with classical BI systems, it would be good to include an additional presentation layer for BI solutions.
There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.
For how long have I used the solution?
We've been working with the solution for three to four years.
What do I think about the stability of the solution?
The solution is stable. It has very good disaster stability and multi-rack configuration.
What do I think about the scalability of the solution?
It is possible to scale the solution. We work with companies that have hundreds of users.
How was the initial setup?
The initial setup might not be straightforward for our customers, but it's easy enough for us to handle. However, if we don't build a proof of concept for the company first it may take some time and be quite complex. Pilot projects take about three months to deploy and full spec projects take up to a year because we have to work in all requirements in data governance, security, etc.
What's my experience with pricing, setup cost, and licensing?
We originally built on Hortonworks tech which didn't require any licensing, but that is getting discontinued in 2022, so it's been proposed we move to Cloudera which will have licensing costs associated with it.
What other advice do I have?
We use the on-premises deployment model. It's a requirement for the company we work with, which is a bank. Often customers demand we work with on-premises deployment models.
I'd rate the solution seven out of ten. In terms of the ability to build middleware and offer scalability, it would be 10 out of 10 from me. However, if you take into account only the visualization, I'd only rate it at three or four out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Partner at a tech services company with 11-50 employees
Highly elastic and stable, but it needs better security
Pros and Cons
- "Hadoop is extensible — it's elastic."
- "Hadoop's security could be better."
What is our primary use case?
There are several use cases for Hadoop. Sometimes it's used for data warehousing. Other times, it's analytics. And In some cases, it's used to do transformation. For example, I have one client using it to decompress, compress, or encrypt data on ingestion. So, he used it like an ETL engine.
What is most valuable?
Hadoop is extensible — it's elastic.
What needs improvement?
Hadoop's security could be better.
For how long have I used the solution?
I've been using Hadoop for about eight years. I'm not sure exactly.
What do I think about the stability of the solution?
Performance is one of the reasons people choose Hadoop.
What do I think about the scalability of the solution?
Scalability is one of Hadoop's strong suits.
How are customer service and support?
I've never had to use Hadoop support.
How was the initial setup?
The complexity of Hadoop's setup depends on the customer and their needs. However, most of my customers wind up using Hadoop as a service, which makes it very easy. It doesn't need much maintenance. My staff maintains multiple systems, so it's not like there would ever be somebody dedicated to one, and Hadoop is not a high-touch platform.
What other advice do I have?
I rate Hadoop seven out of 10. It's very good, but it could always be better. To anyone considering Hadoop, I recommend that you be mindful of what you're trying to achieve.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
Technical Lead at a government with 201-500 employees
Good distributed processing and performance, but very expensive
Pros and Cons
- "The performance is pretty good."
- "The solution is very expensive."
What is most valuable?
The distributed processing is excellent.
On the solution, Spark is very good.
The performance is pretty good.
What needs improvement?
For the visualization tools, we use Apache Hadoop and it is very slow.
It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable.
The solution needs to be powerful and offer better availability for gathering queries.
The solution is very expensive.
For how long have I used the solution?
I've been using the solution for about five years now.
What do I think about the stability of the solution?
The solution is stable and offers good performance. It doesn't crash or freeze. It's not buggy at all.
What do I think about the scalability of the solution?
You can scale the solution if you need to. We find that it's pretty easy to expand it out.
There were about 13-20 people using it at any given time.
How are customer service and technical support?
The technical support was pretty good. It's my understanding that the company was pretty satisfied with the level of support they received. They were knowledgeable and responsive.
Which solution did I use previously and why did I switch?
I've also worked with MySQL and Postgres. Hadoop is more for analytical processing. While the others claim to have a distributor, Hadoop is far better in that regard. It's excellent compared to other options.
How was the initial setup?
The initial setup was pretty straightforward. It was not overly complex for our team.
What's my experience with pricing, setup cost, and licensing?
The solution isn't cheap. It's quite costly.
What other advice do I have?
The solution is perfect for those dealing with a huge amount of data. Still, you need to check to make sure it meets your company's requirements. You need to understand them before actually choosing the technology you'll ultimately use.
Overall, I would rate the solution at a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Database/Middleware Consultant (Currently at U.S. Department of Labor) at a tech services company with 51-200 employees
There are no licensing costs involved, hence money is saved on software infrastructure
Pros and Cons
- "Data ingestion: It has rapid speed, if Apache Accumulo is used."
- "It needs better user interface (UI) functionalities."
What is our primary use case?
- Content management solution
- Unified Data solution
- Apache Hadoop running on Linux
What is most valuable?
- Data ingestion: It has rapid speed, if Apache Accumulo is used.
- Data security
- Inexpensive
What needs improvement?
It needs better user interface (UI) functionalities.
For how long have I used the solution?
Three to five years.
What's my experience with pricing, setup cost, and licensing?
There are no licensing costs involved, hence money is saved on the software infrastructure.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Founder & CTO at a tech services company with 1-10 employees
Processes large data sets across clusters of computers
Pros and Cons
- "Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
- "From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
What is our primary use case?
We mainly use Apache Hadoop for real-time streaming. Real-time streaming and integration using Spark streaming and the ecosystem of Spark technologies inside Hadoop.
What is most valuable?
I actually like most of the capabilities, but I think Spark has added reposit capabilities on top of the Hadoop ecosystem. The Spark area includes the capabilities that I like the most with Hadoop.
What needs improvement?
I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine.
The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem.
From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.
For how long have I used the solution?
I have been using this solution for roughly five years.
What do I think about the stability of the solution?
I've never experienced any bugs or glitches.
What do I think about the scalability of the solution?
Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability.
How was the initial setup?
It's a well-known fact that Hadoop's configuration is pretty hard.
What other advice do I have?
Usually, people need to study and prepare for a few use cases and compare multiple ecosystems before choosing one. When people think of using a big data solution, Hadoop comes to mind. For certain use cases, Hadoop is comparable with other technologies. For example, when building a sort of real-time data warehouse — an enterprise data hub —, people don't think about using Hadoop directly. People often use solutions like DROID for building.
At the end of the day, you need to compare technologies — existing technologies against their use cases. You need to study your use case and select the technology inside of Hadoop that will fit your use case. You may find another ecosystem that solves your problem, just keep in mind, Hadoop is not the only solution, there are a lot of solutions. It depends on the use case.
Overall, on a scale from one to ten, I would give Hadoop a rating of eight.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2024
Product Categories
Data WarehousePopular Comparisons
Snowflake
Teradata
Oracle Exadata
Vertica
VMware Tanzu Data Solutions
SAP BW4HANA
IBM Netezza Performance Server
Oracle Database Appliance
IBM Db2 Warehouse
SAP IQ
Microsoft Parallel Data Warehouse
Oracle Big Data Appliance
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which data catalog can provide support for BI data sources such as SAP BO and Tableau?
- Which is the best RDMBS solution for big data?
- Apache Spark without Hadoop -- Is this recommended?
- What is the biggest difference between Apache Hadoop and Snowflake?
- Which solution is better for setting up a data lake: Apache Hadoop or Oracle Exadata?
- Oracle Exadata vs. HPE Vertica vs. EMC GreenPlum vs. IBM Netezza
- When evaluating Data Warehouse solutions, what aspect do you think is the most important to look for?
- At what point does a business typically invest in building a data warehouse?
- Is a data warehouse the best option to consolidate data into one location?
- What are the main differences between Data Lake and Data Warehouse?