We use this solution for our Enterprise Data Lake.
Works
Reduces cost, saves time, and provides insight into our unstructured data
Pros and Cons
- "The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics."
- "We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it."
What is our primary use case?
How has it helped my organization?
Using this solution has reduced the overall TCO. It has also improved data processing time for the machine and provides greater insight into our unstructured data.
What is most valuable?
The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics.
What needs improvement?
We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it.
Buyer's Guide
Apache Hadoop
January 2025
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
For how long have I used the solution?
More than four years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Stable, highly scalable, but integration could improve
Pros and Cons
- "The scalability of Apache Hadoop is very good."
- "The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
What needs improvement?
The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.
For how long have I used the solution?
I have been using Apache Hadoop for approximately nine years.
What do I think about the stability of the solution?
Apache Hadoop is stable.
What do I think about the scalability of the solution?
The scalability of Apache Hadoop is very good.
What's my experience with pricing, setup cost, and licensing?
The price of Apache Hadoop could be less expensive.
What other advice do I have?
My advice to others is if you have a strong engineering team then this solution is excellent.
I rate Apache Hadoop an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Apache Hadoop
January 2025
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
Business data analyst at RBSG Internet operations
A low-cost solution that allows us to download data, but has latency issues when running queries
Pros and Cons
- "One valuable feature is that we can download data."
- "I think more of the solution needs to be focused around the panel processing and retrieval of data."
What is our primary use case?
We use the solution as a data link for our customer payment and SaaS information. We get data from various sources and then utilize and leverage that data.
What is most valuable?
One valuable feature is that we can download data. Another is that it is a low-cost solution. Hadoop has also made it feasible to have all the data available in one area.
What needs improvement?
We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data.
For how long have I used the solution?
I have been using this solution for about seven or eight years.
What do I think about the stability of the solution?
This is a stable product.
What do I think about the scalability of the solution?
The scalability of the solution is good. Approximately 100 people are currently using this solution within our company.
How are customer service and support?
I would rate the tech support as a four out of five.
How would you rate customer service and support?
Positive
What other advice do I have?
I would recommend this product to others. I would rate it as an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Partner at a tech services company with 11-50 employees
Highly elastic and stable, but it needs better security
Pros and Cons
- "Hadoop is extensible — it's elastic."
- "Hadoop's security could be better."
What is our primary use case?
There are several use cases for Hadoop. Sometimes it's used for data warehousing. Other times, it's analytics. And In some cases, it's used to do transformation. For example, I have one client using it to decompress, compress, or encrypt data on ingestion. So, he used it like an ETL engine.
What is most valuable?
Hadoop is extensible — it's elastic.
What needs improvement?
Hadoop's security could be better.
For how long have I used the solution?
I've been using Hadoop for about eight years. I'm not sure exactly.
What do I think about the stability of the solution?
Performance is one of the reasons people choose Hadoop.
What do I think about the scalability of the solution?
Scalability is one of Hadoop's strong suits.
How are customer service and support?
I've never had to use Hadoop support.
How was the initial setup?
The complexity of Hadoop's setup depends on the customer and their needs. However, most of my customers wind up using Hadoop as a service, which makes it very easy. It doesn't need much maintenance. My staff maintains multiple systems, so it's not like there would ever be somebody dedicated to one, and Hadoop is not a high-touch platform.
What other advice do I have?
I rate Hadoop seven out of 10. It's very good, but it could always be better. To anyone considering Hadoop, I recommend that you be mindful of what you're trying to achieve.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
CEO at AM-BITS LLC
Good stability and scalability but the visualization isn't good
Pros and Cons
- "The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
- "There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution."
What is our primary use case?
We primarily use the solution for the enterprise data hub and big data warehouse extension.
What is most valuable?
The ability to add multiple nodes without any restriction is the solution's most valuable aspect.
What needs improvement?
What needs improvement depends on the customer and the use case. The classical Hadoop, for example, we consider an old variant. Most now work with flash data.
There is a very wide application for this solution, but in enterprise companies, if you work with classical BI systems, it would be good to include an additional presentation layer for BI solutions.
There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.
For how long have I used the solution?
We've been working with the solution for three to four years.
What do I think about the stability of the solution?
The solution is stable. It has very good disaster stability and multi-rack configuration.
What do I think about the scalability of the solution?
It is possible to scale the solution. We work with companies that have hundreds of users.
How was the initial setup?
The initial setup might not be straightforward for our customers, but it's easy enough for us to handle. However, if we don't build a proof of concept for the company first it may take some time and be quite complex. Pilot projects take about three months to deploy and full spec projects take up to a year because we have to work in all requirements in data governance, security, etc.
What's my experience with pricing, setup cost, and licensing?
We originally built on Hortonworks tech which didn't require any licensing, but that is getting discontinued in 2022, so it's been proposed we move to Cloudera which will have licensing costs associated with it.
What other advice do I have?
We use the on-premises deployment model. It's a requirement for the company we work with, which is a bank. Often customers demand we work with on-premises deployment models.
I'd rate the solution seven out of ten. In terms of the ability to build middleware and offer scalability, it would be 10 out of 10 from me. However, if you take into account only the visualization, I'd only rate it at three or four out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Technical Lead at a government with 201-500 employees
Good distributed processing and performance, but very expensive
Pros and Cons
- "The performance is pretty good."
- "The solution is very expensive."
What is most valuable?
The distributed processing is excellent.
On the solution, Spark is very good.
The performance is pretty good.
What needs improvement?
For the visualization tools, we use Apache Hadoop and it is very slow.
It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable.
The solution needs to be powerful and offer better availability for gathering queries.
The solution is very expensive.
For how long have I used the solution?
I've been using the solution for about five years now.
What do I think about the stability of the solution?
The solution is stable and offers good performance. It doesn't crash or freeze. It's not buggy at all.
What do I think about the scalability of the solution?
You can scale the solution if you need to. We find that it's pretty easy to expand it out.
There were about 13-20 people using it at any given time.
How are customer service and technical support?
The technical support was pretty good. It's my understanding that the company was pretty satisfied with the level of support they received. They were knowledgeable and responsive.
Which solution did I use previously and why did I switch?
I've also worked with MySQL and Postgres. Hadoop is more for analytical processing. While the others claim to have a distributor, Hadoop is far better in that regard. It's excellent compared to other options.
How was the initial setup?
The initial setup was pretty straightforward. It was not overly complex for our team.
What's my experience with pricing, setup cost, and licensing?
The solution isn't cheap. It's quite costly.
What other advice do I have?
The solution is perfect for those dealing with a huge amount of data. Still, you need to check to make sure it meets your company's requirements. You need to understand them before actually choosing the technology you'll ultimately use.
Overall, I would rate the solution at a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Database/Middleware Consultant (Currently at U.S. Department of Labor) at a tech services company with 51-200 employees
There are no licensing costs involved, hence money is saved on software infrastructure
Pros and Cons
- "Data ingestion: It has rapid speed, if Apache Accumulo is used."
- "It needs better user interface (UI) functionalities."
What is our primary use case?
- Content management solution
- Unified Data solution
- Apache Hadoop running on Linux
What is most valuable?
- Data ingestion: It has rapid speed, if Apache Accumulo is used.
- Data security
- Inexpensive
What needs improvement?
It needs better user interface (UI) functionalities.
For how long have I used the solution?
Three to five years.
What's my experience with pricing, setup cost, and licensing?
There are no licensing costs involved, hence money is saved on the software infrastructure.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Technical Architect at RBSG Internet Operations
Good database and highly scalable, with good plug and play analytics tools
Pros and Cons
- "The most valuable feature is the database."
- "It would be good to have more advanced analytics tools."
What is our primary use case?
We are primarily dumping all the prior payment transaction data into a loop system and then we use some of the plug and play analytics tools to translate it.
What is most valuable?
The most valuable feature is the database.
What needs improvement?
We're finding vulnerabilities in running it 24/7. We're experiencing some downtime that affects the data.
It would be good to have more advanced analytics tools.
For how long have I used the solution?
I've been using the solution for five years.
What do I think about the scalability of the solution?
The solution is scalable. From a payments perspective, we're using the solution on a large scale.
How are customer service and technical support?
We've never contacted technical support.
Which solution did I use previously and why did I switch?
We didn't previously use a different solution.
How was the initial setup?
The initial setup was complex. There was a lot of data that we had to bring over from various sources and it was quite a long process.
What about the implementation team?
We did have some assistance with the implementation.
What other advice do I have?
We use the on-premises deployment model.
We're more inclined towards an operational data source to fill our customer's needs. Hadoop is good for analytics and some reporting requirements.
It's a good solution for those needing something for the purposes of management reporting.
I'd rate the solution eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Product Categories
Data WarehousePopular Comparisons
Snowflake
Teradata
Oracle Exadata
Vertica
VMware Tanzu Data Solutions
SAP BW4HANA
IBM Netezza Performance Server
Oracle Database Appliance
IBM Db2 Warehouse
SAP IQ
Microsoft Parallel Data Warehouse
Oracle Big Data Appliance
Buyer's Guide
Download our free Apache Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which data catalog can provide support for BI data sources such as SAP BO and Tableau?
- Which is the best RDMBS solution for big data?
- Apache Spark without Hadoop -- Is this recommended?
- What is the biggest difference between Apache Hadoop and Snowflake?
- Which solution is better for setting up a data lake: Apache Hadoop or Oracle Exadata?
- Oracle Exadata vs. HPE Vertica vs. EMC GreenPlum vs. IBM Netezza
- When evaluating Data Warehouse solutions, what aspect do you think is the most important to look for?
- At what point does a business typically invest in building a data warehouse?
- Is a data warehouse the best option to consolidate data into one location?
- What are the main differences between Data Lake and Data Warehouse?