Try our new research platform with insights from 80,000+ expert users

Apache Hadoop vs Azure Data Factory comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Apache Hadoop
Average Rating
7.8
Number of Reviews
39
Ranking in other categories
Data Warehouse (6th)
Azure Data Factory
Average Rating
8.0
Reviews Sentiment
6.7
Number of Reviews
86
Ranking in other categories
Data Integration (1st), Cloud Data Warehouse (3rd)
 

Featured Reviews

Sushil Arya - PeerSpot reviewer
Provides ease of integration with the IT workflow of a business
When working with Kafka, I saw that the data came in an incremental order. The incremental data processing part is still not very effective in Apache Hadoop. If the data is already there, it can be processed very effectively, especially if the data is coming in every second. If you want to know the location of some data every second, then such data is not processed effectively in Apache Hadoop. I can say that one of the features where improvements are required revolves around the licensing cost of the tool. If the tool can build some licensing structures in a pay-per-use manner, organizations can get the look and feel of Apache Hadoop. Apache Hadoop can offer a licensing structure of the product that can be seen as similar to how AWS operates. Apache Hadoop can look into the capability of processing incremental data. The tool's setup process can be a scope of improvement. Also, it is not very simple because while doing the setup, we need to do all the server settings, including port listing and firewall configurations. If we look at other products on the market, then they can be made simpler. There are certain shortcomings when it comes to the product's technical support part, making it an area where improvements are required. The time frame for the resolution is an area that needs to be improved. The overall communication part of the technical support team also needs improvement.
Thulani David Mngadi - PeerSpot reviewer
Data flow feature is valuable for data transformation tasks
The workflow automation features in GitLab, particularly its low code/no code approach, are highly beneficial for accelerating development speed. This feature allows for quick creation of pipelines and offers customization options for integration needs, making it versatile for various use cases. GitLab supports a wide range of connectors, catering to a majority of integration needs. Azure Data Factory's virtual enterprise and monitoring capabilities, the visual interface of GitLab makes it user-friendly and easy to teach, facilitating adoption within teams. While the monitoring capabilities are sufficient out of the box, they may not be as comprehensive as dedicated enterprise monitoring tools. GitLab's monitoring features are manageable for production use, with the option to integrate log analytics or create custom dashboards if needed. The data flow feature in Azure Data Factory within GitLab is valuable for data transformation tasks, especially for those who may not have expertise in writing complex code. It simplifies the process of data manipulation and is particularly useful for individuals unfamiliar with Spark coding. While there could be improvements for more flexibility, overall, the data flow feature effectively accomplishes its purpose within GitLab's ecosystem.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial."
"Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform."
"It's open-source, so it's very cost-effective."
"The most valuable feature is scalability and the possibility to work with major information and open source capability."
"Apache Hadoop is crucial in projects that save and retrieve data daily. Its valuable features are scalability and stability. It is easy to integrate with the existing infrastructure."
"We selected Apache Hadoop because it is not dependent on third-party vendors."
"Hadoop File System is compatible with almost all the query engines."
"The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
"The two most valuable features of Azure Data Factory are that it's very scalable and that it's also highly reliable."
"Data Factory's most valuable feature is Copy Activity."
"For me, it was that there are dedicated connectors for different targets or sources, different data sources. For example, there is direct connector to Salesforce, Oracle Service Cloud, etcetera, and that was really helpful."
"I like its integration with SQL pools, its ability to work with Databricks, its pipelines, and the serverless architecture are the most effective features."
"For developers that are very accustomed to the Microsoft development studio, it's very easy for them to complete end-to-end data integration."
"The solution has a good interface and the integration with GitHub is very useful."
"From my experience so far, the best feature is the ability to copy data to any environment. We have 100 connects and we can connect them to the system and copy the data from its respective system to any environment. That is the best feature."
"The best part of this product is the extraction, transformation, and load."
 

Cons

"It needs better user interface (UI) functionalities."
"Real-time data processing is weak. This solution is very difficult to run and implement."
"The solution is very expensive."
"It could be more user-friendly."
"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
"Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"Since it is an open-source product, there won't be much support."
"Azure Data Factory should be cheaper to move data to a data center abroad for calamities in case of disasters."
"If the user interface was more user friendly and there was better error feedback, it would be helpful."
"Data Factory could be improved in terms of data transformations by adding more metadata extractions."
"It does not appear to be as rich as other ETL tools. It has very limited capabilities."
"There's space for improvement in the development process of the data pipelines."
"I would like to see this time travel feature in Snowflake added to Azure Data Factory."
"The thing we missed most was data update, but this is now available as of two weeks ago."
"The pricing scheme is very complex and difficult to understand."
 

Pricing and Cost Advice

"This is a low cost and powerful solution."
"The product is open-source, but some associated licensing fees depend on the subscription level."
"Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
"The price of Apache Hadoop could be less expensive."
"We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
"It's reasonable, but there's room for improvement in cost-effectiveness."
"​There are no licensing costs involved, hence money is saved on the software infrastructure​."
"The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
"The pricing is a bit on the higher end."
"Our licensing fees are approximately 15,000 ($150 USD) per month."
"Pricing appears to be reasonable in my opinion."
"The price is fair."
"The pricing model is based on usage and is not cheap."
"The solution's fees are based on a pay-per-minute use plus the amount of data required to process."
"I would rate Data Factory's pricing nine out of ten."
"Pricing is comparable, it's somewhere in the middle."
report
Use our free recommendation engine to learn which Cloud Data Warehouse solutions are best for your needs.
816,636 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
32%
Computer Software Company
11%
University
7%
Energy/Utilities Company
6%
Financial Services Firm
13%
Computer Software Company
12%
Manufacturing Company
9%
Healthcare Company
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Hadoop?
It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.
What is your experience regarding pricing and costs for Apache Hadoop?
The product is open-source, but some associated licensing fees depend on the subscription level. While it might be free for students, organizations typically need to pay for their subscriptions. Th...
What needs improvement with Apache Hadoop?
Hadoop lacks OLAP capabilities. I recommend adding a Delta Lake feature to make the data compatible with ACID properties. Also, video and audio streaming import issues could be improved to ensure p...
How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Azure Data Factory compare with Informatica PowerCenter?
Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up an...
How does Azure Data Factory compare with Informatica Cloud Data Integration?
Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power Q...
 

Learn More

 

Overview

 

Sample Customers

Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
Find out what your peers are saying about Apache Hadoop vs. Azure Data Factory and other solutions. Updated: October 2024.
816,636 professionals have used our research since 2012.