Try our new research platform with insights from 80,000+ expert users

Azure Data Factory vs Pentaho Data Integration and Analytics comparison

 

Comparison Buyer's Guide

Executive SummaryUpdated on Dec 19, 2024
 

Categories and Ranking

Azure Data Factory
Ranking in Data Integration
1st
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
86
Ranking in other categories
Cloud Data Warehouse (3rd)
Pentaho Data Integration an...
Ranking in Data Integration
24th
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
52
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of December 2024, in the Data Integration category, the mindshare of Azure Data Factory is 11.0%, down from 13.3% compared to the previous year. The mindshare of Pentaho Data Integration and Analytics is 1.5%, up from 0.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
 

Featured Reviews

Thulani David Mngadi - PeerSpot reviewer
Data flow feature is valuable for data transformation tasks
The workflow automation features in GitLab, particularly its low code/no code approach, are highly beneficial for accelerating development speed. This feature allows for quick creation of pipelines and offers customization options for integration needs, making it versatile for various use cases. GitLab supports a wide range of connectors, catering to a majority of integration needs. Azure Data Factory's virtual enterprise and monitoring capabilities, the visual interface of GitLab makes it user-friendly and easy to teach, facilitating adoption within teams. While the monitoring capabilities are sufficient out of the box, they may not be as comprehensive as dedicated enterprise monitoring tools. GitLab's monitoring features are manageable for production use, with the option to integrate log analytics or create custom dashboards if needed. The data flow feature in Azure Data Factory within GitLab is valuable for data transformation tasks, especially for those who may not have expertise in writing complex code. It simplifies the process of data manipulation and is particularly useful for individuals unfamiliar with Spark coding. While there could be improvements for more flexibility, overall, the data flow feature effectively accomplishes its purpose within GitLab's ecosystem.
Ryan Ferdon - PeerSpot reviewer
Low-code makes development faster than with Python, but there were caching issues
If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The most valuable feature is the copy activity."
"The solution can scale very easily."
"The valuable feature of Azure Data Factory is its integration capability, as it goes well with other components of Microsoft Azure."
"We have been using drivers to connect to various data sets and consume data."
"It's extremely consistent."
"The overall performance is quite good."
"Data Factory's best feature is the ease of setting up pipelines for data and cloud integrations."
"One of the most valuable features of Azure Data Factory is the drag-and-drop interface. This helps with workflow management because we can just drag any tables or data sources we need. Because of how easy it is to drag and drop, we can deliver things very quickly. It's more customizable through visual effect."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
"It has improved our data integration capabilities​."
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
"It is easy to use, install, and start working with."
"Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool things is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing."
"Provides a good open source option."
"One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
 

Cons

"A room for improvement in Azure Data Factory is its speed. Parallelization also needs improvement."
"The performance could be better. It would be better if Azure Data Factory could handle a higher load. I have heard that it can get overloaded, and it can't handle it."
"The initial setup is not very straightforward."
"The solution needs to be more connectable to its own services."
"Data Factory would be improved if it were a little more configuration-oriented and not so code-oriented and if it had more automated features."
"The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter."
"I would like to see this time travel feature in Snowflake added to Azure Data Factory."
"The pricing scheme is very complex and difficult to understand."
"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."
"If you develop it on MacBook, it'll be quite a hassle."
"I would like to see more improvements with AS400 DB2."
"Parallel execution could be better in Pentaho. It's very simple but I don't think it works well."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
"A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."
"In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version."
"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
 

Pricing and Cost Advice

"The pricing is pay-as-you-go or reserve instance. Of the two options, reserve instance is much cheaper."
"The solution's fees are based on a pay-per-minute use plus the amount of data required to process."
"Data Factory is affordable."
"The cost is based on the amount of data sets that we are ingesting."
"In terms of licensing costs, we pay somewhere around S14,000 USD per month. There are some additional costs. For example, we would have to subscribe to some additional computing and for elasticity, but they are minimal."
"I am aware of the pricing of Azure Data Factory, but I prefer not to disclose specific details."
"For our use case, it is not expensive. We take into the picture everything: resources, learning curve, and maintenance."
"The price is fair."
"The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
"I use it because it is free. I download from their page for free. I don't have to pay for a license. With other tools, I have to pay for the licenses. That is why I use Pentaho."
"I mostly used the open-source version. I didn't work with a license."
"When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho."
"We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it."
"I primarily work on the Community Version, which is available to use free of charge."
"For most development tasks, the Enterprise edition should be sufficient. It depends on the type of support that you require for your production environment."
"I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
824,053 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
13%
Computer Software Company
12%
Manufacturing Company
9%
Healthcare Company
7%
Financial Services Firm
23%
Computer Software Company
15%
Government
8%
Comms Service Provider
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Azure Data Factory compare with Informatica PowerCenter?
Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up an...
How does Azure Data Factory compare with Informatica Cloud Data Integration?
Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power Q...
Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
 

Also Known As

No data available
Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
 

Overview

 

Sample Customers

1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Find out what your peers are saying about Azure Data Factory vs. Pentaho Data Integration and Analytics and other solutions. Updated: December 2024.
824,053 professionals have used our research since 2012.