Try our new research platform with insights from 80,000+ expert users

AWS Glue vs Pentaho Data Integration and Analytics comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

AWS Glue
Average Rating
7.8
Reviews Sentiment
7.0
Number of Reviews
46
Ranking in other categories
Cloud Data Integration (1st)
Pentaho Data Integration an...
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
52
Ranking in other categories
Data Integration (24th)
 

Featured Reviews

Ajaykumar Myana - PeerSpot reviewer
Provides serverless mechanism, easy data transformation and automated infrastructure management
We no longer had to worry much about infrastructure management because AWS Glue is serverless, and Amazon takes care of the underlying infrastructure. This allowed us to focus on the code and application logic without concerns about scaling, CPU management, or handling fluctuations in flow. The serverless nature of Glue jobs relieved us from these infrastructure-related worries.
Ryan Ferdon - PeerSpot reviewer
Low-code makes development faster than with Python, but there were caching issues
If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The AWS Glue Data Catalog provides metadata management and schema discovery. AWS Glue simplifies data transformation with automatic schema detection, incremental data updates, and integration with other AWS services."
"The two features I find most valuable in AWS Glue are its user interface and ease of use."
"The solution is stable and reliable."
"Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."
"The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need."
"I like that it's flexible, powerful, and allows you to write your own queries and scripts to get the needed transformations."
"It's very good to manage."
"The solution’s most valuable feature is the ETL job."
"The amount of data that it loads and processes is good."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"It has improved our data integration capabilities​."
"The solution has a free to use community version."
"Data transformation within Pentaho is a nice feature that they have and that I value."
"Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
"One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
"The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
 

Cons

"I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."
"Beginners need additional support as it currently lacks some features required for complex transformations, often necessitating custom Python coding."
"The solution could be cheaper. The price of the solution is an area that needs improvement."
"The solution should offer features for streaming data in addition to batching data."
"Setting up pipelines is challenging, especially with version control and testing requirements."
"It fails to handle massive databases acquired from various sources."
"The price of the solution could improve."
"I would like to see stable libraries at the moment they are not there."
"Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."
"One thing that I don't like, just a little, is the backward compatibility."
"​I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.​"
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
"In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud."
"I would like to see improvements made for real-time data processing."
"The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
 

Pricing and Cost Advice

"AWS Glue is a paid service that doesn't come under the free trial of AWS."
"AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage."
"AWS Glue follows a pay-as-you-go model, wherein the cost of the data you use will be counted as a monthly bill."
"Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients. In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend."
"If you are using the solution for an enterprise business, it will be expensive."
"It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."
"This solution is affordable and there is an option to pay for the solution based on your usage."
"The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive."
"We did a two or three-year deal the last time we did it. As compared to other solutions, at least so far in our experience, it has been very affordable. The licensing is by component. So, you need to make sure you only license the components that you really intend to use. I am not sure if we have relicensed after the Hitachi acquisition, but previously, multi-year renewals resulted in a good discount. I'm not sure if this is still the case. We've had the full suite for a lot of years, and there is just the initial cost. I am not aware of any additional costs."
"The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
"The price of the regular version is not reasonable and it should be lower."
"You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use."
"The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month."
"We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it."
"I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
"The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
824,067 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
22%
Computer Software Company
13%
Manufacturing Company
8%
Insurance Company
6%
Financial Services Firm
23%
Computer Software Company
15%
Government
8%
Comms Service Provider
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Talend Open Studio compare with AWS Glue?
We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in...
What are the most common use cases for AWS Glue?
AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or ma...
Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
 

Also Known As

No data available
Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
 

Overview

 

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Find out what your peers are saying about AWS Glue vs. Pentaho Data Integration and Analytics and other solutions. Updated: December 2024.
824,067 professionals have used our research since 2012.