Try our new research platform with insights from 80,000+ expert users

AWS Glue vs Pentaho Data Integration and Analytics comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

AWS Glue
Average Rating
7.8
Reviews Sentiment
7.1
Number of Reviews
45
Ranking in other categories
Cloud Data Integration (1st)
Pentaho Data Integration an...
Average Rating
8.0
Reviews Sentiment
5.8
Number of Reviews
51
Ranking in other categories
Data Integration (30th)
 

Featured Reviews

Ajaykumar Myana - PeerSpot reviewer
Provides serverless mechanism, easy data transformation and automated infrastructure management
We no longer had to worry much about infrastructure management because AWS Glue is serverless, and Amazon takes care of the underlying infrastructure. This allowed us to focus on the code and application logic without concerns about scaling, CPU management, or handling fluctuations in flow. The serverless nature of Glue jobs relieved us from these infrastructure-related worries.
Ryan Ferdon - PeerSpot reviewer
Low-code makes development faster than with Python, but there were caching issues
If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features."
"Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs."
"The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature."
"The solution helps organizations gain flexibility in defining the structure of the data."
"The most valuable feature for me is the visual interface of AWS Glue."
"We no longer had to worry much about infrastructure management because AWS Glue is serverless, and Amazon takes care of the underlying infrastructure."
"The key role for Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it."
"It is AWS-integrated. There is end-to-end integration with the other AWS services. It is also user-friendly."
"The way it has improved our product is by giving our users the ability to do ad hoc reports, which is very important to our users. We can do predictive analysis on trends coming in for contracts, which is what our product does. The product helps users decide which way to go based on the predictive analysis done by Pentaho. Pentaho is not doing predictions, but reporting on the predictions that our product is doing. This is a big part of our product."
"The product is user-friendly and intuitive"
"The amount of data that it loads and processes is good."
"The solution has a free to use community version."
"We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."
"This solution allows us to create pipelines using a minimal amount of custom coding."
 

Cons

"In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement."
"The drawbacks associated with the product stem from the fact that, based on the data volume, it can become very costly."
"The solution's visual ETL tool is of no use for actual implementation."
"I have encountered challenges with multi-region support."
"The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment."
"The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."
"The solution could be cheaper. The price of the solution is an area that needs improvement."
"The process of entering environment variables in AWS Glue requires navigating to a different page, which could be streamlined."
"The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."
"In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud."
"Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying."
"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
"Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake."
"The product needs more plugins."
"​I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.​"
"The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is."
 

Pricing and Cost Advice

"AWS Glue is a high-priced solution that bills the client $150,000 to $250,000 annually."
"I would rate the solution a six or seven on a scale of one to ten, with ten being very expensive. Specifically, I rate its pricing a six out of ten."
"AWS Glue follows a pay-as-you-go model, wherein the cost of the data you use will be counted as a monthly bill."
"It is an expensive product. I rate its pricing a nine out of ten."
"I rate the tool an eight on a scale of one to ten, where one is expensive, and ten is expensive."
"Technical support is a paid service, and which subscription you have is dependent on that. You must pay one of them, and it ranges from $15,000 to $25,000 per year."
"If you are using the solution for an enterprise business, it will be expensive."
"I rate pricing an eight out of ten."
"You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use."
"There is a good open source option (Community Edition)​."
"We did a two or three-year deal the last time we did it. As compared to other solutions, at least so far in our experience, it has been very affordable. The licensing is by component. So, you need to make sure you only license the components that you really intend to use. I am not sure if we have relicensed after the Hitachi acquisition, but previously, multi-year renewals resulted in a good discount. I'm not sure if this is still the case. We've had the full suite for a lot of years, and there is just the initial cost. I am not aware of any additional costs."
"The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month."
"I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
"The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
"For most development tasks, the Enterprise edition should be sufficient. It depends on the type of support that you require for your production environment."
"The price of the regular version is not reasonable and it should be lower."
report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
816,406 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
21%
Computer Software Company
14%
Manufacturing Company
8%
Insurance Company
6%
Financial Services Firm
23%
Computer Software Company
14%
Government
7%
Comms Service Provider
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Talend Open Studio compare with AWS Glue?
We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in...
What are the most common use cases for AWS Glue?
AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or ma...
Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
 

Also Known As

No data available
Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
 

Overview

 

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Find out what your peers are saying about AWS Glue vs. Pentaho Data Integration and Analytics and other solutions. Updated: October 2024.
816,406 professionals have used our research since 2012.