Try our new research platform with insights from 80,000+ expert users

AWS Glue vs Pentaho Data Integration and Analytics comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

ROI

Sentiment score
6.9
AWS Glue offers cost-efficient solutions for limited pipelines, enhancing efficiency and ROI, despite installation overhead and budget constraints.
Sentiment score
7.9
Pentaho offers cost-effective integration, reducing ETL time, lowering expenses, and enhancing competitiveness with open-source flexibility and efficiency.
I advocate using Glue in such cases.
 

Customer Service

Sentiment score
6.8
AWS Glue support is praised for responsiveness and effectiveness, despite some complaints about response times and cost-related issues.
Sentiment score
5.2
Users rely on community support over customer service due to mixed experiences, despite responsive technical support and Hitachi's involvement.
AWS's documentation is reliable, and careful reference often resolves missed upgrade details.
Communication with the vendor is challenging
 

Scalability Issues

Sentiment score
8.0
AWS Glue offers easy scalability with serverless architecture, supporting diverse data requirements and receiving high user ratings.
Sentiment score
7.3
Pentaho excels in scalability and efficient data handling but faces challenges with exceptionally large data and complex growth scenarios.
It is beneficial to upgrade jobs, and we conduct extensive testing in development before migrating to production.
Pentaho Data Integration handles larger datasets better.
 

Stability Issues

Sentiment score
8.0
AWS Glue is highly reliable and stable, with few issues reported, though improvements are suggested for larger datasets.
Sentiment score
7.1
Pentaho Data Integration offers reliability for small to midsize operations but may lag and freeze with complex uses.
It's pretty stable, however, it struggles when dealing with smaller amounts of data.
 

Room For Improvement

AWS Glue users seek faster start-up, enhanced integration, user-friendly features, better performance, and improved documentation and support.
Pentaho needs improvements in big data performance, error handling, UI, scheduling, backward compatibility, cloud integration, and Python support.
With AWS, I gather data from multiple sources, clean it up, normalize it, de-duplicate it, and make it presentable.
Migrating jobs from version 3.0 to 4.0 can present compatibility issues.
Pentaho Data Integration is very friendly, it is not very useful when there isn't a lot of data to handle.
 

Setup Cost

AWS Glue's pricing is flexible but can be costly and unpredictable compared to AWS alternatives like Lambda or EMR.
Pentaho offers a cost-effective solution with its free Community Edition and affordable subscription-based Enterprise Edition for varying needs.
Costing depends on resource usage, and cost optimization may involve redesigning jobs for flexibility.
AWS charges based on runtime, which can be quite pricey.
 

Valuable Features

AWS Glue enables automated, cost-effective ETL with serverless architecture, seamless integrations, and user-friendly tools for large-scale data handling.
Pentaho provides an intuitive, open-source platform for efficient ETL development and data integration with minimal coding and broad compatibility.
For ETL, I feel the performance is excellent. If I create jobs in a standard way, the performance is great, and maintenance is also seamless.
I think if I'm working with big data, common languages like Python work quite nicely, which is advantageous.
It's easy to use and friendly, especially for larger data sets.
 

Categories and Ranking

AWS Glue
Average Rating
7.8
Reviews Sentiment
7.0
Number of Reviews
48
Ranking in other categories
Cloud Data Integration (1st)
Pentaho Data Integration an...
Average Rating
8.0
Reviews Sentiment
6.9
Number of Reviews
53
Ranking in other categories
Data Integration (23rd)
 

Featured Reviews

Muthuvel Sivaraman - PeerSpot reviewer
Handles a huge volume of data and is serverless, but it can be considered costly by some users
We use Amazon's services to provide technical support for the product. If you want to have support, Oracle and others offer a single support, and other tools have a direct support window. For Amazon, we need to pay 10 percent of my billing amount for the tool to get support services. Whether to raise a support ticket or not is an issue since ten percent is a huge amount. My company ends up using all the options without help from support. It is very difficult for any common man to understand why there is a need to pay ten percent for support. If I find an issue in the product, and I need to get support from AWS to fix it, then I need to pay ten percent of the tool's bill amount to Amazon. AWS is a very tricky tool because everything is evolving nowadays. AWS engineers are getting hired from other places, and even after that, if I am not getting any technical support, then things will be very nasty. There are some good engineers who help users outside the normal support cycle, but it doesn't meet their needs. I rate the technical support a four out of ten.
Ryan Ferdon - PeerSpot reviewer
Low-code makes development faster than with Python, but there were caching issues
If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.
report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
832,138 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
22%
Computer Software Company
13%
Manufacturing Company
8%
Insurance Company
6%
Financial Services Firm
22%
Computer Software Company
14%
Government
8%
Comms Service Provider
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Talend Open Studio compare with AWS Glue?
We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in...
What are the most common use cases for AWS Glue?
AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or ma...
Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
 

Also Known As

No data available
Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
 

Overview

 

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Find out what your peers are saying about AWS Glue vs. Pentaho Data Integration and Analytics and other solutions. Updated: January 2025.
832,138 professionals have used our research since 2012.