Try our new research platform with insights from 80,000+ expert users

AWS Data Pipeline [EOL] vs AWS Glue comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

AWS Data Pipeline [EOL]
Average Rating
8.0
Number of Reviews
2
Ranking in other categories
No ranking in other categories
AWS Glue
Average Rating
7.8
Reviews Sentiment
7.1
Number of Reviews
43
Ranking in other categories
Cloud Data Integration (1st)
 

Featured Reviews

Geoffrey Leigh - PeerSpot reviewer
Jun 9, 2023
A stable, scalable, and reliable solution for moving and processing data
We're only considering enhancing the presentation layer to give a more multidimensional OLAP view that AWS seems to have decided on. Redshift with the data mart structure is like an OLAP cube. Oracle Analytics Cloud is an over-code killer and is not what we need. I was looking at Mondrian, which used to be part of the open-source stack from another vendor that works. Still, I am also looking at some of the other OLAP environments like Kaiser and perhaps decided to go to Azure with Microsoft Azure analysis cloud, but that's not multidimensional either as SSAS used to be. We tried the Mondrian, and that didn't perform how we expected. So, we are looking at resetting something to perform as an OLAP in the cloud, particularly AWS, so that we might consider an Azure solution.
Ajaykumar Myana - PeerSpot reviewer
Jul 31, 2023
Provides serverless mechanism, easy data transformation and automated infrastructure management
I had the source data, which was unstructured and non-fixable, and my responsibility was to convert it into structured data. For this task, I used PySpark as the programming language. With Python, I implemented the creation of a data frame using Glue jobs. Since Glue jobs are a serverless…

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"It is a stable solution...It is a scalable solution."
"The most valuable feature of the solution is that orchestration and development capabilities are easier with the tool."
"Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features."
"Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."
"The key role for Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it."
"One of the best features of the solution is its ability to easily integrate with other AWS services."
"Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
"You do not need many frameworks to run Glue."
"The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs."
"AWS Glue is fast and managed by AWS. Hence, you don't have to worry about capacity and the performance of Glue jobs. It has integrations with other data stores of AWS. The product offers metadata management, logging, and ETL processing capabilities. It comes with a powerful feature, Glue Studio, which helps to do queries interactively within the community. It is a managed service and very secure. Another popular and mature service is S3."
 

Cons

"The user-defined functions have shortcomings in AWS Data Pipeline."
"It's almost semi-automatic because you must review and approve code push, which works well. Still, we had many problems getting there during the deployment process, but we got there."
"It fails to handle massive databases acquired from various sources."
"The solution’s stability could be improved."
"Cost-wise, AWS Glue is expensive, so that's an area for improvement. The process for setting up the solution was also complex, which is another area for improvement."
"The solution should offer features for streaming data in addition to batching data."
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
"I have encountered challenges with multi-region support."
"The monitoring is not that good."
"I would like to see stable libraries at the moment they are not there."
 

Pricing and Cost Advice

"I rate the pricing between six to eight on a scale from one to ten, where one is low price, and ten is high price."
"The way we use it, I think it is fair as we're getting a good value for money compared to having a server or some other data pipeline."
"The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes."
"It is an expensive product. I rate its pricing a nine out of ten."
"AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage."
"The current cost is around forty to fifty thousand a month."
"It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."
"The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive."
"This solution is affordable and there is an option to pay for the solution based on your usage."
"AWS Glue is a high-priced solution that bills the client $150,000 to $250,000 annually."
report
Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
814,763 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Computer Software Company
21%
Financial Services Firm
20%
Government
8%
Manufacturing Company
5%
Financial Services Firm
21%
Computer Software Company
14%
Manufacturing Company
8%
Insurance Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about AWS Data Pipeline?
The most valuable feature of the solution is that orchestration and development capabilities are easier with the tool.
What is your experience regarding pricing and costs for AWS Data Pipeline?
I rate the pricing between six to eight on a scale from one to ten, where one is low price, and ten is high price.
What needs improvement with AWS Data Pipeline?
The user-defined functions have shortcomings in AWS Data Pipeline. The user-defined functions could be one of the areas where I can write a custom function and embed it as a part of AWS Data Pipeli...
How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Talend Open Studio compare with AWS Glue?
We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in...
What are the most common use cases for AWS Glue?
AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or ma...
 

Overview

 

Sample Customers

bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
Find out what your peers are saying about Amazon Web Services (AWS), Informatica, Salesforce and others in Cloud Data Integration. Updated: October 2024.
814,763 professionals have used our research since 2012.