What needs improvement with AWS Glue?

AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring. AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates...

Download AWS Glue Report Read more

Related Q&As

Dec 15, 2022

Is AWS Glue a difficult solution to use if you are a complete beginner?

Dec 15, 2022

Is AWS Glue effective for AWS-related products only?

score 0 · Answer 1 · 2025-01-09T16:39:00Z

I actually don't like it. It is good, however, I find it quite clunky and code-heavy, which is my biggest problem. I am quite technical, yet when it comes to data pipelines, I prefer using GUIs over code-based calls. It might be easier with a different skill set. I am using an ETL tool that allows writing without needing dedicated data engineers, however, it's a bit more technical and requires some skills. It works well for big data, yet for typical financial services, other data pipelines perform better. Glue can be said to be more for techies. They have introduced a GUI for drag-and-drop, and it's still primitive, in my view. With AWS, I gather data from multiple sources, clean it up, normalize it, de-duplicate it, and make it presentable. Glue is mainly used for this purpose. I have worked with it a few times, but it's not my tool of choice.

Nivas Srinivasan Principal Consultant at a retailer with 1,001-5,000 employees · Answer 2 · 2025-01-02T11:21:00Z

Improvements in the UI are needed, as it is challenging to understand some functionalities. Glue is quite customizable, but technical issues, particularly with internal errors, can pose challenges. New version upgrades can also be problematic. For example, migrating jobs from version 3.0 to 4.0 can present compatibility issues. Changes might be needed to ensure the job fits all versions. Despite these challenges, upgrading our systems enhances performance. Learning the latest functionalities is crucial, and while challenging, it is a vital part of staying current and ensuring an efficient ETL process.

Andre Luis Tiago Soares Developer-Data Engineer at Collab · Answer 3 · 2024-11-19T17:26:05Z

Setting up pipelines is challenging, especially with version control and testing requirements. While the initial setup is easy, it doesn't accommodate more complex development needs. You might feel hesitant about changing pipelines that are already running and processing business-critical data due to limited versioning and testing capabilities.

Nitish Kumar Mahatha Site Reliability Engineer (AWS) at KFin Technologies Ltd · Answer 4 · 2024-10-29T07:19:00Z

AWS Glue should be more reliable and faster in processing. Enhancing the speed of data processing would be beneficial.

Anuj Saraswat AWS DATA ENGINEER at Coforge Growth Agency · Answer 5 · 2024-10-21T10:50:00Z

I see scope for improvement in the drag-and-drop feature of AWS Glue. Beginners need additional support as it currently lacks some features required for complex transformations, often necessitating custom Python coding.

score 0 · Answer 6 · 2024-09-06T15:45:25Z

reviewer2541582

Principal System Architect at a transportation company with 1,001-5,000 employees

Real User

Top 5

Sep 6, 2024

The solution’s technical support could be improved.

Rajesh Ramadoss Technology Specialist at Cognizant · Answer 7 · 2024-08-07T12:20:00Z

It is very difficult to learn the tool and remember the syntaxes comparatively. Sometimes, I face issues integrating the solution with some third-party services or services that are not a part of Glue. Such integrations take a lot of time, and not much content is available over the internet for the same.

Muthuvel Sivaraman AVP at a manufacturing company with 10,001+ employees · Answer 8 · 2024-06-21T06:35:50Z

The drawbacks associated with the product stem from the fact that, based on the data volume, it can become very costly. There is a huge cost if the source system is not properly designed. If the changes are frequent and not valid, then, initially, you will use huge amounts of data in the ETL. The biggest challenges are associated with AWS Glue's costs, and it takes one-third of my entire pipeline cost.

Senthil Kumar Veerasamy Senior Manager, Analytics at Azendian · Answer 9 · 2024-01-18T08:38:03Z

Since AWS Glue is not like an enterprise ETL tool, we need to put quite a lot of effort into customization. The solution has a visual editor, but most ETL transformations cannot be implemented or constructed using that. We always have to do a script. The solution's visual ETL tool is of no use for actual implementation.

ParamShah Engineering Manager at Milestone Technologies · Answer 10 · 2024-01-16T09:21:00Z

There are output limitations and configuration of its three parts. There was a lot of trial and error that we had to go through. It is not clear how the partition discovery would have been affected by more data coming in. We've made some expensive mistakes, which, if there were any tutorials available or if there was easy documentation available with FAQs, could have been avoided. There is documentation, but it doesn't cover all. There are three specific partition changes, and AWS Glue is tightly tied to Athena. We don't have much flexibility in managing the Athena. AWS Glue could integrate with an AI model or a more advanced version that processes chat-based inputs rather than configuration. This would align it more closely with the functionalities of chat-based interfaces, making it easier to adopt.

score 0 · Answer 11 · 2023-10-09T14:32:26Z

reviewer2290962

VP- Cloud Data/ Solution Architect at a financial services firm with 10,001+ employees

Real User

Top 5

Oct 9, 2023

I have encountered challenges with multi-region support.

Neelabh Sharma Data Engineer at Scania · Answer 12 · 2023-09-11T14:24:31Z

The product is expensive for data streaming compared to EMR. This area needs improvement.

Mbaye Babacar Gueye Owner at a tech services company with 51-200 employees · Answer 13 · 2023-09-01T19:46:13Z

One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools. Additionally, AWS Glue can sometimes be slow, especially when processing large datasets. It was sometimes a bit slow. Also, I couldn't directly use bucketed data. With Elastic Glue, you had to convert your data frames into the correct format before connecting them using the drag-and-drop interface. So that's something I didn't like because the conversion process wasn't straightforward. In future releases, I would like to see a feature that could trigger Glue pipeline using an API or something.

RajKumar23 Sr Associate at Cognizant · Answer 14 · 2023-08-03T09:08:10Z

RajKumar23

Sr Associate at Cognizant

Real User

Top 5

Aug 3, 2023

The solution’s stability could be improved.

AmitMataghare Associate Director at PricewaterhouseCoopers · Answer 15 · 2023-08-03T04:25:26Z

AWS Glue Studio has undergone a lot of enhancements in the last couple of months. An improvement that can help the solution is if the user interface can become more user-friendly and allow for features like drag and drop, allowing it to build transformations. There can be a good improvement if the product itself supports different kinds of transformations so that the pipeline, which we want to create, can be done easily since right now, we have to write a code to do so in our company. Only people who can code, either in Java or Python, can use the product freely. Those who don't know Java or Python might find using AWS Glue difficult. AWS has pricing for spot instances that reduces the cost substantially, but that is not available for AWS Glue AWS pricing for spot instances comes for products like EC2, and if the same gets introduced for AWS Glue, then the pricing can substantially reduce.

score 0 · Answer 16 · 2023-07-31T17:41:50Z

In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement. Faster code execution would be beneficial. If AWS could enhance the serverless execution capabilities, like increasing CPU, RAM, and processing speed, that would be great.

Shifa Shah Data engineer at nust · Answer 17 · 2023-05-24T12:30:04Z

While working on AWS Glue, I could not find any training material for it. Although it's not a problem with the product, the solution could include better documentation.

score 0 · Answer 18 · 2023-04-26T09:07:00Z

We face performance issues when using AWS Glue for data transformation and integration. It takes almost three to four hours to execute single transformations, which is a lot. We want to improve the performance to meet customer requirements. Mainly, I am focused on improving the performance aspect because the customer is keen on this improvement.

reviewer1526064 Associate Consultant at Tata Consultancy · Answer 19 · 2023-04-20T10:59:00Z

The solution could be cheaper. The price of the solution is an area that needs improvement.

score 0 · Answer 20 · 2023-03-09T22:01:42Z

The product has only a few built-in transformations; additional custom-building transformations could be improved in the next release. For additional features, I would like documentation on the equivalent of legacy ETL tools and their equivalent in AWS to make it easier for users to migrate their ETL processing to the cloud. It would save time and help users find the best transformation or solution to satisfy their new business needs.

Syed Zakaulla Project Manager at Softway · Answer 21 · 2023-02-13T20:14:36Z

AWS Glue had some issues, which required optimization, particularly in terms of the number of workers you deploy, and that's where costing comes in. Cost-wise, AWS Glue is expensive, so that's an area for improvement. My company did some modifications, which turned out to be successful, so overall, the solution works fine. Even though there is a backup, you need to know what's happening. You need to understand why there's a failure. AWS Glue doesn't provide the information, so my company uses its logs. The development team also doesn't have specific answers because the team is still playing around with the process, which means the company is still trying to figure out other areas for improvement in AWS Glue. The process for setting up the solution was also complex, which is another area for improvement. AWS should provide help during migration and assist its users. Otherwise, it's a nightmare.

reviewer2070318 Manager at a construction company with 51-200 employees · Answer 22 · 2023-01-19T18:04:06Z

I would like to see in general, documentation, on the limitations on which loads you can actually pull in when you are running Python. The additional Python Jupyter Notebook now has been nice. But yeah, generally speaking, you can not import every LOB. You can import branders now and you can use photos, but you can not import a lot of the other sorts of statistical-based loads. That is an issue currently. I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells.

score 0 · Answer 23 · 2022-11-25T20:48:52Z

The mapping area and the use of the data catalog from Glue could be better. I would say those two are the main things we'd like to see improvements on. The solution needs support for big data. As I understand it, Glue is based on Lambdas and Lambdas have some limitations as far as running them continuously. Sometimes they get dropped, and they have to be reinitialized.

score 0 · Answer 24 · 2022-10-28T15:16:30Z

Sainagaraju Vaduka

Data solution architect at a pharma/biotech company with 5,001-10,000 employees

Real User

Oct 28, 2022

I would like to see stable libraries at the moment they are not there.

Murilo Hallgren Data Engineer at a consultancy with self employed · Answer 25 · 2022-10-17T14:45:15Z

Murilo Hallgren

Data Engineer at a consultancy with self employed

Real User

Oct 17, 2022

The price of the solution could improve.

Liana Iuhas CEO at Quark Technologies SRL · Answer 26 · 2022-09-01T11:06:20Z

The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment. If business users want to run their own graphs they will not have the opportunity to use such features, such as running code inside AWS Glue in Spark, which will be complex for them.

Ankit Shukla Data Engineer at YASH Technologies · Answer 27 · 2022-07-20T15:04:13Z

The monitoring is not that good. We'd like to see job progress be more clear. Right now, how we can view that is not that good. The is that mostly it is Python or Scala code based. The UX is lacking. There is a bit of a learning curve, particularly during the setup process. More connectors should be included.

Sashi Dhar Operations executive at Wipro Infotech · Answer 28 · 2022-07-18T07:42:56Z

Sashi Dhar

Operations executive at Wipro Infotech

Real User

Jul 18, 2022

There should be more connectors for different databases.

Diksha Hirole Data Engineer at BlazeClan Technologies · Answer 29 · 2022-07-01T09:23:35Z

There are a couple of issues with AWS Glue. First, AWS Control randomly logs off, which disturbs coding. Second, if there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data. In the next release, AWS Glue should include more transformations with AWS Studio.

Suraj Sachdeva Data Engineer | Developer at Sakshath Technologies · Answer 30 · 2022-06-21T13:28:38Z

The technical support for this solution could be improved. In future, we would like to connect more services like Athena or Kinesis to help control more loads of data.

score 0 · Answer 31 · 2022-06-16T15:42:50Z

It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.

score 0 · Answer 32 · 2021-12-02T16:14:50Z

reviewer1084386

ECM CONSULTANT/ARCHITECT/SOFTWARE DEVELOPER, DELUXE MN at a tech services company with 5,001-10,000 employees

Real User

Dec 2, 2021

There is a learning curve to this tool.

score 0 · Answer 33 · 2021-10-21T11:50:32Z

When there is a need to configure connections to different database sources in respect of the target, it would be good if it were easier to deal with roles. I am referring to the need to configure connections in a different target process, something which would require a certain time outlay for configuring VPC and checking that everything is okay, in respect of the creation of required roles. It would save time were this process to be made easier and more user friendly. The technical support depends on the type of question, whether there is a need to understand additional inter-related information on multiple levels. Overall, I consider the technical support to be fine, although the response time could be faster in certain cases.

Bruno Ramos CEO and Founder at HartB · Answer 34 · 2020-12-17T18:52:47Z

The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS.

score 0 · Answer 35 · 2020-10-14T06:36:55Z

Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background.

score 0 · Answer 36 · 2020-09-03T07:49:46Z

The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.