I use AWS Glue for data processing. Some of my colleagues have data for software, and I use AWS Glue to transform and inspect this data.
Data Engineer at a tech services company with 501-1,000 employees
Offers good documentation, stability but error handling is difficult
Pros and Cons
- "It's very good to manage."
- "AWS Glue's error handling is difficult."
What is our primary use case?
What is most valuable?
It's very good to manage. It is easy to integrate other products with AWS.
Glue integrates with other AWS processes and networks. So, it's quite easy to integrate.
I've worked with AI integration but I haven't gone into much depth on that topic.
What needs improvement?
AWS Glue's error handling is difficult.
The errors in AWS are very hard to handle. The screen is very hard to understand.
I have to use CloudWatch, but whatever our error was, the new ones, and so on. I would test this with someone. It's not so easy for me, and there are more things related to this.
For how long have I used the solution?
I have been using it for a year and a half.
Buyer's Guide
AWS Glue
November 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
What do I think about the stability of the solution?
I would rate the stability a nine out of ten.
What do I think about the scalability of the solution?
I would rate the scalability a seven out of ten.
My data is small, so we need to consider more days. We need to deal with what we have, but I understand the documentation.
Some people find it hard, but I rated it a seven. In my company, TechOps uses AWS with about 1,200 users.
Which solution did I use previously and why did I switch?
I worked with Databricks. In my opinion, Databricks is improving and is easier to use. It's more user-friendly, and I think it's better overall.
How was the initial setup?
I work with a big company, and most of it is already quickly done, like using something that is a blueprint. This configuration stuff is already working in another place. The only thing I have to do with the cloud is the remote configuration.
What's my experience with pricing, setup cost, and licensing?
AWS can be expensive.
What other advice do I have?
Overall, I would rate it a seven out of ten. I would recommend it.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Jul 4, 2024
Flag as inappropriateData engineer at nust
Better than other tools for ETL jobs, but needs better documentation
Pros and Cons
- "AWS Glue is quite better than other tools, but you have to learn it properly before you start using it."
- "While working on AWS Glue, I could not find any training material for it."
What is our primary use case?
I constructed a straightforward ETL job using AWS Glue, wherein I had to load a couple of files in the Teradata database.
What is most valuable?
AWS Glue is quite better than other tools, but you have to learn it properly before you start using it.
What needs improvement?
While working on AWS Glue, I could not find any training material for it. Although it's not a problem with the product, the solution could include better documentation.
For how long have I used the solution?
I have been using AWS Glue for about two months.
What do I think about the stability of the solution?
AWS Glue is a stable solution.
How was the initial setup?
AWS Glue's initial setup is quite straightforward.
What other advice do I have?
Overall, I rate AWS Glue a seven out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
AWS Glue
November 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
CEO and Founder at HartB
Improved our time to implement a new ETL process and has a good price and scalability, but only works with AWS
Pros and Cons
- "The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features."
- "The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."
What is our primary use case?
It is a good tool for us. All the implementation in our company is done with AWS Glue. We use it to execute all the ETL processes. We have collected more or less five terabytes of information from the internet by now. We process all this data in our cloud platform and normalize the information. We first put it on a data lake that we have here on the AWS tool. After that, we use AWS Glue to transform all the information collected around the internet and put the normalized information into a data warehouse.
How has it helped my organization?
It has improved the time to implement a new ETL process by 30%. We have also seen a big improvement in the data science area.
What is most valuable?
The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features.
What needs improvement?
The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS.
For how long have I used the solution?
I have been using this solution for two years.
What do I think about the stability of the solution?
In terms of stability, we had some problems in the past, but now, it is okay. AWS provides SLA, and the integration of the tools is good.
What do I think about the scalability of the solution?
Scalability is a very strong point of this solution as compared to other solutions like PowerCenter and Pentaho. In Pentaho, you need to install a lot of machines, but in AWS Glue, you just need to find out how many instances do you need. You just put this information in a form and click okay. Magically, you have the scaled processes.
We have 35 users of this solution, and they are engineers, DevOps, and data scientists. We have a lot of plans to increase the usage of AWS Glue in 2021.
How are customer service and technical support?
In the first year of using it, we had a lot of problems with the solution. Our team found more or less five bugs if I remember correctly. Our experience with AWS support was very good. The team in the US helped us to resolve the problems and fix the bugs. We are AWS partners.
Which solution did I use previously and why did I switch?
Before AWS Glue, we worked with Talend, PowerCenter, and Pentaho. In the case of PowerCenter, the biggest problem for us was the plugins because they were too expensive. That was the negative point of PowerCenter.
In the case of Talend, the problem was that in Brazil, we didn't have professionals with the skills to work with Talend. In addition, we had to use the command-line interface, which was a terrible thing because it took more time as compared to other solutions.
In the case of Pentaho, we had the same problem as Talend. We didn't have a lot of professionals. Of course, we have some courses to train people in Pentaho. We work with the biggest companies in Brazil, and we need professionals every day, but we don't have professionals with experience in Pentaho.
How was the initial setup?
The initial setup process is totally easy. You just need to put some information in the forms, and then you just need to click some buttons, and it is complete. The process to provide a new infrastructure with AWS Glue takes from 10 minutes to an hour.
What about the implementation team?
We have all the professionals inside the company.
What's my experience with pricing, setup cost, and licensing?
Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is also good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients.
In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend.
What other advice do I have?
I would rate AWS Glue a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Principal System Architect at a transportation company with 1,001-5,000 employees
Used for data engineering ETL jobs to extract, transform, and load data
Pros and Cons
- "The solution’s most valuable feature is the ETL job."
- "The solution’s technical support could be improved."
What is our primary use case?
AWS Glue is essentially used for data engineering ETL jobs to extract, transform, and load data. We use it to clean data. You have multiple data sources from your application that are not so clean. You have this data and may want to delete certain columns or fill in certain data in an Excel sheet. That's where the extract part comes in. Then, you transform, drop, or make the data uniform and load it to your destination like a data warehouse.
What is most valuable?
The solution’s most valuable feature is the ETL job. AWS Glue is an easy-to-use solution. AWS Glue integrates seamlessly with other AWS services like Athena, Redshift, and S3.
What needs improvement?
The solution’s technical support could be improved.
For how long have I used the solution?
I have been using AWS Glue for a few months.
What do I think about the stability of the solution?
AWS Glue is a stable solution.
I rate the solution’s stability eight and a half out of ten.
What do I think about the scalability of the solution?
In the future, our data sets are going to increase. For now, the solution's scalability is fine.
Which solution did I use previously and why did I switch?
I previously used Data Pipeline, and I tried using Lambda.
How was the initial setup?
The solution’s initial setup is easy.
What other advice do I have?
AWS Glue is built for large datasets, and it does the job perfectly. I would recommend the solution to other users.
Overall, I rate the solution eight and a half out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Sep 12, 2024
Flag as inappropriateConsultant Data junior at a computer software company with 51-200 employees
User-friendly visual interface, but only a few built-in transformations
Pros and Cons
- "The most valuable feature for me is the visual interface of AWS Glue."
- "The product has only a few built-in transformations."
What is our primary use case?
The primary use cases of AWS Glue in our organization are for implementing ETL processes and for data flow.
What is most valuable?
The most valuable feature for me is the visual interface of AWS Glue. It is user-friendly and it is not complicated. Moreover, the coding part of AWS Glue allows users to upload their scripts after dropping some components. The product has flexibility and scalability, which is common in most cloud tools.
What needs improvement?
The product has only a few built-in transformations; additional custom-building transformations could be improved in the next release.
For additional features, I would like documentation on the equivalent of legacy ETL tools and their equivalent in AWS to make it easier for users to migrate their ETL processing to the cloud. It would save time and help users find the best transformation or solution to satisfy their new business needs.
For how long have I used the solution?
I have been using this solution for three months, and I am using the latest version.
What do I think about the stability of the solution?
The stability is good; I have not faced any crashes so far.
What do I think about the scalability of the solution?
I would rate its scalability a seven out of ten.
Which solution did I use previously and why did I switch?
I used a product called SysTrack. For me, it was just a switch from SysTrack to AWS Glue.
What's my experience with pricing, setup cost, and licensing?
The pricing depends on the usage, such as the number of users, computers, and the time jobs run.
What other advice do I have?
Overall, I would rate this product a seven out of ten. It is a good product, but I have not experienced all the additional features.
Which deployment model are you using for this solution?
Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Operations executive at Wipro Infotech
Good support, user-friendly, and AWS-integrated
Pros and Cons
- "It is AWS-integrated. There is end-to-end integration with the other AWS services. It is also user-friendly."
- "There should be more connectors for different databases."
What is our primary use case?
We are using it for day-to-day ETL jobs. It is being used to transfer data from Teradata to the cloud.
We are using its latest version.
What is most valuable?
It is AWS-integrated. There is end-to-end integration with the other AWS services. It is also user-friendly.
What needs improvement?
There should be more connectors for different databases.
For how long have I used the solution?
I have been using this solution for almost a year.
What do I think about the stability of the solution?
It is stable.
What do I think about the scalability of the solution?
It is scalable. We have almost 40 users.
How are customer service and support?
Their support is very good. I would rate them a five out of five.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We were not using any other solution previously.
How was the initial setup?
It was straightforward. Within a couple of hours, it was done.
What other advice do I have?
Before you start using it, you need to know PySpark.
I would rate it a nine out of ten. It is good for what we are using it for.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Net Full-Stack developer at a tech services company with 201-500 employees
A stable solution which can easily integrate with other AWS services
Pros and Cons
- "One of the best features of the solution is its ability to easily integrate with other AWS services."
- "Overall, I consider the technical support to be fine, although the response time could be faster in certain cases."
What is our primary use case?
We use the solution as a level of loading data from the source systems.
What is most valuable?
One of the best features of the solution is its ability to easily integrate with other AWS services. So, it's like we are using AWS as a main cloud provider. It's easy to put everything together. it is very flexible when it comes to compute features. We find the solution very useful when we make use of certain scripts. In some cases, it allows us to get rid of duplicates.
What needs improvement?
When there is a need to configure connections to different database sources in respect of the target, it would be good if it were easier to deal with roles. I am referring to the need to configure connections in a different target process, something which would require a certain time outlay for configuring VPC and checking that everything is okay, in respect of the creation of required roles. It would save time were this process to be made easier and more user friendly.
The technical support depends on the type of question, whether there is a need to understand additional inter-related information on multiple levels. Overall, I consider the technical support to be fine, although the response time could be faster in certain cases.
For how long have I used the solution?
I have been using AWS Glue for about two years.
What do I think about the stability of the solution?
The solution is stable.
How are customer service and support?
While the technical support can vary with the type of question, I feel that, overall, it is okay, although receipt of information could be faster in certain cases.
Which solution did I use previously and why did I switch?
We previously had experience with Database Migration Service at AWS. I recommend it over AWS Glue if one needs to do full database migration from on-premises deployment or in cases involving large volumes of data.
How was the initial setup?
I handled the installation on my own.
What's my experience with pricing, setup cost, and licensing?
I consider the the price to be standard-plus when it comes to optimal usage.
What other advice do I have?
I rate AWS Glue as an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Team Lead at a financial services firm with 5,001-10,000 employees
It can generate the code and has a good user interface, but it lacks Java support
Pros and Cons
- "Its user interface is quite good. You just need to choose some options to create a job in AWS Glue. The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you."
- "Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background."
What is our primary use case?
We are using it for file ingestion. Its primary role is to ingest a file from a vendor to a database.
What is most valuable?
Its user interface is quite good. You just need to choose some options to create a job in AWS Glue.
The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you.
What needs improvement?
Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background.
For how long have I used the solution?
I have been using AWS Glue for three months. We have just started using these services.
What do I think about the stability of the solution?
We have not been using AWS Glue for a long time. Till now, we haven't found any issues.
How are customer service and technical support?
Their technical support is good. We faced an issue with AWS Glue where we had to read a flat file. In a flat file, you only have spaces. You don't have commas or anything else. AWS Glue does not directly support flat files. You need to provide it with an expression to read the file, and that expression itself has some limitation of characters. We contacted the AWS support team. They had a call with us and first tried to understand our problem and then our use case. We gave them some sample files for our use case, and they come up with a solution for this limitation. There are some custom patterns in AWS Glue that can be used. Even though they took some time, they provided the solution. If you give a file today, they will take three to four days to get back.
How was the initial setup?
It was straightforward. A lot of documentation is available on the AWS website, which can guide you through the simple steps to set it up. Its setup was easy for me.
What's my experience with pricing, setup cost, and licensing?
It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us.
What other advice do I have?
We have just recently started to use this solution. We haven't used all features properly. It is good for the features we are using. We did not find any drawbacks or limitations so far. We are already getting whatever we want from it.
I would rate AWS Glue a seven out of ten. It needs improvements in terms of Java support and the turnaround time for our problems.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Cloud Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Informatica PowerCenter
SSIS
MuleSoft Anypoint Platform
Oracle Data Integrator (ODI)
webMethods.io
Talend Open Studio
Confluent
IBM InfoSphere DataStage
AWS Database Migration Service
Oracle GoldenGate
Palantir Foundry
SAP Data Services
StreamSets
Oracle Integration Cloud Service
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best choice for cloud integration: AWS Glue or Informatica Intelligent Cloud Services (IICS)?
- Is AWS Glue a difficult solution to use if you are a complete beginner?
- Is AWS Glue effective for AWS-related products only?
- Why would you choose AWS Glue over other tools?
- What are the most common use cases for AWS Glue?
- How does Talend Open Studio compare with AWS Glue?
- Does AWS Glue offer more flexibility than other ETL (Extract, Transform, Load) tools in terms of data loading?
- Oracle ICS vs ODI
- What is data lake storage?
- When evaluating Cloud Data Integration, what aspect do you think is the most important to look for?