We use AWS Glue for building ETL pipelines.
Module Lead at Mphasis
Provides inbuilt data quality and cataloging features, but it is costly compared to other tools
Pros and Cons
- "The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature."
- "AWS Glue is more costly compared to other tools like Airflow."
What is our primary use case?
What is most valuable?
The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature. The solution provides a codeless feature or no code feature, where you can write a pipeline without adding code in AWS Glue.
If you are working on AWS services for your pipeline, AWS Glue better interacts with the AWS services than other third-party tools. The solution provides inbuilt data quality and cataloging features. AWS Glue provides a complete package, and you can do all things in one place.
What needs improvement?
AWS Glue is more costly compared to other tools like Airflow. It would be better if the solution's pricing could be reduced. The default scheduling that AWS Glue provides is not as good as Airflow. The scheduler of AWS Glue could be improved because you cannot customize it.
For how long have I used the solution?
I have been using AWS Glue for more than three years.
Buyer's Guide
AWS Glue
January 2025
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
What do I think about the stability of the solution?
AWS Glue is a stable product because it's an AWS-managed service. We can directly contact AWS for any issues we face. If there are any glitches in the version, AWS solves the issue by creating patches or version upgrades to the solution.
What do I think about the scalability of the solution?
Around 100 to 200 people are using the solution in our organization.
How was the initial setup?
As AWS Glue is a SaaS product, you don't have to set up anything. You can just create a new job, write your script, and run your job. You have to select the particular cluster or nodes you want to run and write the code. Not much admin part is required for the solution's setup.
What's my experience with pricing, setup cost, and licensing?
AWS Glue is a paid service that doesn't come under the free trial of AWS. You have to pay a charge for using the solution.
What other advice do I have?
If you are doing a job once or twice a day, AWS Glue will not cost you much. If you run jobs for four to five times a day or hourly jobs, it will be costlier compared to other tools. If you are required to run hourly jobs five to six times a day, then using other tools would be a better option. You can choose AWS Glue if you are running jobs only one or two times a day.
Our company decided to go with AWS Glue because the tools we were using in the pipeline were AWS services only. AWS Glue easily interacts with AWS services. The jobs we were running were also not frequent.
You can use AWS Glue for learning purposes. AWS Glue is a paid service that doesn't come under the free trial of AWS. You have to pay a charge for using the solution. You can learn the code by directly testing the basic spark code in any local system. Once you are comfortable that your code is working fine, then you can run your code in AWS Glue jobs. You should test the code in the local system first and then run it in AWS Glue. Testing on AWS Glue will be costly.
If a person is familiar with Spark jobs or Python jobs, they can easily learn AWS Glue. A new user will take the same amount of time to learn AWS Glue as he takes to get comfortable with Spark. Since it provides the GUI and no code thing, users can directly start using AWS Glue without having to learn any code. It's much easier to learn to use the solution.
Overall, I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at Tata Consultancy
Good integration with other AWS services but 30-minute runtime is limiting for large data sets
Pros and Cons
- "The solution integrates well with other AWS products or services."
- "On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded."
What is our primary use case?
Our company has five data engineers who use the solution for metadata catalogs and ETL pipelines that are built in S3 or EC2.
What is most valuable?
The solution integrates well with other AWS products or services.
What needs improvement?
The solution needs to expand its 30-minute query or runtime. Sometimes it fails with certain data types such as Athena due to the limited runtime. Some large data sets run overtime during busy hours so we try to avoid failures by running data at idle times or at night.
On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded. This can be quite confusing.
For how long have I used the solution?
I have been using the solution for one year.
What do I think about the stability of the solution?
The solution is stable 99% of the time but is somewhat dependent on workloads. Heavy workloads or bigger data sets sometimes fail. Improvements are needed for stability to be consistent.
How are customer service and support?
I have not needed technical support but my colleagues have used it. I do not have relevant feedback to report.
How was the initial setup?
The setup requires some knowledge and understanding of how the solution works.
What about the implementation team?
We implemented the solution in-house.
What's my experience with pricing, setup cost, and licensing?
I rate pricing an eight out of ten.
Which other solutions did I evaluate?
There are other options that offer equivalent service such as SRA and GCP but the solution is fully integrated with AWS where our data resides so we take full advantage of that.
What other advice do I have?
I rate the solution a five out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Buyer's Guide
AWS Glue
January 2025
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
Senior Developer for cloud services at Coforge Growth Agency
Efficiently handles and moves data around, contributing to streamlined operations
Pros and Cons
- "One aspect that I would like to highlight is the Glue Crawler, which we utilize when working with large datasets to ensure the schema updates seamlessly without requiring end-team knowledge."
- "The point for improvement in AWS Glue would be the dynamic allocation of resources while utilizing Lambda functions."
What is our primary use case?
I use AWS Glue mostly for data ingestion or data extraction from multiple sources.
How has it helped my organization?
AWS Glue plays a central role in AI-based solutions and machine learning workflows by efficiently handling and moving data around, contributing to streamlined operations.
What is most valuable?
One aspect that I would like to highlight is the Glue Crawler, which we utilize when working with large datasets to ensure the schema updates seamlessly without requiring end-team knowledge.
Additionally, the pipeline orchestration and scheduling of ETL pipelines in AWS Glue are also highly effective. AWS Glue supports AI-driven projects and DAG transformations by facilitating efficient data handling required for machine learning workflows.
What needs improvement?
The point for improvement in AWS Glue would be the dynamic allocation of resources while utilizing Lambda functions. Currently, Lambda functions encounter a time-out error after fifteen minutes, necessitating an improvement in this area. Moreover, more practical examples and resources for advanced features would be beneficial for users.
For how long have I used the solution?
With AWS Glue, I have one year of experience.
What do I think about the stability of the solution?
In terms of stability, there are areas that could be improved, particularly with Lambda functions and step functions. There should be better scaling for parallel computation. Despite this, AWS's overall stability is quite good.
What do I think about the scalability of the solution?
AWS Glue receives a nine out of ten for scalability, although we feel certain functions should scale better for enhanced parallel computation.
How are customer service and support?
I would rate the technical support for AWS Glue as ten out of ten. The support is high-quality.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of AWS Glue could be straightforward for new users of AWS services, but there should be more practical examples and guides for advanced features to assist newcomers in learning complex concepts.
What about the implementation team?
Currently, I work on the implementation within our development team, focusing on cloud services.
What's my experience with pricing, setup cost, and licensing?
I find the pricing for AWS Glue quite affordable. For students or new users, AWS offers free credits, and as usage increases, the pay-as-you-go model provides flexibility without being expensive.
What other advice do I have?
I would recommend AWS Glue to others and rate the overall solution nine out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
Last updated: Nov 24, 2024
Flag as inappropriateAssociate Director - Delivery (Technology DWH & Data Engineer) at MOBIUS KNOWLEDGE SERVICES PRIVATE LIMITED
Efficiently integrates and transforms data but lacks in scalability
Pros and Cons
- "I like its integration and ability to handle all data-related tasks."
- "We face performance issues when using AWS Glue for data transformation and integration."
What is our primary use case?
Our primary use cases include pulling data from multiple sources and loading it into the central capacity for data transformation, integration, and processing.
What is most valuable?
I like its performance, integration, and its ability to handle all data-related tasks.
What needs improvement?
We face performance issues when using AWS Glue for data transformation and integration. It takes almost three to four hours to execute single transformations, which is a lot. We want to improve the performance to meet customer requirements.
Mainly, I am focused on improving the performance aspect because the customer is keen on this improvement.
For how long have I used the solution?
I have been using AWS Glue for five years. I am using the latest version.
What do I think about the stability of the solution?
I would rate the stability of AWS Glue a seven out of ten. There are some performance issues.
What do I think about the scalability of the solution?
I would rate the scalability of AWS Glue a six out of ten. There are over 25 users in my company.
How are customer service and support?
The customer service and support team is okay.
How would you rate customer service and support?
Neutral
How was the initial setup?
It is a cloud-based solution; there is no such installation procedure required. One DevOps is required for the maintenance of AWS Glue.
What other advice do I have?
Based on the customer scenario, I have previously recommended AWS Glue. Sometimes, customers directly request either Azure RapidAPI or AWS Glue. It depends on the specific business use case. Both tools have limitations, so it's hard to say which is best. If a customer already uses Microsoft products, I suggest going with Azure. As for a general rating, I would give AWS Glue a seven out of ten.
Overall, I would rate AWS Glue a seven out of ten because it's not about performance. It's because of how the tool is used.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Software Engineer at a consumer goods company with 10,001+ employees
It comes with its own data catalog and supports triggers for scheduling the ETL process
Pros and Cons
- "Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
- "The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
What is our primary use case?
We are collecting some TV audience data and analyzing it.
What is most valuable?
Data catalog and triggers are the two best features for me.
AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process.
What needs improvement?
The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great.
It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.
For how long have I used the solution?
We have been using the AWS Glue for approximately one and a half years.
What do I think about the stability of the solution?
There is no problem related to stability.
What do I think about the scalability of the solution?
Scalability is good. I can reduce or increase the number of DPUs, which I find very useful.
We are trying to increase the usage of AWS Glue because of customer needs. When the data increases, our application needs some more analyzers and user interfaces. We will increase our data analyzer and user interfaces.
How are customer service and technical support?
I didn't take any technical support because I didn't have a big problem or issue. I just used some information from various communities and forums about the maintenance.
What's my experience with pricing, setup cost, and licensing?
The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied.
There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes.
What other advice do I have?
I would recommend AWS Glue. It is a great choice.
I would rate this solution a nine out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: partner
Developer-Data Engineer at Collab
Good large data processing and scalable but must overcome pipeline challenges
Pros and Cons
- "The best thing about AWS Glue is its scalability and how easy it is to process a large amount of data."
- "Setting up pipelines is challenging, especially with version control and testing requirements."
What is our primary use case?
I use AWS Glue primarily for ETL jobs. In my organization, it's just me using it as we are a small company. The IT team consists of four people, and I am the data engineering specialist.
What is most valuable?
The best thing about AWS Glue is its scalability and how easy it is to process a large amount of data. It integrates well with Redshift, S3, and AWS Glue catalog.
For processing extensive data, having a managed Spark service fulfills that role. If you're already working on AWS and you need to process a lot of data that can't be handled on a single node or server, AWS Glue will serve you well. While it's quite expensive, it's valuable for large data processing needs.
What needs improvement?
Setting up pipelines is challenging, especially with version control and testing requirements. While the initial setup is easy, it doesn't accommodate more complex development needs. You might feel hesitant about changing pipelines that are already running and processing business-critical data due to limited versioning and testing capabilities.
For how long have I used the solution?
I've been using AWS Glue since 2022, so for two years.
What do I think about the stability of the solution?
The stability of AWS Glue is fine. I haven't had any problems with it.
What do I think about the scalability of the solution?
The scalability of AWS Glue is commendable.
Which solution did I use previously and why did I switch?
Previously, in different jobs, I have worked with Databricks for ETL processes. I've also utilized Lambda functions for handling smaller data. I didn’t switch to AWS Glue, but used it in a different context.
How was the initial setup?
The initial setup of AWS Glue is easy, yet not adequate for more complex requirements. If you need to do something robust, like creating a notebook, it is straightforward.
However, when dealing with complex pipelines handling critical business data, it's hard to set up versioning and testing.
What other advice do I have?
AWS Glue receives a hesitant five out of ten from me. I recommend it if you're already on AWS and need to process large data sets. However, for smaller data volumes, I would suggest Airflow because AWS Glue can be quite expensive.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Nov 21, 2024
Flag as inappropriateConsultant at a tech vendor with 10,001+ employees
Easy to set up, useful for batch processing, and is free to try
Pros and Cons
- "The solution helps organizations gain flexibility in defining the structure of the data."
- "I haven't looked into Glue in terms of seeking out flaws. I've not come across missing features."
What is our primary use case?
Once you get the data and you don't know about the structure of the data, then Glue is very helpful to estimate the structure, including where is the structure, and it'll identify everything for you. It has one component that is called Glue Crawler that is quite useful for this task. It will go through segments of your data and try to guess their structure. It pops out the structure, and you can modify it according to your convenience.
It is good to basically perform the ETL when your files are stored in the S3 bucket. Glue supports other external sources also. That said, most of the time, we have basically given our proposal to clients if the data is available in S3.
How has it helped my organization?
The solution helps organizations gain flexibility in defining the structure of the data.
You can define and then include the original data structure and decide what the required fills are or what other ones you can omit. You can perform certain processing tasks also, and you can basically apply the multiplying factor; you can do the cleanup, et cetera, on the fly with the Glue.
What is most valuable?
The Glue Crawler can have a set of connectors, so you can utilize those connectors to connect with the external databases, which may be on-premise in different networks or maybe locally on AWS. Basically, you can use the connector to fetch the data.
Once you have a data schema, you can start streaming or fetching the data in the particular format conversion. For example, suppose you have the text file, and you have Word in place or maybe in SQL, and you can use the connector on the fly to convert the database.
For batch processing, batch genetics, it is helpful for the ETL process.
The setup is easy.
The solution offers a free trial.
The solution can scale.
It's stable.
Users only pay for what they use once they have a license.
What needs improvement?
I haven't looked into Glue in terms of seeking out flaws. I've not come across missing features.
For how long have I used the solution?
I've been dealing with the solution for two or three years. I have given a lot of proposals based on customer demand.
What do I think about the stability of the solution?
The solution is quite stable and reliable. There are no bugs or glitches. It doesn't crash or freeze. It is reliable.
What do I think about the scalability of the solution?
Typically, data analytics individuals use the solution. It's not for an entire organization.
It's a scalable solution. I'd rate it ten out of ten.
We do have plans to increase usage. We are in the process of moving many things to the cloud, and if they move onto AWS, they'll need Glue.
How are customer service and support?
I've never been in touch with technical support. I can't speak to how helpful or responsive they are.
How was the initial setup?
The solution is very straightforward to set up and implement.
I'd rate the ease of deployment at an eight or nine out of ten. However, it all depends on the circumstances.
The deployment only takes two to three minutes. It's very fast.
Using the console, you have different sections of AWS Glue You can go and specify the input data source and output target data place. Then you need to specify the transformation. If you want to do the filtering, et cetera, you have to specify. You have the blueprint of transformation functions available also, and you can select from there and then just run it.
What about the implementation team?
I've only just explored the solution. It has not been deployed yet.
What's my experience with pricing, setup cost, and licensing?
When you are just learning and testing the solution, it is free. I cannot speak to the full cost beyond that, as I am just experimenting with the product. They do offer it to users for a limited time to try for free, however.
My understanding is you only pay for what you use, so pricing would vary based on that. You don't need to maintain a cluster and it is serverless.
There are no extra costs beyond a standard license fee.
What other advice do I have?
We are using the latest version of the solution. The solution runs on the cloud and is serverless.
It's a good solution to use when people are not exporting analytics. If you want to perform some ETL on your data and the data is complex, then you should go for Glue. It is easy to set up.
I'd rate the solution ten out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer:
Senior Vice President & Global Head AWS BU at a tech services company with 10,001+ employees
Boosts data integration with serverless architecture and advanced compatibility
Pros and Cons
- "Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features."
- "There could be an enhanced way of managing pure metadata management or data cataloging."
What is our primary use case?
In my role as the global lead for AWS solutions and offerings, we work with various clients, including large-scale clients, to adopt and implement AWS cloud offerings.
Our primary focus revolves around cloud lift-and-shift migration, modernization, re-platforming, rehosting, data architecture, design strategy, and implementing generative AI-specific solutions across different industries such as banking, capital insurance, energy utilities, manufacturing, automotive, semiconductor, and aerospace and defense.
For example, we have implemented AWS Glue at several client locations, utilizing its serverless data integration capabilities during the data discovery process, enterprise transformation, cleansing, transforming, and centralizing data.
How has it helped my organization?
AWS Glue has significantly improved our data quality, enhancing the data by removing duplicates and providing timely and efficient insights.
It also aids in real-time data processing, reducing effort and cost due to its serverless architecture. These features ensure we maintain the highest level of scalability, reliability, and security compliance.
What is most valuable?
AWS Glue is fully managed, providing an easy-to-use integration environment to create, run, and monitor ETL jobs. It's broadly compatible and seamlessly integrates with other AWS services like Amazon S3, Redshift, and Athena. It's flexible with data integration, manages various data formats (JSON, ORC, CSV, etc.), and is serverless, eliminating the need for infrastructure management.
Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features.
What needs improvement?
There could be an enhanced way of managing pure metadata management or data cataloging.
Additionally, while it covers a wide range of integrations with AWS services, integrating with certain additional or legacy products is not seamless and can be complex.
Increasing support for more programming languages and improving advanced analytics capabilities could also be beneficial.
For how long have I used the solution?
We have been working with AWS Glue for almost three-plus years now.
What do I think about the stability of the solution?
We haven't faced any stability issues with AWS Glue. It is a scalable solution, provided that the right design principles and workload management are implemented.
What do I think about the scalability of the solution?
AWS Glue is a scalable solution due to its serverless architecture and efficient design.
How are customer service and support?
My team handles interactions with AWS for technical support, ensuring our design architectures are scalable, flexible, and well-integrated. We often reach out to the AWS team to double-check our implementation mechanisms and guidelines.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of AWS Glue is straightforward due to its serverless architecture and fully managed nature. Specific prerequisites need to be followed, such as setting up data sources, configuring IAM permissions, creating crawlers, and running ETL jobs.
What about the implementation team?
My team escalates technical questions to AWS support, ensuring our design architectures are optimal. We have a partnership with AWS, and the technical team frequently reaches out to AWS for guidance on scalability, flexibility, and integration mechanisms.
What was our ROI?
We have seen an efficient process with AWS Glue, providing the right return on investment at the right time. It ensures efficiency for our clients, giving them the desired ROI within their expected timelines.
What other advice do I have?
Follow the right design principles and involve AWS at the right time to leverage the most current features and offerings from AWS Glue. Ensuring the right architecture will mitigate any issues. I'd rate the solution eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Last updated: Sep 16, 2024
Flag as inappropriateBuyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Product Categories
Cloud Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
MuleSoft Anypoint Platform
webMethods.io
AWS Database Migration Service
Palantir Foundry
Denodo
Fivetran
Matillion ETL
SnapLogic
Elastic Search
IBM App Connect
Zapier
IBM Cloud Pak for Integration
Talend Data integration
Jitterbit Harmony
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best choice for cloud integration: AWS Glue or Informatica Intelligent Cloud Services (IICS)?
- Is AWS Glue a difficult solution to use if you are a complete beginner?
- Is AWS Glue effective for AWS-related products only?
- Why would you choose AWS Glue over other tools?
- What are the most common use cases for AWS Glue?
- How does Talend Open Studio compare with AWS Glue?
- Does AWS Glue offer more flexibility than other ETL (Extract, Transform, Load) tools in terms of data loading?
- Oracle ICS vs ODI
- When evaluating Cloud Data Integration, what aspect do you think is the most important to look for?
- What is data lake storage?