We use the solution to do the usual type of transformations that before required ETL. It's mostly transformation-type purposes that we have, including transforming data from source to target. Also, we are replacing the usual ETLs with Glue, for example.
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Straightforward, easy to set up, and needs little to no training
Pros and Cons
- "It's fairly straightforward as a product; it's not very complicated."
- "The mapping area and the use of the data catalog from Glue could be better."
What is our primary use case?
How has it helped my organization?
The solution has helped the organization mostly with migration. When we migrate from mostly on-premises to the cloud, Glue replaces a more expensive item that's also available in the cloud. Lots of programmers can understand Glue because they can write scripts in Python and PySpark, and there are quite a few programmers that know how to program in those languages. With the previous components that did similar things on-premise, you needed specialized knowledge.
What is most valuable?
The fact that we can use PySpark to program them is great.
It's fairly straightforward as a product; it's not very complicated.
The fact that Amazon offers it and it's a quick way of getting things done, and the fact that it doesn't require a lot of training is very positive attributes.
The initial setup is straightforward.
It can scale.
What needs improvement?
The mapping area and the use of the data catalog from Glue could be better. I would say those two are the main things we'd like to see improvements on.
The solution needs support for big data.
As I understand it, Glue is based on Lambdas and Lambdas have some limitations as far as running them continuously. Sometimes they get dropped, and they have to be reinitialized.
Buyer's Guide
AWS Glue
December 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
For how long have I used the solution?
I've been using the solution for about two years.
What do I think about the stability of the solution?
So far, the solution has been stable. For everything that we've done, we didn't run into any problems, so it was fairly stable for us.
What do I think about the scalability of the solution?
The solution can scale. I don't know exactly at what scale. I was using it for a medium-scale type solution. We never tested it with extremely large scales, so I don't know how expansive it can be. That said, since it's in the cloud, if you need more scale, it would be easy enough to add additional Glue components. At least, in theory, it should be fairly scalable.
Indirectly, we have hundreds of people on the solution.
At this time, I'm not sure if any clients plan to increase usage.
How are customer service and support?
Since the solution is so easy to use, we have not needed the help of technical support.
Which solution did I use previously and why did I switch?
We've previously used many different solutions, including Informatica, Data Storage from IBM, and SSIS.
How was the initial setup?
The solution is easy to set up. It's not overly complex.
At a minimum, a company would need one to two people to handle the deployment. If it is a larger scale of deployment, they might need more personnel.
What about the implementation team?
As a consultant, I assist with the implementation.
What was our ROI?
Likely, a company would receive an ROI as it's cheaper than the alternatives.
What's my experience with pricing, setup cost, and licensing?
I was not involved in the cost negotiation process.
Which other solutions did I evaluate?
Our clients typically have chosen Glue from the start. I was not involved in the evaluation process.
What other advice do I have?
We are using one of the latest versions of the solution. It's about two years old.
Depending on the number of data sources, the variety of data sources, and the variety of targets they will have, I might recommend the solution. What they have and plan to do will dictate whether Glue is a good solution or whether they would require something more sophisticated - such as Databricks. For example, if you have big data, then Databricks is probably a better solution to do ETL.
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
ECM CONSULTANT/ARCHITECT/SOFTWARE DEVELOPER, DELUXE MN at a tech services company with 5,001-10,000 employees
Easy to perform ETL on multiple data sources, and easy to use after you learn it
Pros and Cons
- "Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs."
- "There is a learning curve to this tool."
What is our primary use case?
Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs. It is tailored and customized to use with SQL Server, which works very well in that platform.
If you want to use other data sources, the NoSQL concept makes it very easy, because missing data can be inserted as a new column or with null values.
That is not the case with many other tools. If you have on-premises tools, such as IIS, they don't manage missing data well.
What is most valuable?
If you want extremely high-performance functionality, you have to use both AWS Glue or Data Lake to store it in some temporary table. First, you will have to do some cleaning of the data, then if you need performance and speed, you have to use IIS with an IBM tool.
You have to use the right tool in the right places. For example, if you're using Oracle, you have got to use the Oracle tools. If you are using SQL, you have to use the SQL tools. There is no other tool that provides the performance.
It's context-based and project-based. In the projects that I have used, it has worked well.
What needs improvement?
There is a learning curve to this tool.
For how long have I used the solution?
I have been working with AWS Glue for four years.
Everything runs on AWS, even if it belongs to a third party. For example, if you have a Netflix subscription, it runs on AWS. We have other products or vendor subscriptions that run on AWS.
What do I think about the stability of the solution?
Undoubtedly, the cloud is built to handle failure. If you have your devices, and your resources configured correctly, you won't have any issues. I haven't seen a problem.
How are customer service and support?
You have to pay for their technical support, and depending on which level of subscription, you will receive a call within an hour; otherwise, you will have to wait for days.
Which solution did I use previously and why did I switch?
We also use Azure's Data Lake, and I worked with Tipco in the past, though it's been a few years since we used it.
You should select the best tool for the job or the projects that are currently being worked on. Tipco was heavily used in the previous project we worked on.
How was the initial setup?
It takes some time to learn, but once you get the hang of it, you'll be fine. It's like any other IT tool, where nobody is an expert or isn't an expert, it is just the way you are exposed to a tool.
You've chosen the right tool if you understand how the data works and what it needs to do. It's like going to Home Depot to get the right tool. You can purchase a set of tools, and it will work for you, but you will still need to purchase something else.
It's one of those tools in which someone must be an expert. After that, all tools and platforms become secondary.
What's my experience with pricing, setup cost, and licensing?
With AWS Glue, you pay more, but if you want to process the data, with speed and performance, you need the correct EC2 instances.
There is a price to pay. It doesn't come free.
Technical support is a paid service, and which subscription you have is dependent on that. You must pay one of them, and it ranges from $15,000 to $25,000 per year.
You sign up for a level of service, and it does not come for free. As previously stated, everything is based on performance, ELAs.
It was very expensive, at that time. If a company wants to pay the money, it makes my job easier. However, if the company or enterprise does not have the funds to pay for it, then it is a hassle.
What other advice do I have?
In that environment, there is a lot going on. There are some things that you can get for free, and there are some add-ons that you can develop or use that have been tested. It's all about convenience and service. You will get what you pay for if you pay for what you want.
I'm not a fan of any tools; it all depends on the organization I work for, where their data is, what they want to do with it, how quickly they want to get there, and what their budget is, and you work around that. For me, I would not choose one over the other, unless I know the details of the project.
I would rate AWS Glue a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
AWS Glue
December 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
AWS DATA ENGINEER at Coforge Growth Agency
Intuitive with a good user interface and ETL integration capabilities
Pros and Cons
- "The two features I find most valuable in AWS Glue are its user interface and ease of use."
- "Beginners need additional support as it currently lacks some features required for complex transformations, often necessitating custom Python coding."
What is our primary use case?
I have been working as a data engineer, where dealing with the ETL process is essential. We are using AWS Glue as a primary ETL tool to serve our organization's needs. I have implemented several Glue jobs still in production.
How has it helped my organization?
AWS Glue has enabled us to perform ETL processes efficiently, with ease of use for AWS cloud users, providing a serverless service that eliminates the need for infrastructure maintenance.
What is most valuable?
The two features I find most valuable in AWS Glue are its user interface and ease of use. The user interface is intuitive, and navigating through the Glue console is seamless.
Additionally, its ability to integrate with other AWS services is excellent, providing flawless coordination with services such as SNS, S3, and Lambda.
What needs improvement?
I see scope for improvement in the drag-and-drop feature of AWS Glue. Beginners need additional support as it currently lacks some features required for complex transformations, often necessitating custom Python coding.
For how long have I used the solution?
I have been using Glue for more than five years now.
What do I think about the stability of the solution?
Overall, the stability of AWS Glue is excellent. I would rate it a nine out of ten. Some network-related issues may arise. That said, they are rare and do not affect its functionality significantly.
What do I think about the scalability of the solution?
Regarding scalability, AWS Glue is nearly perfect. I would rate it a nine out of ten, although there is always room for improvement.
How are customer service and support?
AWS customer service is great, but there is room for improvement. The issue I face is the inconsistency in dealing with different customer service representatives for the same issue, which disrupts personal touch.
How would you rate customer service and support?
Neutral
What's my experience with pricing, setup cost, and licensing?
On an organizational level, the pricing of AWS Glue does not pose a concern. It is in line with other ETL tools in the market. However, AWS Glue's cost to free-tier users is an issue because it is not entirely free, even for trial purposes.
What other advice do I have?
I advise potential users to adopt AWS Glue primarily due to its user-friendly interface, extensive documentation, and seamless integration with other AWS services, making it ideal for data engineers.
I'd rate the solution nine out of ten.
Disclosure: My company has a business relationship with this vendor other than being a customer:
Last updated: Oct 29, 2024
Flag as inappropriateSenior Vice President & Global Head AWS BU at a tech services company with 10,001+ employees
Boosts data integration with serverless architecture and advanced compatibility
Pros and Cons
- "Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features."
- "There could be an enhanced way of managing pure metadata management or data cataloging."
What is our primary use case?
In my role as the global lead for AWS solutions and offerings, we work with various clients, including large-scale clients, to adopt and implement AWS cloud offerings.
Our primary focus revolves around cloud lift-and-shift migration, modernization, re-platforming, rehosting, data architecture, design strategy, and implementing generative AI-specific solutions across different industries such as banking, capital insurance, energy utilities, manufacturing, automotive, semiconductor, and aerospace and defense.
For example, we have implemented AWS Glue at several client locations, utilizing its serverless data integration capabilities during the data discovery process, enterprise transformation, cleansing, transforming, and centralizing data.
How has it helped my organization?
AWS Glue has significantly improved our data quality, enhancing the data by removing duplicates and providing timely and efficient insights.
It also aids in real-time data processing, reducing effort and cost due to its serverless architecture. These features ensure we maintain the highest level of scalability, reliability, and security compliance.
What is most valuable?
AWS Glue is fully managed, providing an easy-to-use integration environment to create, run, and monitor ETL jobs. It's broadly compatible and seamlessly integrates with other AWS services like Amazon S3, Redshift, and Athena. It's flexible with data integration, manages various data formats (JSON, ORC, CSV, etc.), and is serverless, eliminating the need for infrastructure management.
Its ease of use, cost-effectiveness, and highly secure architecture are some of the most valuable features.
What needs improvement?
There could be an enhanced way of managing pure metadata management or data cataloging.
Additionally, while it covers a wide range of integrations with AWS services, integrating with certain additional or legacy products is not seamless and can be complex.
Increasing support for more programming languages and improving advanced analytics capabilities could also be beneficial.
For how long have I used the solution?
We have been working with AWS Glue for almost three-plus years now.
What do I think about the stability of the solution?
We haven't faced any stability issues with AWS Glue. It is a scalable solution, provided that the right design principles and workload management are implemented.
What do I think about the scalability of the solution?
AWS Glue is a scalable solution due to its serverless architecture and efficient design.
How are customer service and support?
My team handles interactions with AWS for technical support, ensuring our design architectures are scalable, flexible, and well-integrated. We often reach out to the AWS team to double-check our implementation mechanisms and guidelines.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup of AWS Glue is straightforward due to its serverless architecture and fully managed nature. Specific prerequisites need to be followed, such as setting up data sources, configuring IAM permissions, creating crawlers, and running ETL jobs.
What about the implementation team?
My team escalates technical questions to AWS support, ensuring our design architectures are optimal. We have a partnership with AWS, and the technical team frequently reaches out to AWS for guidance on scalability, flexibility, and integration mechanisms.
What was our ROI?
We have seen an efficient process with AWS Glue, providing the right return on investment at the right time. It ensures efficiency for our clients, giving them the desired ROI within their expected timelines.
What other advice do I have?
Follow the right design principles and involve AWS at the right time to leverage the most current features and offerings from AWS Glue. Ensuring the right architecture will mitigate any issues. I'd rate the solution eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Last updated: Sep 16, 2024
Flag as inappropriateOwner at a tech services company with 51-200 employees
Capable of handling real-time but ETL interface could be more user-friendly
Pros and Cons
- "I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages."
- "One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools."
What is our primary use case?
One common use case is migrating data from one system to another. So, mostly migrating data and data engineering, getting real-time or near-real-time data using Lambda functions and migrating big data from on-prem to the cloud for historical data before starting a project.
What is most valuable?
If you have the Fund Manager, you could use a fast processing engine, which is crucial for performance.
I also like that you can add custom libraries like JAR files and use them. So, the ability to use a fast processing engine and embed basic jobs easily are significant advantages.
What needs improvement?
One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools.
Additionally, AWS Glue can sometimes be slow, especially when processing large datasets. It was sometimes a bit slow. Also, I couldn't directly use bucketed data. With Elastic Glue, you had to convert your data frames into the correct format before connecting them using the drag-and-drop interface. So that's something I didn't like because the conversion process wasn't straightforward.
In future releases, I would like to see a feature that could trigger Glue pipeline using an API or something.
For how long have I used the solution?
I have experience with AWS Glue. I have about one year of experience in a professional setting, but I have also done some personal work with this solution.
How are customer service and support?
Support was good, but I was working with a big client, so that might have influenced the experience. The response time was fast, we heard back from them within a day.
How would you rate customer service and support?
Positive
How was the initial setup?
I would rate my experience with the initial setup an eight out of ten, where one is difficult and ten is easy.
The initial setup is not very complex. You can customize parameters like minimum and maximum for your needs. For me, it wasn't complex to deploy the solution.
What other advice do I have?
I'd rate it around six out of ten compared to other tools like Databricks.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at Scania
Provides good scalability and has an easy setup process
Pros and Cons
- "The product has a valuable feature for data catalog."
- "The product is expensive for data streaming. This area needs improvement."
What is our primary use case?
We use AWS Glue for ETL batch processing purposes.
What is most valuable?
The product has a valuable feature for data catalog.
What needs improvement?
The product is expensive for data streaming compared to EMR. This area needs improvement.
For how long have I used the solution?
We have been using AWS Glue for one and a half years.
What do I think about the stability of the solution?
I rate the product's stability a ten out of ten.
What do I think about the scalability of the solution?
We have five to six AWS Glue users. I rate its scalability a nine out of ten.
Which solution did I use previously and why did I switch?
We have used Cloudera before. We switched to AWS Glue for better pricing, scalability, and innovation.
How was the initial setup?
The initial setup is easy. I rate the process an eight or nine out of ten. It could be deployed on-premises and on the cloud as well. We have a team of five executives to carry out the implementation.
What's my experience with pricing, setup cost, and licensing?
It is an expensive product. I rate its pricing a nine out of ten.
What other advice do I have?
I rate AWS Glue a nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Associate Consultant at a tech vendor with 10,001+ employees
An extremely user-friendly and stable tool requiring an easy initial setup
Pros and Cons
- "The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
- "The solution could be cheaper. The price of the solution is an area that needs improvement."
What is our primary use case?
Currently, we are utilizing AWS Glue for various ETL workloads, specifically in the life sciences domain. Our primary objective is to acquire data from various sources. Then, we store it in Redshift. This is where the complete use case of AWS Glue comes into the picture.
What is most valuable?
The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users.
What needs improvement?
The solution could be cheaper. The price of the solution is an area that needs improvement.
For how long have I used the solution?
I have been using AWS Glue in my organization for a year. I am an end-user and a customer of the solution.
What do I think about the stability of the solution?
It is a stable solution. We have not faced any issues in the past year, so it's pretty stable. Stability-wise, I rate it a ten out of ten.
What do I think about the scalability of the solution?
The solution has proven to be scalable, and from my experience in the data engineering domain, I rate it an eight out of ten. It is worth noting that I may not be the most qualified person to provide a rating since I mostly manage and work on data-related tasks. Currently, approximately 20-25 people in our company use the solution.
How are customer service and support?
I had no experience with the technical support team of AWS Glue.
Which solution did I use previously and why did I switch?
Previously, I used Azure Data Factory. But I did not find it really helpful. And it was a bit complex. It was not that user-friendly. And I am much more comfortable with the AWS services as compared to Azure services.
How was the initial setup?
The initial setup of the solution is straightforward, and I find it easy to implement. I rate the setup process a nine on a scale of one to ten, where ten is the easiest. As for the deployment process, we usually request our platform team to handle it, and they are quite efficient in deploying and managing the infrastructure. Although I am not directly involved in the deployment process, my understanding is that it can be completed in just a few hours with the help of two to three team members. Our platform team consists of data engineers, architects, and platform engineers who cater to the needs of various projects and products within the AWS ecosystem. Fortunately, the solution does not require any maintenance.
What's my experience with pricing, setup cost, and licensing?
Price-wise, the solution is adequate, and we have no issues with it. We believe that the cost is justified given the number of users and the features it provides. Overall, it can be considered an average-priced tool. I would rate the solution a six or seven on a scale of one to ten, with ten being very expensive. Specifically, I rate its pricing a six out of ten.
Which other solutions did I evaluate?
Before choosing AWS Glue, I evaluated Azure Data Factory.
What other advice do I have?
I would tell those planning to use AWS Glue to try it. I rate the overall solution a ten out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at YASH Technologies
Cheap, reliable, and able to expand as needed
Pros and Cons
- "The solution is stable and reliable."
- "The monitoring is not that good."
What is most valuable?
The best feature is the price point. It's pretty cheap as compared to other tools like Informatica, et cetera. That's why major companies are moving to the cloud and using Glue. At least, that's what I found.
The solution is stable and reliable.
You can scale the product if you need to.
What needs improvement?
The monitoring is not that good. We'd like to see job progress be more clear. Right now, how we can view that is not that good. The is that mostly it is Python or Scala code based. The UX is lacking.
There is a bit of a learning curve, particularly during the setup process.
More connectors should be included.
For how long have I used the solution?
I've been using the solution for three years.
What do I think about the stability of the solution?
The solution is very reliable. It's stable. There are no bugs or glitches It works just fine.
What do I think about the scalability of the solution?
The solution can scale very well. It's not a problem.
How are customer service and support?
Technical support is okay. We tend to go to the partner if we have issues, and they'll go to WS if they need to.
Which solution did I use previously and why did I switch?
I'm also familiar with Informatica. However, Glue is less expensive.
How was the initial setup?
In terms of the initial setup, the learning part was a little bit stiff. After that, it is okay. We didn't have any issues once we understood the process.
What about the implementation team?
We didn't require any outside assistance such as integrators or consultants. We were able to handle it ourselves.
What's my experience with pricing, setup cost, and licensing?
The price is very good. It's enticing people to move to the cloud.
That said, I do not have exact information on pricing.
What other advice do I have?
I'm an AWS engineer. My company is a gold partner.
I'd rate the product eight out of ten. So far, it's quite good. I don't have any complaints.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2024
Product Categories
Cloud Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
MuleSoft Anypoint Platform
webMethods.io
AWS Database Migration Service
Palantir Foundry
Denodo
Matillion ETL
Fivetran
SnapLogic
Elastic Search
IBM App Connect
Zapier
IBM Cloud Pak for Integration
Talend Data integration
Jitterbit Harmony
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best choice for cloud integration: AWS Glue or Informatica Intelligent Cloud Services (IICS)?
- Is AWS Glue a difficult solution to use if you are a complete beginner?
- Is AWS Glue effective for AWS-related products only?
- Why would you choose AWS Glue over other tools?
- What are the most common use cases for AWS Glue?
- How does Talend Open Studio compare with AWS Glue?
- Does AWS Glue offer more flexibility than other ETL (Extract, Transform, Load) tools in terms of data loading?
- Oracle ICS vs ODI
- When evaluating Cloud Data Integration, what aspect do you think is the most important to look for?
- What is data lake storage?