We have a lot of microservices written in Glue, which are responsible for triggering based on certain events. The solution will be responsible for another container to containerize them and run over the cloud. We use the solution for different purposes, including data computing.
Technology Specialist at a consultancy with 10,001+ employees
An interpreted language that does not need compilation, but it is very difficult to learn
Pros and Cons
- "You do not need many frameworks to run Glue."
- "It is very difficult to learn the tool and remember the syntaxes comparatively."
What is our primary use case?
What is most valuable?
You do not need many frameworks to run Glue. It's an interpreted language that does not require to be compiled at all.
What needs improvement?
It is very difficult to learn the tool and remember the syntaxes comparatively. Sometimes, I face issues integrating the solution with some third-party services or services that are not a part of Glue. Such integrations take a lot of time, and not much content is available over the internet for the same.
For how long have I used the solution?
I have been using the solution for three to four years.
Buyer's Guide
AWS Glue
November 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
What do I think about the scalability of the solution?
We have eight developers on our team. My team works on almost four to five Glue services. We have four team members working on Glue, including me.
How are customer service and support?
I once faced an issue with Glue. There was a scenario where I wanted Glue to pick certain images, containerize them, and run over the code. That containerization integration wasn't happening successfully. I dropped a couple of messages in the community channel. I got good support from them, which helped me resolve my issue as quickly as possible. The community is very small, but the people are very helpful.
Which solution did I use previously and why did I switch?
I previously worked in Java, .NET, and Python. I have extensive experience with Python and .NET. Since my organization is language-independent, we have microservices written in almost all the languages, including Glue, Python, Java, and .NET.
How was the initial setup?
I'm not handling the solution's end-to-end deployment, but we have a CI/CD pipeline set up for that. The CI/CD pipeline will remain the same. It's all about how you containerize your Glue application. That is the only challenge we have faced while setting up the deployment. The rest of the configuration was pretty smooth.
What other advice do I have?
Glue is not a must-have tool. You can choose Glue if you have the capability to learn Glue as quickly as possible. There are other alternatives where you will find a lot of articles, study material, and certificates over the internet apart from Glue. If you do not have any other option, go for Glue.
If Glue is not mandatory for you, go for something else because it is difficult to learn Glue and remember the syntaxes. You will need support whenever you have a bigger integration or connectivity with third-party libraries or services. You will not receive many articles or help over the internet. Although the community is available, you need to spend some time with them to make them understand the issue.
It is not easy for a beginner to learn to use the solution for the first time. There are a few videos and courses available, but it's difficult. It's not as easy as other languages in terms of content. It's hard, but you can use it once you understand the concept.
Overall, I rate the solution seven and a half out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Aug 8, 2024
Flag as inappropriateProject Manager at Softway
It has a real-time backup feature and records and backs up information every single moment, but its cost is high, and setting it up is complex
Pros and Cons
- "What I like best about AWS Glue is its real-time data backup feature. Last week, there was a production push, and what used to take almost ten days to send out around fifty-six thousand emails now takes only two hours."
- "Cost-wise, AWS Glue is expensive, so that's an area for improvement. The process for setting up the solution was also complex, which is another area for improvement."
What is our primary use case?
We're using GPU 0.2 in ten verticals and wanted to use AWS Glue only for one purpose: to optimize Amazon Redshift.
We have millions of data that we have to back up. Previously, we did it once every six months, but the client data have been very interactive, and we need spontaneous back and forth of data communication in real-time. In one second, we have almost one million records that come and go continuously. The client wanted to keep all data because they're using it for analytics and wanted to back up the data every second without delay. We tried to optimize Amazon Redshift and found out about AWS Glue, which comes with massive costs, but the client is willing to pay.
What is most valuable?
What I like best about AWS Glue is its real-time data backup feature. Last week, there was a production push, and what used to take almost ten days even to send out around fifty-six thousand emails now takes only two hours.
I also like that the data backup in AWS Glue is spontaneous, and data is recorded and backed up every single moment.
What needs improvement?
AWS Glue had some issues, which required optimization, particularly in terms of the number of workers you deploy, and that's where costing comes in. Cost-wise, AWS Glue is expensive, so that's an area for improvement. My company did some modifications, which turned out to be successful, so overall, the solution works fine.
Even though there is a backup, you need to know what's happening. You need to understand why there's a failure. AWS Glue doesn't provide the information, so my company uses its logs. The development team also doesn't have specific answers because the team is still playing around with the process, which means the company is still trying to figure out other areas for improvement in AWS Glue.
The process for setting up the solution was also complex, which is another area for improvement.
AWS should provide help during migration and assist its users. Otherwise, it's a nightmare.
For how long have I used the solution?
I've been using AWS Glue for one and a half months.
What do I think about the stability of the solution?
AWS Glue is stable, but stability depends on how many workers you deploy and the work that you do.
What do I think about the scalability of the solution?
AWS Glue is highly scalable. It can scale to almost one billion data per second.
How are customer service and support?
We did make some good friends in AWS, so they gave us technical support for AWS Glue for free. They were also new and were trying to evolve, so they provided us with free support, but they'll be charging other clients for the support moving forward.
How was the initial setup?
The setup for AWS Glue is highly complex. The company started with R&D four months ago and only completed the deployment last week.
My company used one and a half FTE resources for the deployment.
The deployment process for AWS Glue was normal and involved CI/CD, but it was mainly the backend dev ops engineers who did it. I'm more of a project manager, so I'm not involved in technical items. It's more of me helping the engineers with the R&D.
What's my experience with pricing, setup cost, and licensing?
AWS Glue is a high-priced solution that bills the client $150,000 to $250,000 annually. That's just the starting price because it's a small data sample, but if it hits over three hundred million users, the cost will probably go up almost thirty times more.
What other advice do I have?
I'm using the latest version of AWS Glue.
I'm not the end-user, as I work for a company that implements AWS Glue for clients.
My company has one client using AWS Glue, but that client has three hundred million users.
I recommend AWS Glue to others because it's an excellent solution. However, it lacks documentation. There's only a little documentation available. Even certified AWS practitioners struggle with the lack of documentation for AWS Glue. You'll find complicated processes or features, such as time series tables. Even if there's documentation, implementing the solution requires many trial and error methods, and revamping becomes a nightmare if you're using the old infrastructure.
My rating for AWS Glue is seven out of ten because of the complexity of the deployment, and the lack of information and documentation, that my company had to do some R&D. If AWS had complete documentation, or sent more than one person to assist my company, then it could have saved more time.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer
Buyer's Guide
AWS Glue
November 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Straightforward, easy to set up, and needs little to no training
Pros and Cons
- "It's fairly straightforward as a product; it's not very complicated."
- "The mapping area and the use of the data catalog from Glue could be better."
What is our primary use case?
We use the solution to do the usual type of transformations that before required ETL. It's mostly transformation-type purposes that we have, including transforming data from source to target. Also, we are replacing the usual ETLs with Glue, for example.
How has it helped my organization?
The solution has helped the organization mostly with migration. When we migrate from mostly on-premises to the cloud, Glue replaces a more expensive item that's also available in the cloud. Lots of programmers can understand Glue because they can write scripts in Python and PySpark, and there are quite a few programmers that know how to program in those languages. With the previous components that did similar things on-premise, you needed specialized knowledge.
What is most valuable?
The fact that we can use PySpark to program them is great.
It's fairly straightforward as a product; it's not very complicated.
The fact that Amazon offers it and it's a quick way of getting things done, and the fact that it doesn't require a lot of training is very positive attributes.
The initial setup is straightforward.
It can scale.
What needs improvement?
The mapping area and the use of the data catalog from Glue could be better. I would say those two are the main things we'd like to see improvements on.
The solution needs support for big data.
As I understand it, Glue is based on Lambdas and Lambdas have some limitations as far as running them continuously. Sometimes they get dropped, and they have to be reinitialized.
For how long have I used the solution?
I've been using the solution for about two years.
What do I think about the stability of the solution?
So far, the solution has been stable. For everything that we've done, we didn't run into any problems, so it was fairly stable for us.
What do I think about the scalability of the solution?
The solution can scale. I don't know exactly at what scale. I was using it for a medium-scale type solution. We never tested it with extremely large scales, so I don't know how expansive it can be. That said, since it's in the cloud, if you need more scale, it would be easy enough to add additional Glue components. At least, in theory, it should be fairly scalable.
Indirectly, we have hundreds of people on the solution.
At this time, I'm not sure if any clients plan to increase usage.
How are customer service and support?
Since the solution is so easy to use, we have not needed the help of technical support.
Which solution did I use previously and why did I switch?
We've previously used many different solutions, including Informatica, Data Storage from IBM, and SSIS.
How was the initial setup?
The solution is easy to set up. It's not overly complex.
At a minimum, a company would need one to two people to handle the deployment. If it is a larger scale of deployment, they might need more personnel.
What about the implementation team?
As a consultant, I assist with the implementation.
What was our ROI?
Likely, a company would receive an ROI as it's cheaper than the alternatives.
What's my experience with pricing, setup cost, and licensing?
I was not involved in the cost negotiation process.
Which other solutions did I evaluate?
Our clients typically have chosen Glue from the start. I was not involved in the evaluation process.
What other advice do I have?
We are using one of the latest versions of the solution. It's about two years old.
Depending on the number of data sources, the variety of data sources, and the variety of targets they will have, I might recommend the solution. What they have and plan to do will dictate whether Glue is a good solution or whether they would require something more sophisticated - such as Databricks. For example, if you have big data, then Databricks is probably a better solution to do ETL.
I'd rate the solution seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at Tata Consultancy
Good integration with other AWS services but 30-minute runtime is limiting for large data sets
Pros and Cons
- "The solution integrates well with other AWS products or services."
- "On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded."
What is our primary use case?
Our company has five data engineers who use the solution for metadata catalogs and ETL pipelines that are built in S3 or EC2.
What is most valuable?
The solution integrates well with other AWS products or services.
What needs improvement?
The solution needs to expand its 30-minute query or runtime. Sometimes it fails with certain data types such as Athena due to the limited runtime. Some large data sets run overtime during busy hours so we try to avoid failures by running data at idle times or at night.
On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded. This can be quite confusing.
For how long have I used the solution?
I have been using the solution for one year.
What do I think about the stability of the solution?
The solution is stable 99% of the time but is somewhat dependent on workloads. Heavy workloads or bigger data sets sometimes fail. Improvements are needed for stability to be consistent.
How are customer service and support?
I have not needed technical support but my colleagues have used it. I do not have relevant feedback to report.
How was the initial setup?
The setup requires some knowledge and understanding of how the solution works.
What about the implementation team?
We implemented the solution in-house.
What's my experience with pricing, setup cost, and licensing?
I rate pricing an eight out of ten.
Which other solutions did I evaluate?
There are other options that offer equivalent service such as SRA and GCP but the solution is fully integrated with AWS where our data resides so we take full advantage of that.
What other advice do I have?
I rate the solution a five out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Associate Director - Delivery (Technology DWH & Data Engineer) at MOBIUS KNOWLEDGE SERVICES PRIVATE LIMITED
Efficiently integrates and transforms data but lacks in scalability
Pros and Cons
- "I like its integration and ability to handle all data-related tasks."
- "We face performance issues when using AWS Glue for data transformation and integration."
What is our primary use case?
Our primary use cases include pulling data from multiple sources and loading it into the central capacity for data transformation, integration, and processing.
What is most valuable?
I like its performance, integration, and its ability to handle all data-related tasks.
What needs improvement?
We face performance issues when using AWS Glue for data transformation and integration. It takes almost three to four hours to execute single transformations, which is a lot. We want to improve the performance to meet customer requirements.
Mainly, I am focused on improving the performance aspect because the customer is keen on this improvement.
For how long have I used the solution?
I have been using AWS Glue for five years. I am using the latest version.
What do I think about the stability of the solution?
I would rate the stability of AWS Glue a seven out of ten. There are some performance issues.
What do I think about the scalability of the solution?
I would rate the scalability of AWS Glue a six out of ten. There are over 25 users in my company.
How are customer service and support?
The customer service and support team is okay.
How would you rate customer service and support?
Neutral
How was the initial setup?
It is a cloud-based solution; there is no such installation procedure required. One DevOps is required for the maintenance of AWS Glue.
What other advice do I have?
Based on the customer scenario, I have previously recommended AWS Glue. Sometimes, customers directly request either Azure RapidAPI or AWS Glue. It depends on the specific business use case. Both tools have limitations, so it's hard to say which is best. If a customer already uses Microsoft products, I suggest going with Azure. As for a general rating, I would give AWS Glue a seven out of ten.
Overall, I would rate AWS Glue a seven out of ten because it's not about performance. It's because of how the tool is used.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Module Lead at Mphasis
Provides inbuilt data quality and cataloging features, but it is costly compared to other tools
Pros and Cons
- "The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature."
- "AWS Glue is more costly compared to other tools like Airflow."
What is our primary use case?
We use AWS Glue for building ETL pipelines.
What is most valuable?
The most valuable feature of AWS Glue is that it provides a GUI format with a drag-and-drop feature. The solution provides a codeless feature or no code feature, where you can write a pipeline without adding code in AWS Glue.
If you are working on AWS services for your pipeline, AWS Glue better interacts with the AWS services than other third-party tools. The solution provides inbuilt data quality and cataloging features. AWS Glue provides a complete package, and you can do all things in one place.
What needs improvement?
AWS Glue is more costly compared to other tools like Airflow. It would be better if the solution's pricing could be reduced. The default scheduling that AWS Glue provides is not as good as Airflow. The scheduler of AWS Glue could be improved because you cannot customize it.
For how long have I used the solution?
I have been using AWS Glue for more than three years.
What do I think about the stability of the solution?
AWS Glue is a stable product because it's an AWS-managed service. We can directly contact AWS for any issues we face. If there are any glitches in the version, AWS solves the issue by creating patches or version upgrades to the solution.
What do I think about the scalability of the solution?
Around 100 to 200 people are using the solution in our organization.
How was the initial setup?
As AWS Glue is a SaaS product, you don't have to set up anything. You can just create a new job, write your script, and run your job. You have to select the particular cluster or nodes you want to run and write the code. Not much admin part is required for the solution's setup.
What's my experience with pricing, setup cost, and licensing?
AWS Glue is a paid service that doesn't come under the free trial of AWS. You have to pay a charge for using the solution.
What other advice do I have?
If you are doing a job once or twice a day, AWS Glue will not cost you much. If you run jobs for four to five times a day or hourly jobs, it will be costlier compared to other tools. If you are required to run hourly jobs five to six times a day, then using other tools would be a better option. You can choose AWS Glue if you are running jobs only one or two times a day.
Our company decided to go with AWS Glue because the tools we were using in the pipeline were AWS services only. AWS Glue easily interacts with AWS services. The jobs we were running were also not frequent.
You can use AWS Glue for learning purposes. AWS Glue is a paid service that doesn't come under the free trial of AWS. You have to pay a charge for using the solution. You can learn the code by directly testing the basic spark code in any local system. Once you are comfortable that your code is working fine, then you can run your code in AWS Glue jobs. You should test the code in the local system first and then run it in AWS Glue. Testing on AWS Glue will be costly.
If a person is familiar with Spark jobs or Python jobs, they can easily learn AWS Glue. A new user will take the same amount of time to learn AWS Glue as he takes to get comfortable with Spark. Since it provides the GUI and no code thing, users can directly start using AWS Glue without having to learn any code. It's much easier to learn to use the solution.
Overall, I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Mar 13, 2024
Flag as inappropriateSr Associate at Cognizant
A stable and easy-to-use solution that can be used for data analytics
Pros and Cons
- "AWS Glue is a stable and easy-to-use solution."
- "The solution’s stability could be improved."
What is our primary use case?
We use AWS Glue for data analytics.
What is most valuable?
AWS Glue is a stable and easy-to-use solution.
What needs improvement?
The solution’s stability could be improved.
For how long have I used the solution?
I have been using AWS Glue for the last three years.
What do I think about the stability of the solution?
I rate AWS Glue a seven out of ten for stability.
What do I think about the scalability of the solution?
AWS Glue is a very scalable solution, and you can connect multiple databases.
How are customer service and support?
AWS Glue's technical support is very good.
What's my experience with pricing, setup cost, and licensing?
AWS Glue is not a licensed solution. AWS Glue follows a pay-as-you-go model, wherein the cost of the data you use will be counted as a monthly bill.
What other advice do I have?
Currently, there are many ETL tools in the marketplace. Compared to other ETL tools, AWS Glue is a low-cost and serverless solution.
Overall, I rate AWS Glue a nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
ECM CONSULTANT/ARCHITECT/SOFTWARE DEVELOPER, DELUXE MN at a tech services company with 5,001-10,000 employees
Easy to perform ETL on multiple data sources, and easy to use after you learn it
Pros and Cons
- "Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs."
- "There is a learning curve to this tool."
What is our primary use case?
Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs. It is tailored and customized to use with SQL Server, which works very well in that platform.
If you want to use other data sources, the NoSQL concept makes it very easy, because missing data can be inserted as a new column or with null values.
That is not the case with many other tools. If you have on-premises tools, such as IIS, they don't manage missing data well.
What is most valuable?
If you want extremely high-performance functionality, you have to use both AWS Glue or Data Lake to store it in some temporary table. First, you will have to do some cleaning of the data, then if you need performance and speed, you have to use IIS with an IBM tool.
You have to use the right tool in the right places. For example, if you're using Oracle, you have got to use the Oracle tools. If you are using SQL, you have to use the SQL tools. There is no other tool that provides the performance.
It's context-based and project-based. In the projects that I have used, it has worked well.
What needs improvement?
There is a learning curve to this tool.
For how long have I used the solution?
I have been working with AWS Glue for four years.
Everything runs on AWS, even if it belongs to a third party. For example, if you have a Netflix subscription, it runs on AWS. We have other products or vendor subscriptions that run on AWS.
What do I think about the stability of the solution?
Undoubtedly, the cloud is built to handle failure. If you have your devices, and your resources configured correctly, you won't have any issues. I haven't seen a problem.
How are customer service and support?
You have to pay for their technical support, and depending on which level of subscription, you will receive a call within an hour; otherwise, you will have to wait for days.
Which solution did I use previously and why did I switch?
We also use Azure's Data Lake, and I worked with Tipco in the past, though it's been a few years since we used it.
You should select the best tool for the job or the projects that are currently being worked on. Tipco was heavily used in the previous project we worked on.
How was the initial setup?
It takes some time to learn, but once you get the hang of it, you'll be fine. It's like any other IT tool, where nobody is an expert or isn't an expert, it is just the way you are exposed to a tool.
You've chosen the right tool if you understand how the data works and what it needs to do. It's like going to Home Depot to get the right tool. You can purchase a set of tools, and it will work for you, but you will still need to purchase something else.
It's one of those tools in which someone must be an expert. After that, all tools and platforms become secondary.
What's my experience with pricing, setup cost, and licensing?
With AWS Glue, you pay more, but if you want to process the data, with speed and performance, you need the correct EC2 instances.
There is a price to pay. It doesn't come free.
Technical support is a paid service, and which subscription you have is dependent on that. You must pay one of them, and it ranges from $15,000 to $25,000 per year.
You sign up for a level of service, and it does not come for free. As previously stated, everything is based on performance, ELAs.
It was very expensive, at that time. If a company wants to pay the money, it makes my job easier. However, if the company or enterprise does not have the funds to pay for it, then it is a hassle.
What other advice do I have?
In that environment, there is a lot going on. There are some things that you can get for free, and there are some add-ons that you can develop or use that have been tested. It's all about convenience and service. You will get what you pay for if you pay for what you want.
I'm not a fan of any tools; it all depends on the organization I work for, where their data is, what they want to do with it, how quickly they want to get there, and what their budget is, and you work around that. For me, I would not choose one over the other, unless I know the details of the project.
I would rate AWS Glue a nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Cloud Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
MuleSoft Anypoint Platform
webMethods.io
AWS Database Migration Service
Palantir Foundry
Denodo
Matillion ETL
Fivetran
SnapLogic
Elastic Search
IBM App Connect
Zapier
IBM Cloud Pak for Integration
Talend Data integration
Jitterbit Harmony
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best choice for cloud integration: AWS Glue or Informatica Intelligent Cloud Services (IICS)?
- Is AWS Glue a difficult solution to use if you are a complete beginner?
- Is AWS Glue effective for AWS-related products only?
- Why would you choose AWS Glue over other tools?
- What are the most common use cases for AWS Glue?
- How does Talend Open Studio compare with AWS Glue?
- Does AWS Glue offer more flexibility than other ETL (Extract, Transform, Load) tools in terms of data loading?
- Oracle ICS vs ODI
- What is data lake storage?
- When evaluating Cloud Data Integration, what aspect do you think is the most important to look for?