Our company uses the solution for ETL data movement for our customers such as on-premises to cloud, cloud to cloud, and cloud to Snowflake. We also data catalog and schedule ETL jobs. We are able to monitor all jobs through AWS services.
Consultant - Business Operations at a computer software company with 10,001+ employees
Transformations are valuable for modifying complex data but rely too heavily on code
Pros and Cons
- "Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."
- "The setup and installation is a bit complex without advanced knowledge or training."
What is our primary use case?
What is most valuable?
Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues.
For example, it is easy to solve issues where volume is good but performance is degrading because you can split jobs into small chunks to more quickly handle data loads.
What needs improvement?
The setup and installation is a bit complex without advanced knowledge or training. It would be easier for an AWS expert or someone in DevOps.
Transformations need improvements to be more user friendly and rely less on coding like Matillion.
For how long have I used the solution?
I have been using the solution for three years.
Buyer's Guide
AWS Glue
January 2025
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
What do I think about the stability of the solution?
The solution's stability is decent and rates higher than other products. It works well with Snowflake, Azure, GCP, and AWS-supported products.
A hybrid situation may cause delays in performance.
What do I think about the scalability of the solution?
The solution is scalable.
How are customer service and support?
One of our customers used technical support and found them to be helpful.
How was the initial setup?
The setup and installation is a bit complex. Training or advance knowledge is required. Someone with AWS experience or a DevOps perspective would have fewer issues.
What about the implementation team?
We install the solution for customers and the timeline depends on the job.
A complete project will take a few days to a week for deployment. The number of jobs and components determines how many technicians are required for setup, installation, and deployment. Technician requirements can range from two to fifteen.
Deployment will take a couple of hours for a few announcement jobs that deploy from the CI/CD pipeline.
Which other solutions did I evaluate?
The solution is my second choice because I prefer Snowflake's capabilities.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Cloud Data Engineer at jems groupe
Great for serverless data transformations but more resources are needed for running Spark jobs
Pros and Cons
- "The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs."
- "The solution should offer features for streaming data in addition to batching data."
What is our primary use case?
Our company is creating data warehousing in the cloud. Our team includes four data engineers, two data ops, and two data administrators.
We use S3 to data lake or prepare data from two databases that are contained in MySQL and Oracle. For the migration, we use DMS.
Then, we use the solution to perform data transformation. For Oracle, we use Data Catalog and Data Crawler to create our catalog. Dev Endpoint is used to develop complex data transformations. We then migrate to Studio Notebook where we develop and schedule a complex Spark job.
Finally, we load the transformed data to Redshift so our data analyst team can visualize it with QuickSight.
What is most valuable?
The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs.
The solution works with many data sources and services in the cloud.
Glue Watch monitors our Spark jobs and immediately alerts us to issues so we are able to resolve them quickly.
What needs improvement?
The solution does not work with Spark DataFrame. We can use the solution's DynamicFrame for this function but transformations are expensive.
Not enough resources or services are available to run managed Spark jobs within the solution. We have reached out to Amazon many times regarding this issue.
The solution should offer features for streaming data in addition to batching data. We can use other products such as Scala or Python but prefer the features be available in the solution.
For how long have I used the solution?
I have been using the solution for one year.
What do I think about the stability of the solution?
The solution is stable with no issues.
What do I think about the scalability of the solution?
The solution is scalable.
How are customer service and support?
Technical support has been good and has handled any issues.
I rate technical support an eight out of ten.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
The solution is the best service in its category at this time. Based on project budget and use case, we use either the solution or EMR.
EMR is used for projects that require the latest version of Spark.
We use the solution for any other versions of Spark.
How was the initial setup?
I was not involved in the initial setup.
What's my experience with pricing, setup cost, and licensing?
The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive.
I use Studio Notebook because it is less expensive and jobs can be deleted or clustered to run in one day.
I rate pricing a four out of ten.
Which other solutions did I evaluate?
Our company only uses Amazon cloud because other cloud environments do not offer the same features.
The solution's Studio uses GCP which is easier than coding in Python Spark or Scala Spark.
Azure Data Factory's features do not compare to what the solution can do in the cloud.
What other advice do I have?
The solution is good for teams who do not want to worry about DevOps or who want to optimize cost by using the cloud.
I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Buyer's Guide
AWS Glue
January 2025
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
Data Engineer | Developer at Sakshath Technologies
Data integration solution that hosts metadata before the roll out of actual data
Pros and Cons
- "The key role for Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it."
- "The technical support for this solution could be improved. In future, we would like to connect more services like Athena or Kinesis to help control more loads of data."
What is our primary use case?
The key role of Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it.
What is most valuable?
The most valuable aspect of this solution is its automation and ability to sync data from the source to the solution phase.
What needs improvement?
The technical support for this solution could be improved. In future, we would like to connect more services like Athena or Kinesis to help control more loads of data.
For how long have I used the solution?
I have been using this solution for three years.
What do I think about the stability of the solution?
This is a stable solution. We have isolated the environment using containerization so that if anything goes wrong, we have higher levels of scalability and availability. To achieve this, we have configured multiple servers for testing, UAT and development.
What do I think about the scalability of the solution?
This is a scalable solution which is supported in our organization by Docker and Kubernetes. We have 2,000 users.
How are customer service and support?
We used a vendor with an internal IT team who provided us with architecture so that we could leverage those services and reach a solution. They have 50 people in the IT team, who continuously help us and monitor the things that we are working on.
How would you rate customer service and support?
Positive
How was the initial setup?
The initial setup was straightforward and took approximately one month. For deployment, we worked in two teams. One person handled all the scripting which we are developing for automation. Two other members handled the database and servers.
What's my experience with pricing, setup cost, and licensing?
This solution is affordable and there is an option to pay for the solution based on your usage.
What other advice do I have?
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Team Lead at a financial services firm with 5,001-10,000 employees
It can generate the code and has a good user interface, but it lacks Java support
Pros and Cons
- "Its user interface is quite good. You just need to choose some options to create a job in AWS Glue. The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you."
- "Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background."
What is our primary use case?
We are using it for file ingestion. Its primary role is to ingest a file from a vendor to a database.
What is most valuable?
Its user interface is quite good. You just need to choose some options to create a job in AWS Glue.
The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you.
What needs improvement?
Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background.
For how long have I used the solution?
I have been using AWS Glue for three months. We have just started using these services.
What do I think about the stability of the solution?
We have not been using AWS Glue for a long time. Till now, we haven't found any issues.
How are customer service and technical support?
Their technical support is good. We faced an issue with AWS Glue where we had to read a flat file. In a flat file, you only have spaces. You don't have commas or anything else. AWS Glue does not directly support flat files. You need to provide it with an expression to read the file, and that expression itself has some limitation of characters. We contacted the AWS support team. They had a call with us and first tried to understand our problem and then our use case. We gave them some sample files for our use case, and they come up with a solution for this limitation. There are some custom patterns in AWS Glue that can be used. Even though they took some time, they provided the solution. If you give a file today, they will take three to four days to get back.
How was the initial setup?
It was straightforward. A lot of documentation is available on the AWS website, which can guide you through the simple steps to set it up. Its setup was easy for me.
What's my experience with pricing, setup cost, and licensing?
It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us.
What other advice do I have?
We have just recently started to use this solution. We haven't used all features properly. It is good for the features we are using. We did not find any drawbacks or limitations so far. We are already getting whatever we want from it.
I would rate AWS Glue a seven out of ten. It needs improvements in terms of Java support and the turnaround time for our problems.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Net Full-Stack developer at a tech services company with 201-500 employees
A stable solution which can easily integrate with other AWS services
Pros and Cons
- "One of the best features of the solution is its ability to easily integrate with other AWS services."
- "Overall, I consider the technical support to be fine, although the response time could be faster in certain cases."
What is our primary use case?
We use the solution as a level of loading data from the source systems.
What is most valuable?
One of the best features of the solution is its ability to easily integrate with other AWS services. So, it's like we are using AWS as a main cloud provider. It's easy to put everything together. it is very flexible when it comes to compute features. We find the solution very useful when we make use of certain scripts. In some cases, it allows us to get rid of duplicates.
What needs improvement?
When there is a need to configure connections to different database sources in respect of the target, it would be good if it were easier to deal with roles. I am referring to the need to configure connections in a different target process, something which would require a certain time outlay for configuring VPC and checking that everything is okay, in respect of the creation of required roles. It would save time were this process to be made easier and more user friendly.
The technical support depends on the type of question, whether there is a need to understand additional inter-related information on multiple levels. Overall, I consider the technical support to be fine, although the response time could be faster in certain cases.
For how long have I used the solution?
I have been using AWS Glue for about two years.
What do I think about the stability of the solution?
The solution is stable.
How are customer service and support?
While the technical support can vary with the type of question, I feel that, overall, it is okay, although receipt of information could be faster in certain cases.
Which solution did I use previously and why did I switch?
We previously had experience with Database Migration Service at AWS. I recommend it over AWS Glue if one needs to do full database migration from on-premises deployment or in cases involving large volumes of data.
How was the initial setup?
I handled the installation on my own.
What's my experience with pricing, setup cost, and licensing?
I consider the the price to be standard-plus when it comes to optimal usage.
What other advice do I have?
I rate AWS Glue as an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at a tech services company with 201-500 employees
Great for ETL and batch processing
Pros and Cons
- "AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code."
- "If there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data."
What is our primary use case?
I mainly use AWS Glue for ETL purposes and batch processing of data.
What is most valuable?
AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code.
What needs improvement?
There are a couple of issues with AWS Glue. First, AWS Control randomly logs off, which disturbs coding. Second, if there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data. In the next release, AWS Glue should include more transformations with AWS Studio.
For how long have I used the solution?
I've been using AWS Glue for around eight months.
What do I think about the stability of the solution?
AWS Glue is stable.
How are customer service and support?
AWS' technical support responds within an hour on email.
How was the initial setup?
The initial setup was very easy, with only some minimal configuration. However, there is a drawback that once we file the name of a user, it can't be changed.
What's my experience with pricing, setup cost, and licensing?
AWS Glue is quite costly, especially for small organizations. The licensing fee is around $200 per year.
What other advice do I have?
Glue supports Spark, so if you have a team that's good with Spark, definitely go with Glue. I would rate AWS Glue as eight out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Cloud Solution Architect at a tech services company with 1-10 employees
Cost-effective and stable
Pros and Cons
- "I appreciate AWS Glue for its cost-effectiveness."
- "In terms of improvement, the performance of AWS Glue could be faster."
What is our primary use case?
AWS Glue is a versatile tool and we mostly use it for "lift and shift" server migrations.
What is most valuable?
I appreciate AWS Glue for its cost-effectiveness. The service provides a good balance between its capabilities and the cost associated with using it.
What needs improvement?
In terms of improvement, the performance of AWS Glue could be faster.
For how long have I used the solution?
I have been using AWS Glue for five years.
What do I think about the stability of the solution?
It is a stable product.
What do I think about the scalability of the solution?
It is fairly scalable.
How are customer service and support?
The partner program support is very good.
How was the initial setup?
The initial setup is not too complex. To deploy and maintain a data platform, a general data team of around four to five skilled individuals is typically required.
What's my experience with pricing, setup cost, and licensing?
For AWS Glue, there is no separate license fee. It is part of the AWS service, and you pay for its usage as part of your overall AWS bill.
What other advice do I have?
Overall, I would rate AWS Glue as an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Product Categories
Cloud Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
MuleSoft Anypoint Platform
webMethods.io
AWS Database Migration Service
Palantir Foundry
Denodo
Fivetran
Matillion ETL
SnapLogic
Elastic Search
IBM App Connect
Zapier
IBM Cloud Pak for Integration
Talend Data integration
Jitterbit Harmony
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best choice for cloud integration: AWS Glue or Informatica Intelligent Cloud Services (IICS)?
- Is AWS Glue a difficult solution to use if you are a complete beginner?
- Is AWS Glue effective for AWS-related products only?
- Why would you choose AWS Glue over other tools?
- What are the most common use cases for AWS Glue?
- How does Talend Open Studio compare with AWS Glue?
- Does AWS Glue offer more flexibility than other ETL (Extract, Transform, Load) tools in terms of data loading?
- Oracle ICS vs ODI
- When evaluating Cloud Data Integration, what aspect do you think is the most important to look for?
- What is data lake storage?