Try our new research platform with insights from 80,000+ expert users
Liana Iuhas - PeerSpot reviewer
CEO at Quark Technologies SRL
Real User
Top 20
Highly scalable, reliable, and beneficial pay-as-you-go pricing model
Pros and Cons
  • "AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software."
  • "The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment."

What is our primary use case?

My colleagues work with Spark, PySpark, and Scala as programming languages for writing complex aggregations. They have a repository in order to have a general view of all the sources and jobs on the platform and AWS Glue is very helpful.

What is most valuable?

AWS Glue is a good solution for developers, they have the ability to write code in different languages and other software.

What needs improvement?

The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment.

If business users want to run their own graphs they will not have the opportunity to use such features, such as running code inside AWS Glue in Spark, which will be complex for them.

For how long have I used the solution?

 I have been using AWS Glue for approximately four years.

Buyer's Guide
AWS Glue
November 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.

What do I think about the stability of the solution?

AWS Glue is a highly stable solution. We didn't have bugs in production. 

The solution works well with Spark, which is a good framework for large volumes of data. It operates very well.

I rate the stability of AWS Glue a ten out of ten.

What do I think about the scalability of the solution?

The scalability of AWS Glue is great. It was used for enterprise customers. We worked a lot with AWS Glue for International companies.

We have approximately 10 people using AWS Glue in my company.

How are customer service and support?

I have to use the support from AWS Glue. The response time could improve.

I rate the support from AWS Glue a nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup of AWS Glue is very simple.

What's my experience with pricing, setup cost, and licensing?

AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage.

Which other solutions did I evaluate?

If I can compare AWS Glue to other solutions, it has the advantage of the cloud, which assures availability and scalability, and the pay-as-you-go is beneficial. This is why many companies are moving from their traditional ETL tools to the cloud because the costs will be reduced dramatically.

What other advice do I have?

I would recommend this solution to others.

I rate AWS Glue a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Murilo Hallgren - PeerSpot reviewer
Data Engineer at a consultancy with self employed
Real User
Easy to use, simple configurations, and good documentation
Pros and Cons
  • "The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need."
  • "The price of the solution could improve."

What is our primary use case?

We are using AWS Glue for transforming firewalls synced to the Data Lake in the bronze zone. The ATL uses the solution to transform fields in the silver layer and later we will produce the gold zone. We are using the Delta Lake Architecture.

What is most valuable?

The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need.

What needs improvement?

The price of the solution could improve.

For how long have I used the solution?

I have been using AWS Glue for approximately one month.

What do I think about the stability of the solution?

The stability of AWS Glue is good.

What do I think about the scalability of the solution?

AWS Glue is highly scalable.

There are dozens of customers using this solution.

How are customer service and support?

I have not used the support from AWS Glue but I know their support is good.

Which solution did I use previously and why did I switch?

I have previously used Azure and Spark for testing.

How was the initial setup?

The initial setup of AWS Glue is simple. In other solutions, such as Spark, the configuration would take a lot longer.

What about the implementation team?

I did the deployment of AWS Glue myself with the AMS console. I am a data engineer.

What's my experience with pricing, setup cost, and licensing?

The overall cost of AWS Glue could be better. It cost approximately $1,000 a month. There is paid support available from AWS Glue.

If the cost of AWS Glue was 50 percent less then we would not move to another solution.

What other advice do I have?

I am moving to the EMR serverless or GCP solution.

I rate AWS Glue a nine out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
AWS Glue
November 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,763 professionals have used our research since 2012.
Sainagaraju Vaduka - PeerSpot reviewer
Data solution architect at a pharma/biotech company with 5,001-10,000 employees
Real User
Excellent scalability, with valuable features, and profitable return on investment
Pros and Cons
  • "The most valuable features currently are glue studio, jobs, and triggers."
  • "I would like to see stable libraries at the moment they are not there."

What is our primary use case?

We are primarily using it for batch crossing and transformations.

How has it helped my organization?

We have a large set of data and we are doing some transformations and identification. We are cleaning the data and transformations. Then we are putting the data into the destination table. So it is very comfortable.

What is most valuable?

The most valuable features currently are glue studio, jobs, and triggers.

What needs improvement?

I would like to see stable libraries at the moment they are not there.

For how long have I used the solution?

I have been using AWS Glue for the past five years.

What do I think about the stability of the solution?

The stability I would consider to be an extensible Apache Spark.

What do I think about the scalability of the solution?

The scalability is good and we have three hundred projects we are working with.

Which solution did I use previously and why did I switch?

Previously, we used EMR, Informatica, Data Pipeline, and Azure Data Factory.

How was the initial setup?

The initial setup is straightforward.

What about the implementation team?

We did our deployment in-house with the CI/CD integrations like GitHub and deployed the code on Glue. 

What was our ROI?

We are seeing a very good return on our investment.

What's my experience with pricing, setup cost, and licensing?

The current cost is around forty to fifty thousand a month.

What other advice do I have?

I would definitely recommend using AWS Glue for batching procedures. I would rate AWS Glue an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Jorge Encinas - PeerSpot reviewer
Sr. Data Engineer at a tech services company with 5,001-10,000 employees
MSP
An event-driven, serverless computing platform that is flexible, powerful, and customizable
Pros and Cons
  • "I like that it's flexible, powerful, and allows you to write your own queries and scripts to get the needed transformations."
  • "It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."

What is our primary use case?

We used AWS Glue to build our data warehouse. We built prototypes to go all the way all across their warehouse platforms. From AWS Glue to Spreadsheets and then QuickSight, that's how we're building their warehouse.

What is most valuable?

I like that it's flexible, powerful, and allows you to write your own queries and scripts to get the needed transformations.

What needs improvement?

It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do.

For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do.

It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.

For how long have I used the solution?

I have been using AWS Glue since last year.

What other advice do I have?

On a scale from one to ten, I would give AWS Glue a nine.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Engineer at GISbiz
Real User
Top 5
It efficiently collects and catalogs the data but needs to improve performance
Pros and Cons
  • "It is a stable and scalable solution."
  • "It fails to handle massive databases acquired from various sources."

What is our primary use case?

We use the solution to collect customers' data containing multiple files and convert it into a common database. Later, we send the database for SQL injection.

What is most valuable?

The solution's most valuable feature is its ability to efficiently collect and catalog the data in the warehouse.

What needs improvement?

They should improve the solution's performance in case of large amounts of data. Currently, AWS fails to handle massive databases acquired from various sources. Also, it is challenging to queue the data or use a standard code in AWS environment. We need to install a third-party tool to tackle the issue. We need to use another tool to convert the data as well. Thus, we are using multiple tools to handle the database. They should work on this particular area.

For how long have I used the solution?

We have been using the solution for one year.

What do I think about the stability of the solution?

It is a stable solution. I rate its stability as an eight.

What do I think about the scalability of the solution?

I rate the solution's scalability as a six.

How was the initial setup?

The initial setup is a bit complex, and I rate the process as a six. We have to install multiple third-party tools whenever we update the security patches or renew the solution. Thus, the deployment process is complicated.

What other advice do I have?

If you already have AWS environment, you can opt for AWS Glue for its ETL operations feature; if you want to process multiple operations, such as creating a table or catalog, or for machine learning purposes better to go for other database tools.

I rate the solution as a seven.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer:
PeerSpot user
Manager at a construction company with 51-200 employees
Real User
Top 20
Excellent capabilities, proven stability, however would like a more robust interface on the no-code side
Pros and Cons
  • "We have found it beneficial when moving data from one source to another."
  • "I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."

What is our primary use case?

Our primary use case is ETL.

How has it helped my organization?

We have found it beneficial when moving data from one source to another.

What is most valuable?

The most valuable feature In terms of convenience, the drag-and-drop is really nice. The no-code interface, is really nice, being able to drag in my connectors. And then the nice thing, as well, is that it generates the framework, the wireframe of your code, so then you can just input whatever Spark or Python you want to input to make any further transformations.

What needs improvement?

I would like to see in general, documentation, on the limitations on which loads you can actually pull in when you are running Python. The additional Python Jupyter Notebook now has been nice. But yeah, generally speaking, you can not import every LOB. You can import branders now and you can use photos, but you can not import a lot of the other sorts of statistical-based loads. That is an issue currently. I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells.

For how long have I used the solution?

I have been using AWS Glue for the past three years.

What do I think about the stability of the solution?

The stability is excellent.

What do I think about the scalability of the solution?

There is good scalability you can set up your minimum and maximum users and you are ready to implement.

How was the initial setup?

The initial setup is straightforward If you are just doing a file format conversion, then it is very simple, but if you want to do a little bit more robust sort of transformations, like inserting transformations or you want to do transformations on multiple delimiters, then there is a bit of learning curve. The deployment time is literally minutes.

What other advice do I have?

I would rate AWS Glue a seven on a scale of one to ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Consultant - Business Operations at a computer software company with 10,001+ employees
Real User
Transformations are valuable for modifying complex data but rely too heavily on code
Pros and Cons
  • "Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues."
  • "The setup and installation is a bit complex without advanced knowledge or training."

What is our primary use case?

Our company uses the solution for ETL data movement for our customers such as on-premises to cloud, cloud to cloud, and cloud to Snowflake. We also data catalog and schedule ETL jobs. We are able to monitor all jobs through AWS services. 

What is most valuable?

Transformations are valuable because you can modify or override complex data logic from an open source or Spark to solve issues. 

For example, it is easy to solve issues where volume is good but performance is degrading because you can split jobs into small chunks to more quickly handle data loads. 

What needs improvement?

The setup and installation is a bit complex without advanced knowledge or training. It would be easier for an AWS expert or someone in DevOps.

Transformations need improvements to be more user friendly and rely less on coding like Matillion. 

For how long have I used the solution?

I have been using the solution for three years. 

What do I think about the stability of the solution?

The solution's stability is decent and rates higher than other products. It works well with Snowflake, Azure, GCP, and AWS-supported products. 

A hybrid situation may cause delays in performance. 

What do I think about the scalability of the solution?

The solution is scalable. 

How are customer service and support?

One of our customers used technical support and found them to be helpful. 

How was the initial setup?

The setup and installation is a bit complex. Training or advance knowledge is required. Someone with AWS experience or a DevOps perspective would have fewer issues. 

What about the implementation team?

We install the solution for customers and the timeline depends on the job. 

A complete project will take a few days to a week for deployment. The number of jobs and components determines how many technicians are required for setup, installation, and deployment. Technician requirements can range from two to fifteen. 

Deployment will take a couple of hours for a few announcement jobs that deploy from the CI/CD pipeline.

Which other solutions did I evaluate?

The solution is my second choice because I prefer Snowflake's capabilities. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Cloud Data Engineer at jems groupe
Real User
Great for serverless data transformations but more resources are needed for running Spark jobs
Pros and Cons
  • "The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs."
  • "The solution should offer features for streaming data in addition to batching data."

What is our primary use case?

Our company is creating data warehousing in the cloud. Our team includes four data engineers, two data ops, and two data administrators. 

We use S3 to data lake or prepare data from two databases that are contained in MySQL and Oracle. For the migration, we use DMS.

Then, we use the solution to perform data transformation. For Oracle, we use Data Catalog and Data Crawler to create our catalog. Dev Endpoint is used to develop complex data transformations. We then migrate to Studio Notebook where we develop and schedule a complex Spark job. 

Finally, we load the transformed data to Redshift so our data analyst team can visualize it with QuickSight. 

What is most valuable?

The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs. 

The solution works with many data sources and services in the cloud. 

Glue Watch monitors our Spark jobs and immediately alerts us to issues so we are able to resolve them quickly. 

What needs improvement?

The solution does not work with Spark DataFrame. We can use the solution's DynamicFrame for this function but transformations are expensive. 

Not enough resources or services are available to run managed Spark jobs within the solution. We have reached out to Amazon many times regarding this issue. 

The solution should offer features for streaming data in addition to batching data. We can use other products such as Scala or Python but prefer the features be available in the solution. 

For how long have I used the solution?

I have been using the solution for one year. 

What do I think about the stability of the solution?

The solution is stable with no issues. 

What do I think about the scalability of the solution?

The solution is scalable. 

How are customer service and support?

Technical support has been good and has handled any issues. 

I rate technical support an eight out of ten. 

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

The solution is the best service in its category at this time. Based on project budget and use case, we use either the solution or EMR.

EMR is used for projects that require the latest version of Spark. 

We use the solution for any other versions of Spark. 

How was the initial setup?

I was not involved in the initial setup.

What's my experience with pricing, setup cost, and licensing?

The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive. 

I use Studio Notebook because it is less expensive and jobs can be deleted or clustered to run in one day. 

I rate pricing a four out of ten. 

Which other solutions did I evaluate?

Our company only uses Amazon cloud because other cloud environments do not offer the same features. 

The solution's Studio uses GCP which is easier than coding in Python Spark or Scala Spark. 

Azure Data Factory's features do not compare to what the solution can do in the cloud. 

What other advice do I have?

The solution is good for teams who do not want to worry about DevOps or who want to optimize cost by using the cloud. 

I rate the solution a seven out of ten. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Product Categories
Cloud Data Integration
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros sharing their opinions.