What is our primary use case?
Our company is creating data warehousing in the cloud. Our team includes four data engineers, two data ops, and two data administrators.
We use S3 to data lake or prepare data from two databases that are contained in MySQL and Oracle. For the migration, we use DMS.
Then, we use the solution to perform data transformation. For Oracle, we use Data Catalog and Data Crawler to create our catalog. Dev Endpoint is used to develop complex data transformations. We then migrate to Studio Notebook where we develop and schedule a complex Spark job.
Finally, we load the transformed data to Redshift so our data analyst team can visualize it with QuickSight.
What is most valuable?
The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs.
The solution works with many data sources and services in the cloud.
Glue Watch monitors our Spark jobs and immediately alerts us to issues so we are able to resolve them quickly.
What needs improvement?
The solution does not work with Spark DataFrame. We can use the solution's DynamicFrame for this function but transformations are expensive.
Not enough resources or services are available to run managed Spark jobs within the solution. We have reached out to Amazon many times regarding this issue.
The solution should offer features for streaming data in addition to batching data. We can use other products such as Scala or Python but prefer the features be available in the solution.
For how long have I used the solution?
I have been using the solution for one year.
Buyer's Guide
AWS Glue
March 2025
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
842,651 professionals have used our research since 2012.
What do I think about the stability of the solution?
The solution is stable with no issues.
What do I think about the scalability of the solution?
The solution is scalable.
How are customer service and support?
Technical support has been good and has handled any issues.
I rate technical support an eight out of ten.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
The solution is the best service in its category at this time. Based on project budget and use case, we use either the solution or EMR.
EMR is used for projects that require the latest version of Spark.
We use the solution for any other versions of Spark.
How was the initial setup?
I was not involved in the initial setup.
What's my experience with pricing, setup cost, and licensing?
The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive.
I use Studio Notebook because it is less expensive and jobs can be deleted or clustered to run in one day.
I rate pricing a four out of ten.
Which other solutions did I evaluate?
Our company only uses Amazon cloud because other cloud environments do not offer the same features.
The solution's Studio uses GCP which is easier than coding in Python Spark or Scala Spark.
Azure Data Factory's features do not compare to what the solution can do in the cloud.
What other advice do I have?
The solution is good for teams who do not want to worry about DevOps or who want to optimize cost by using the cloud.
I rate the solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner