What is your primary use case for AWS Glue?

AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring. AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates...

Download AWS Glue Report Read more

Related Q&As

Dec 15, 2022

Is AWS Glue a difficult solution to use if you are a complete beginner?

Dec 15, 2022

Is AWS Glue effective for AWS-related products only?

score 0 · Answer 1 · 2025-01-09T16:39:00Z

reviewer2322996

Data Architect at a financial services firm with 10,001+ employees

Real User

Top 5

Jan 9, 2025

My use case is the usual: data ingestion, curation, and transformation.

Nivas Srinivasan Principal Consultant at a retailer with 1,001-5,000 employees · Answer 2 · 2025-01-02T11:21:00Z

I am using this solution for ETL programming, which involves extraction, transformation, and loading. We have a data warehouse, Redshift. We receive some data files from outside, save those files into S3, and then call our Glue jobs to process those files. We perform transformations, including various aggregations and grouping of data. Sometimes, we also perform data cleansing, refining the data in a proper way so that customers can generate reports against it. For different models, we transform the data and perform a bulk load or an ETL incremental load into the rest of the data warehouse. We have also done some tie-outs, like comparing two different datasets, ensuring they tie out as expected, and creating exception reports. We generate mismatches in the data and perform an unload process to create exception reports out of Glue. This process involves creating a report against the table if there is a data mismatch. We extract the data and put it into an S3 dump for TPUs to use for their exceptions. If a particular data file has an issue, we identify the problem and address it to improve the quality of the data.

Andre Luis Tiago Soares Developer-Data Engineer at Collab · Answer 3 · 2024-11-19T17:26:05Z

I use AWS Glue primarily for ETL jobs. In my organization, it's just me using it as we are a small company. The IT team consists of four people, and I am the data engineering specialist.

Nitish Kumar Mahatha Site Reliability Engineer (AWS) at KFin Technologies Ltd · Answer 4 · 2024-10-29T07:19:00Z

We use AWS Glue for handling data-intensive tasks such as data lake creation, log analysis, machine learning pipelines, data warehouse population for analytics, and real-time data integration with AWS Lambda.

Anuj Saraswat AWS DATA ENGINEER at Coforge Growth Agency · Answer 5 · 2024-10-21T10:50:00Z

I have been working as a data engineer, where dealing with the ETL process is essential. We are using AWS Glue as a primary ETL tool to serve our organization's needs. I have implemented several Glue jobs still in production.

score 0 · Answer 6 · 2024-09-06T15:45:25Z

AWS Glue is essentially used for data engineering ETL jobs to extract, transform, and load data. We use it to clean data. You have multiple data sources from your application that are not so clean. You have this data and may want to delete certain columns or fill in certain data in an Excel sheet. That's where the extract part comes in. Then, you transform, drop, or make the data uniform and load it to your destination like a data warehouse.

Rajesh Ramadoss Technology Specialist at Cognizant · Answer 7 · 2024-08-07T12:20:00Z

We have a lot of microservices written in Glue, which are responsible for triggering based on certain events. The solution will be responsible for another container to containerize them and run over the cloud. We use the solution for different purposes, including data computing.

Muthuvel Sivaraman AVP at a manufacturing company with 10,001+ employees · Answer 8 · 2024-06-21T06:35:50Z

I use the solution in my company for building datalake and for a variety of data sources like Oracle, MongoDB, and other multiple data sources, like SQL server, and AWS S3 buckets as a datalake storage tool, and then further we use AWS Glue to process it and move to AWS' search engine which will be like a lakehouse solution.

Senthil Kumar Veerasamy Senior Manager, Analytics at Azendian · Answer 9 · 2024-01-18T08:38:03Z

We are implementing a solution in AWS for one of our customers. It is more of a data analytics solution. We wanted to process data from different sources and put it into a central repository that can be used for any analysis or predictive modeling.

ParamShah Engineering Manager at Milestone Technologies · Answer 10 · 2024-01-16T09:21:00Z

We use the solution to build tables on CSV data. We get data from some different sources, pull it in S3, and then create tables using Glue to get some metrics out of that data.

score 0 · Answer 11 · 2023-10-02T10:45:05Z

reviewer907167

Cloud Solution Architect at a tech services company with 1-10 employees

Real User

Top 10

Oct 2, 2023

AWS Glue is a versatile tool and we mostly use it for "lift and shift" server migrations.

Neelabh Sharma Data Engineer at Scania · Answer 12 · 2023-09-11T14:24:31Z

Neelabh Sharma

Data Engineer at Scania

Real User

Top 10

Sep 11, 2023

We use AWS Glue for ETL batch processing purposes.

Mbaye Babacar Gueye Owner at a tech services company with 51-200 employees · Answer 13 · 2023-09-01T19:46:13Z

One common use case is migrating data from one system to another. So, mostly migrating data and data engineering, getting real-time or near-real-time data using Lambda functions and migrating big data from on-prem to the cloud for historical data before starting a project.

RajKumar23 Sr Associate at Cognizant · Answer 14 · 2023-08-03T09:08:10Z

RajKumar23

Sr Associate at Cognizant

Real User

Top 5

Aug 3, 2023

We use AWS Glue for data analytics.

AmitMataghare Associate Director at PricewaterhouseCoopers · Answer 15 · 2023-08-03T04:25:26Z

In my company, we use AWS Glue to build data engineering pipelines, so we ingest data from either S3 or other sources and put it back into Redshift, where we have a data lake or data warehouse.

score 0 · Answer 16 · 2023-07-31T17:41:50Z

I had the source data, which was unstructured and non-fixable, and my responsibility was to convert it into structured data. For this task, I used PySpark as the programming language. With Python, I implemented the creation of a data frame using Glue jobs. Since Glue jobs are a serverless mechanism, I deployed my code into the Glue job, and that's how I got the job done.

Shifa Shah Data engineer at nust · Answer 17 · 2023-05-24T12:30:04Z

I constructed a straightforward ETL job using AWS Glue, wherein I had to load a couple of files in the Teradata database.

score 0 · Answer 18 · 2023-04-26T09:07:00Z

Our primary use cases include pulling data from multiple sources and loading it into the central capacity for data transformation, integration, and processing.

reviewer1526064 Associate Consultant at Tata Consultancy · Answer 19 · 2023-04-20T10:59:00Z

Currently, we are utilizing AWS Glue for various ETL workloads, specifically in the life sciences domain. Our primary objective is to acquire data from various sources. Then, we store it in Redshift. This is where the complete use case of AWS Glue comes into the picture.

score 0 · Answer 20 · 2023-03-09T22:01:42Z

The primary use cases of AWS Glue in our organization are for implementing ETL processes and for data flow.

Syed Zakaulla Project Manager at Softway · Answer 21 · 2023-02-13T20:14:36Z

We're using GPU 0.2 in ten verticals and wanted to use AWS Glue only for one purpose: to optimize Amazon Redshift. We have millions of data that we have to back up. Previously, we did it once every six months, but the client data have been very interactive, and we need spontaneous back and forth of data communication in real-time. In one second, we have almost one million records that come and go continuously. The client wanted to keep all data because they're using it for analytics and wanted to back up the data every second without delay. We tried to optimize Amazon Redshift and found out about AWS Glue, which comes with massive costs, but the client is willing to pay.

reviewer2070318 Manager at a construction company with 51-200 employees · Answer 22 · 2023-01-19T18:04:06Z

reviewer2070318

Manager at a construction company with 51-200 employees

Real User

Jan 19, 2023

Our primary use case is ETL.

score 0 · Answer 23 · 2022-11-25T20:48:52Z

We use the solution to do the usual type of transformations that before required ETL. It's mostly transformation-type purposes that we have, including transforming data from source to target. Also, we are replacing the usual ETLs with Glue, for example.

score 0 · Answer 24 · 2022-10-28T15:16:30Z

Sainagaraju Vaduka

Data solution architect at a pharma/biotech company with 5,001-10,000 employees

Real User

Oct 28, 2022

We are primarily using it for batch crossing and transformations.

Murilo Hallgren Data Engineer at a consultancy with self employed · Answer 25 · 2022-10-17T14:45:15Z

We are using AWS Glue for transforming firewalls synced to the Data Lake in the bronze zone. The ATL uses the solution to transform fields in the silver layer and later we will produce the gold zone. We are using the Delta Lake Architecture.

Liana Iuhas CEO at Quark Technologies SRL · Answer 26 · 2022-09-01T11:06:20Z

My colleagues work with Spark, PySpark, and Scala as programming languages for writing complex aggregations. They have a repository in order to have a general view of all the sources and jobs on the platform and AWS Glue is very helpful.

Sashi Dhar Operations executive at Wipro Infotech · Answer 27 · 2022-07-18T07:42:56Z

We are using it for day-to-day ETL jobs. It is being used to transfer data from Teradata to the cloud. We are using its latest version.

Diksha Hirole Data Engineer at BlazeClan Technologies · Answer 28 · 2022-07-01T09:23:35Z

Diksha Hirole

Data Engineer at BlazeClan Technologies

MSP

Top 10

Jul 1, 2022

I mainly use AWS Glue for ETL purposes and batch processing of data.

Suraj Sachdeva Data Engineer | Developer at Sakshath Technologies · Answer 29 · 2022-06-21T13:28:38Z

The key role of Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it.

score 0 · Answer 30 · 2022-06-16T15:42:50Z

We used AWS Glue to build our data warehouse. We built prototypes to go all the way all across their warehouse platforms. From AWS Glue to Spreadsheets and then QuickSight, that's how we're building their warehouse.

score 0 · Answer 31 · 2021-12-02T16:14:50Z

Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs. It is tailored and customized to use with SQL Server, which works very well in that platform. If you want to use other data sources, the NoSQL concept makes it very easy, because missing data can be inserted as a new column or with null values. That is not the case with many other tools. If you have on-premises tools, such as IIS, they don't manage missing data well.

score 0 · Answer 32 · 2021-10-21T11:50:32Z

reviewer1688958

Net Full-Stack developer at a tech services company with 201-500 employees

Real User

Oct 21, 2021

We use the solution as a level of loading data from the source systems.

Bruno Ramos CEO and Founder at HartB · Answer 33 · 2020-12-17T18:52:47Z

It is a good tool for us. All the implementation in our company is done with AWS Glue. We use it to execute all the ETL processes. We have collected more or less five terabytes of information from the internet by now. We process all this data in our cloud platform and normalize the information. We first put it on a data lake that we have here on the AWS tool. After that, we use AWS Glue to transform all the information collected around the internet and put the normalized information into a data warehouse.

score 0 · Answer 34 · 2020-10-14T06:36:55Z

We are using it for file ingestion. Its primary role is to ingest a file from a vendor to a database.

score 0 · Answer 35 · 2020-09-03T07:49:46Z

reviewer1412730

Senior Software Engineer at a consumer goods company with 10,001+ employees

Real User

Sep 3, 2020

We are collecting some TV audience data and analyzing it.