We use the product for scheduling and defining workflows. It helps us extensively to manage complex workflows within Cloudera's ecosystem, particularly for handling and processing data.
Director of Business Intelligence at a consultancy with self employed
Real User
2024-06-20T14:04:20Z
Jun 20, 2024
We use the tool to schedule data pipelines. We also use Apache Airflow to orchestrate dbt, another data processing tool. Airflow helps manage dbt processes, which, in our case, load data from our data lake.
I have used Apache Airflow for various purposes, such as orchestrating Spark jobs, EMR clusters, Glue jobs, and submitting jobs within the DCP data flow on Azure Databricks including ad hoc queries. For instance, if there's a need to execute queries on Redshift or other databases.
We utilize Apache Airflow for two primary purposes. Firstly, it serves as the tool for ingesting data from the source system application into our data warehouse. Secondly, it plays a crucial role in our ETL pipeline. After extracting data, it facilitates the transformation process and subsequently loads the transformed data into the designated target tables.
We use Apache Airflow for the automation and orchestration of model deployment, training, and feature engineering steps. It is a model lifecycle management tool.
Apache Airflow is like a freeway. Just as a freeway allows cars to travel quickly and efficiently from one point to another, Apache Airflow allows data engineers to orchestrate their workflows in a similarly efficient way. There are a lot of scheduling tools in the market, but Apache Airflow has taken over everything. With the help of airflow operators, any task required for day-to-day data engineering work becomes possible. It manages the entire lifecycle of data engineering workflows.
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
2022-12-01T16:03:48Z
Dec 1, 2022
Our primary use case for the solution is setting up workflows and processes applied everywhere because most industries are based on workflows and processes. We've deployed it for all kinds of workflows within the organization.
Senior Data Analytics at a media company with 1,001-5,000 employees
Real User
2022-08-31T14:14:49Z
Aug 31, 2022
Our primary use case for this solution is scheduling task rates. We capture the data from the SQL Server location and migrate it to the central data warehouse.
Currently, I am a lead data scientist. Our primary use cases for Apache Airflow are for all orchestrations, from the basic big data lake to machine learning predictions. It is used for all the MLS processes. It is also used for some ELT, to transform, load, and export all big data from restricted, unrestricted, and all phase processes.
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees
Real User
2021-10-05T14:49:32Z
Oct 5, 2021
I've used it at past companies to build a data warehouse for analytics (populating redshift/snowflake).
My current company is using it for similar purposes but more to pull data from data sources across the company, join them into a central data repository (we're currently using postgres), and build datasets with this data. Having this central data repository will help serve other use cases for us in the future.
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees
Real User
2021-03-26T23:33:18Z
Mar 26, 2021
I'm a data engineer. In the past, I used Airflow for building data pipelines and to populate data warehouses. With my current company, it's a data product or datasets that we sell to biopharma companies. We are using those pipelines to generate those datasets.
There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy. We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions. For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises.
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees
Real User
2021-01-15T22:07:12Z
Jan 15, 2021
We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees
Real User
2020-12-22T20:05:08Z
Dec 22, 2020
We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.
We are a technology, media, and entertainment-technology company. We are using Apache Airflow for architecting our media workflows. We are using it for two major workflows. We have had it set up for some time on our own cloud. Recently, we migrated the setup to AWS.
Apache Airflow is an open-source workflow management system (WMS) that is primarily used to programmatically author, orchestrate, schedule, and monitor data pipelines as well as workflows. The solution makes it possible for you to manage your data pipelines by authoring workflows as directed acyclic graphs (DAGs) of tasks. By using Apache Airflow, you can orchestrate data pipelines over object stores and data warehouses, run workflows that are not data-related, and can also create and manage...
We use the product for scheduling and defining workflows. It helps us extensively to manage complex workflows within Cloudera's ecosystem, particularly for handling and processing data.
We use the tool to schedule data pipelines. We also use Apache Airflow to orchestrate dbt, another data processing tool. Airflow helps manage dbt processes, which, in our case, load data from our data lake.
I have used Apache Airflow for various purposes, such as orchestrating Spark jobs, EMR clusters, Glue jobs, and submitting jobs within the DCP data flow on Azure Databricks including ad hoc queries. For instance, if there's a need to execute queries on Redshift or other databases.
The main use case is orchestration. We use it to schedule our jobs.
We use Apache Airflow for the orchestration of data pipelines.
Our use cases are a bit complex, but primarily for data extraction, transformation, and loading (ETL) tasks.
We utilize Apache Airflow for two primary purposes. Firstly, it serves as the tool for ingesting data from the source system application into our data warehouse. Secondly, it plays a crucial role in our ETL pipeline. After extracting data, it facilitates the transformation process and subsequently loads the transformed data into the designated target tables.
We use Apache Airflow for data orchestration.
We use Apache Airflow for the automation and orchestration of model deployment, training, and feature engineering steps. It is a model lifecycle management tool.
Apache Airflow is like a freeway. Just as a freeway allows cars to travel quickly and efficiently from one point to another, Apache Airflow allows data engineers to orchestrate their workflows in a similarly efficient way. There are a lot of scheduling tools in the market, but Apache Airflow has taken over everything. With the help of airflow operators, any task required for day-to-day data engineering work becomes possible. It manages the entire lifecycle of data engineering workflows.
We use Apache Airflow to send our data to a third-party system.
Apache Airflow is utilized for automating data engineering tasks. When creating a sequence of tasks, Airflow can assist in automating them.
We use this solution to monitor BD tasks.
Our primary use case for the solution is setting up workflows and processes applied everywhere because most industries are based on workflows and processes. We've deployed it for all kinds of workflows within the organization.
The primary use case is the orchestration and automation of ELT/ETL data pipelines.
Apache Airflow is great in this respect and there are scheduling options to make it fully automated based on the used case.
Our primary use case for this solution is scheduling task rates. We capture the data from the SQL Server location and migrate it to the central data warehouse.
Currently, I am a lead data scientist. Our primary use cases for Apache Airflow are for all orchestrations, from the basic big data lake to machine learning predictions. It is used for all the MLS processes. It is also used for some ELT, to transform, load, and export all big data from restricted, unrestricted, and all phase processes.
I've used it at past companies to build a data warehouse for analytics (populating redshift/snowflake).
My current company is using it for similar purposes but more to pull data from data sources across the company, join them into a central data repository (we're currently using postgres), and build datasets with this data. Having this central data repository will help serve other use cases for us in the future.
I'm a data engineer. In the past, I used Airflow for building data pipelines and to populate data warehouses. With my current company, it's a data product or datasets that we sell to biopharma companies. We are using those pipelines to generate those datasets.
There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy. We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions. For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises.
We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.
Our primary use case is to integrate with SLAs.
We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.
We are a technology, media, and entertainment-technology company. We are using Apache Airflow for architecting our media workflows. We are using it for two major workflows. We have had it set up for some time on our own cloud. Recently, we migrated the setup to AWS.
The primary use case for this solution is to automate ETL process for datawarehouse.