What needs improvement with Apache Airflow?

Apache Airflow is an open-source workflow management system (WMS) that is primarily used to programmatically author, orchestrate, schedule, and monitor data pipelines as well as workflows. The solution makes it possible for you to manage your data pipelines by authoring workflows as directed acyclic graphs (DAGs) of tasks. By using Apache Airflow, you can orchestrate data pipelines over object stores and data warehouses, run workflows that are not data-related, and can also create and manage...

Download Apache Airflow Report Read more

Related Q&As

Nov 7, 2021

Which would you choose - Camunda Platform or Apache Airflow?

Mar 22, 2024

What do you like most about Apache Airflow?

Madhan Potluri Head of Data at a energy/utilities company with 51-200 employees · Answer 1 · 2025-04-23T14:24:55Z

Currently, Apache Airflow is closely coupled, limiting the ability to link to outside channels by ourselves. The existing options include integration with Teams, Slack, and email. If we desire to add custom messengers or a rest API, those options are unavailable.

Kemal Duman Team Lead, Data Engineering at Nesine.com · Answer 2 · 2024-12-25T10:17:00Z

Running frequent jobs, such as every minute or five minutes, is not appropriate for Airflow. It is not suitable for real-time ETL tasks. Stream jobs are not its strength.

Jitesh Rathod Sr. Team Lead - IT at InfoStretch · Answer 3 · 2024-12-24T11:09:00Z

There is a minor issue with the manual work in Airflow, as everyday activities are managed manually. There is no dashboard for us to check all the Directed Acyclic Graphs (DAGs); a dashboard would help us analyze the work better.

Sanket Suhagiya Senior Data Engineer at LTIMindtree · Answer 4 · 2024-09-30T09:01:00Z

The UI is a little bit outdated according to modern standards. The UI can be enhanced to support some modern standards. Maybe small things such as dark mode and some proper aesthetics can be implemented.

Bernd Stroehle. Enterprise Architect at kosakya · Answer 5 · 2024-09-05T09:27:00Z

Apache Airflow improved workflow efficiency, but we had to find solutions for large workflows. For instance, a monthly workflow with 1200 jobs had to be split into three to four pieces as it struggled with large job numbers. Loading a workflow with 500 jobs could take 10 minutes, which wasn't acceptable. The most important feature Apache Airflow lacks is support for external configuration files. All classical schedulers like Control-M or Automic allow you to load workflow definitions from YAML, XML, or JSON files, but the tool requires you to write Python programs. Airflow only supports external configuration for variables, not for workflows. To address this, I created a YAML configuration file that I converted into Python programs, but this functionality is missing from Apache Airflow itself. All of its competitors have this feature. In Control-M, Automic, and IBM's scheduler, you can load workflows from XML, JSON, or YAML files.

Prathamesh D Marathe Senior Software Engineer at Annalect India · Answer 6 · 2024-05-27T07:09:48Z

I have not come across any challenges associated with the product. The scripts that we use in our company refer to the package dependencies in Python, but those are lost when Apache Airflow starts running for a particular test. The in-built package dependencies in Python have some issues in Apache Airflow, making it an area that needs improvement.

ManojKumar43 Big Data Engineer at BigTapp Analytics · Answer 7 · 2024-03-22T15:11:00Z

ManojKumar43

Big Data Engineer at BigTapp Analytics

Real User

Top 10

Mar 22, 2024

Airflow should support the dynamic drag creation.

UjjwalGupta Module Lead at Mphasis · Answer 8 · 2024-03-09T13:10:34Z

The solution lacks certain features. We cannot run real-time jobs in the solution. It supports only batch jobs. If we are using ETL pipelines, it can either be a batch job or a real-time job. Real-time jobs run continuously. They are not scheduled. Apache Airflow is for scheduled jobs, not real-time jobs. It would be a good improvement if the solution could run real-time jobs. Many connectors are available in the product, but some are still missing. We have to build a custom connector if it is not available. The solution must have more in-built connectors for improved functionality.

Mikalai Surta Head of Big Data Department at IBA Group · Answer 9 · 2024-02-22T08:57:39Z

Mikalai Surta

Head of Big Data Department at IBA Group

MSP

Top 5

Feb 22, 2024

Apache Airflow should have better integration with cloud platforms.

Fr Br Product Owner at La Poste S.A. · Answer 10 · 2024-01-15T09:28:17Z

The automation capabilities could be improved; a visual workflow designer and a graphical tool to reduce coding would be very helpful. But for now, it's sufficient for our simple workflows.

Punit_Shah Director at Smart Analytica · Answer 11 · 2023-12-22T14:15:00Z

Enhancements become necessary when scaling it up from a few thousand workflows to a more extensive scale of five thousand or ten thousand workflows. At this point, resource management and threading, become critical aspects. This involves optimizing the utilization of resources and threading within the Kubernetes VM ecosystem.

Luiz Cesar Gosi Senior Analytics Engineer at TalkDesk · Answer 12 · 2023-10-19T14:21:11Z

I have some issues with the solution's communication. The solution uses the same database or data set. Sometimes, we consume the same data and send it to a different place when doing a different DAG. When using the UI, I want to see that we use the same data set more than once.

SabinaZeynalova Data Engineer Team Lead at Unibank · Answer 13 · 2023-09-22T12:41:00Z

Airflow is a pipeline for transferring code by clients, but for experimental model experiments, Apache Airflow does not have any solution. There is a need for more features on experimental evolution steps.

SUDHIR KUMAR RATHLAVATH Student at University of South Florida · Answer 14 · 2023-07-26T20:06:26Z

One improvement could be the inclusion of a plugin with a drag-and-drop feature. This graphical feature would be beneficial when dealing with connectivity and integration services like connecting to BigQuery or other systems. As a first-time user, although the documentation is available, it would be more user-friendly to have a drag-and-drop interface within the portal. Users could simply drag and drop components to create a pseudo-code, making it more flexible and intuitive. Therefore, I suggest having a drag-and-drop feature for a more user-friendly experience and better code management. Moreover, for admins, there should be improved logging capabilities because Apache Airflow does have logging, but it's limited to some database data. It would be better if everything goes into the server where it's hosted. Probably on the interface level. If something goes well for the developers.

score 0 · Answer 15 · 2023-07-24T07:26:45Z

There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it.

score 0 · Answer 16 · 2023-06-28T06:37:50Z

Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful. Apache Airflow is not that easy to use, but we have gotten used to it.

Ravan Nannapaneni Senior Lead Engineer at Oliver Wyman · Answer 17 · 2023-03-31T09:53:26Z

We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches.

Fadi Bathish Project Manager at Siren Analytics · Answer 18 · 2023-02-20T16:14:00Z

The following should be improved: * Dashboards * Security * Airflow web UI * Telemetry for logging, monitoring, and alerting purposes * Documentation

score 0 · Answer 19 · 2022-12-01T16:03:48Z

The solution can be improved by creating a tool that allows us to do these kinds of things graphically instead of just writing scripts. Hence, the graphical user interface can be improved.

score 0 · Answer 20 · 2022-09-03T05:32:05Z

VenugopalKathirvel

Senior Member Of Technical Staff, Engineering Operations at VMware

Real User

Sep 3, 2022

Apache Airflow could be improved with the addition of more frameworks.

score 0 · Answer 21 · 2022-08-31T14:14:49Z

The solution could be improved by simplifying the integration process and providing access to its support team to guide integration.

Nomena NY HOAVY Lead Data Scientist at MVola · Answer 22 · 2022-06-20T12:19:00Z

Apache Airflow could be improved by integrating some versioning principles. Currently, we have to swap some tags in our flow. It would be interesting if we can check the product and version all of the product at the same time comparing what scripts have changed from last year to this year, or last month to this month. For example, we have a flow for one project, to version it we need to check it one by one to identify which tags changed and which scripts changed. All of these need to be done manually.

score 0 · Answer 23 · 2021-03-26T23:33:18Z

I am using a Celery Executor and I find that it crashes and I can't see any logs. I can only assume that it's a memory issue and have to blindly restart until eventually, it starts up again. One of the use cases is triggered by input rather than a batch process. For example, we receive a batch of data, it goes through tasks one, two, and three, and a new batch comes in, each subsequent task should be operating on just that data from the prior task. I am used to working on it as the output gets written to a table and then the next task selects all from that upstream table. It could be coded where you are only writing the data for that portion of the task. It could handle state machines and state changes as opposed to the batch proxy. I would like to see it more friendly for other use cases.

score 0 · Answer 24 · 2021-02-11T12:31:46Z

We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult. When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly. The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.

score 0 · Answer 25 · 2021-01-15T22:07:12Z

reviewer1468407

Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees

Real User

Jan 15, 2021

The dashboard is connected into the BPM flow that could be improved.

score 0 · Answer 26 · 2020-12-23T23:10:23Z

Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required. In the future, I would like to see a single-click installation.

score 0 · Answer 27 · 2020-12-22T20:05:08Z

The graphics in the past have not been ideal. We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself. The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves. The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not. There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them. There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.

Sudhir Ganti Engineering Manager - OTT Platform at Amagi · Answer 28 · 2020-04-13T06:27:00Z

One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow. For example, Step 1 of my workflow will have output which I definitely want to automatically be provided as an input to my Step 2. At the workflow level, we want to have common state management where, across steps, we'll be able to reach the state information. Right now, we're using an external state repository to maintain the state. If Airflow could come up with some kind of implementation, where not every step of the pipeline is an independent step, that would be helpful. I would like it if a part of the output of your previous steps could be Apache input for your next step. That kind of pipeline is missing. When we consider other products like jBPM, Camunda, or Cadence, they have the concept of pipelining. I would also like to see support for more platforms, in terms of programming BPMs. Cadence supports Golang and Java. Legacy components can be from any platform, so if they could provide more client support for Java client library and Golang, that would be helpful. I want it to program in Java.

Alexey Nadenenko Solution Architect at EPAM Systems · Answer 29 · 2019-09-02T05:33:00Z

There are some drawbacks to this solution. The code does not cover all tasks in the data warehouse automation process. Currently , in production, we have a large installation with a complex workflow that includes hundreds of tasks. Most of them are dispatched by existing engine, but not all. For example, sometimes we need to create cycles in our workflow but we are not able to, because Airflow supports only Direct Acyclic Graphs ( DAGs ) We need to develop our workflow description and notations because out of the box, Apache Airflow does not provide some features that are needed. It is our understanding that it is limited by design. We will wait for the latest 2.0 version, as it is awaited to be much more mature than the 1.8-1.10 version. We believe that it will be better. There should be some improvement made to the Doc Management features from within the UI. They should think about Outlook integration, which should be out of the box, and the object model should be expanded to support cyclic graphs inside the workflow.