The main use case is orchestration. We use it to schedule our jobs.
Module Lead at Mphasis
User-friendly, provides a graphical representation of the whole flow, and the user interface is pretty good
Pros and Cons
- "The tool is user-friendly."
- "We cannot run real-time jobs in the solution."
What is our primary use case?
What is most valuable?
The best thing about the product is its UI. The tool is user-friendly. We can divide our work into different tasks and groups. It gives a graphical representation of the whole flow. It also creates a graph of the complete pipeline. The UI is beautiful. Whenever there is a failure, we can see it at the backend. We can retry at the point where the failure happened. We do not have to redo the whole flow. The user interface is pretty good. It provides details about the jobs. It also provides monitoring features. We can see the metrics and the history of the runs. The administration features are good. We can manage the users.
What needs improvement?
The solution lacks certain features. We cannot run real-time jobs in the solution. It supports only batch jobs. If we are using ETL pipelines, it can either be a batch job or a real-time job. Real-time jobs run continuously. They are not scheduled. Apache Airflow is for scheduled jobs, not real-time jobs. It would be a good improvement if the solution could run real-time jobs. Many connectors are available in the product, but some are still missing. We have to build a custom connector if it is not available. The solution must have more in-built connectors for improved functionality.
For how long have I used the solution?
I have been using the solution for four to five years.
Buyer's Guide
Apache Airflow
November 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
What do I think about the stability of the solution?
The tool has stability issues that are present in open-source products. It has some failures or bugs sometimes. It is difficult to troubleshoot because we do not have any support for it. We have to search the community to get answers. It would be good if there were a support team for the tool.
What do I think about the scalability of the solution?
We have 5000 to 10,000 users in our organization.
How was the initial setup?
The installation is relatively easy. It doesn't have much configuration. It is straightforward. Some companies provide custom installations. It is easier, but it will be a costly paid service. We generally use the core product. We also have AWS Managed Services. It is a better option if we do not want to do the configuration ourselves.
What other advice do I have?
Apache Airflow is a better option for batch jobs. My advice depends on the tools people use and the jobs they schedule. Databricks has its own scheduler. If someone is using Databricks, a separate tool for scheduling would be useless. They can schedule the jobs through Databricks.
Apache Airflow is a good option if someone is not using third-party tools to run the jobs. When we use APIs to get data or get data from RDBMS systems, we can use Apache Airflow. If we use third-party vendors, using the in-built scheduler is better than Apache Airflow. Overall, I rate the solution a nine out of ten.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Mar 19, 2024
Flag as inappropriateData Engineer Team Lead at Unibank
Can be used with multiple systems and servers, Kubernetes systems, and dashboard systems
Pros and Cons
- "The product is stable."
- "There is a need for more features on experimental evolution steps."
What is our primary use case?
We use Apache Airflow for the automation and orchestration of model deployment, training, and feature engineering steps. It is a model lifecycle management tool.
How has it helped my organization?
We have an integration with Apache Airflow in our portal for messaging. We use group and transformation data from Redshift to Tesco, and then create a call flow to the router. This is a source of data leakage, such as data engineering and machine learning, especially in a HIPAA environment. We need to check the evolution steps in the pipeline. In production, we only have two cases. Sometimes, we need customer data not in the database, which we get from object storage. The call flow from Redshift to Tesco involves transforming the data and then generating it with the router or Kibana router for the policy. The data is then transformed and sent to the dashboard or data warehouse.
What needs improvement?
Airflow is a pipeline for transferring code by clients, but for experimental model experiments, Apache Airflow does not have any solution. There is a need for more features on experimental evolution steps.
For how long have I used the solution?
I have been using Apache Airflow for one and a half years.
What do I think about the stability of the solution?
The product is stable. I rate the solution’s stability an eight out of ten.
What do I think about the scalability of the solution?
20 users are using this solution in our organization. I rate the solution’s scalability an eight out of ten.
How was the initial setup?
The initial setup is not complex and can be done by two people. However, open-source prime solutions have some difficulties. We can schedule Apache Airflow on Kubernetes. Space limitations and installation issues may arise, as we do not have full control over Kubernetes cluster resources, and our administration is limited. I rate the initial setup a six out of ten, where one is difficult, and ten is easy.
What other advice do I have?
I recommend Apache Airflow because it is still profitable and can be used with multiple systems and servers, Kubernetes systems, and dashboard systems. You can use it to get social media and other data, but it can be expensive. Overall, I rate the solution a nine out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Apache Airflow
November 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Senior Lead Engineer at Oliver Wyman
Beneficial creating and scheduling jobs, but stability need improvement
Pros and Cons
- "The most valuable feature of Apache Airflow is creating and scheduling jobs. Additionally, the reattempt at failed jobs is useful."
- "We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches."
What is our primary use case?
Apache Airflow is utilized for automating data engineering tasks. When creating a sequence of tasks, Airflow can assist in automating them.
What is most valuable?
The most valuable feature of Apache Airflow is creating and scheduling jobs. Additionally, the reattempt at failed jobs is useful.
What needs improvement?
We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches.
For how long have I used the solution?
I have been using Apache Airflow for approximately three years.
What do I think about the stability of the solution?
We experienced some glitches using the solution with some errors.
I rate the stability of Apache Airflow a five out of ten.
What do I think about the scalability of the solution?
Apache Airflow is scalable because it is within Amazon AWS.
I rate the scalability of Apache Airflow an eight out of ten.
How are customer service and support?
The technical support is good, they are able to debug issues.
How was the initial setup?
The initial setup of Apache Airflow was simple because it was all managed by Amazon AWS. The process took a few minutes.
What's my experience with pricing, setup cost, and licensing?
The solution is free if you use Amazon AWS.
What other advice do I have?
I would recommend this solution for projects even though there have been glitches. Once the solution has become stable it would be ideal for critical projects.
I rate Apache Airflow a seven out of ten.
This is great software to build data pipelines. However, we had many glitches that were causing some problems in production.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Big Data Engineer at BigTapp Analytics
A solution for orchestrating EMR clusters with plug-and-play UI
Pros and Cons
- "Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details."
- "Airflow should support the dynamic drag creation."
What is our primary use case?
I have used Apache Airflow for various purposes, such as orchestrating Spark jobs, EMR clusters, Glue jobs, and submitting jobs within the DCP data flow on Azure Databricks including ad hoc queries. For instance, if there's a need to execute queries on Redshift or other databases.
How has it helped my organization?
If you are working with APIs or databases, you must write SQL queries and formulate the right statements to retrieve everything. But with the UI, it's more like plug-and-play. You go there, select the task you want to see, like logs, and click on it. It will promptly display the details of the logs, automatically showing the returned logs. However, if you're accessing logs manually from the web server, you must write commands and perform additional tasks. These overheads can be efficiently managed using the UI.
What is most valuable?
Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details. All logs are readily accessible within the interface itself. Examining the logs lets you discern which steps and processes are being executed.
You don't have to configure SMTP for everything. You need to configure email settings, such as email on error, failure, or alert access. With Apache Airflow, you can send emails with just a few lines of code. You don't have to write extensive code to configure SMTP; all those configurations can be accomplished within a few lines of code.
I managed a complex workflow for a finance application project. They use Apache Airflow to orchestrate processes, such as retrieving data from SFTP and landing it into S3. From S3, they trigger Glue jobs based on certain conditions. Additionally, they use the Glue catalog in Glusoft for data management, all orchestrated using Airflow. Furthermore, various logics are written in Airflow DAGs to handle scenarios like security mismatches. For instance, files are sent accordingly if there's a missing security.
Apache Airflow triggers a set of tasks based on DAGs. If you have multiple tags, such as raw, transform, and ready layers, instead of manually triggering each DAGs. In that case, you can integrate them to trigger one, automatically triggering the others. Also, you can put conditions.
What needs improvement?
Airflow should support the dynamic drag creation.
For how long have I used the solution?
I have been using Apache Airflow for over 8 years.
What do I think about the stability of the solution?
The solution is stable.
I rate the solution's stability a nine-point five out of ten.
What do I think about the scalability of the solution?
We were using Apache Airflow on Kubernetes. As more requests came in, it scaled dynamically based on the available ports. There are almost 15 data engineers who are using Apache Airflow.
I rate the solution's scalability a nine out of ten.
How was the initial setup?
The initial setup is straightforward. It will be tricky if you go with an executor or Kubernetes operator.
If you're into plug-and-play convenience, Apache Airflow supports various deployment methods like Docker, Helm, or Kubernetes. If you want to spin up Airflow, it will take more than 10-15 minutes. However, if you're making customizations or prefer not to use existing databases, the setup time could be extended due to customization requests.
What other advice do I have?
You use Apache Airflow to automate your data pipelines. When you have a data pipeline, such as a Spark job or any other job, and want to automate it, triggering the job manually is not always necessary. You need to configure these DAGs accordingly. For instance, Airflow can initiate the job when the data becomes available. We don't need to keep the cluster running all the time, 24/7. We start the cluster using Airflow when we need to submit the job. Once the job is completed, we terminate the cluster.
I recommend the solution.
Overall, I rate the solution a nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Apr 10, 2024
Flag as inappropriateAnalytics Solution Manager at Telekom Malaysia
A cost-effective solution widely adopted and with a broad open-source community
Pros and Cons
- "Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop."
- "There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it."
What is most valuable?
Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop.
What needs improvement?
There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it.
For how long have I used the solution?
I have been using Apache Airflow for five years. We are using the latest version of the solution.
What do I think about the stability of the solution?
I rate the solution’s stability a seven out of ten.
What do I think about the scalability of the solution?
The scalability is good. We have five people working on it for five different projects.
I rate the solution’s scalability a ten out of ten.
Which solution did I use previously and why did I switch?
We have used open-source Apache NiFi for data flow, Talend, and secret server integration services.
We chose Apache Airflow because it is quite popular, adopted by many people, and has an open-source community and engineers. We moved with the crowd and chose it based on popularity.
How was the initial setup?
I rate the initial setup five on a scale of one to ten, one being difficult and ten being easy.
The deployment required a senior engineer and took a week to complete.
What's my experience with pricing, setup cost, and licensing?
The solution is cheap.
What other advice do I have?
A new user has to be prepared to adopt a new paradigm and treat the data baseline as code rather than drag and drop. An organization should have a dedicated or experienced person looking into this.
Overall, I rate the solution an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Google Cloud Architect at Capgemini
Has an efficient user interface, but its stability needs improvement
Pros and Cons
- "The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. c"
- "The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues."
What is our primary use case?
We use the product to orchestrate data engines and process new data files.
What is most valuable?
The product's most valuable feature is scalability. It helps us run hundreds of data jobs every day.
What needs improvement?
The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues. It requires manual intervention to resume jobs. Additionally, while extending the code is possible, it sometimes necessitates creating custom plugins.
For how long have I used the solution?
We have been using Apache Airflow for four years.
What do I think about the scalability of the solution?
We have more than 100 Apache Airflow users in our organization.
How was the initial setup?
The initial setup on Google Cloud using Cloud Composer is straightforward and simplified. However, deploying it on-premises can be complex and challenging.
What was our ROI?
The product is worth the investment.
What's my experience with pricing, setup cost, and licensing?
It is an open-source solution, so there are no hidden fees or licensing costs associated with the software. However, users need to cover the operational costs for the actual infrastructure, such as the virtual machines (VMs).
What other advice do I have?
The directed acyclic graph (DAG) functionality in Apache Airflow has significantly enhanced our workflow management. It provides a visual representation of data processing tasks.
The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. It is difficult for beginners to use the platform, and some training is required.
I recommend the product to others, and it is much better than our competitors. It is an open source. I rate it a seven out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Apr 1, 2024
Flag as inappropriateSoftware engineer at Naver Corp
Convenient, easy to learn, has a simple UI, and has a huge user base
Pros and Cons
- "The UI is very simple and easy to learn."
- "The documentation must be improved."
What is our primary use case?
My team works on commerce services. We use Airflow to synchronize user information or product information from other services. We use the tool for automating data pipelines. We store user history about API calls and show it on a statistics page, like daily or real-time statistics. We use the solution to aggregate API user's data.
What is most valuable?
Kubernetes from the batch application is the most useful to my team. It uses Python. It is simple. There are not many learning costs. We're using the scheduler. We don't need to care about the batch job every day. We just need to notice when the alerts are firing. It is convenient for us. The product supports many other services, like Kubernetes. I saw some custom applications and programs. The solution integrates very well with other products.
What needs improvement?
The documents do not precisely define the function of the operators. I had to do some experiments to understand the function of the operators. The documentation must be improved. Some parts of the documentation do not precisely explain the parameters and functions. We often need to do experiments to understand how they work.
For how long have I used the solution?
I have been using the solution for one and a half years.
What do I think about the stability of the solution?
I rate the tool’s stability a nine out of ten.
What do I think about the scalability of the solution?
I rate the tool’s scalability a six or seven out of ten. We haven’t horizontally scaled the solution. At least 20% of the teams in my organization are using Airflow to do some batch jobs. There are around 300 users.
How was the initial setup?
I rate the ease of setup an eight out of ten. The product is deployed on the cloud. We release Airflow on Kubernetes. The deployment takes less than five minutes. We use a deployment tool made by our company to deploy the solution.
Which other solutions did I evaluate?
I am also using Apache Kafka.
What other advice do I have?
I will recommend the product to others. The UI is very simple and easy to learn. There are a lot of users of the product. We can find information easily on Google. Overall, I rate the tool an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Mar 4, 2024
Flag as inappropriateHead of Big Data Department at IBA Group
Used for the orchestration of data pipelines, but it should have better integration with cloud platforms
Pros and Cons
- "Since it's widely adopted by the community, Apache Airflow is a user-friendly solution."
- "Apache Airflow should have better integration with cloud platforms."
What is our primary use case?
We use Apache Airflow for the orchestration of data pipelines.
What is most valuable?
Since it's widely adopted by the community, Apache Airflow is a user-friendly solution.
What needs improvement?
Apache Airflow should have better integration with cloud platforms.
For how long have I used the solution?
I have been using Apache Airflow for a couple of years.
What do I think about the stability of the solution?
Apache Airflow is not a stable solution.
What do I think about the scalability of the solution?
Around ten people are using the solution in our organization.
How was the initial setup?
The solution's initial setup is difficult and should be done by an experienced person.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is a cheap solution.
What other advice do I have?
The solution is deployed on the cloud in our organization. Before choosing Apache Airflow, users should try cloud-native services first.
Overall, I rate the solution a seven out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Feb 24, 2024
Flag as inappropriateBuyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Business Process Management (BPM)Popular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Camunda
Appian
Pega Platform
SAP Signavio Process Manager
IBM BPM
Bizagi
ARIS BPA
Bonita
Nintex Process Platform
AWS Step Functions
IBM Business Automation Workflow
Oracle BPM
KiSSFLOW
Boomi AtomSphere Flow
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which would you choose - Camunda Platform or Apache Airflow?
- When evaluating Business Process Management, what aspect do you think is the most important to look for?
- Camunda or Bonitasoft?
- Do you know of a solution which fulfills the requirements listed below?
- Looking for a BPMN tool that is easy to use and reasonably priced
- Which is the best Workflow Automation Platform with microservices?
- Which tool do you recommend for business process modeling only?
- RPA vs BPM: do they complement each other?
- What is the ROI of BPM solutions for a company which currently isn't using one?
- What Business Process Management (BPM) workflow solution would you recommend for others?