Try our new research platform with insights from 80,000+ expert users
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis
Real User
Top 5
User-friendly, provides a graphical representation of the whole flow, and the user interface is pretty good
Pros and Cons
  • "The tool is user-friendly."
  • "We cannot run real-time jobs in the solution."

What is our primary use case?

The main use case is orchestration. We use it to schedule our jobs.

What is most valuable?

The best thing about the product is its UI. The tool is user-friendly. We can divide our work into different tasks and groups. It gives a graphical representation of the whole flow. It also creates a graph of the complete pipeline. The UI is beautiful. Whenever there is a failure, we can see it at the backend. We can retry at the point where the failure happened. We do not have to redo the whole flow. The user interface is pretty good. It provides details about the jobs. It also provides monitoring features. We can see the metrics and the history of the runs. The administration features are good. We can manage the users.

What needs improvement?

The solution lacks certain features. We cannot run real-time jobs in the solution. It supports only batch jobs. If we are using ETL pipelines, it can either be a batch job or a real-time job. Real-time jobs run continuously. They are not scheduled. Apache Airflow is for scheduled jobs, not real-time jobs. It would be a good improvement if the solution could run real-time jobs. Many connectors are available in the product, but some are still missing. We have to build a custom connector if it is not available. The solution must have more in-built connectors for improved functionality.

For how long have I used the solution?

I have been using the solution for four to five years.

Buyer's Guide
Apache Airflow
February 2025
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
832,138 professionals have used our research since 2012.

What do I think about the stability of the solution?

The tool has stability issues that are present in open-source products. It has some failures or bugs sometimes. It is difficult to troubleshoot because we do not have any support for it. We have to search the community to get answers. It would be good if there were a support team for the tool.

What do I think about the scalability of the solution?

We have 5000 to 10,000 users in our organization.

How was the initial setup?

The installation is relatively easy. It doesn't have much configuration. It is straightforward. Some companies provide custom installations. It is easier, but it will be a costly paid service. We generally use the core product. We also have AWS Managed Services. It is a better option if we do not want to do the configuration ourselves.

What other advice do I have?

Apache Airflow is a better option for batch jobs. My advice depends on the tools people use and the jobs they schedule. Databricks has its own scheduler. If someone is using Databricks, a separate tool for scheduling would be useless. They can schedule the jobs through Databricks.

Apache Airflow is a good option if someone is not using third-party tools to run the jobs. When we use APIs to get data or get data from RDBMS systems, we can use Apache Airflow. If we use third-party vendors, using the in-built scheduler is better than Apache Airflow. Overall, I rate the solution a nine out of ten.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sanket Suhagiya - PeerSpot reviewer
Senior Data Engineer at a consultancy with 10,001+ employees
Real User
Top 10
Efficient pipeline building with intuitive UI and powerful Python features
Pros and Cons
  • "The core features are strong, which are supported by Apache Airflow variables, DAGs, and connections."
  • "The UI is a little bit outdated according to modern standards."

What is our primary use case?

The primary use case for us is ETL pipelines. We write some pipelines to ingest the data. That is the primary use. And, secondly, we use it to run some scheduling and orchestration. We need to run some automation jobs every day. So, we just write an Airflow task and pipeline that runs every day or every hour or however we need. Those are the two things we use it for.

How has it helped my organization?

Since integrating Airflow, we are efficiently able to build pipelines around it in days. If there is a requirement within days or at the end of the week, we can create a pipeline for it.

What is most valuable?

The declarative language in Python is very powerful as the learning curve is really less. The UI is also very intuitive, and it makes sense. The core features are strong, which are supported by Apache Airflow variables, DAGs, and connections. Connections make it really extendable to plug-ins and custom modules we can write around it.

What needs improvement?

The UI is a little bit outdated according to modern standards. The UI can be enhanced to support some modern standards. Maybe small things such as dark mode and some proper aesthetics can be implemented.

For how long have I used the solution?

I have been working with Airflow for the past two years.

What do I think about the stability of the solution?

We have not faced any performance issues. Our team follows a custom deployment and uses a Kubernetes runner in the backend, so we can scale it as we need. Scalability-wise, we have not faced any issues.

What do I think about the scalability of the solution?

We use Kubernetes on the backend, which allows us to scale it as needed. We have not faced any issues with scalability.

How are customer service and support?

The team prepared comprehensive user guides and FAQs. I do not remember raising any tickets or concerns with tech support.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I have heard people using Hadoop and some Informatica flows. Informatica flows were pretty rigid, and custom solutioning was more difficult with those.

How was the initial setup?

We used the help of the Astronomer company for setting up Airflow. Setting up the pipelines is straightforward. We have a CI/CD system, and we just write a Python script, and the pipeline is up and running in minutes.

What about the implementation team?

We used the help of the Astronomer company for setting up Airflow.

What was our ROI?

I might not have the numbers for the investment. Whatever the investment, we can efficiently build pipelines around it in days. If there is a requirement within days or at the end of the week, we can create a pipeline for it. So, the ROI should be good.

What other advice do I have?

If it is a large deployment, it is good to go with a managed approach where someone else would be managing for us. If it is small, we can go on our own by spinning up some Kubernetes clusters and deploying it in the cloud.

I'd rate the solution nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Apache Airflow
February 2025
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
832,138 professionals have used our research since 2012.
Ravan Nannapaneni - PeerSpot reviewer
Senior Lead Engineer at Oliver Wyman
Real User
Top 10
Beneficial creating and scheduling jobs, but stability need improvement
Pros and Cons
  • "The most valuable feature of Apache Airflow is creating and scheduling jobs. Additionally, the reattempt at failed jobs is useful."
  • "We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches."

What is our primary use case?

Apache Airflow is utilized for automating data engineering tasks. When creating a sequence of tasks, Airflow can assist in automating them.

What is most valuable?

The most valuable feature of Apache Airflow is creating and scheduling jobs. Additionally, the reattempt at failed jobs is useful.

What needs improvement?

We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches.

For how long have I used the solution?

I have been using Apache Airflow for approximately three years.

What do I think about the stability of the solution?

We experienced some glitches using the solution with some errors.

I rate the stability of Apache Airflow a five out of ten.

What do I think about the scalability of the solution?

Apache Airflow is scalable because it is within Amazon AWS.

I rate the scalability of Apache Airflow an eight out of ten.

How are customer service and support?

The technical support is good, they are able to debug issues.

How was the initial setup?

The initial setup of Apache Airflow was simple because it was all managed by Amazon AWS. The process took a few minutes.

What's my experience with pricing, setup cost, and licensing?

The solution is free if you use Amazon AWS.

What other advice do I have?

I would recommend this solution for projects even though there have been glitches. Once the solution has become stable it would be ideal for critical projects.

I rate Apache Airflow a seven out of ten.

This is great software to build data pipelines. However, we had many glitches that were causing some problems in production.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Analytics Solution Manager at Telekom Malaysia
Real User
Top 20
A cost-effective solution widely adopted and with a broad open-source community
Pros and Cons
  • "Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop."
  • "There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it."

What is most valuable?

Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop.

What needs improvement?

There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it.

For how long have I used the solution?

I have been using Apache Airflow for five years. We are using the latest version of the solution.

What do I think about the stability of the solution?

I rate the solution’s stability a seven out of ten.

What do I think about the scalability of the solution?

The scalability is good. We have five people working on it for five different projects.

I rate the solution’s scalability a ten out of ten.

Which solution did I use previously and why did I switch?

We have used open-source Apache NiFi for data flow, Talend, and secret server integration services.

We chose Apache Airflow because it is quite popular, adopted by many people, and has an open-source community and engineers. We moved with the crowd and chose it based on popularity.

How was the initial setup?

I rate the initial setup five on a scale of one to ten, one being difficult and ten being easy.

The deployment required a senior engineer and took a week to complete.

What's my experience with pricing, setup cost, and licensing?

The solution is cheap.

What other advice do I have?

A new user has to be prepared to adopt a new paradigm and treat the data baseline as code rather than drag and drop. An organization should have a dedicated or experienced person looking into this.

Overall, I rate the solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Pravin Gadekar - PeerSpot reviewer
Google Cloud Architect at Capgemini
Real User
Top 5
Has an efficient user interface, but its stability needs improvement
Pros and Cons
  • "The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. c"
  • "The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues."

What is our primary use case?

We use the product to orchestrate data engines and process new data files.

What is most valuable?

The product's most valuable feature is scalability. It helps us run hundreds of data jobs every day.

What needs improvement?

The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues. It requires manual intervention to resume jobs. Additionally, while extending the code is possible, it sometimes necessitates creating custom plugins.

For how long have I used the solution?

We have been using Apache Airflow for four years.

What do I think about the scalability of the solution?

We have more than 100 Apache Airflow users in our organization.

How was the initial setup?

The initial setup on Google Cloud using Cloud Composer is straightforward and simplified. However, deploying it on-premises can be complex and challenging.

What was our ROI?

The product is worth the investment.

What's my experience with pricing, setup cost, and licensing?

It is an open-source solution, so there are no hidden fees or licensing costs associated with the software. However, users need to cover the operational costs for the actual infrastructure, such as the virtual machines (VMs).

What other advice do I have?

The directed acyclic graph (DAG) functionality in Apache Airflow has significantly enhanced our workflow management. It provides a visual representation of data processing tasks.

The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. It is difficult for beginners to use the platform, and some training is required.

I recommend the product to others, and it is much better than our competitors. It is an open source. I rate it a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Analytics Solution Manager at Telekom Malaysia
Real User
Top 20
Comes with direct support for Python, letting us easily automate our pipelines
Pros and Cons
  • "The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot."
  • "We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult."

What is our primary use case?

There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy.

We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions. 

For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises. 

What is most valuable?

The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.

It's such a natural fit because our engineers are also Python-based, and I think we also quite like that we don't have to learn different kinds of UIs. Airflow is based on standard software packages, so we don't have to learn anything new in the way of opinionated UIs from different vendors.

What needs improvement?

We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult.

When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly. 

The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.

For how long have I used the solution?

I've been using Apache Airflow for about two and a half years. 

What do I think about the stability of the solution?

I think how Apache Airflow works is great. We like the paradigm of ETL as code, which means you define your pipeline as code. All the while, people talk about infrastructure as code, so the practice of ETL as code really fits into that philosophy.

What do I think about the scalability of the solution?

We can scale it well, and it runs on cloud, too. It's compatible with cloud-native technologies like Kubernetes so it has no issues regarding elasticity.

How are customer service and technical support?

We contacted an Airflow developer for assistance once and it was a good experience.

Which solution did I use previously and why did I switch?

We like to explore different tools, mixing and matching them to our needs, but we have never really found any like Airflow that are to our liking. We tried looking into Talend and Alteryx but we didn't find them suitable to our style or approach.

How was the initial setup?

As a first-time user, it was complex and somewhat difficult to set up as there are many components to put together. You've got your data portion, your scheduler portion, your web server portion, etc., and you've got all these parts to set up at first.

The next project that you get to, it gets easier. You really need to acquire a feel for what you're doing, and once you get over that, it's not too bad.

What about the implementation team?

We implemented Airflow ourselves, with the help of our two in-house data engineers and system administrator. It took around three months to get it deployed initially, from concept into production. Then after that, the goal is just to operate it and keep it running.

What's my experience with pricing, setup cost, and licensing?

Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost.

What other advice do I have?

I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable.

I would rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1619292 - PeerSpot reviewer
Director of Business Intelligence at a consultancy with self employed
Real User
Helps to schedule data pipelines but improvement is needed in workflow integration across the servers
Pros and Cons
  • "To increase efficiency, it's quite simple to add dbt tasks to an Apache Airflow pipeline or orchestration file. With the tool, you can specify dependencies."
  • "I would like to see workflow integration across the servers."

What is our primary use case?

We use the tool to schedule data pipelines. We also use Apache Airflow to orchestrate dbt, another data processing tool. Airflow helps manage dbt processes, which, in our case, load data from our data lake.

What is most valuable?

To increase efficiency, it's quite simple to add dbt tasks to an Apache Airflow pipeline or orchestration file. With the tool, you can specify dependencies.

What needs improvement?

I would like to see workflow integration across the servers. 

For how long have I used the solution?

I have been using the product for two years. 

What do I think about the stability of the solution?

The solution is stable, but we do have occasional performance issues. These aren't performance problems, but the Apache Airflow cluster sometimes crashes when too many tasks run simultaneously. 

What do I think about the scalability of the solution?

My team has around 11 people using the tool. Each team has a separate server, so we have about 10-20 different Apache Airflow servers. Altogether, I would estimate that around 200 people in our organization use it.

How are customer service and support?

I haven't contacted the support team directly. Our system team does it. 

How was the initial setup?

Apache Airflow provides templates for deployment, which makes it easy. When deploying the tool or using dbt, we usually use Kubernetes. We configure Kubernetes to generate a Docker file that sets up the Kubernetes servers for us. This means that when we deploy, it automatically goes to production. The whole process can be completed in seven weeks. 

What's my experience with pricing, setup cost, and licensing?

I use the tool's open-source version. 

What other advice do I have?

The solution's maintenance involves upgrades. Our system team handles maintenance for us. Their main tasks are upgrading versions and addressing vulnerabilities. It's hard work, but they manage it well. Maintenance takes about two weeks per year for our system team.

I rate the product a seven out of ten and I recommend it to others. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Punit_Shah - PeerSpot reviewer
Director at Smart Analytica
Reseller
Top 10
Excels in orchestrating complex workflows, offering extensibility, a graphical user interface for clear pipeline monitoring and affordability
Pros and Cons
  • "One of its most valuable features is the graphical user interface, providing a visual representation of the pipeline status, successes, failures, and informative developer messages."
  • "Enhancements become necessary when scaling it up from a few thousand workflows to a more extensive scale of five thousand or ten thousand workflows."

What is our primary use case?

We utilize Apache Airflow for two primary purposes. Firstly, it serves as the tool for ingesting data from the source system application into our data warehouse. Secondly, it plays a crucial role in our ETL pipeline. After extracting data, it facilitates the transformation process and subsequently loads the transformed data into the designated target tables.

What is most valuable?

One of its most valuable features is the graphical user interface, providing a visual representation of the pipeline status, successes, failures, and informative developer messages. This graphical interface greatly enhances the user experience by offering clear insights into the pipeline's status.

What needs improvement?

Enhancements become necessary when scaling it up from a few thousand workflows to a more extensive scale of five thousand or ten thousand workflows. At this point, resource management and threading, become critical aspects. This involves optimizing the utilization of resources and threading within the Kubernetes VM ecosystem.

For how long have I used the solution?

I have been working with it for five years.

What do I think about the stability of the solution?

I would rate its stability capabilities nine out of ten.

What do I think about the scalability of the solution?

While it operates smoothly with up to fifteen hundred pipelines, scaling beyond that becomes challenging. The performance tends to drop when dealing with five thousand pipelines or more, leading to the rating of five out of ten.

How are customer service and support?

I would rate the customer service and support nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup is straightforward. I would rate it nine out of ten.

What about the implementation team?

The deployment process requires approximately four hours, and the level of involvement from individuals depends on the quantity of pipelines intended for deployment.

What's my experience with pricing, setup cost, and licensing?

The cost is quite affordable. I would rate it two out of ten.

What other advice do I have?

If you have around two thousand pipelines to execute daily within an eight to nine-hour window, Apache Airflow proves to be an excellent solution. I would rate it nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: My company has a business relationship with this vendor other than being a customer: Service provider
PeerSpot user
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
Updated: February 2025
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.