What is our primary use case?
We were using Apache Airflow for our orchestration needs. We used it for all the jobs that we had created in Databricks, Fivetran, or dbt. These were the three primary tools that we were using. There were a few others, but these were the three primary tools. So, Apache Airflow was for the job orchestration and connecting them to each other for building our entire data pipeline. We were also using Apache Airflow for dbt CI/CD purposes.
What is most valuable?
The most valuable feature is that it's the most popular data orchestration tool in the market right now. It connects to everything you need.
It's open-source. You have a lot of documentation and a lot of people helping out. It has large communities, so if you need something or you want to ask something, you can. Often, someone else would have already asked that question, and they would have already got the answer, and you can just look it up.
Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not. For notifications, it can connect with different messaging tools such as Slack and Teams, as well as with webhooks. It's very easy to use, and it has a lot of features that you would expect from any of the data orchestration tools.
What needs improvement?
Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python. It doesn't provide features for graphical programming. You cannot drag and drop something, build a pipeline out of that, or orchestrate that with a drag and drop. They have a graphical feature but only for administration purposes, not for development. They don't have a UI for development.
It doesn't support the Windows system. That's a big drawback because a lot of people are using Windows.
For how long have I used the solution?
I used Apache Airflow on my previous project. We had planned to use it in our current project, but due to time issues, we were not able to deploy it. In my previous project, I used it for around eight or nine months.
What do I think about the stability of the solution?
It's a very stable product.
What do I think about the scalability of the solution?
It's highly scalable. You can scale it as much as you want. It depends on the size, and you need to scale up your instance. We had over 3,000 DAGs in our previous project, and we didn't face any issue with even 8 GB memory in our EC2 instance. If you have a lot of DAGs, you might need to scale up, but it's quite lightweight, so you don't need to worry much about that.
How are customer service and support?
It's open source. It was my first project, and I had a few doubts, but everything I needed was available on the internet, so I never had to contact their support. I might have been able to post my questions on their GitHub, but I didn't need that. Airflow has a very large community, so any questions you ask get answered there.
How was the initial setup?
Its setup wasn't done by us. It was done by the Astronomer team on Azure Community Services. So, it was deployed and set up on Azure Community Service. Everything was taken care of by the Astronomer team.
What about the implementation team?
Apache Airflow has two large and popular distributors. There might be others, but the two popular ones are Bitnami and Astronomer. For us, everything was set up by Astronomer.
What's my experience with pricing, setup cost, and licensing?
It's open source. You can install it locally on your own system. If you are deploying it in the production system, you normally deploy it on some cloud, such as EC2 service, which would have some cost. If you are setting up a Docker container or something for Apache Airflow yourself, which is quite easy, you can do pretty much everything online. I have set it up on my local system, and It doesn't take a long time. You can do customization for your project such as selecting different repository databases or selecting different cellular or web services, which is good.
If you are going with a service provider such as Astronomer or Bitnami, they will charge you because they are a distributor of Airflow. They have some of their own features and their own support. They will charge you if you are going with them.
What other advice do I have?
If you are on a Mac or Linux system, it's very easy to install. You can just go to the Apache website to install it, and you can start working, but Apache Airflow doesn't support Windows Exe installation, so if you have some knowledge of Docker containers for WSL, it'll be useful.
Other than that, Astronomer has an instructor called Marc Lamberti who is very popular in the Airflow community. He has YouTube videos. In five minutes, he can teach you how to set up Airflow or what DAGs are. He has five or six videos, and he gets into the details with his videos. So, if you have no idea about Apache Airflow and you don't want to go through all the documentation, you can start with those videos, but if you have a Mac or Linux system, you can directly install it on your system.
I'd rate it a seven out of ten because it doesn't support Windows, and it doesn't support graphical designing, so we cannot create DAGs in the UI. We can administer and look at DAGs through the UI, but we cannot create DAGs through the UI. Other orchestration tools that are available in the market provide that feature.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.