Try our new research platform with insights from 80,000+ expert users
Mikalai Surta - PeerSpot reviewer
Head of Big Data Department at IBA Group
MSP
Top 5
Used for the orchestration of data pipelines, but it should have better integration with cloud platforms
Pros and Cons
  • "Since it's widely adopted by the community, Apache Airflow is a user-friendly solution."
  • "Apache Airflow should have better integration with cloud platforms."

What is our primary use case?

We use Apache Airflow for the orchestration of data pipelines.

What is most valuable?

Since it's widely adopted by the community, Apache Airflow is a user-friendly solution.

What needs improvement?

Apache Airflow should have better integration with cloud platforms.

For how long have I used the solution?

I have been using Apache Airflow for a couple of years.

Buyer's Guide
Apache Airflow
December 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.

What do I think about the stability of the solution?

Apache Airflow is not a stable solution.

What do I think about the scalability of the solution?

Around ten people are using the solution in our organization.

How was the initial setup?

The solution's initial setup is difficult and should be done by an experienced person.

What's my experience with pricing, setup cost, and licensing?

Apache Airflow is a cheap solution.

What other advice do I have?

The solution is deployed on the cloud in our organization. Before choosing Apache Airflow, users should try cloud-native services first.

Overall, I rate the solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1715364 - PeerSpot reviewer
Senior Data Engineer at a photography company with 11-50 employees
Real User
Top 5
A tool that needs to improve its complex initial setup and limited integration capabilities but can be useful in workflow automation
Pros and Cons
  • "Apache Airflow is useful for workflow automation, making it capable of automating pipelines, data pipelines, and data warehouse processes."
  • "The problem with Apache Airflow is that it is an open-source tool. You have to build it into a Kubernetes container, which is not easy to maintain, and I find it to be very clunky."

What is our primary use case?

Apache Airflow is useful for workflow automation, making it capable of automating pipelines, data pipelines, and data warehouse processes. I don't have a strong need for Apache Airflow because I do everything with a dbt or data build tool since it has its own integrated workflow process.

I use Fivetran to synchronize my data. I don't need to do any automation on that and don't have any need for workflow automation. I have everything I need.

How has it helped my organization?

We were experimenting with the solution. We never reached the point where we would deploy the solution in the production capacity.

What needs improvement?

The problem with Apache Airflow is that it is an open-source tool. You have to build it into a Kubernetes container, which is not easy to maintain, and I find it to be very clunky.

Additionally, there is room for improvement with DAGs. I had a very hard time building DAGs in Apache Airflow. I decided to use Astronomer, which is on top of Apache Airflow and is supposed to make your life easier. The best part of the solution is the third-party add-on which is Astronomer.

It would be a very nice tool if it could have been an entirely cloud-based solution. Apache Airflow is not so nice when you have a hybrid setup, such as half is on-premises and half of it is on a cloud environment. It should integrate better with the outside world.

For how long have I used the solution?

I have been using Apache Airflow for a couple of months.

What do I think about the stability of the solution?

I have no opinion on the solution's stability. The solution did not get to a production capacity. I couldn't even do file processing with Apache Airflow. None of the engineers could actually help me set up Apache Airflow. I had to give up on the product. Just buy a product that works, and you will be done with it.

How was the initial setup?

The initial setup was complex to deploy on the cloud. Installing the software is very difficult. The documentation is very bad. There is no installer where you can press a button, and it does everything for you. One may need a couple of engineers to install the solution, which is an issue with open-source tools. Price-wise, the software falls on the cheaper side. With Apache Airflow, one may spend much more on engineers.

The solution is deployed purely on the cloud.

What was our ROI?

I didn't experience any ROI using the solution. I could do everything without Apache Airflow since it would have been just a money pit.

What other advice do I have?

I suggest others not use Apache Airflow. If you use Apache Airflow, you will waste your time unless you have a bunch of engineers who already know about the solution.

If you cannot write a DAG within two hours of starting the process, then forget about the tool, and it would be better if you tried to find something else.

Overall, if the tool was working properly, it would be very good, but unfortunately, it is not.

Overall, I rate the solution a five out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Airflow
December 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
Kemal Duman - PeerSpot reviewer
Team Lead, Data Engineering at Nesine.com
Real User
Top 5
Enables efficient orchestration of batch processes with very good reliability
Pros and Cons
  • "Apache Airflow is easy to scale and its UI improves with each release."
  • "I definitely recommend Apache Airflow and would rate it nine out of ten."
  • "It is not suitable for real-time ETL tasks."
  • "Running frequent jobs, such as every minute or five minutes, is not appropriate for Airflow. It is not suitable for real-time ETL tasks."

What is our primary use case?

I am using Apache Airflow to orchestrate my jobs, projects, and batch ETL jobs. It manages tasks, orchestrates jobs, and helps in handling batch processes effectively, incorporating integrations like DBT and Great Expectations.

What is most valuable?

Apache Airflow is easy to scale and its UI improves with each release. Reliability is good, and when integrated with Kubernetes, it performs better compared to on-premises environments. It also facilitates the construction of data pipelines by various teams, including those with limited technical knowledge.

What needs improvement?

Running frequent jobs, such as every minute or five minutes, is not appropriate for Airflow. It is not suitable for real-time ETL tasks. Stream jobs are not its strength.

For how long have I used the solution?

I have been using Apache Airflow for approximately five to six years.

What do I think about the stability of the solution?

Apache Airflow is stable and I have not experienced significant issues. I regard it as a reliable solution.

What do I think about the scalability of the solution?

Apache Airflow scales well, especially when deployed in Kubernetes environments. In stand-alone Linux environments, I found it limited. Kubernetes allows for better scalability.

How are customer service and support?

I generally solve problems internally without technical support, however, forums and community resources like Stack Overflow are helpful.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Before Apache Airflow, I used CronJob. I switched because Apache Airflow is the industry standard, provides better log tracking, and easier task management for teams with varied technical knowledge.

What's my experience with pricing, setup cost, and licensing?

I prefer using the open-source version rather than the enterprise version, which helps manage costs.

What other advice do I have?

I definitely recommend Apache Airflow and would rate it nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
reviewer1619292 - PeerSpot reviewer
Director of Business Intelligence at a consultancy with self employed
Real User
Helps to schedule data pipelines but improvement is needed in workflow integration across the servers
Pros and Cons
  • "To increase efficiency, it's quite simple to add dbt tasks to an Apache Airflow pipeline or orchestration file. With the tool, you can specify dependencies."
  • "I would like to see workflow integration across the servers."

What is our primary use case?

We use the tool to schedule data pipelines. We also use Apache Airflow to orchestrate dbt, another data processing tool. Airflow helps manage dbt processes, which, in our case, load data from our data lake.

What is most valuable?

To increase efficiency, it's quite simple to add dbt tasks to an Apache Airflow pipeline or orchestration file. With the tool, you can specify dependencies.

What needs improvement?

I would like to see workflow integration across the servers. 

For how long have I used the solution?

I have been using the product for two years. 

What do I think about the stability of the solution?

The solution is stable, but we do have occasional performance issues. These aren't performance problems, but the Apache Airflow cluster sometimes crashes when too many tasks run simultaneously. 

What do I think about the scalability of the solution?

My team has around 11 people using the tool. Each team has a separate server, so we have about 10-20 different Apache Airflow servers. Altogether, I would estimate that around 200 people in our organization use it.

How are customer service and support?

I haven't contacted the support team directly. Our system team does it. 

How was the initial setup?

Apache Airflow provides templates for deployment, which makes it easy. When deploying the tool or using dbt, we usually use Kubernetes. We configure Kubernetes to generate a Docker file that sets up the Kubernetes servers for us. This means that when we deploy, it automatically goes to production. The whole process can be completed in seven weeks. 

What's my experience with pricing, setup cost, and licensing?

I use the tool's open-source version. 

What other advice do I have?

The solution's maintenance involves upgrades. Our system team handles maintenance for us. Their main tasks are upgrading versions and addressing vulnerabilities. It's hard work, but they manage it well. Maintenance takes about two weeks per year for our system team.

I rate the product a seven out of ten and I recommend it to others. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Bernd Stroehle. - PeerSpot reviewer
Enterprise Architect at KosaKya
Real User
Top 10
An open-source solution that has limitations in processing too many jobs
Pros and Cons
  • "I worked on a project at a leading German bank for two years, successfully migrating large applications with hundreds of jobs."
  • "Apache Airflow improved workflow efficiency, but we had to find solutions for large workflows. For instance, a monthly workflow with 1200 jobs had to be split into three to four pieces as it struggled with large job numbers. Loading a workflow with 500 jobs could take 10 minutes, which wasn't acceptable."

What needs improvement?

Apache Airflow improved workflow efficiency, but we had to find solutions for large workflows. For instance, a monthly workflow with 1200 jobs had to be split into three to four pieces as it struggled with large job numbers. Loading a workflow with 500 jobs could take 10 minutes, which wasn't acceptable.

The most important feature Apache Airflow lacks is support for external configuration files. All classical schedulers like Control-M or Automic allow you to load workflow definitions from YAML, XML, or JSON files, but the tool requires you to write Python programs. Airflow only supports external configuration for variables, not for workflows. To address this, I created a YAML configuration file that I converted into Python programs, but this functionality is missing from Apache Airflow itself.

All of its competitors have this feature. In Control-M, Automic, and IBM's scheduler, you can load workflows from XML, JSON, or YAML files.

For how long have I used the solution?

I've been familiar with Apache Airflow for about three to four years. I worked on a project at a leading German bank for two years, successfully migrating large applications with hundreds of jobs. However, the leading German bank paused its migration strategy due to issues with the team in India. They're likely waiting for version 3, which is expected next year.

What do I think about the stability of the solution?

I rate the tool's stability a nine out of ten. 

What do I think about the scalability of the solution?

I rate the product's scalability a seven out of ten. 

How are customer service and support?

Apache Airflow doesn't have its own technical support.

How was the initial setup?

I've been involved in all aspects of Airflow deployment, including building infrastructure using Kubernetes and containers. We faced challenges migrating from enterprise schedulers like Control-M and IBM's scheduler to Airflow, as it lacked some functionality. I had to implement extra features and extensions to support things like individual calendars.

What's my experience with pricing, setup cost, and licensing?

Apache Airflow is open-source and free. Hyperscalers like Google (with Composer), Azure, and AWS offer managed Airflow services.

What other advice do I have?

I recommend Apache Airflow because it's open-source, but you must accept its limitations. However, I wouldn't recommend it to companies in biomedical, chemistry, or oil and gas industries with large workflows and thousands of jobs. For example, genomic analysis at an American multinational pharmaceutical and biotechnology corporation involved workflows with around twenty thousand jobs, which Airflow can't handle. Special schedulers are needed for such cases, as even classical schedulers like Control-M and Automic aren't suitable.

I rate the overall solution a seven out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Analytics Solution Manager at Telekom Malaysia
Real User
Top 20
Comes with direct support for Python, letting us easily automate our pipelines
Pros and Cons
  • "The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot."
  • "We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult."

What is our primary use case?

There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy.

We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions. 

For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises. 

What is most valuable?

The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.

It's such a natural fit because our engineers are also Python-based, and I think we also quite like that we don't have to learn different kinds of UIs. Airflow is based on standard software packages, so we don't have to learn anything new in the way of opinionated UIs from different vendors.

What needs improvement?

We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult.

When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly. 

The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.

For how long have I used the solution?

I've been using Apache Airflow for about two and a half years. 

What do I think about the stability of the solution?

I think how Apache Airflow works is great. We like the paradigm of ETL as code, which means you define your pipeline as code. All the while, people talk about infrastructure as code, so the practice of ETL as code really fits into that philosophy.

What do I think about the scalability of the solution?

We can scale it well, and it runs on cloud, too. It's compatible with cloud-native technologies like Kubernetes so it has no issues regarding elasticity.

How are customer service and technical support?

We contacted an Airflow developer for assistance once and it was a good experience.

Which solution did I use previously and why did I switch?

We like to explore different tools, mixing and matching them to our needs, but we have never really found any like Airflow that are to our liking. We tried looking into Talend and Alteryx but we didn't find them suitable to our style or approach.

How was the initial setup?

As a first-time user, it was complex and somewhat difficult to set up as there are many components to put together. You've got your data portion, your scheduler portion, your web server portion, etc., and you've got all these parts to set up at first.

The next project that you get to, it gets easier. You really need to acquire a feel for what you're doing, and once you get over that, it's not too bad.

What about the implementation team?

We implemented Airflow ourselves, with the help of our two in-house data engineers and system administrator. It took around three months to get it deployed initially, from concept into production. Then after that, the goal is just to operate it and keep it running.

What's my experience with pricing, setup cost, and licensing?

Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost.

What other advice do I have?

I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable.

I would rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2108010 - PeerSpot reviewer
Associate Data Engineer at a outsourcing company with 201-500 employees
MSP
Top 5
Connects to everything we need, but doesn't support development through the UI
Pros and Cons
  • "Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not."
  • "Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python."

What is our primary use case?

We were using Apache Airflow for our orchestration needs. We used it for all the jobs that we had created in Databricks, Fivetran, or dbt. These were the three primary tools that we were using. There were a few others, but these were the three primary tools. So, Apache Airflow was for the job orchestration and connecting them to each other for building our entire data pipeline. We were also using Apache Airflow for dbt CI/CD purposes.

What is most valuable?

The most valuable feature is that it's the most popular data orchestration tool in the market right now. It connects to everything you need.

It's open-source. You have a lot of documentation and a lot of people helping out. It has large communities, so if you need something or you want to ask something, you can. Often, someone else would have already asked that question, and they would have already got the answer, and you can just look it up.

Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not. For notifications, it can connect with different messaging tools such as Slack and Teams, as well as with webhooks. It's very easy to use, and it has a lot of features that you would expect from any of the data orchestration tools.

What needs improvement?

Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python. It doesn't provide features for graphical programming. You cannot drag and drop something, build a pipeline out of that, or orchestrate that with a drag and drop. They have a graphical feature but only for administration purposes, not for development. They don't have a UI for development.

It doesn't support the Windows system. That's a big drawback because a lot of people are using Windows. 

For how long have I used the solution?

I used Apache Airflow on my previous project. We had planned to use it in our current project, but due to time issues, we were not able to deploy it. In my previous project, I used it for around eight or nine months.

What do I think about the stability of the solution?

It's a very stable product.

What do I think about the scalability of the solution?

It's highly scalable. You can scale it as much as you want. It depends on the size, and you need to scale up your instance. We had over 3,000 DAGs in our previous project, and we didn't face any issue with even 8 GB memory in our EC2 instance. If you have a lot of DAGs, you might need to scale up, but it's quite lightweight, so you don't need to worry much about that.

How are customer service and support?

It's open source. It was my first project, and I had a few doubts, but everything I needed was available on the internet, so I never had to contact their support. I might have been able to post my questions on their GitHub, but I didn't need that. Airflow has a very large community, so any questions you ask get answered there.

How was the initial setup?

Its setup wasn't done by us. It was done by the Astronomer team on Azure Community Services. So, it was deployed and set up on Azure Community Service. Everything was taken care of by the Astronomer team.

What about the implementation team?

Apache Airflow has two large and popular distributors. There might be others, but the two popular ones are Bitnami and Astronomer. For us, everything was set up by Astronomer.

What's my experience with pricing, setup cost, and licensing?

It's open source. You can install it locally on your own system. If you are deploying it in the production system, you normally deploy it on some cloud, such as EC2 service, which would have some cost. If you are setting up a Docker container or something for Apache Airflow yourself, which is quite easy, you can do pretty much everything online. I have set it up on my local system, and It doesn't take a long time. You can do customization for your project such as selecting different repository databases or selecting different cellular or web services, which is good.

If you are going with a service provider such as Astronomer or Bitnami, they will charge you because they are a distributor of Airflow. They have some of their own features and their own support. They will charge you if you are going with them.

What other advice do I have?

If you are on a Mac or Linux system, it's very easy to install. You can just go to the Apache website to install it, and you can start working, but Apache Airflow doesn't support Windows Exe installation, so if you have some knowledge of Docker containers for WSL, it'll be useful.

Other than that, Astronomer has an instructor called Marc Lamberti who is very popular in the Airflow community. He has YouTube videos. In five minutes, he can teach you how to set up Airflow or what DAGs are. He has five or six videos, and he gets into the details with his videos. So, if you have no idea about Apache Airflow and you don't want to go through all the documentation, you can start with those videos, but if you have a Mac or Linux system, you can directly install it on your system.

I'd rate it a seven out of ten because it doesn't support Windows, and it doesn't support graphical designing, so we cannot create DAGs in the UI. We can administer and look at DAGs through the UI, but we cannot create DAGs through the UI. Other orchestration tools that are available in the market provide that feature.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Nomena NY HOAVY - PeerSpot reviewer
Lead Data Scientist at MVola
Real User
An easy to implement and flexible solution
Pros and Cons
  • "The solution is flexible for all programming languages for all frameworks."
  • "Apache Airflow could be improved by integrating some versioning principles."

What is our primary use case?

Currently, I am a lead data scientist. Our primary use cases for Apache Airflow are for all orchestrations, from the basic big data lake to machine learning predictions. It is used for all the MLS processes. It is also used for some ELT, to transform, load, and export all big data from restricted, unrestricted, and all phase processes.

What is most valuable?

The user experience of Apache Airflow is good. The solution is flexible for all programming languages for all frameworks. I also value that it is used for monitoring. Apache Airflow helps to easily integrate data sources with other products.

What needs improvement?

Apache Airflow could be improved by integrating some versioning principles. Currently, we have to swap some tags in our flow. It would be interesting if we can check the product and version all of the product at the same time comparing what scripts have changed from last year to this year, or last month to this month.

For example, we have a flow for one project, to version it we need to check it one by one to identify which tags changed and which scripts changed. All of these need to be done manually.

For how long have I used the solution?

I have been using Apache Airflow for four months.

What do I think about the stability of the solution?

We have experienced some bugs in Airflow. For example, the solution did not mention all the errors regarding why a process did not work. We had to investigate to try and understand why it was not working.

What do I think about the scalability of the solution?

The solution is easy to scale. We have four people in our organization that use Airflow. One is dedicated to the solution, while the others can use it to adjust the flow of their jobs on their own.

How are customer service and support?

We do not use technical support. We are trained to resolve concerns on our own. If a problem is significant we could call support, however, there is a good developer community that uses Airflow that can help resolve the issue with us.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Prior to using Airflow, I used Windows SSIS for three years. We made the switch because Windows SSIS uses the drag-and-drop concept, where Airflow requires coding. Also, Windows is orientated to Microsoft products and is not very flexible.

How was the initial setup?

I am a technician, so the initial setup is instinctive. Without experience, it would not be as simple. Experience with configurations with parameters is required. The documentation is good, however, it does not mention some features explicitly requiring some research. 

I would rate the ease of implementation a three out of five.

What about the implementation team?

We have dedicated machine learning ops, so we manage all product deployment ourselves. The deployment takes about four days, including two days of administration. 

Apache Airflow requires maintenance. It is very important to maintain all the source codes and all the data. We are looking for a platform that would facilitate the maintenance of the project.

What's my experience with pricing, setup cost, and licensing?

We use a community edition of Apache Airflow. It is open-source and free. 

What other advice do I have?

Anyone considering Apache Airflow should make sure that they have a good team with experience, including some administration. A strong background will help to understand and exploit the strengths of the platform.

I would rate this solution a nine out of 10 overall.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.