We use Apache Airflow to send our data to a third-party system.
Lead of Monitoring Tech at a educational organization with 1,001-5,000 employees
A good tool for managing data pipelines
Pros and Cons
- "Since Apache works very well on Python, we can manage everything and create pipelines there."
- "Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful."
What is our primary use case?
What is most valuable?
We are already on Python. Since Apache works very well on Python, we can manage everything and create pipelines there.
What needs improvement?
Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful. Apache Airflow is not that easy to use, but we have gotten used to it.
For how long have I used the solution?
I have been using Apache Airflow for three years.
Buyer's Guide
Apache Airflow
December 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
What do I think about the stability of the solution?
Apache Airflow is a stable solution.
What do I think about the scalability of the solution?
Apache Airflow is not a scalable solution for our use cases. We have a very huge list of use cases. Over 10 developers use Apache Airflow in our organization.
How are customer service and support?
Apache Airflow's technical support team is good and provides assistance almost 90% of the time.
How was the initial setup?
Apache Airflow's initial setup is easy. It's not that difficult, but it has a learning curve.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is a cheap solution.
What other advice do I have?
Depending on your use case, if you are looking for a quick solution to work on and know Python, you should go ahead with Apache Airflow.
Apache Airflow is a good enough tool for managing data pipelines. However, the solution is not up to the mark as you scale up and go at the higher performance. Apache Airflow has introduced the DAG connector for managing data pipelines.
Overall, I rate Apache Airflow an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Member Of Technical Staff, Engineering Operations at VMware
Flexible open-source solution
Pros and Cons
- "Apache Airflow's best feature is its flexibility."
- "Apache Airflow could be improved with the addition of more frameworks."
What is most valuable?
Apache Airflow's best feature is its flexibility.
What needs improvement?
Apache Airflow could be improved with the addition of more frameworks.
For how long have I used the solution?
I've been using Apache Airflow for four years.
What do I think about the stability of the solution?
Apache Airflow is stable.
What do I think about the scalability of the solution?
Apache Airflow is scalable.
How was the initial setup?
The initial setup was very easy.
What about the implementation team?
We used an in-house team.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is open-source and free of charge.
What other advice do I have?
I would rate Apache Airflow eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Apache Airflow
December 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
Project Manager at Siren Analytics
Very stable, easy to learn, and quite configurable
Pros and Cons
- "The solution is quite configurable so it is easy to code within a configuration kind of environment."
- "The dashboards could be enhanced."
What is our primary use case?
We use this solution to monitor BD tasks.
What is most valuable?
The solution is quite configurable so it is easy to code within a configuration kind of environment.
The ease of learning and using the solution is quite good. The learning curve is low so new users can learn in a short period of time in comparison to other products.
What needs improvement?
The following should be improved:
- Dashboards
- Security
- Airflow web UI
- Telemetry for logging, monitoring, and alerting purposes
- Documentation
For how long have I used the solution?
I have used the solution for six months.
What do I think about the stability of the solution?
The solution is 99% stable. We have a few glitches here and there but have been able to fix them.
What do I think about the scalability of the solution?
The solution is quite scalable. You can grow in terms of users and environment. You can grow to multi-server applications. You can use the solution on desktops, mobile, or other devices.
How are customer service and support?
We have an internal tech support team so have not needed support from the vendor.
How was the initial setup?
The setup is straightforward. The time for deployment depends on the environment and user base.
What about the implementation team?
We implement the solution in-house. We have one implementation with 60 users and another with 75 users.
We have a tech support team that consists of ten engineers who support implementations. They follow up on issues that might arise during the process automation or implementation of the workflow itself.
For example, our tech support team will resolve a workflow that gets stuck during the MDM workflow engine. The tech team has the knowledge base to resolve any of these issues.
What's my experience with pricing, setup cost, and licensing?
The solution is open source.
What other advice do I have?
I do not have exposure to use cases for large organizations with a huge user environment, so I cannot speak to the solution's effectiveness in these scenarios.
I rate the solution an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Technical Lead at a media company with 5,001-10,000 employees
Useful for scheduling purposes but should include no-code capabilities
Pros and Cons
- "It's stable."
- "I would like to see some no-code capabilities and drag and drop abilities in Airflow."
What is our primary use case?
I use this solution for scheduling purposes. We have our own Python framework to run jobs, do the extractions, and for transformation loading.
We have 20 people who are using Airflow. It's being used on a daily basis. We don't have any plans to increase usage because we have low data sets.
The solution is deployed on cloud. The cloud provider is Azure.
What needs improvement?
Everything is in the Python framework now. I would like to see some no-code capabilities and drag and drop abilities in Airflow.
We're expecting a few more improvements in the log generator. Currently, it's very clumsy.
For how long have I used the solution?
I have used Apache Airflow for three years.
What do I think about the stability of the solution?
It's stable.
What do I think about the scalability of the solution?
It's scalable. So far, we haven't needed more scalability because it's totally controlled by administrators.
Which solution did I use previously and why did I switch?
The only difference between Apache Airflow and BPM software is the pricing.
How was the initial setup?
Setup is about medium difficulty. You need to have some prior knowledge and experience with docker containers and AKS.
What's my experience with pricing, setup cost, and licensing?
It's open-source.
What other advice do I have?
I would rate this solution as seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Analytics Engineer at TalkDesk
A useful tool for data orchestration and collecting information
Pros and Cons
- "The solution's UI allows me to collect all the information and see the code lines."
- "I have some issues with the solution's communication."
What is our primary use case?
We use Apache Airflow for data orchestration.
What is most valuable?
Apache Airflow is a pretty useful tool for collecting information. Apache Airflow is a pretty easy solution that can be used with Python. The solution's UI allows me to collect all the information and see the code lines.
What needs improvement?
I have some issues with the solution's communication. The solution uses the same database or data set. Sometimes, we consume the same data and send it to a different place when doing a different DAG. When using the UI, I want to see that we use the same data set more than once.
For how long have I used the solution?
I have been using Apache Airflow for five years.
What do I think about the stability of the solution?
I rate Apache Airflow a seven out of ten for stability.
What do I think about the scalability of the solution?
I rate Apache Airflow an eight out of ten for scalability. Around 400 users are using the solution in our organization.
Which solution did I use previously and why did I switch?
I previously used Control-M and some AWS and Google Cloud Platform tools.
How was the initial setup?
Apache Airflow's initial setup is pretty straightforward. Apache Airflow is quite intuitive to set up and create DAGs.
What about the implementation team?
It takes around two days to deploy Apache Airflow. A DAG can be created in just a few hours.
What other advice do I have?
Apache Airflow is deployed on-cloud in our organization.
Overall, I rate Apache Airflow a nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees
Feature rich, open-source, and good for building data pipelines
Pros and Cons
- "I like the UI rework, it's much easier."
- "I would like to see it more friendly for other use cases."
What is our primary use case?
I'm a data engineer. In the past, I used Airflow for building data pipelines and to populate data warehouses. With my current company, it's a data product or datasets that we sell to biopharma companies.
We are using those pipelines to generate those datasets.
What is most valuable?
I like the UI rework, it's much easier.
I use XCom for derived variables that need to pass between tasks. I don't really tend to use it for passing data, but only for a derived variable. For example, I don't have to re-query something every time, with one-task uses. I use the JSON comp for overwriting certain parameters.
In our use cases, some of the inputs of the dataset are files that we pulled out of S3. Sometimes they need to re-do those files, but we don't need to change any logic, we just need to redo the bills. Rather than redeploying the code to point to a new S3 bucket, we overwrite it to point to a different S3 key.
I have read that there are many different workflow pipelining tools in the biotech space, such as Snakemake and Nextflow.
There is also a CWL plugin that we may look into at some point.
Eventually, we might have a use case where a researcher has a pipeline they run locally, and then we want to convert that to a DAG.
The CWL-Airflow plugin would be useful for that. This might be something to look into later. But that would be like months, or maybe a year from now.
What needs improvement?
I am using a Celery Executor and I find that it crashes and I can't see any logs. I can only assume that it's a memory issue and have to blindly restart until eventually, it starts up again.
One of the use cases is triggered by input rather than a batch process. For example, we receive a batch of data, it goes through tasks one, two, and three, and a new batch comes in, each subsequent task should be operating on just that data from the prior task.
I am used to working on it as the output gets written to a table and then the next task selects all from that upstream table. It could be coded where you are only writing the data for that portion of the task. It could handle state machines and state changes as opposed to the batch proxy.
I would like to see it more friendly for other use cases.
For how long have I used the solution?
In my current company, I just introduced it within the last couple of months. But I've used it at my prior two jobs as well.
We are using Version 2.0.1.
What's my experience with pricing, setup cost, and licensing?
We are using the open-source version of Apache Airflow.
What other advice do I have?
I usually create my own custom operators every time. We upgraded to 2.0, but I am not using any of the new features.
I haven't yet used DAG of DAGs or the new way of using Python functions in the Python operator yet. But we might use DAG of DAGs eventually.
I Love this solution and I would rate it a nine out of ten.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Data Analytics at a media company with 1,001-5,000 employees
A customizable solution, but the integration process could be simplified
Pros and Cons
- "The best feature is the customization."
- "The solution could be improved by simplifying the integration process."
What is our primary use case?
Our primary use case for this solution is scheduling task rates. We capture the data from the SQL Server location and migrate it to the central data warehouse.
What is most valuable?
The best feature is the customization that can be done using Python. For example, there are use cases where we have to tweak the algorithm and with Apache Script Rate, we have extra functionality that helps to change the underlying process. We can define our algorithms and processes using Python.
What needs improvement?
The solution could be improved by simplifying the integration process and providing access to its support team to guide integration.
For how long have I used the solution?
We have been using this solution for two months and it is deployed on-premises.
What do I think about the stability of the solution?
The solution is stable but primarily depends on the support team and how they manage it.
What do I think about the scalability of the solution?
Apache Airflow is scalable. Approximately 20 people use this solution on my team.
How are customer service and support?
We haven't had any experience with customer service and support.
Which solution did I use previously and why did I switch?
Previously, we were using SQL server integration tools and integration service SSIS packages. We had project orders and wanted to migrate everything as it was an open source rate and no license was required. We switched to Apache Flow because we are trying to migrate all the projects developed in SSIS using Python.
How was the initial setup?
The initial setup was straightforward. However, if a script is written, it takes four to five minutes to set up.
What's my experience with pricing, setup cost, and licensing?
Apache Airflow is open source, so I cannot comment on licensing costs.
Which other solutions did I evaluate?
We chose this solution because it was suitable for our business needs.
What other advice do I have?
I rate this solution a seven out of ten. My advice to new users is to have good proficiency with Python language. The solution is good but can be improved by simplifying its integration process.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees
Integrates well with other pipelines and builds different processes well but the scalability needs improvement
Pros and Cons
- "The product integrates well with other pipelines and solutions."
- "The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not."
What is our primary use case?
We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.
What is most valuable?
The product integrates well with other pipelines and solutions.
The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows.
What needs improvement?
The graphics in the past have not been ideal.
We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself.
The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves.
The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not.
There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them.
There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.
For how long have I used the solution?
I've been using the solution for maybe three years at this point. It hasn't been too long.
What do I think about the stability of the solution?
The solution is largely stable. Obviously when you start creating more use cases, then you realize the limitations, however, it's not really, really bad.
What do I think about the scalability of the solution?
Due to the fact that the solution is on the cloud, we thought it would be fairly easy to scale. This is proving not to be the case and scalability is limited.
The challenging part is to make it really flexible in a cloud-native environment. With other applications, what you have there is the scalability that can be sensitive to your needs, based on the amount of data you are putting into the flow.
Instead of you having to create your own logic to scale it up, it should be a little more efficient on how it gets integrated into the whole environment. You have to get a little bit creative and put some commands and some logic in there and be monitoring everything. You build everything - versus other options that are more out of the box. With other solutions, if you have these bursts of data they ultimately can scale up and they are more native.
How are customer service and technical support?
Technical support has been pretty good. We don't really have anything to complain about. We're satisfied with the service so far.
Which solution did I use previously and why did I switch?
For this particular category, due to the fact that we're testing all the other tools and they were too much of what we needed and due to the fact that we have used other products in other projects, and nothing really worked for us. Airflow, being a bit different, we decided that it was a nice player and a good open-source tool.
We do use other tools. However, this one seems to work quite well for us.
How was the initial setup?
The initial setup isn't as straightforward as we hoped. It's not as flexible as other options. You need to be a bit creative during the process.
What's my experience with pricing, setup cost, and licensing?
This product is open-source.
What other advice do I have?
We're just customers and end-users. We don't have a special business relationship with Apache.
I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions.
I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases.
After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist.
Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be.
Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API.
In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2024
Product Categories
Business Process Management (BPM)Popular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Camunda
Appian
Pega Platform
SAP Signavio Process Manager
Bizagi
IBM BPM
ARIS BPA
Bonita
Nintex Process Platform
AWS Step Functions
IBM Business Automation Workflow
Oracle BPM
KiSSFLOW
Boomi AtomSphere Flow
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which would you choose - Camunda Platform or Apache Airflow?
- When evaluating Business Process Management, what aspect do you think is the most important to look for?
- Camunda or Bonitasoft?
- Do you know of a solution which fulfills the requirements listed below?
- Looking for a BPMN tool that is easy to use and reasonably priced
- Which is the best Workflow Automation Platform with microservices?
- Which tool do you recommend for business process modeling only?
- RPA vs BPM: do they complement each other?
- What is the ROI of BPM solutions for a company which currently isn't using one?
- What Business Process Management (BPM) workflow solution would you recommend for others?