Try our new research platform with insights from 80,000+ expert users
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
A useful solution to set up workflows and processes
Pros and Cons
  • "Designing processes and workflows is easier, and it assists in coordinating all of the different processes."
  • "The graphical user interface can be improved."

What is our primary use case?

Our primary use case for the solution is setting up workflows and processes applied everywhere because most industries are based on workflows and processes. We've deployed it for all kinds of workflows within the organization.

What is most valuable?

The ability to easily set up and deploy workflows with Airflows is valuable. Additionally, designing processes and workflows is easier, and it assists in coordinating all of the different processes.

What needs improvement?

The solution can be improved by creating a tool that allows us to do these kinds of things graphically instead of just writing scripts. Hence, the graphical user interface can be improved.

For how long have I used the solution?

We have been using the solution for approximately one year and are currently using the latest version.

Buyer's Guide
Apache Airflow
December 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. Approximately hundreds of thousands of people are utilizing it.

How are customer service and support?

We have not had any issues that require customer service and support.

How was the initial setup?

The initial setup is intermediate, and two people are required for deployment.

What was our ROI?

There is a return on investment because it's free, open source and very useful, so there is a significant return on investment.

What other advice do I have?

I rate the solution an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Global Data Architecture and Data Science Director at FH
Real User
ExpertModerator
Managing large scale Data Pipeline and Python tasks have been made easy
Pros and Cons
  • "I found the following features very useful: DAG - Workload management and orchestration of tasks using."
  • "UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements."

We have been using Apache Airflow for the past 2 years for various use cases such as: 

  • Data Pipeline building and monitoring
  • Automation of data extraction processes and Intelligent Automation
  • Web Scraping at scale for financial services 

We manage large-scale data processing workloads using DAG (Directed Acyclic Graph), which is a core concept of Airflow (Apache Airflow is commonly known as Airflow) expediting error handling and logging. It helped us to manage the complex workflows and orchestration of tasks efficiently.

I found the following features very useful:

  • DAG - Workload management and orchestration of tasks using 
  • TaskFlow API - moving Python tasks have been made easy, cleaning of DAGs using @task decorator in python
  • Connection and Hooks - interface to connect external systems

To be able to implement various useful functionalities of Airflow effectively you would need to be a very good python programmer. UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Airflow
December 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
Solution Architect at EPAM Systems
Real User
Top 5Leaderboard
Simple to automate using Python, but code does not cover all data warehousing tasks
Pros and Cons
  • "This is a simple tool to automate using Python."
  • "We need to develop our workflow description and notations because out of the box, Apache Airflow does not provide some features that are needed."

What is our primary use case?

The primary use case for this solution is to automate ETL process for datawarehouse.

What is most valuable?

The most valuable feature is the UI, for automation.One can monitor all ETL processes in single screen. Complex workflows are shown as DAGs SVG images.

This is a simple tool to automate using Python.

What needs improvement?

There are some drawbacks to this solution. The code does not cover all tasks in the data warehouse automation process.  Currently , in production, we have a large installation with a complex workflow that includes hundreds of tasks. Most of them are dispatched by existing engine, but not all.
For example, sometimes we need to create cycles in our workflow but we are not able to, because Airflow supports only Direct Acyclic Graphs ( DAGs )

We need to develop our workflow description and notations because out of the box, Apache Airflow does not provide some features that are needed. It is our understanding that it is limited by design.

We will wait for the latest 2.0 version, as it is awaited to be much more mature than the 1.8-1.10 version. We believe that it will be better.

There should be some improvement made to the Doc Management features from within the UI. They should think about Outlook integration, which should be out of the box, and the object model should be expanded to support cyclic graphs inside the workflow.

For how long have I used the solution?

We have been using this solution for eighteen months.

What do I think about the stability of the solution?

This solution is not very stable. There are a number of configurations issues.

What do I think about the scalability of the solution?

This solution is scalable. We use this solution in a single node, but it is possible to have a  cluster of workers.

It can be used for one or two thousand related tasks and should be done in a cluster configuration. 

We don't use a cluster, rather we only use single nodes. It is sufficient for our tasks. Tasks are long and the parallelism is limited by the database engine, and not by the workflow engine. 

We would like to evaluate clusters in the future.

We are using the Cron Task scheduling feature for Apache Airflow. Users can configure the Apache Airflow themselves. There are up to ten users that can configure Apache Airflow.

This is a part of the wage solution, and it is the initial point of the wage slot process. The wage solution has hundreds of users.

How are customer service and technical support?

We don't use any paid technical support, as it is an open-source solution. We have used Stack Overflow and other open information sources, but we know that some companies provide technical support. 

As we have studied their solutions that are available on the internet, it is my understanding, that, we are on a pretty high level and could provide commercial support ourselves. 

We don't use any support from commercial companies, but some very useful recent solutions we could extract from Apache Airflow GitHub, as an example.  

Which solution did I use previously and why did I switch?

Previously, we used Control-M for a short period. It was a solution used by our customers, and we needed to understand their difficulties and the results. 

For low to middle scaled tasks, Apache Airflow could be a substitute for Control-M

How was the initial setup?

The deployment model we used was through a private cloud. It was a private installation on Google Cloud.

What about the implementation team?

In-house team.

What was our ROI?

It 's measured jut now. Precise data is awaited in 3..4 months. First conclusion - positive ROI

What's my experience with pricing, setup cost, and licensing?

There are no costs associated with this solution. Apache Airflow is a free solution that can be downloaded and ready for use at any moment.

Which other solutions did I evaluate?

Our tasks can be automated by simple Jenkins, but our customer wanted to implement it on Apache Airflow. This was a solution used by our customer.

Apache Airflow is mainstream and everyone wants to use it. Google provides Apache Airflow as part of the Cloud services.

What other advice do I have?

My advice would be to use this solution for simple tasks. 

They should have a Python expert for features that are not available out of the box, as it is not enough. 

It could be a good solution for enterprise workflow automation and solutions like Control-M within the next two to three years.

We are happy and satisfied with this solution, but not fully satisfied, as this solution has some positive and negative aspects.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Engineering Manager - OTT Platform at Amagi
Vendor
Helps us maintain a clear separation of our functional logic from our operational logic
Pros and Cons
  • "The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow."
  • "One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow."

What is our primary use case?

We are a technology, media, and entertainment-technology company. We are using Apache Airflow for architecting our media workflows. We are using it for two major workflows.

We have had it set up for some time on our own cloud. Recently, we migrated the setup to AWS.

How has it helped my organization?

Airflow is our first choice because we wanted a clear separation of our functional logic from our operational logic. We don't want our microservices to have the cross-cutting responsibilities of our operational logic. Right now, our microservices are the core business' inner functional logic. The majority of our distribution, our decision making, and the majority of our workflow operational responsibilities have been added to Airflow.

What is most valuable?

The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow. 

It's all Python, as well. The majority of the configuration is Python-friendly.

What needs improvement?

One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow. For example, Step 1 of my workflow will have output which I definitely want to automatically be provided as an input to my Step 2. At the workflow level, we want to have common state management where, across steps, we'll be able to reach the state information. Right now, we're using an external state repository to maintain the state.

If Airflow could come up with some kind of implementation, where not every step of the pipeline is an independent step, that would be helpful. I would like it if a part of the output of your previous steps could be Apache input for your next step. That kind of pipeline is missing. When we consider other products like jBPM, Camunda, or Cadence, they have the concept of pipelining.

I would also like to see support for more platforms, in terms of programming BPMs. Cadence supports Golang and Java. Legacy components can be from any platform, so if they could provide more client support for Java client library and Golang, that would be helpful. I want it to program in Java.

For how long have I used the solution?

I have been using Apache Airflow for more than a year.

What do I think about the scalability of the solution?

It's definitely scalable.

We have been using Airflow for sometime but we are not heavily dependent on it. We only have a couple of use cases being executed by Airflow. 

Because we have some data engineering problems, we have a good amount of analytics systems. We have a high volume of data that comes into our system, along with a lot of email, and we have to have an automated data pipeline. Given that, we have all these computing capabilities that are built of microservices. The beauty of it is its scalability. It has every step of your workflow, and it has scheduler capabilities. Every step of your workflow is delegated to one of your nodes. That is being scaled per your computing needs.

We are still evolving. Our business processes are not completely automatic. We're still in the process of identifying what all the automation cases are that we can bring under Airflow. We would like to leverage one common orchestrator or workflow BPM for our complete ecosystem. So we have some architects in our system who are happy with Airflow and others who would like to migrate to some other BPM like Cadence or Apache NiFi. There are a lot of orchestrators and we're just out of the gate. Airflow is still not being heavily used in our enterprise.

Which solution did I use previously and why did I switch?

This is the first workflow BPM tool that we are using in our platforms.

How was the initial setup?

There is comprehensive documentation for setting up a simple workflow and you just follow the documentation for setting things up. We're all engineers so we don't mind if the steps are lengthy, in terms of setting up the system. I'm quite okay with the documentation provided for getting your system up and running.

But I would appreciate it if they published a portal where we could see in what way other businesses, or other technology companies are solving their problems, with some case studies, using Airflow. It would help us to review their case studies. My biggest problem at the time when I was deciding whether Airflow fit our needs or not, was that I was looking for some case studies of technology companies that are already using the solution. With Camunda and jBPM, there is a good quantity of case studies available online.

Which other solutions did I evaluate?

There is no scarcity of BPMs. There are many products online: either open-source or community products or licensed products. There are many good BPMs. The reason that Airflow is in my system is that some of our workflows which we have onboarded are also on Python. Airflow complements that. But the first and foremost ability of any orchestrator should be to integrate with any underlying platform, be it a Java platform or a Python platform. That's the beauty of an orchestrator.

What other advice do I have?

We have a team of people, four to five team members, who initially evaluated Airflow and  wanted to implement it.

We have customers onboarded on our legacy systems. I cannot disrupt the service and bring everything into Airflow. I have to onboard Airflow seamlessly, while I protect my current, ongoing business systems. So I'm trying to balance things here. We have only been able to onboard a couple of workflows. Eventually, we want to do it more fully, but there were a few challenges as I told you: There is no pipeline to take information, which is forcing me to retain my state in a separate state repository. That would be the next big area where I would like to see improvement.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Associate Director - Technologies at a tech services company with 51-200 employees
Real User
Quick and easy to set up, but the technical support needs to be improved
Pros and Cons
  • "The initial setup was straightforward and it does not take long to complete."
  • "Technical support is an area that needs improvement."

What is our primary use case?

Our primary use case is to integrate with SLAs.

What is most valuable?

The most valuable feature is the workflow.

What needs improvement?

Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required.

In the future, I would like to see a single-click installation.

For how long have I used the solution?

We have been working with Apache Airflow for approximately one month.

What do I think about the scalability of the solution?

In our company, we are doing a POC and there are only three users. We have also implemented it for clients.

We do plan to increase our usage and the POC that we are now working on is something that we will implement for other clients if it works.

How are customer service and technical support?

We are not satisfied with technical support. We rely on using Google to identify solutions for the problems we have.

Which solution did I use previously and why did I switch?

We did not use another similar solution prior to Airflow.

How was the initial setup?

The initial setup was straightforward and it does not take long to complete. The deployment took no more than an hour.

Which other solutions did I evaluate?

We evaluated Control-M and another similar product from IBM.

What other advice do I have?

This is a good product and I definitely recommend it.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1468407 - PeerSpot reviewer
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees
Real User
Scalable, stable and simple installation
Pros and Cons
  • "We have been quite satisfied with the stability of the solution."
  • "The dashboard is connected into the BPM flow that could be improved."

What is our primary use case?

We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.

What is most valuable?

I do not have specific feedback because it is quite early in the review stage for comment.

What needs improvement?

The dashboard is connected into the BPM flow that could be improved.

For how long have I used the solution?

I have been using the solution for half a year.

What do I think about the stability of the solution?

We have been quite satisfied with the stability of the solution.

What do I think about the scalability of the solution?

The scalability of the solution is good.

How are customer service and technical support?

We had no issue with technical support.

How was the initial setup?

The installation is straightforward.

What's my experience with pricing, setup cost, and licensing?

The pricing for the product is reasonable.

Which other solutions did I evaluate?

We are evaluating Camunda as well as this solution. We are investigating and trying to determine how suitable they are for production facilities. Additionally, we are seeing where the solutions are actually suitable in what type of processes.

What other advice do I have?


We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities.

I rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.