If it is a large deployment, it is good to go with a managed approach where someone else would be managing for us. If it is small, we can go on our own by spinning up some Kubernetes clusters and deploying it in the cloud. I'd rate the solution nine out of ten.
I recommend Apache Airflow because it's open-source, but you must accept its limitations. However, I wouldn't recommend it to companies in biomedical, chemistry, or oil and gas industries with large workflows and thousands of jobs. For example, genomic analysis at an American multinational pharmaceutical and biotechnology corporation involved workflows with around twenty thousand jobs, which Airflow can't handle. Special schedulers are needed for such cases, as even classical schedulers like Control-M and Automic aren't suitable. I rate the overall solution a seven out of ten.
DAG or a directed acyclic graph functionality has enhanced our company's workflow management since it helps to find out the source tasks that are running and also the target that fall subsequent to them, and it helps figure out how the data flow is working. In DAG, our company can group the tasks, and so that helps to figure out which group of tasks are running. Speaking about my experience with Apache Airflow's UI for monitoring and managing workflows, I would say that the tool's UI allows one to add variables, and it also allows one to check the status of the tasks that are running or the previous run on a particular DAG. Our company can send notifications via Apache Airflow, and also check the connection and configuration details. I recommend the tool to others who plan to use it since it is quite easy to use and it is easy to implement its functionalities for the use cases. I rate the tool a nine out of ten.
You use Apache Airflow to automate your data pipelines. When you have a data pipeline, such as a Spark job or any other job, and want to automate it, triggering the job manually is not always necessary. You need to configure these DAGs accordingly. For instance, Airflow can initiate the job when the data becomes available. We don't need to keep the cluster running all the time, 24/7. We start the cluster using Airflow when we need to submit the job. Once the job is completed, we terminate the cluster. I recommend the solution. Overall, I rate the solution a nine out of ten.
Apache Airflow is a better option for batch jobs. My advice depends on the tools people use and the jobs they schedule. Databricks has its own scheduler. If someone is using Databricks, a separate tool for scheduling would be useless. They can schedule the jobs through Databricks. Apache Airflow is a good option if someone is not using third-party tools to run the jobs. When we use APIs to get data or get data from RDBMS systems, we can use Apache Airflow. If we use third-party vendors, using the in-built scheduler is better than Apache Airflow. Overall, I rate the solution a nine out of ten.
The solution is deployed on the cloud in our organization. Before choosing Apache Airflow, users should try cloud-native services first. Overall, I rate the solution a seven out of ten.
If you have around two thousand pipelines to execute daily within an eight to nine-hour window, Apache Airflow proves to be an excellent solution. I would rate it nine out of ten.
I recommend Apache Airflow because it is still profitable and can be used with multiple systems and servers, Kubernetes systems, and dashboard systems. You can use it to get social media and other data, but it can be expensive. Overall, I rate the solution a nine out of ten.
Since I have been using Apache Airflow for six to seven years, I would confidently rate the solution a solid ten. We help customers re-design and implement projects using Apache Airflow, and approximately 90% of our work revolves around this powerful tool. So, I rate this product a perfect ten.
A new user has to be prepared to adopt a new paradigm and treat the data baseline as code rather than drag and drop. An organization should have a dedicated or experienced person looking into this. Overall, I rate the solution an eight out of ten.
Lead of Monitoring Tech at a educational organization with 1,001-5,000 employees
Real User
Top 20
2023-06-28T06:37:50Z
Jun 28, 2023
Depending on your use case, if you are looking for a quick solution to work on and know Python, you should go ahead with Apache Airflow. Apache Airflow is a good enough tool for managing data pipelines. However, the solution is not up to the mark as you scale up and go at the higher performance. Apache Airflow has introduced the DAG connector for managing data pipelines. Overall, I rate Apache Airflow an eight out of ten.
I would recommend this solution for projects even though there have been glitches. Once the solution has become stable it would be ideal for critical projects. I rate Apache Airflow a seven out of ten. This is great software to build data pipelines. However, we had many glitches that were causing some problems in production.
I do not have exposure to use cases for large organizations with a huge user environment, so I cannot speak to the solution's effectiveness in these scenarios. I rate the solution an eight out of ten.
Senior Data Analytics at a media company with 1,001-5,000 employees
Real User
2022-08-31T14:14:49Z
Aug 31, 2022
I rate this solution a seven out of ten. My advice to new users is to have good proficiency with Python language. The solution is good but can be improved by simplifying its integration process.
Anyone considering Apache Airflow should make sure that they have a good team with experience, including some administration. A strong background will help to understand and exploit the strengths of the platform. I would rate this solution a nine out of 10 overall.
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees
Real User
2021-03-26T23:33:18Z
Mar 26, 2021
I usually create my own custom operators every time. We upgraded to 2.0, but I am not using any of the new features. I haven't yet used DAG of DAGs or the new way of using Python functions in the Python operator yet. But we might use DAG of DAGs eventually. I Love this solution and I would rate it a nine out of ten.
I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable. I would rate Apache Airflow an eight out of ten.
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees
Real User
2021-01-15T22:07:12Z
Jan 15, 2021
We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities. I rate Apache Airflow an eight out of ten.
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees
Real User
2020-12-22T20:05:08Z
Dec 22, 2020
We're just customers and end-users. We don't have a special business relationship with Apache. I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions. I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases. After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist. Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be. Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API. In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.
We have a team of people, four to five team members, who initially evaluated Airflow and wanted to implement it. We have customers onboarded on our legacy systems. I cannot disrupt the service and bring everything into Airflow. I have to onboard Airflow seamlessly, while I protect my current, ongoing business systems. So I'm trying to balance things here. We have only been able to onboard a couple of workflows. Eventually, we want to do it more fully, but there were a few challenges as I told you: There is no pipeline to take information, which is forcing me to retain my state in a separate state repository. That would be the next big area where I would like to see improvement.
My advice would be to use this solution for simple tasks. They should have a Python expert for features that are not available out of the box, as it is not enough. It could be a good solution for enterprise workflow automation and solutions like Control-M within the next two to three years. We are happy and satisfied with this solution, but not fully satisfied, as this solution has some positive and negative aspects. I would rate this solution a seven out of ten.
Apache Airflow is an open-source workflow management system (WMS) that is primarily used to programmatically author, orchestrate, schedule, and monitor data pipelines as well as workflows. The solution makes it possible for you to manage your data pipelines by authoring workflows as directed acyclic graphs (DAGs) of tasks. By using Apache Airflow, you can orchestrate data pipelines over object stores and data warehouses, run workflows that are not data-related, and can also create and manage...
If it is a large deployment, it is good to go with a managed approach where someone else would be managing for us. If it is small, we can go on our own by spinning up some Kubernetes clusters and deploying it in the cloud. I'd rate the solution nine out of ten.
I recommend Apache Airflow because it's open-source, but you must accept its limitations. However, I wouldn't recommend it to companies in biomedical, chemistry, or oil and gas industries with large workflows and thousands of jobs. For example, genomic analysis at an American multinational pharmaceutical and biotechnology corporation involved workflows with around twenty thousand jobs, which Airflow can't handle. Special schedulers are needed for such cases, as even classical schedulers like Control-M and Automic aren't suitable. I rate the overall solution a seven out of ten.
DAG or a directed acyclic graph functionality has enhanced our company's workflow management since it helps to find out the source tasks that are running and also the target that fall subsequent to them, and it helps figure out how the data flow is working. In DAG, our company can group the tasks, and so that helps to figure out which group of tasks are running. Speaking about my experience with Apache Airflow's UI for monitoring and managing workflows, I would say that the tool's UI allows one to add variables, and it also allows one to check the status of the tasks that are running or the previous run on a particular DAG. Our company can send notifications via Apache Airflow, and also check the connection and configuration details. I recommend the tool to others who plan to use it since it is quite easy to use and it is easy to implement its functionalities for the use cases. I rate the tool a nine out of ten.
You use Apache Airflow to automate your data pipelines. When you have a data pipeline, such as a Spark job or any other job, and want to automate it, triggering the job manually is not always necessary. You need to configure these DAGs accordingly. For instance, Airflow can initiate the job when the data becomes available. We don't need to keep the cluster running all the time, 24/7. We start the cluster using Airflow when we need to submit the job. Once the job is completed, we terminate the cluster. I recommend the solution. Overall, I rate the solution a nine out of ten.
Apache Airflow is a better option for batch jobs. My advice depends on the tools people use and the jobs they schedule. Databricks has its own scheduler. If someone is using Databricks, a separate tool for scheduling would be useless. They can schedule the jobs through Databricks. Apache Airflow is a good option if someone is not using third-party tools to run the jobs. When we use APIs to get data or get data from RDBMS systems, we can use Apache Airflow. If we use third-party vendors, using the in-built scheduler is better than Apache Airflow. Overall, I rate the solution a nine out of ten.
The solution is deployed on the cloud in our organization. Before choosing Apache Airflow, users should try cloud-native services first. Overall, I rate the solution a seven out of ten.
It's well-suited for simple, certain programs. Overall, I would rate the solution a seven out of ten.
If you have around two thousand pipelines to execute daily within an eight to nine-hour window, Apache Airflow proves to be an excellent solution. I would rate it nine out of ten.
Apache Airflow is deployed on-cloud in our organization. Overall, I rate Apache Airflow a nine out of ten.
I recommend Apache Airflow because it is still profitable and can be used with multiple systems and servers, Kubernetes systems, and dashboard systems. You can use it to get social media and other data, but it can be expensive. Overall, I rate the solution a nine out of ten.
Since I have been using Apache Airflow for six to seven years, I would confidently rate the solution a solid ten. We help customers re-design and implement projects using Apache Airflow, and approximately 90% of our work revolves around this powerful tool. So, I rate this product a perfect ten.
A new user has to be prepared to adopt a new paradigm and treat the data baseline as code rather than drag and drop. An organization should have a dedicated or experienced person looking into this. Overall, I rate the solution an eight out of ten.
Depending on your use case, if you are looking for a quick solution to work on and know Python, you should go ahead with Apache Airflow. Apache Airflow is a good enough tool for managing data pipelines. However, the solution is not up to the mark as you scale up and go at the higher performance. Apache Airflow has introduced the DAG connector for managing data pipelines. Overall, I rate Apache Airflow an eight out of ten.
I would recommend this solution for projects even though there have been glitches. Once the solution has become stable it would be ideal for critical projects. I rate Apache Airflow a seven out of ten. This is great software to build data pipelines. However, we had many glitches that were causing some problems in production.
I do not have exposure to use cases for large organizations with a huge user environment, so I cannot speak to the solution's effectiveness in these scenarios. I rate the solution an eight out of ten.
I rate the solution an eight out of ten.
I would rate Apache Airflow eight out of ten.
I rate this solution a seven out of ten. My advice to new users is to have good proficiency with Python language. The solution is good but can be improved by simplifying its integration process.
Anyone considering Apache Airflow should make sure that they have a good team with experience, including some administration. A strong background will help to understand and exploit the strengths of the platform. I would rate this solution a nine out of 10 overall.
I usually create my own custom operators every time. We upgraded to 2.0, but I am not using any of the new features. I haven't yet used DAG of DAGs or the new way of using Python functions in the Python operator yet. But we might use DAG of DAGs eventually. I Love this solution and I would rate it a nine out of ten.
I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable. I would rate Apache Airflow an eight out of ten.
We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities. I rate Apache Airflow an eight out of ten.
This is a good product and I definitely recommend it. I would rate this solution a seven out of ten.
We're just customers and end-users. We don't have a special business relationship with Apache. I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions. I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases. After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist. Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be. Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API. In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.
We have a team of people, four to five team members, who initially evaluated Airflow and wanted to implement it. We have customers onboarded on our legacy systems. I cannot disrupt the service and bring everything into Airflow. I have to onboard Airflow seamlessly, while I protect my current, ongoing business systems. So I'm trying to balance things here. We have only been able to onboard a couple of workflows. Eventually, we want to do it more fully, but there were a few challenges as I told you: There is no pipeline to take information, which is forcing me to retain my state in a separate state repository. That would be the next big area where I would like to see improvement.
My advice would be to use this solution for simple tasks. They should have a Python expert for features that are not available out of the box, as it is not enough. It could be a good solution for enterprise workflow automation and solutions like Control-M within the next two to three years. We are happy and satisfied with this solution, but not fully satisfied, as this solution has some positive and negative aspects. I would rate this solution a seven out of ten.