We are currently migrating from on-prem to the cloud, and our on-prem tables are getting data from upstream. We used ADF to build a pipeline to facilitate this migration. A team of 15-20 people currently uses ADF, and more will join once it goes live.
Associate Specialist at Synechron
We can integrate our Databricks notebooks and schedule them
Pros and Cons
- "ADF is another ETL tool similar to Informatica that can transform data or copy it from on-prem to the cloud or vice versa. Once we have the data, we can apply various transformations to it and schedule our pipeline according to our business needs. ADF integrates with Databricks. We can call our Databricks notebooks and schedule them via ADF."
- "I rate Azure Data Factory six out of 10 for stability. ADF is stable now, but we had problems recently with indexing on an SQL database. It's slow when dealing with a huge volume of data. It depends on whether the database is configured as general purpose or hyperscale."
What is our primary use case?
What is most valuable?
ADF is another ETL tool similar to Informatica that can transform data or copy it from on-prem to the cloud or vice versa. Once we have the data, we can apply various transformations to it and schedule our pipeline according to our business needs. ADF integrates with Databricks. We can call our Databricks notebooks and schedule them via ADF.
For how long have I used the solution?
I have used Azure Data Factory for about six months.
What do I think about the stability of the solution?
I rate Azure Data Factory six out of 10 for stability. ADF is stable now, but we had problems recently with indexing on an SQL database. It's slow when dealing with a huge volume of data. It depends on whether the database is configured as general purpose or hyperscale.
Buyer's Guide
Azure Data Factory
December 2024
Learn what your peers think about Azure Data Factory. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
How was the initial setup?
I rate Azure Data Factory eight out of 10 for ease of setup. The deployment time depends on the data volume. Four million records will take longer than four thousand. Migrating our full load from on-prem to the cloud took around 16-18 hours because the volume was 17 million.
What's my experience with pricing, setup cost, and licensing?
I rate ADF six out of 10 for affordability. The cost depends on the services we use. It's usage-based.
What other advice do I have?
I rate Azure Data Factory seven out of 10. Companies that want to migrate from on-prem to the cloud have lots of options. I haven't explored them all, but Azure, GCP, and AWS are essentially all the same.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Data Engineer at Shell
Helps to pull data from on-premises systems and supports large data volumes
Pros and Cons
- "The solution handles large volumes of data very well. One of its best features is its ability to integrate data end-to-end, from pulling data from the source to accessing Databricks. This makes it quite useful for our needs."
- "The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter."
What is our primary use case?
My main use case for Azure Data Factory is to pull data from on-premises systems. Most data transformation is done through Databricks, but Data Factory mainly pulls data into different services.
What is most valuable?
The solution handles large volumes of data very well. One of its best features is its ability to integrate data end-to-end, from pulling data from the source to accessing Databricks. This makes it quite useful for our needs.
What needs improvement?
The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter.
One specific issue is with parallel executions. When running parallel executions for multiple tables, I noticed a performance slowdown.
For how long have I used the solution?
I have been working with the product for five years.
What do I think about the stability of the solution?
We haven't faced any issues with the tool's stability.
What do I think about the scalability of the solution?
The solution can handle large datasets.
How are customer service and support?
I am satisfied with Microsoft's support. They provide solutions to our challenges.
How would you rate customer service and support?
Positive
What's my experience with pricing, setup cost, and licensing?
The solution is cheap.
What other advice do I have?
I rate the overall product an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Jul 30, 2024
Flag as inappropriateBuyer's Guide
Azure Data Factory
December 2024
Learn what your peers think about Azure Data Factory. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
Specialist Software Engineer at a financial services firm with 10,001+ employees
Faster than other solutions, has multiple connectors, and is easy to set up
Pros and Cons
- "One advantage of Azure Data Factory is that it's fast, unlike SSIS and other on-premise tools. It's also very convenient because it has multiple connectors. The availability of native connectors allows you to connect to several resources to analyze data streams."
- "There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation."
What is our primary use case?
I use Azure Data Factory for architecture creation, for example, loading data from Oracle DB to Azure Synapse Analytics, creating facts and dimensions using Azure Data Pipeline, and creating Azure Synapse notebooks for data transformation.
Another use case for Azure Data Factory is dashboard creation to help customers make informed decisions.
How has it helped my organization?
Compared to the on-premise SSIS, Azure Data Factory has better infrastructure. It also benefits my company because you can scale the solution up or down with different resources.
Azure Data Factory is also on a pay-as-you-go or pay-as-you-use model, which is suitable for the company because my company only pays for its usage or requirement.
The solution is also very user-friendly, and the Azure Data Factory support team responds quickly whenever my team has a loading issue.
What is most valuable?
One advantage of Azure Data Factory is that it's fast, unlike SSIS and other on-premise tools.
It's also very convenient because Azure Data Factory has multiple connectors. It has sixty connectors which you can't find in SSIS. The availability of native connectors allows you to connect to several resources to analyze data streams.
I also like that you can set up your own VM and infrastructure on Azure Data Factory without any help from the IT team because it only requires a single click.
What needs improvement?
What's missing in Azure Data Factory is an Oracle connector. If you want to connect directly to the Oracle database, you must copy and transform the data. There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation.
Sending out emails after a job is completed is another area for improvement in the tool.
For how long have I used the solution?
I've been using Azure Data Factory for three years.
What do I think about the scalability of the solution?
Azure Data Factory is a scalable tool.
Which solution did I use previously and why did I switch?
We used SSIS, but its on-premise version is slower than Azure Data Factory, and Azure Data Factory, infrastructure-wise, is better, so we went with Azure Data Factory.
How was the initial setup?
The initial setup for Azure Data Factory is an eight out of ten.
Manually deploying Azure Data Factory is easy and doesn't take much time, but I'm not sure how long it takes for an automated approach to deployment.
What's my experience with pricing, setup cost, and licensing?
The licensing model for Azure Data Factory is good because you won't have to overpay. Pricing-wise, the solution is a five out of ten. It was not expensive, and it was not cheap. It's in the middle.
What other advice do I have?
I have experience with both Azure Data Factory and SSIS.
I'm using the latest version of Azure Data Factory.
My rating for Azure Data Factory is eight out of ten.
My company is an Azure Data Factory user.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Project Lead at a manufacturing company with 10,001+ employees
It lets you create ETL pipelines, and it comes with a good dashboard and many connectors
Pros and Cons
- "What I like best about Azure Data Factory is that it allows you to create pipelines, specifically ETL pipelines. I also like that Azure Data Factory has connectors and solves most of my company's problems."
- "A room for improvement in Azure Data Factory is its speed. Parallelization also needs improvement."
What is our primary use case?
I can't go into specifics about the use case for Azure Data Factory, but it's for analytics related to an assessment.
What is most valuable?
What I like best about Azure Data Factory is that it allows you to create pipelines, specifically ETL pipelines.
I also like that Azure Data Factory has connectors and solves most of my company's problems. I can't recall a case where I couldn't use the solution for solving problems.
I'm also happy about the Azure Data Factory dashboard.
What needs improvement?
A room for improvement in Azure Data Factory is its speed. Parallelization also needs improvement. As for the rest of the features of Azure Data Factory, I'm happy.
I cannot suggest an additional feature I'd like to see in Azure Data Factory in the future because some of the features aren't available internally because the features undergo security evaluation first, and my organization controls which features would become available to users.
For how long have I used the solution?
I've been using Azure Data Factory for the last two years.
What do I think about the stability of the solution?
We're happy with the stability of Azure Data Factory.
What do I think about the scalability of the solution?
Azure Data Factory is scalable, with clusters available on demand. There isn't any issue with scaling the solution.
How are customer service and support?
We have an internal support team and the Azure Data Factory support team. We raise tickets and follow up on those tickets, and on a scale of one to five, we'd rate support as four because sometimes there are delays. Otherwise, we are satisfied with Azure Data Factory support.
How was the initial setup?
My company didn't set up Azure Data Factory as the Azure team did it.
What about the implementation team?
We outsourced the implementation of Azure Data Factory directly to the Azure team.
What's my experience with pricing, setup cost, and licensing?
I have no idea how much Azure Data Factory costs.
Which other solutions did I evaluate?
We're using AWS apart from Azure Data Factory. We're trying out Palantir Foundry as well. They are the leading service providers in the data analytics and ETL world.
What other advice do I have?
I'm familiar with Palantir Foundry, but my company just recently got the Palantir Foundry license, so I'm still not using it, but checking it for shortcomings.
I have experience with Azure Data Factory, too.
I'm unsure of the exact version of Azure Data Factory, but I'm using the latest version or whatever's available on Azure.
I have a vague figure of users of Azure Data Factory, but it's more than one thousand to one thousand five hundred people.
I'd tell people who want to use Azure Data Factory that Microsoft offers excellent courses, ESI (Enterprise Skill Initiatives). You should register and take the courses. Azure Data Factory is a solution I'd recommend to others.
I'd rate Azure Data Factory as nine out of ten because it has a lot of connectors, even custom connectors, for data onboarding. It can also integrate with Spark notebooks and allows my organization to parallelize code. Azure Data Factory also has provisions for Spark and SQL scripts or any scripts, plus the infrastructure is highly scalable, so it's a nine for me.
My organization is a customer of Azure Data Factory.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Integration Solutions Lead | Digital Core Transformation Service Line at Hexaware Technologies Limited
Helps to pull records and parse them quickly, but the exception handling and logging mechanisms can be improved
Pros and Cons
- "We have found the bulk load feature very valuable."
- "When the record fails, it's tough to identify and log."
What is our primary use case?
Our primary use case for the solution is data integration and we deploy it only on Azure.
How has it helped my organization?
When we were integrating the Ports product with our internal data warehouse, we had to update all the reports to our internal data warehouse on the Ports system database. However, they were not given access to the database company, and they dump some files or provide you with them. In one case, they were providing files. In another case, they provided some APIs where you need to call in a batch of thousands of records multiple times. It works very well with Azure Data Factory to pull the records, parse them quickly and post them in the database and data warehouse.
What is most valuable?
We have found the bulk load feature very valuable.
What needs improvement?
The only challenge with Azure Data Factory is its exception-handling mechanism. When the record fails, it's tough to identify and log.
For how long have I used the solution?
We have been using the solution for a year and a half and are currently using the latest version.
What do I think about the scalability of the solution?
The solution is scalable and we intend to further increase its usage in the future.
How are customer service and support?
I rate customer service and support an eight out of ten.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We previously used different solutions.
How was the initial setup?
The initial setup is straightforward.
What about the implementation team?
The implementation was done in-house.
What's my experience with pricing, setup cost, and licensing?
I cannot comment on licensing costs because I was not involved.
What other advice do I have?
I rate the solution a six out of ten. The solution is good but its exception handling and logging mechanisms can be improved. I advice users considering this solution to go for it especially if their integrations are heavy on the data side.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Engineering Manager at a energy/utilities company with 10,001+ employees
A good and constantly improving solution but the Flowlets could be reconfigured
Pros and Cons
- "Azure Data Factory became more user-friendly when data-flows were introduced."
- "Azure Data Factory uses many resources and has issues with parallel workflows."
What is our primary use case?
We use this solution to ingest data from one of the source systems from SAP. From the SAP HANA view, we push data to our data pond and ingest it into our data warehouse.
How has it helped my organization?
Azure Data Factory didn't bring a lot of good when we were also using Alteryx. Alteryx is user-friendly, while Azure Data Factory uses many resources and has issues with parallel workflows. Alteryx helps you diagnose issues quicker than Azure Data Factory because it's on the cloud and has a cold start debugger.
Azure Data Factory has to wake up whenever you are trying to do testing, and it takes about four to five minutes. It's not always online to do a quick test. For example, if we want to test an Excel file to see if the formatting is correct or why the data-flow or pipeline is failing, we need to wait four to five minutes to get the cold start debugger to run. Compared to Alteryx, Azure Data Factory could be better. Nevertheless, we are using it because we have to.
What is most valuable?
Initially, when we started using it, we didn't like it because it needed to be more mature and had data-flows, so we used the traditional pipeline. After that, Azure Data Factory introduced the concept of data-flows, and it started to become more mature and look more like Alteryx. Azure Data Factory became more user-friendly when data-flows were introduced.
What needs improvement?
They introduced the concept of Flowlets, but it has bugs. Flowlets are a reusable component that allows you to create data-flows. We can configure a Flowlet as a reusable pipeline and plug it inside different data-flows, so we don't have to rewrite our code or visual transformation.
If we make any changes in our data-flow, it reverts all our changes to the original state of the Flowlet. It does not retain changes, and we must reconfigure the Flowlets repeatedly. We had these issues three months ago so things might have changed. It works fine whenever we plug it in and configure it in our data-flow, but if we make minor changes to it, the Flowlet needs to be reconfigured again and loses the configuration.
For how long have I used the solution?
We have used this solution for about a month and a half. It is a cloud-based tool, so there are no versions. It is all deployed on Azure Cloud.
What do I think about the stability of the solution?
Everything is computed inside the SQL server if we're working with pipelines, so we have to be very careful when designing our solution in Azure Data Factory. Alteryx spoiled us because we never cared how it looked in the backend because all the operations were happening on the Alteryx server. But in Azure Data Factory, they run on the capacity of our data warehouse. So Azure Data Factory cannot run your queries, and it directly sends the query to the instance in the SQL server or data warehouse. So we have to be very careful about how we perform certain operations.
We need to have knowledge of SQL and how to optimize our queries. If we are calling a stored procedure, it joins one table in Alteryx. It is pretty easy, and we just put a joint tool. Suppose we want to do it with a stored procedure in the Azure Data Factory. In that case, we have to be very careful about how we write our code. So that is a challenge for our team because we were not looking into how to optimize their SQL queries when fighting queries from Azure Data Factory to the data warehouse.
In addition, the workflows were running very slow, the performance was bad, and some queries were getting timed out because we have a threshold. So we faced many challenges and had to reeducate ourselves on SQL and query optimization.
What do I think about the scalability of the solution?
In regards to scaling, when Azure Data Factory was introduced as your Databricks, it worked similarly to Hadoop or Spark, and it had some Spark clusters in the back end that could scale it as much as it could, and speed up the performance. So it is scalable, especially with Databricks, because a lot of data-related transformations can be performed.
On my team, there are approximately 20 people who work with Azure Data Factory.
How are customer service and support?
We do not have experience with customer service and support.
How was the initial setup?
It does not require any installation and is more like software as a service. You need to create an instance of Azure Data Factory in Azure and configure some of the connections to your databases. You can connect to your block storages and some authentication is necessary for Azure Data Factory.
The setup is straightforward. It doesn't take much time, and it's on cloud. It requires a few clicks, and you can quickly set it up and grant access to the developer. Then the developer can go to the link and start developing within their browser.
We have a team that takes care of the cloud infrastructure, so we raise a ticket and request infrastructure, and they just exceed it based on the naming convention with the project name.
What about the implementation team?
We have an entire team that takes care of the cloud infrastructure. So we raise a ticket when we need infrastructure, which is executed based on the naming convention for the project name.
What was our ROI?
The nature of our solution is not based on ROI because we are building solutions for other functions within the same organization. In addition, due to the large size of our organization and the services we provide, the ROI is not something we consistently track. It's something discussed with the management, so I can't comment on it.
What's my experience with pricing, setup cost, and licensing?
The cost is based on usage and the computing resources consumed. However, since Azure Data Factory connects with so many different functionalities that Azure provides, such as Azure functions, Logic apps and others in the Azure Data Factory pipelines, additional costs can be acquired by using other tools.
Which other solutions did I evaluate?
We did not evaluate other options because this solution was aligned with out current work environment.
What other advice do I have?
I rate the solution a seven out of ten. The solution is good and constantly improving, but the concept of Flowlets can be reconfigured to retain the changes we make. I advise users considering this solution to thoroughly understand what Azure Data Factory is and evaluate what's available in the market. Secondly, to assess the nature of the use cases and the kind of products they will be building before deciding to choose a solution.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Azure Architect\Informatica ETL Developer at Relativity
A helpful and responsive GUI, but there are a lot of tasks for which you need to write code
Pros and Cons
- "The most valuable feature is the ease in which you can create an ETL pipeline."
- "The support and the documentation can be improved."
What is our primary use case?
I use this primarily for ETL tasks.
What is most valuable?
The most valuable feature is the ease in which you can create an ETL pipeline.
The GUI is very helpful when it comes to creating pipelines. The user interface is also very fast.
The connection to Snowflake is easy. I can store data and transform it during the ETL process before sending it to Snowflake.
What needs improvement?
Azure Data Factory is a bit complicated compared to Informatica. There are a lot of connectors that are missing and there are a lot of instances where I need to create a server and install Integration Runtime.
The support and the documentation can be improved.
There are a lot of tasks that you need to write code for.
For how long have I used the solution?
I have been using Azure Data Factory for about six months.
Which solution did I use previously and why did I switch?
I have experience with Informatica and I find it easier to use. For example, there are a lot of connectors that are directly available. Also, Informatica is able to take incremental copies, but with Azure, you have to write code to do that.
I have also worked with Matillion and Fivetran, and I feel that there are a lot of things that Azure can learn from these products. For example, with Fivetran there are very good connectors for copying data between other solutions. This is unlike Azure, where a lot of the time, I have to build my own logic.
How was the initial setup?
The initial setup is complex.
What other advice do I have?
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Solution Architect at a computer software company with 1,001-5,000 employees
Helps us to load data to warehouses and useful for ETL processes
Pros and Cons
- "The tool's most valuable features are its connectors. It has many out-of-the-box connectors. We use ADF for ETL processes. Our main use case involves integrating data from various databases, processing it, and loading it into the target database. ADF plays a crucial role in orchestrating these ETL workflows."
- "When working with AWS, we have noticed that the difference between ADF and AWS is that AWS is more customer-focused. They're more responsive compared to any other company. ADF is not as good as AWS, but it should be. If AWS is ten out of ten, ADF is around eight out of ten. I think AWS is easier to understand from the GUI perspective compared to ADF."
What is our primary use case?
We use the product for data warehouses. It helps us to load data to warehouses.
What is most valuable?
The tool's most valuable features are its connectors. It has many out-of-the-box connectors. We use ADF for ETL processes. Our main use case involves integrating data from various databases, processing it, and loading it into the target database. ADF plays a crucial role in orchestrating these ETL workflows.
The tool's visual interface is good. The ADS scheduling feature impacts data management by determining when jobs must be run and setting up dependencies. This capability eliminates the need to rely on enterprise data scheduling tools.
What needs improvement?
When working with AWS, we have noticed that the difference between ADF and AWS is that AWS is more customer-focused. They're more responsive compared to any other company. ADF is not as good as AWS, but it should be. If AWS is ten out of ten, ADF is around eight out of ten. I think AWS is easier to understand from the GUI perspective compared to ADF.
For how long have I used the solution?
I have been using the product for 6 months.
What do I think about the stability of the solution?
ADF is stable.
What do I think about the scalability of the solution?
I rate the tool's scalability an eight out of ten.
How was the initial setup?
The tool's deployment is easy. The deployment typically takes around two to three days to set up. However, the duration may vary depending on factors such as the number of integrated endpoints. In our company, the deployment team had three to four people. This team consisted of an IT engineer, a network engineer, and an ETL admin.
We still haven't required much maintenance since we're still in the development phase. However, as time progresses and we move into production, we'll better understand the maintenance requirements.
What's my experience with pricing, setup cost, and licensing?
ADF is cheaper compared to AWS.
What other advice do I have?
The tool has met our projects' growing data needs effectively so far. I rate it an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Azure Data Factory Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2024
Popular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Informatica PowerCenter
Teradata
Oracle Data Integrator (ODI)
Talend Open Studio
IBM InfoSphere DataStage
Oracle GoldenGate
Palantir Foundry
SAP Data Services
Qlik Replicate
Alteryx Designer
Fivetran
SnapLogic
Buyer's Guide
Download our free Azure Data Factory Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which solution do you prefer: KNIME, Azure Synapse Analytics, or Azure Data Factory?
- How do Alteryx, Denod, and Azure Data Factory overlap (or complement) each other?
- Do you think Azure Data Factory’s price is fair?
- What kind of organizations use Azure Data Factory?
- Is Azure Data Factory a secure solution?
- How does Azure Data Factory compare with Informatica PowerCenter?
- How does Azure Data Factory compare with Informatica Cloud Data Integration?
- Which is better for Snowflake integration, Matillion ETL or Azure Data Factory (ADF) when hosted on Azure?
- What is the best suitable replacement for ODI on Azure?
- Which product do you prefer: Teradata Vantage or Azure Data Factory?