We use Azure Data Factory for data integration.
Solution Architect at Giant Eagle
Easy to use and can be used for data integration
Pros and Cons
- "The most valuable features of the solution are its ease of use and the readily available adapters for connecting with various sources."
- "Some known bugs and issues with Azure Data Factory could be rectified."
What is our primary use case?
What is most valuable?
The most valuable features of the solution are its ease of use and the readily available adapters for connecting with various sources.
What needs improvement?
Some known bugs and issues with Azure Data Factory could be rectified.
For how long have I used the solution?
I have been using Azure Data Factory for about two years.
Buyer's Guide
Azure Data Factory
November 2024
Learn what your peers think about Azure Data Factory. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,649 professionals have used our research since 2012.
What do I think about the stability of the solution?
I rate the solution an eight out of ten for stability.
What do I think about the scalability of the solution?
Azure Data Factory is a scalable solution. A team of 16 people from the data analytics team use the solution in our organization.
I rate the solution an eight out of ten for scalability.
How was the initial setup?
On a scale from one to ten, where one is difficult and ten is easy, I rate the solution's initial setup a seven out of ten.
What about the implementation team?
A team of three people deployed Azure Data Factory in three to four days.
What's my experience with pricing, setup cost, and licensing?
The solution's pricing is competitive.
What other advice do I have?
We build data pipelines primarily for integration. Few of them are real-time data transfers, and few of them would be a batch-free file. These would direct the data from various sources to our data warehouse. Azure Data Factory helps build the data pipelines and adaptors.
The solution has built-in features and a control center for us to monitor the status of the pipelines. The solution's email notification also helps us in monitoring. We didn't face any challenges to set up the data pipelines. We know there are some controls, but governance is customized for the organization's requirements. We have our own policies.
Azure Data Factory is deployed on the cloud in our organization. I would recommend Azure Data Factory to other users.
Overall, I rate the solution a nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Mar 25, 2024
Flag as inappropriateAssociate Specialist at Synechron
We can integrate our Databricks notebooks and schedule them
Pros and Cons
- "ADF is another ETL tool similar to Informatica that can transform data or copy it from on-prem to the cloud or vice versa. Once we have the data, we can apply various transformations to it and schedule our pipeline according to our business needs. ADF integrates with Databricks. We can call our Databricks notebooks and schedule them via ADF."
- "I rate Azure Data Factory six out of 10 for stability. ADF is stable now, but we had problems recently with indexing on an SQL database. It's slow when dealing with a huge volume of data. It depends on whether the database is configured as general purpose or hyperscale."
What is our primary use case?
We are currently migrating from on-prem to the cloud, and our on-prem tables are getting data from upstream. We used ADF to build a pipeline to facilitate this migration. A team of 15-20 people currently uses ADF, and more will join once it goes live.
What is most valuable?
ADF is another ETL tool similar to Informatica that can transform data or copy it from on-prem to the cloud or vice versa. Once we have the data, we can apply various transformations to it and schedule our pipeline according to our business needs. ADF integrates with Databricks. We can call our Databricks notebooks and schedule them via ADF.
For how long have I used the solution?
I have used Azure Data Factory for about six months.
What do I think about the stability of the solution?
I rate Azure Data Factory six out of 10 for stability. ADF is stable now, but we had problems recently with indexing on an SQL database. It's slow when dealing with a huge volume of data. It depends on whether the database is configured as general purpose or hyperscale.
How was the initial setup?
I rate Azure Data Factory eight out of 10 for ease of setup. The deployment time depends on the data volume. Four million records will take longer than four thousand. Migrating our full load from on-prem to the cloud took around 16-18 hours because the volume was 17 million.
What's my experience with pricing, setup cost, and licensing?
I rate ADF six out of 10 for affordability. The cost depends on the services we use. It's usage-based.
What other advice do I have?
I rate Azure Data Factory seven out of 10. Companies that want to migrate from on-prem to the cloud have lots of options. I haven't explored them all, but Azure, GCP, and AWS are essentially all the same.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Azure Data Factory
November 2024
Learn what your peers think about Azure Data Factory. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
814,649 professionals have used our research since 2012.
Engineering Manager at a energy/utilities company with 10,001+ employees
A good and constantly improving solution but the Flowlets could be reconfigured
Pros and Cons
- "Azure Data Factory became more user-friendly when data-flows were introduced."
- "Azure Data Factory uses many resources and has issues with parallel workflows."
What is our primary use case?
We use this solution to ingest data from one of the source systems from SAP. From the SAP HANA view, we push data to our data pond and ingest it into our data warehouse.
How has it helped my organization?
Azure Data Factory didn't bring a lot of good when we were also using Alteryx. Alteryx is user-friendly, while Azure Data Factory uses many resources and has issues with parallel workflows. Alteryx helps you diagnose issues quicker than Azure Data Factory because it's on the cloud and has a cold start debugger.
Azure Data Factory has to wake up whenever you are trying to do testing, and it takes about four to five minutes. It's not always online to do a quick test. For example, if we want to test an Excel file to see if the formatting is correct or why the data-flow or pipeline is failing, we need to wait four to five minutes to get the cold start debugger to run. Compared to Alteryx, Azure Data Factory could be better. Nevertheless, we are using it because we have to.
What is most valuable?
Initially, when we started using it, we didn't like it because it needed to be more mature and had data-flows, so we used the traditional pipeline. After that, Azure Data Factory introduced the concept of data-flows, and it started to become more mature and look more like Alteryx. Azure Data Factory became more user-friendly when data-flows were introduced.
What needs improvement?
They introduced the concept of Flowlets, but it has bugs. Flowlets are a reusable component that allows you to create data-flows. We can configure a Flowlet as a reusable pipeline and plug it inside different data-flows, so we don't have to rewrite our code or visual transformation.
If we make any changes in our data-flow, it reverts all our changes to the original state of the Flowlet. It does not retain changes, and we must reconfigure the Flowlets repeatedly. We had these issues three months ago so things might have changed. It works fine whenever we plug it in and configure it in our data-flow, but if we make minor changes to it, the Flowlet needs to be reconfigured again and loses the configuration.
For how long have I used the solution?
We have used this solution for about a month and a half. It is a cloud-based tool, so there are no versions. It is all deployed on Azure Cloud.
What do I think about the stability of the solution?
Everything is computed inside the SQL server if we're working with pipelines, so we have to be very careful when designing our solution in Azure Data Factory. Alteryx spoiled us because we never cared how it looked in the backend because all the operations were happening on the Alteryx server. But in Azure Data Factory, they run on the capacity of our data warehouse. So Azure Data Factory cannot run your queries, and it directly sends the query to the instance in the SQL server or data warehouse. So we have to be very careful about how we perform certain operations.
We need to have knowledge of SQL and how to optimize our queries. If we are calling a stored procedure, it joins one table in Alteryx. It is pretty easy, and we just put a joint tool. Suppose we want to do it with a stored procedure in the Azure Data Factory. In that case, we have to be very careful about how we write our code. So that is a challenge for our team because we were not looking into how to optimize their SQL queries when fighting queries from Azure Data Factory to the data warehouse.
In addition, the workflows were running very slow, the performance was bad, and some queries were getting timed out because we have a threshold. So we faced many challenges and had to reeducate ourselves on SQL and query optimization.
What do I think about the scalability of the solution?
In regards to scaling, when Azure Data Factory was introduced as your Databricks, it worked similarly to Hadoop or Spark, and it had some Spark clusters in the back end that could scale it as much as it could, and speed up the performance. So it is scalable, especially with Databricks, because a lot of data-related transformations can be performed.
On my team, there are approximately 20 people who work with Azure Data Factory.
How are customer service and support?
We do not have experience with customer service and support.
How was the initial setup?
It does not require any installation and is more like software as a service. You need to create an instance of Azure Data Factory in Azure and configure some of the connections to your databases. You can connect to your block storages and some authentication is necessary for Azure Data Factory.
The setup is straightforward. It doesn't take much time, and it's on cloud. It requires a few clicks, and you can quickly set it up and grant access to the developer. Then the developer can go to the link and start developing within their browser.
We have a team that takes care of the cloud infrastructure, so we raise a ticket and request infrastructure, and they just exceed it based on the naming convention with the project name.
What about the implementation team?
We have an entire team that takes care of the cloud infrastructure. So we raise a ticket when we need infrastructure, which is executed based on the naming convention for the project name.
What was our ROI?
The nature of our solution is not based on ROI because we are building solutions for other functions within the same organization. In addition, due to the large size of our organization and the services we provide, the ROI is not something we consistently track. It's something discussed with the management, so I can't comment on it.
What's my experience with pricing, setup cost, and licensing?
The cost is based on usage and the computing resources consumed. However, since Azure Data Factory connects with so many different functionalities that Azure provides, such as Azure functions, Logic apps and others in the Azure Data Factory pipelines, additional costs can be acquired by using other tools.
Which other solutions did I evaluate?
We did not evaluate other options because this solution was aligned with out current work environment.
What other advice do I have?
I rate the solution a seven out of ten. The solution is good and constantly improving, but the concept of Flowlets can be reconfigured to retain the changes we make. I advise users considering this solution to thoroughly understand what Azure Data Factory is and evaluate what's available in the market. Secondly, to assess the nature of the use cases and the kind of products they will be building before deciding to choose a solution.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Data Engineer at Shell
Helps to pull data from on-premises systems and supports large data volumes
Pros and Cons
- "The solution handles large volumes of data very well. One of its best features is its ability to integrate data end-to-end, from pulling data from the source to accessing Databricks. This makes it quite useful for our needs."
- "The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter."
What is our primary use case?
My main use case for Azure Data Factory is to pull data from on-premises systems. Most data transformation is done through Databricks, but Data Factory mainly pulls data into different services.
What is most valuable?
The solution handles large volumes of data very well. One of its best features is its ability to integrate data end-to-end, from pulling data from the source to accessing Databricks. This makes it quite useful for our needs.
What needs improvement?
The main challenge with implementing Azure Data Factory is that it processes data in batches, not near real-time. To achieve near real-time processing, we need to schedule updates more frequently, which can be an issue. Its interface needs to be lighter.
One specific issue is with parallel executions. When running parallel executions for multiple tables, I noticed a performance slowdown.
For how long have I used the solution?
I have been working with the product for five years.
What do I think about the stability of the solution?
We haven't faced any issues with the tool's stability.
What do I think about the scalability of the solution?
The solution can handle large datasets.
How are customer service and support?
I am satisfied with Microsoft's support. They provide solutions to our challenges.
How would you rate customer service and support?
Positive
What's my experience with pricing, setup cost, and licensing?
The solution is cheap.
What other advice do I have?
I rate the overall product an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Jul 30, 2024
Flag as inappropriateSpecialist Software Engineer at a financial services firm with 10,001+ employees
Faster than other solutions, has multiple connectors, and is easy to set up
Pros and Cons
- "One advantage of Azure Data Factory is that it's fast, unlike SSIS and other on-premise tools. It's also very convenient because it has multiple connectors. The availability of native connectors allows you to connect to several resources to analyze data streams."
- "There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation."
What is our primary use case?
I use Azure Data Factory for architecture creation, for example, loading data from Oracle DB to Azure Synapse Analytics, creating facts and dimensions using Azure Data Pipeline, and creating Azure Synapse notebooks for data transformation.
Another use case for Azure Data Factory is dashboard creation to help customers make informed decisions.
How has it helped my organization?
Compared to the on-premise SSIS, Azure Data Factory has better infrastructure. It also benefits my company because you can scale the solution up or down with different resources.
Azure Data Factory is also on a pay-as-you-go or pay-as-you-use model, which is suitable for the company because my company only pays for its usage or requirement.
The solution is also very user-friendly, and the Azure Data Factory support team responds quickly whenever my team has a loading issue.
What is most valuable?
One advantage of Azure Data Factory is that it's fast, unlike SSIS and other on-premise tools.
It's also very convenient because Azure Data Factory has multiple connectors. It has sixty connectors which you can't find in SSIS. The availability of native connectors allows you to connect to several resources to analyze data streams.
I also like that you can set up your own VM and infrastructure on Azure Data Factory without any help from the IT team because it only requires a single click.
What needs improvement?
What's missing in Azure Data Factory is an Oracle connector. If you want to connect directly to the Oracle database, you must copy and transform the data. There's no Oracle connector if you want to do transformation using data flow activity, so Azure Data Factory needs more connectors for data flow transformation.
Sending out emails after a job is completed is another area for improvement in the tool.
For how long have I used the solution?
I've been using Azure Data Factory for three years.
What do I think about the scalability of the solution?
Azure Data Factory is a scalable tool.
Which solution did I use previously and why did I switch?
We used SSIS, but its on-premise version is slower than Azure Data Factory, and Azure Data Factory, infrastructure-wise, is better, so we went with Azure Data Factory.
How was the initial setup?
The initial setup for Azure Data Factory is an eight out of ten.
Manually deploying Azure Data Factory is easy and doesn't take much time, but I'm not sure how long it takes for an automated approach to deployment.
What's my experience with pricing, setup cost, and licensing?
The licensing model for Azure Data Factory is good because you won't have to overpay. Pricing-wise, the solution is a five out of ten. It was not expensive, and it was not cheap. It's in the middle.
What other advice do I have?
I have experience with both Azure Data Factory and SSIS.
I'm using the latest version of Azure Data Factory.
My rating for Azure Data Factory is eight out of ten.
My company is an Azure Data Factory user.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Analytics Specialist at GlaxoSmithKline
Quick delivery due to drag-and-drop interface
Pros and Cons
- "One of the most valuable features of Azure Data Factory is the drag-and-drop interface. This helps with workflow management because we can just drag any tables or data sources we need. Because of how easy it is to drag and drop, we can deliver things very quickly. It's more customizable through visual effect."
- "Data Factory could be improved by eliminating the need for a physical data area. We have to extract data using Data Factory, then create a staging database for it with Azure SQL, which is very, very expensive. Another improvement would be lowering the licensing cost."
What is our primary use case?
My primary use case of Azure Data Factory is supporting the data migration for advanced analytics projects.
What is most valuable?
One of the most valuable features of Azure Data Factory is the drag-and-drop interface. This helps with workflow management because we can just drag any tables or data sources we need. Because of how easy it is to drag and drop, we can deliver things very quickly. It's more customizable through visual effect.
What needs improvement?
Data Factory could be improved by eliminating the need for a physical data area. We have to extract data using Data Factory, then create a staging database for it with Azure SQL, which is very, very expensive. Another improvement would be lowering the licensing cost.
For how long have I used the solution?
I have been using this solution for the past year.
What do I think about the stability of the solution?
This solution is stable. We are using an Azure subscription, so there is no maintenance or direct updates, it's just always the latest version.
What do I think about the scalability of the solution?
This solution is automatically scalable, since it's in the cloud. At my company, there were more than one thousand people using this solution because we were a big, media-based company. If there are many user requests in the front end application and the system is not responding much or has slow performance, the system will automatically scale up the performance hardware requirements.
How are customer service and support?
I have contacted technical support. I have never faced an issue like that with Denodo. Fortunately, we got some kind of a tutorial PDF, which helps us to deploy everything quickly.
Which solution did I use previously and why did I switch?
Before working with Azure, I worked with Python. In the culture I was working in, there was no integration. We were using Pure Python scripting and Python data manipulation tools. For example, we used Python's pandas library, which we coded to transform and orchestrate the data, which is necessary for the endpoint. It was not at all a visual tool. It took more time than Denodo.
How was the initial setup?
There is no installation because it's on the cloud. You just log on to the cloud with your subscription credentials, then you can use Data Factory directly.
What about the implementation team?
I implemented through an in-house team.
What's my experience with pricing, setup cost, and licensing?
Data Factory is very expensive. We are using an Azure subscription, so Data Factory has no direct updates, it's just always the latest version. Compared to Denodo, Azure is very costly. Azure Framework has multiple services, not only Data Factory. So in the cloud-based solution, if you're selecting a particular service, like Data Factory, you need to pay for each request.
Which other solutions did I evaluate?
I also use Denodo. Data Factory is like a transformation layer, but we need an additional staging database or a data storage facility, which is very expensive compared to implementing Denodo. So we extracted the data using Data Factory, then created a staging database with Azure SQL, which cost a huge amount since it's a physical data area. In Denodo, we just implement a layer, which is all handled in Denodo, and not a physical storage mechanism. I prefer customizable data solutions because they improve performance, creativity, and are helpful for front end people.
In comparison to Data Factory's drag-and-drop interface, Denodo developers need to create all the unified views by coding, so we have to create SQL queries to execute. With Data Factory, you can quickly drag and drop data or tables, but in Denodo, it takes more time because you need to code and test and all that.
What other advice do I have?
I rate Data Factory an eight out of ten, mainly because you need a staging database. I recommend Azure to others, but it depends on architecture. In Data Factory, there is no virtualization environment, no layer of virtualization to help integration and doing caching mechanisms. Though Data Factory is there, Denodo is going further.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Azure Architect\Informatica ETL Developer at Relativity
A helpful and responsive GUI, but there are a lot of tasks for which you need to write code
Pros and Cons
- "The most valuable feature is the ease in which you can create an ETL pipeline."
- "The support and the documentation can be improved."
What is our primary use case?
I use this primarily for ETL tasks.
What is most valuable?
The most valuable feature is the ease in which you can create an ETL pipeline.
The GUI is very helpful when it comes to creating pipelines. The user interface is also very fast.
The connection to Snowflake is easy. I can store data and transform it during the ETL process before sending it to Snowflake.
What needs improvement?
Azure Data Factory is a bit complicated compared to Informatica. There are a lot of connectors that are missing and there are a lot of instances where I need to create a server and install Integration Runtime.
The support and the documentation can be improved.
There are a lot of tasks that you need to write code for.
For how long have I used the solution?
I have been using Azure Data Factory for about six months.
Which solution did I use previously and why did I switch?
I have experience with Informatica and I find it easier to use. For example, there are a lot of connectors that are directly available. Also, Informatica is able to take incremental copies, but with Azure, you have to write code to do that.
I have also worked with Matillion and Fivetran, and I feel that there are a lot of things that Azure can learn from these products. For example, with Fivetran there are very good connectors for copying data between other solutions. This is unlike Azure, where a lot of the time, I have to build my own logic.
How was the initial setup?
The initial setup is complex.
What other advice do I have?
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Solution Architect at a computer software company with 1,001-5,000 employees
Helps us to load data to warehouses and useful for ETL processes
Pros and Cons
- "The tool's most valuable features are its connectors. It has many out-of-the-box connectors. We use ADF for ETL processes. Our main use case involves integrating data from various databases, processing it, and loading it into the target database. ADF plays a crucial role in orchestrating these ETL workflows."
- "When working with AWS, we have noticed that the difference between ADF and AWS is that AWS is more customer-focused. They're more responsive compared to any other company. ADF is not as good as AWS, but it should be. If AWS is ten out of ten, ADF is around eight out of ten. I think AWS is easier to understand from the GUI perspective compared to ADF."
What is our primary use case?
We use the product for data warehouses. It helps us to load data to warehouses.
What is most valuable?
The tool's most valuable features are its connectors. It has many out-of-the-box connectors. We use ADF for ETL processes. Our main use case involves integrating data from various databases, processing it, and loading it into the target database. ADF plays a crucial role in orchestrating these ETL workflows.
The tool's visual interface is good. The ADS scheduling feature impacts data management by determining when jobs must be run and setting up dependencies. This capability eliminates the need to rely on enterprise data scheduling tools.
What needs improvement?
When working with AWS, we have noticed that the difference between ADF and AWS is that AWS is more customer-focused. They're more responsive compared to any other company. ADF is not as good as AWS, but it should be. If AWS is ten out of ten, ADF is around eight out of ten. I think AWS is easier to understand from the GUI perspective compared to ADF.
For how long have I used the solution?
I have been using the product for 6 months.
What do I think about the stability of the solution?
ADF is stable.
What do I think about the scalability of the solution?
I rate the tool's scalability an eight out of ten.
How was the initial setup?
The tool's deployment is easy. The deployment typically takes around two to three days to set up. However, the duration may vary depending on factors such as the number of integrated endpoints. In our company, the deployment team had three to four people. This team consisted of an IT engineer, a network engineer, and an ETL admin.
We still haven't required much maintenance since we're still in the development phase. However, as time progresses and we move into production, we'll better understand the maintenance requirements.
What's my experience with pricing, setup cost, and licensing?
ADF is cheaper compared to AWS.
What other advice do I have?
The tool has met our projects' growing data needs effectively so far. I rate it an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Mar 21, 2024
Flag as inappropriateBuyer's Guide
Download our free Azure Data Factory Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Popular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Informatica PowerCenter
Snowflake
Teradata
MuleSoft Anypoint Platform
Oracle Data Integrator (ODI)
webMethods.io
Talend Open Studio
IBM InfoSphere DataStage
Oracle GoldenGate
Palantir Foundry
Microsoft Azure Synapse Analytics
SAP Data Services
StreamSets
Buyer's Guide
Download our free Azure Data Factory Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which solution do you prefer: KNIME, Azure Synapse Analytics, or Azure Data Factory?
- How do Alteryx, Denod, and Azure Data Factory overlap (or complement) each other?
- Do you think Azure Data Factory’s price is fair?
- What kind of organizations use Azure Data Factory?
- Is Azure Data Factory a secure solution?
- How does Azure Data Factory compare with Informatica PowerCenter?
- How does Azure Data Factory compare with Informatica Cloud Data Integration?
- Which is better for Snowflake integration, Matillion ETL or Azure Data Factory (ADF) when hosted on Azure?
- What is the best suitable replacement for ODI on Azure?
- Which product do you prefer: Teradata Vantage or Azure Data Factory?